Containers (sometimes called collections) are a staple of computer programming. Every major programming language has fundamental containers, such as arrays or lists. Modern programming languages usually have an assortment of more powerful containers for more specialized needs, such as trees.
This chapter presents the standard containers, the iterators used to examine containers, and the algorithms that can be used with the iterators.
The fundamental purpose of a container is to store multiple objects in a single container object. Different kinds of containers have different characteristics: speed, size, and ease of use. The choice of container depends on the characteristics and behavior you require.
In C++, the containers are class templates, so you can store anything in a container. (Well, almost anything. The type must have a public copy constructor and assignment operator. Some containers impose additional restrictions.)
The standard containers fall into two categories: sequences and associations. A sequence container preserves the original order in which items were added to the container. An associative container keeps items in ascending order, to speed up searching.
The standard containers are as follows:
deque
<deque>
.list
<list>
.map
multimap
map
requires unique keys. A multimap
permits duplicate keys. The header for map
and multimap
is <map>
.set
multiset
set
and multiset
is <set>
.basic_string
string
wstring
string
and wstring
types as strings, not containers of character values. Nonetheless, they meet the requirements of a sequence container, so you can use their iterators with the standard algorithms. The header is <string>
.vector
<vector>
.The set and map containers perform insertions, deletions, and searches in logarithmic time, which implies a tree or tree-like implementation. Items must be kept in sorted order, so a hash table implementation is not allowed. Many people consider the lack of a standard hash table to be a serious omission. When the C++ standard is revised, a hash table is likely to be added. Until then, there are plenty of hash table containers available on the Internet.
In addition to the standard containers, the standard library has several container adapters. An adapter is a class template that uses a container for storage while provided container-like behavior. The standard adapters are priority_queue
, queue
, and stack
. For more information, see the <queue>
and <stack>
sections in Chapter 13.
The standard library has a few class templates that are similar to the standard containers, but fail one or more of the requirements for a standard container. In particular, bitset
, valarray
, and vector<bool>
are container-like types, but fail to mee the requirements of a standard container.
The bitset
type represents a bitmask of arbitrary size. The sized is fixed when the bitset
is declared. There are not bitset
iterators, so you cannot use a bitset
with the standard algorithms. See <bitset>
in Chapter 13 for details.
A valarray
is an array of numeric values, optmized for computational efficiency. As part of the optimzation, the compiler is free to make assumptions that prevent the user of valarray
with the standard algorithms. See <valarray>
in Chapter 13 for details.
The vector<bool>
type is a specialization of the vector template. Although vector
<> usually meets the requirements of a standard container, the vector<bool>
specialization does not. See <vector>
in Chapter 13 for details.
If you write your own container, be sure to follow the conventions and rules laid down for the standard containers. In particular, your container should define the following types:
const_iterator
const
values.const_reference
const_reference
type.difference_type
iterator
reference
reference
type.size_type
difference_type
value.value_type
A container that supports bidirectional iterators should also define the reverse_iterator
and const_reverse_iterator
types.
An associative container should define key_type
as the key type, compare_type
as the key compare function, and value_compare
as a function that compares two value_type
objects.
Optionally, a container can declare pointer
and const_pointer
as synonyms for the allocator's types of the same name, and allocator_type
for the allocator, which is typically the last template parameter.
A custom container should have standard default and copy constructors. The default constructor initializes the container to be empty. The copy constructor initializes the container with a copy of all the items from the other container. All containers should also have a constructor that takes two template arguments:
template<InIter>
container
(InIter first, InIter last)
InIter
is an integral type, the container is initialized with first
copies of last
(converted to value_type
). Otherwise, InIter
must be an input iterator, and the container is initialized with copies of all the items in the range [first
, last
).A sequence container should have the following additional constructor:
container
(size_type n, const value_type& x)
n
copies of x
.An associative container should have the following additional constructors:
container
(key_compare compare)
compare
to compare keys.template<InIter>
container
(InIter
first,
InIter
last,
key_compare
compare)
first
, last
), comparing keys with compare
.Optionally, your container can be parameterized with an allocator, and all the constructors can take an optional, additional parameter to specify an allocator objects.
The destructor should be sure to call the destructor for every object in the container.
All containers should have the following member functions:
iterator
begin
()
const_iterator
begin
() const
void
clear
()
bool
empty
()
size()
==
0
).iterator
end
()
const_iterator
end
() const
erase
(iterator p)
erase
(iterator first, last)
p
points to or all the items in the range [first
, last
). For a sequence container, erase
returns an iterator that points to the item that comes immediately after that last deleted item or end()
. For an associative container, erase
does not return a value.size_type
max_size
()
size_type
, other container types might have a fixed maximum size, such as an array type.container&
operator=
(const container& that)
that
.size_type
size
()
void
swap
(const container& that)
that
.All containers should have all the comparison functions defined, either as member functions, or preferably functions at the namespace level. Namespace-level functions offer more flexibility than member functions. For example, the compiler can use implicit type conversions on the left-hand operand, but only if the function is not a member function.
A container that supports bidirectional iterators should define rbegin()
and rend()
member functions to return reverse iterators.
The following functions are optional. You should provide only those functions that can run in constant time:
reference
at
(size_type n)
const_reference
at
(size_type n) const
n
, or throws out_of_range
if n
>=
size()
.reference
back
()
const_reference
back
() const
reference
front
()
const_reference
front
() const
reference
operator[]
(size_type n)
const_reference
operator[]
(size_type n)
n
. Behavior is undefined if n
>=
size()
.void
pop_back
()
void
pop_front
()
void
push_back
(const value_type& x)
x
as the new last item in the container.void
push_front
(const value_type& x)
x
as the new first item in the container.A sequence container should define the following member functions:
iterator
insert
(iterator p, const value_type& x)
x
immediately before p
and returns an iterator that points to x
.void
insert
(iterator p, size_type n,
const value_type& x)
n
copies of x
before p
.template<InIter>
void
insert
(iterator p, InIter first, InIter last)
first
, last
) and inserts them before p
.An associative container should define the following member functions:
size_type
count
(const key_type& k)
const
k
.pair<const_iterator,const_iterator>
equal_range
(const
key_type&
k) const
pair<iterator,iterator>
equal_range
(const
key_type&
k)
make_pair(lower_bound(k),
upper_bound(k))
.size_type
erase
(const key_type& k)
const_iterator
find
(const key_type& k) const
iterator
find
(const key_type& k)
end()
if not found.
insert
(const value_type& x)
x
. If the container permits duplicate keys, insert
returns an iterator that points to the newly inserted item. If the container requires unique keys, insert returns pair<iterator,bool>
, where the first element of the pair is an iterator that points to item equivalent to x
, and the second element is true if x
was inserted or false if x
was already present in the container.iterator
insert
(iterator p, const value_type& x)
x
and returns an iterator that points to x
. The iterator p
is a hint for where x
might belong.template<InIter>
void
insert
(InIter first, InIter last)
first
, last
) and inserts each item in the container.key_compare
key_comp
() const
const_iterator
lower_bound
(const key_type& k) const
iterator
lower_bound
(const key_type& k)
k
. That is, if k
is in the container, the iterator points to the position of its first occurrence; otherwise the iterator points to the first position where k
should be inserted.value_compare
value_comp
() const
const_iterator
upper_bound
(const key_type& k) const
iterator
upper_bound
(const key_type& k)
k
.Example 11-1 shows the slist
container, which implements a singly-linked list. A singly-linked list rquires slightly less memory than a doubly-linked list, but offer at best a forward iterator, not a bidirectional iterator.
Example 11-1: Implementing a custom container: a singly-linked list.
// Simple container for singly-linked lists. template<typename T, typename Alloc = ::std::allocator<T> > class slist { // Private type for a link (node) in the list. template<typename U> struct link { link* next; U value; }; typedef link<T> link_type; public: typedef typename Alloc::reference reference; typedef typename Alloc::const_reference const_reference; typedef typename Alloc::pointer pointer; typedef typename Alloc::const_pointer const_pointer; typedef Alloc allocator_type; typedef T value_type; typedef size_t size_type; typedef ptrdiff_t difference_type; class iterator; // See Iterators, later in this class const_iterator; // chapter for the iterators. slist(const slist& that); slist(const Alloc& alloc = Alloc()); slist(size_type n, const T& x, const Alloc& alloc=Alloc()); template<typename InputIter> slist(InputIter first, InputIter last, const Alloc& alloc = Alloc()); ~slist() { clear(); } slist& operator=(const slist& that); allocator_type get_allocator() const { return alloc_; } iterator begin() { return iterator(0, head_); } const_iterator begin() const { return const_iterator(0, head_); } iterator end() { return iterator(0, 0); } const_iterator end() const { return const_iterator(0, 0); } void pop_front() { erase(begin()); } void push_front(const T& x) { insert(begin(), x); } T front() const { return head_->value; } T& front() { return head_->value; } iterator insert(iterator p, const T& x); void insert(iterator p, size_type n, const T& x); template<typename InputIter> void insert(iterator p, InputIter first, InputIter last); iterator erase(iterator p); iterator erase(iterator first, iterator last); void clear() { erase(begin(), end()); } bool empty() const { return size() == 0; } size_type max_size() const { return ::std::numeric_limits<size_type>::max(); } void resize(size_type sz, const T& x = T()); size_type size() const { return count_; } void swap(slist& that); private: typedef typename allocator_type::template rebind<link_type>::other link_allocator_type; link_type* newitem(const T& x, link_type* next = 0); void delitem(link_type* item); template<typename InputIter> void construct(InputIter first, InputIter last, is_integer_tag); template<typename InputIter> void construct(InputIter first, InputIter last, is_not_integer_tag); link_type* head_; link_type* tail_; size_t count_; allocator_type alloc_; link_allocator_type linkalloc_; }; // Constructor. If InputIter is an integral type, the // standard requires the constructor to interpret first // and last as a count and value, and perform the // slist(size_type, T) constructor. Use the is_integer // trait to dispatch to the appropriate construct function, // which does the real work. template<typename T, typename A> template<typename InputIter> slist<T,A>::slist(InputIter first, InputIter last, const A& alloc) : alloc_(alloc), linkalloc_(link_allocator_type()), head_(0), tail_(0), count_(0) { construct(first, last, is_integer<InputIter>::tag()); } template<typename T, typename A> template<typename InputIter> void slist<T,A>::construct(InputIter first, InputIter last, is_integer_tag) { insert(begin(), static_cast<size_type>(first), static_cast<T>(last)); } template<typename T, typename A> template<typename InputIter> void slist<T,A>::construct(InputIter first, InputIter last, is_not_integer_tag) { insert(begin(), first, last); } // Private function to allocate a new link node. template<typename T, typename A> typename slist<T,A>::link_type* slist<T,A>::newitem(const T& x, link_type* next) { link_type* item = linkalloc_.allocate(1); item->next = next; alloc_.construct(&item->value, x); return item; } // Private function to release a link node. template<typename T, typename A> void slist<T,A>::delitem(link_type* item) { alloc_.destroy(&item->value); linkalloc_.deallocate(item, 1); } // Basic insertion function. All insertions eventually find // their way here. Inserting at the head of the list // (p == begin()) must set the head_ member. // Inserting at the end of the list (p == end()) means // appending to the list, which updates the tail_'s next // member, and then sets tail_. Anywhere else in the list // requires updating p.prev_->next. Note that inserting into // an empty list looks like inserting at end(). Return an // iterator that points to the newly inserted node. template<typename T, typename A> typename slist<T,A>::iterator slist<T,A>::insert(iterator p, const T& x) { // Allocate the new link before changing any pointers. If // newitem throws an exception, the list is not affected. link_type* item = newitem(x, p.node_); if (p.node_ == 0) { p.prev_ = tail_; // at end if (tail_ == 0) head_ = tail_ = item; // empty list else { tail_->next = item; tail_ = item; } } else if (p.prev_ == 0) head_ = item; // new head of list else p.prev_->next = item; p.node_ = item; ++count_; return p; } // Erase the item at p. All erasures come here eventually. // If erasing begin(), update head_. // If erasing the last item in the list, update tail_. // Update the iterator to point to the node after the one // being deleted. template<typename T, typename A> typename slist<T,A>::iterator slist<T,A>::erase(iterator p) { link_type* item = p.node_; p.node_ = item->next; if (p.prev_ == 0) head_ = item->next; else p.prev_->next = item->next; if (item->next == 0) tail_ = p.prev_; --count_; delitem(item); return p; } // Comparison functions are straightforward. template<typename T> bool operator==(const slist<T>& a, const slist<T>& b) { return a.size() == b.size() && ::std::equal(a.begin(), a.end(), b.begin()); }
A container hold stuff. Naturally, you need to know how to add stuff to a container, remove stuff from a container, find stuff in a container, and so on.
To add an item to a container, call an insert
member function. Sequence containers might also have push_front
or push_back
to insert an item at the beginning or end of the sequence. The push_front
and push_back
members exist only if they can be implemented in constant time. (Thus, for example, vector
does not have push_front
.)
Every container has an insert(
iter,
item)
function, where iter is an iterator and item is the item to insert. A sequence container inserts the item before the indicated position. Associative containers treat the iterator as a hint: if the item belongs immediately after the iterator's position, performance is constant instead of logarithmic.
Sequence containers have other insert
functions, to insert many copies of an item at a position and to copy a range to a given position. Associative containers can copy a range into the container.
To remove an item from a container, call an erase
member function. All containers have erase functions that take a single iterator (to delete the item that the iterator points to) and two iterators (to delete every item in the range). Associative containers have an erase function that takes an item as an argument to erase all matching items.
Among the standard algorithms are remove
and remove_if
. Their names are suggestive, but misleading. They do not remove anything from the container. Instead, they rearrange the elements of the container so the items to remove are at the end. They return an iterator that points to the first item to be erased. Call erase
with this iterator as the first argument and end()
as the second to erase the items from the container. This two-step process is needed because an iterator cannot erase anything. The only way to erase an item from a container is to call a member function of the container, and the standard algorithms do not have access to the containers, only iterators. Example 11-2 shows how to implement a generic erase
function that calls remove
and then the erase
member function.
Example 11-2: Removing matching items from a sequence container.
// Erase all items from container c that are equal to item. template<template<typename T, typename A> class C, typename T, typename A> void erase(C<T,A>& c, const T& item) { c.erase(std::remove(c.begin(), c.end(), item), c.end()); } template<template<typename T, typename A> class C, typename T, typename A, typename Pred> void erase_if(C<T,A>& c, Pred pred) { c.erase(std::remove_if(c.begin(), c.end(), pred), c.end()); } int main() { std::list<int> l; ... // Erase all items == 20. erase(l, 20); ... // Erase all items < 20. erase_if(l, std::bind2nd(std::less<int>(), 20)); ... }
The standard algorithms provide several different ways to search for items in a container: adjacent_find
, find
, find_end
, first_first_of
, find_if
, search
, and search_n
. These algorithms essentially perform a linear search of a range. If you know exactly which item you want, you can search an associative container much faster by calling the find
member function. For example, suppose you want to write a generic function, contains
, that tells you whether a container contains at least one instance of an item. Example 11-3 shows one way to implement this function.
Example 11-3: Determining whether a container contains an item.
// Need a type trait to tell us which containers are // associative and which are not. struct associative_container_tag {}; struct sequence_container_tag {}; struct unknown_container_tag {}; template<template<typename T, typename A> class C, typename T, typename A> struct is_associative { typedef unknown_container_tag tag; }; template<typename T, typename A> struct is_associative<std::list,T,A> { typedef sequence_container_tag tag; }; // ditto for vector and deque template<typename T, typename A> struct is_associative<std::set,T,A> { typedef associative_container_tag tag; }; // ditto for multiset, map, and multimap template<template<typename T, typename A> class C, typename T, typename A> inline bool do_contains(const C<T,A>& c, const T& item, const associative_container_tag&) { return c.end() != c.find(item); } template<template<typename T, typename A> class C, typename T, typename A> inline bool do_contains(const C<T,A>& c, const T& item, const sequence_container_tag&) { return c.end() != ::std::find(c.begin(), c.end(), item); } template<template<typename T, typename A> class C, typename T, typename A> inline bool do_contains(const C<T,A>& c, const T& item, const unknown_container_tag&) { return c.end() != ::std::find(c.begin(), c.end(), item); } // Here is the actual contains function. It dispatches // to do_contains, picking the appropriate overloaded // function depending on the type of the container c. template<template<typename T, typename A> class C, typename T, typename A> bool contains(const C<T,A>& c, const T& item) { return do_contains(c, item, is_associative<C,T,A>::tag()); }
As you can see, iterators are important for using containers. You need them to insert at a specific position, identify an item for erasure, or specifying ranges for algorithms. The next section discusses iterators in more depth.
An iterator is a kind of smart pointer for pointing into containers and other sequences. An ordinary pointer can point to different elements in an array. The ++
operator advances the pointer to the next element, and the *
operator dereferences the pointer to return a value from the array. Iterators generalize the concept so the same operators have the same behavior for any container, even trees and lists. This section describes iterators in general. See the <iterator>
section of Chapter 13 for more details.
Ther e are five categories of iterators:
++
) operator advances to the next element, but there is no decrement operator. The dereference (*
) operator does not return an lvalue, so you can read elements but not modify them.++
) operator advances to the next element, but there is no decrement operator. You can dereference an element only to assign a value to it. You cannot compare output iterators.--
(decrement) operator to move the iterator backward by one position.[]
(subscript) operator to access any index in the sequence. Also, you can add or subtract an integer to move a random access integer by more than one position at a time. Subtracting two random access iterators yields an integer distance between them. Thus, a random access iterator is most like a conventional pointer, and a pointer can be used as a random access iterator.An input, forward, bidirectional, or random access iterator can be a constant iterator. Dereferencing a constant iterator yields a constant lvalue.
The most important point to remember about iterators is that they are inherently unsafe. Like pointers, an iterator can point to a container that has been destroyed or to an element that has been erased. You can advance an iterator past the end of the container the same way a pointer can point past the end of an array. With a little care and caution, however, iterators are safe to use.
The first key to safe use of iterators is to make sure a program never dereferences an iterator that marks the end of a range. Two iterators can denote a range of values, typically in a container. One iterator points to the start of the range and another marks the end of the range by pointing to a position one past the last element in the range. The mathematical notation of [first
, last
) tells that that the item that first
points to is included in the range, but the item that last
points to is excluded from the range.
A program must never dereference an iterator that is pointing to one past the end of a range (e.g., last
) because that iterator might not be valid. It might be pointing to one past the end of the elements of a container, for example.
Even a valid iterator can become invalid and therefore unsafe to use, for example if the container is destroyed. The detailed descriptions in Chapter 13 tell you this information for each container type. In general, iterators for the node-based containers (list, set, multiset, map, multimap) become invalid only when they point to an erased node. Iterators for the array-based containers (deque, vector) becomes invalid when the underlying array is reallocated, which might happen for any insertion and for some erasures.
Iterators are often used with containers, but they have many more uses. You can define iterators for almost any sequence of objects. The standard library includes several examples of non-container iterators, most notably I/O iterators.
At the lowest level, a stream is nothing more than a sequence of characters. At a slightly higher level, you can think of a stream as a sequence of objects, which would be read with operator>>
or written with operator<<
. Thus, the standard library includes the following I/O iterators: istreambuf_iterator, ostreambuf_iterator, istream_iterator, ostream_iterator. Example 11-4 shows how to use streambuf iterators to copy one stream to another.
Example 11-4: Copying streams with streambuf iterators.
template<typename charT, typename traits> void copy(std::basic_ostream<charT,traits>& out, std::basic_istream<charT,traits>& in) { std::copy(std::istreambuf_iterator<charT>(in), std::istreambuf_iterator<charT>(), std::ostreambuf_iterator<charT>(out)); }
A special kind of output iterator is an insert iterator, which inserts items in a sequence collection. The insert iterator requires a container, and an optional iterator to specify the position where the new items should be inserted. You can insert at the back of a sequence with back_insert_iterator
, at the front of a sequence with front_insert_iterator
, or at a specific position with insert_iterator
. Each of these iterator class templates has an associated function template that creates the object for you, letting the compile infer the type. Example 11-5 shows how to read a series of numbers from a stream, and store them all at the end of a vector.
Example 11-5: Inserting numbers in a vector.
#include <algorithm> #include <iostream> #include <iterator> #include <vector> int main() { using namespace std; vector<double> data; copy(istream_iterator<double>(cin), istream_iterator<double>(), back_inserter(data)); // use the data... // Write the data, one number per line. copy(data.begin(), data.end(), ostream_iterator<double>(cout, "\n")); }
The simplest way to write your own iterator is to derive from the iterator class template, specializing it for your iterator category. The slist
container from Example 11-1 needs an iterator
and a const_iterator
. The only difference is that a const_iterator
returns rvalues instead of lvalues. Much of the iteration logic can be factored into a base class. Example 11-6 shows iterator
and base_iterator
; const_iterator
is almost identical to iterator
, so it is not shown.
Example 11-6: Writing a custom iterator.
// The declaration for iterator_base is nested in slist. class iterator_base : public std::iterator< ::std::forward_iterator_tag, T> { friend class slist; public: bool operator==(const iterator_base& i) const { return node_ == i.node_; } bool operator!=(const iterator_base& i) const { return ! (*this == i); } protected: iterator_base(const iterator_base& i) : prev_(i.prev_), node_(i.node_) {} iterator_base(slist::link_type* prev, slist::link_type* node) : prev_(prev), node_(node) {} // If node_ == 0, the iterator == end(). slist::link_type* node_; // A pointer to the node before node_ is needed to support // erase(). If prev_ == 0, the iterator points to the head // of the list. slist::link_type* prev_; private: iterator_base(); }; // The declaration for iterator is nested in slist. class iterator : public iterator_base { friend class slist; public: iterator(const iterator& i) : iterator_base(i) {} iterator& operator++() { // pre-increment this->prev_ = this->node_; this->node_ = this->node_->next; return *this; } iterator operator++(int) { // post-increment iterator tmp = *this; operator++(); return tmp; } T& operator*() { return this->node_->value; } T* operator->() { return &this->node_->value; } private: iterator(slist::link_type* prev, slist::link_type* node) : iterator_base(prev, node) {} };
Every container must provide an iterator
type and a const_iterator
type. Functions such as begin()
and end()
return iterator
when calls on a non-const
container and return const_iterator
when called on a const
container.
Note that a const_iterator
(with underscore) is quite different from a const
iterator
(without underscore). A const
iterator
is a constant object of type iterator
. Being constant, it cannot change, so it cannot advance to point to a different position. A const_iterator
, on the other hand, is a non-const
object of type const_iterator
. It is not constant, so it can change value. The key difference between iterator
and const_iterator
is that iterator
returns references to T
objects, and const_iterator
returns type const
T
. The standard requires that a plain iterator
be convertible to const_iterator
, but not the other way.
The problem is that some members of the standard contains (most notably erase and insert) take iterator
as parameters, not const_iterator
. If you have a const_iterator
, you cannot use it as an insertion or erasure position.
Another problem is that it might be difficult to compare an iterator
with a const_iterator
. If the compiler reports an error when you try to compare iterators for equality or inequality, try swapping the order of the iterators, that is, if a
==
b
fails to compile, try b
==
a
. The problem, most likely, is that b
is a const_iterator
and a
is a plain iterator
. By swapping the order, you let the compiler convert a
to a const_iterator
and allow the comparison.
For a full explanation of how best to work with const_iterator
s, see Scott Meyer's Effective STL (Addison-Wesley, 2001).
The standard library includes the reverse_iterator
class template as a convenient way to implement the reverse_iterator
type in certain containers. A reverse iterator, as its name implies, is an iterator adapter that runs in the reverse direction of the adapted iterator. The adapted iterator must be a bidirectional or random access iterator.
On paper, the reverse iterator seems like a good idea. After all, a bidirectional iterator can run in two directions. There is no reason why an iterator adapter could not implement operator++
by calling the adapted iterator's operator--
function.
Reverse iterators share a problem with const_iterator
s, namely hat several members, such as insert
and erase
, do not take an iterator template parameter, but require the exact iterator
type, as declared in the container class. The reverse_iterator
type is not accepted, so you must pass the adapted iterator instead, which is returned from the base()
function.
As an insertion point, the base()
iterator works fine, but for erasing, it is one off from the desired position. The solution is to increment the reverse iterator, then call base()
, as shown in Example 11-7.
Example 11-7: Using a reverse iterator.
int main() { std::list<int> l; l.push_back(10); l.push_back(42); l.push_back(99); print(l); std::list<int>::reverse_iterator ri; ri = std::find(l.rbegin(), l.rend(), 42); l.insert(ri.base(), 33); // Okay: 33 inserted before 42, from the point of view // of a reverse iterator, that is, 33 inserted after 42. ri = std::find(l.rbegin(), l.rend(), 42); l.erase(ri.base()); // Oops! Item 33 is deleted, not item 42. ri = std::find(l.rbegin(), l.rend(), 42); l.erase((++ri).base()); // That's right! In order to delete the item ri points to, // you must advance ri first, then delete the item. }
For a full explanation of how best to work with reverse iterators, see Scott Meyer's Effective STL (Addison-Wesley, 2001).
The so-called algorithms in the standard library set off C++ from other programming languages. Every major programming language has a set of containers, but in the traditional object-oriented approach, each container defines the operations that are permitted, e.g., sorting, searching, modifying. C++ turns object-oriented programming on its head and provides a set of function templates, called algorithms, that work with iterators, and therefore with almost any container.
The advantage of the C++ approach is that the library can contain a rich set of algorithms, where each algorithm can be written once and work with any kind of container. The set of algorithms is easily extensible without touching the container classes. Another benefit is that the algorithms work with iterators, not containers, so even non-container iterators (such as the stream iterators) can participate.
C++ algorithms have one glaring disadvantage, though. Remember that iterators, like pointers, can be unsafe. Algorithms use iterators, and therefore are equally unsafe. Pass the wrong iterator to an algorithm, and the algorithm cannot detect the error and will produce undefined behavior. Fortunately, most uses of algorithms make it easy to avoid programming errors.
Most of the standard algorithms are declared in the <algorithm>
header, with some numerical algorithms in <numeric>
. Refer to the respective section of Chapter 13 for details; this section presents the properties of algorithms in general.
The generic algorithms all work in a similar fashion. They are all function templates, where the template parameters include one or more iterators. Because the algorithms are templates, you can specialize the function with any template argument that meets the basic requirements.
For example, for_each
is declared as follows:
template<typename InIter, typename Function> Function for_each(InIter first, InIter last, Function func);
The names of the template parameters tell you what is expected as template arguments: InIter
must be an input iterator, and Function
must be a function pointer or functional object. The documentation for for_each
further tells you that Function
must take one argument whose type is the value_type
of InIter
. That's all. The InIter argument can be anything that meets the requirements of an input iterator. Notice that no container is mentioned in the declaration or documentation of for_each
. You can use an istream_iterator
, for example.
For a programmer trained in traditional object-oriented programming, the flexibility of the standard algorithms migh seem strange or backwards. Thinking in terms of algorithms takes some adjustment.
Think about how you would read a stream of numbers into a data array. Typically, you would set up a while
loop to read the input stream, and for each number read, add the number to the array. Now rethink the problem in terms of an algorithmic solution. What you are actually doing is copying data from an input stream to an array, so use the copy
algorithm as follows:
std::copy(std::istream_iterator<double>(stream), std::istream_iterator<double>(), std::back_inserter(data));
The copy
algorithm copies all the items from one range to another. The input comes from an istream_iterator
, which is an iterator interface for reading from an istream
. The output range is a back_insert_iterator
(created by the back_inserter
function), which is an output iterator that pushes items onto a container.
At first glance, the algorithmic solution doesn't seem any simpler and clearer than a straightforward loop. More complex examples demonstrate the value of the C++ algorithms.
For example, all major programming languages have a type for character strings. They typically also have a function for finding substrings. What about the more general problem of finding a subrange in any larger range? Say, a researcher is looking for patterns in a data set and wants to see if a small data pattern occurs in a larger data set? In C++, use the search
algorithm:
std::vector<double> data; ... if (std::search(data.begin(), data.end(), pattern.begin(), pattern.end()) != data.end()) { // found the pattern... }
A number of algorithms take a function pointer or functional object (that is, an object that has operator()
overloaded) as one of the arguments. The algorithms calls the function and possibly uses the return value. For example, count_if
counts the number of times the function returns a true (non-zero) result when applied to each element in a range. For example,
bool negative(double x) { return x < 0; } std::vector<double>::iterator::difference_type neg_cnt; std::vector<double> data; ... neg_cnt = std::count_if(data.begin(), data.end(), negative);
In spite of the unwieldy declaration for neg_cnt
, the application of count_if
to count the number of negative items in the data
vector is easy to write and easy to read.
If you don't want to write a function just for use with an algorithm, you might be able to use the standard functional objects (which are declared in the <functional>
header). For example, the same count of negative values can be had with the following:
std::vector<double>::iterator::difference_type neg_cnt; std::vector<double> data; ... neg_cnt = std::count_if(data.begin(), data.end(), std::bind2nd(std::less<double>, 0.0));
The std::less
function template takes two arguments and applies operator<
. The bind2nd
function template takes a two-argument functional and binds a constant value (in this case 0.0
) as the second argument, returning a one-argument function (which is what count_if
requires). The use of standard functional objects can make the code harder to read, but also helps avoid writing one-off custom functions
When using functional objects, be very careful about objects that maintain state or have global side effects. Some algorithms copy the functional objects, and you must be sure the state is properly copied, too. The numerical algorithms do not permit functional objects that have side effects.
Example 11-8 shows one use for a functional object (or functor). It accumulates statistical data, for computing mean and variance of a data set. Pass an instance of Statistics
to the for_each
algorithm to accumulate the statistics. The copy that is returned from for_each
contains the desired results.
Example 11-8: Computing statistics with a functor.
template<typename T> class Statistics { public: typedef T value_type; Statistics() : n_(0), sum_(0), sumsq_(0) {} void operator()(double x) { ++n_; sum_ += x; sumsq_ += x * x; } size_t count() const { return n_; } T sum() const { return sum_; } T sumsq() const { return sumsq_; } T mean() const { return sum_ / n_; } T variance() const { return (sumsq_ - sum_*sum_ / n_) / (n_ - 1); } private: size_t n_; T sum_; T sumsq_; // sum of squares }; int main() { using namespace std; vector<double> data; copy(istream_iterator<double>(cin), istream_iterator<double>(), back_inserter(data)); Data<double> d = for_each(data.begin(), data.end(), Data<double>()); cout << "count=" << d.count() << '\n'; cout << "mean =" << d.mean() << '\n'; cout << "var =" << d.variance() << '\n'; cout << "stdev=" << std::sqrt(d.variance()) << '\n'; cout << "sum =" << d.sum() << '\n'; cout << "sumsq=" << d.sumsq() << '\n'; }
Chapter 13 describes all the algorithms in detail. This section presents a categorized summary of the algorithms.
If the algorithm has "_copy
" in its name, it works by reading an input range and copies all or parts of the input range to an output range. (The sole exception is merge
, which is a copy operation, and inplace_merge
, which does not copy. The name change reflects the fact that a copying merge is more common than an inplace merge.)
It is the programmer's responsibility to ensure the output range is large enough to accommodate the input.
If the algorithm name ends with "_if
", the final argument is a function pointer or functional object that returns a Boolean result.
count
count_if
for_each
equal
lexicographical_compare
max
max_element
min
min_element
mismatch
adjacent_find
find
find_end
find_first_of
find_if
search
search_n
binary_search
equal_range
lower_bound
upper_bound
copy
copy_backward
fill
fill_n
generate
generate_n
iter_swap
random_shuffle
remove
remove_copy
remove_copy_if
remove_if
replace
replace_copy
replace_copy_if
replace_if
reverse
reverse_copy
rotate
rotate_copy
swap_ranges
transform
unique
unique_copy
nth_element
partial_sort
partial_sort_copy
partition
sort
stable_partition
stable_sort
inplace_merge
merge
includes
set_difference
set_intersection
set_symmetric_difference
set_union
make_heap
pop_heap
push_heap
sort_heap
next_permutation
prev_permutation
Writing your own algorithm is easy. Some care is always needed when writing function templates (as discussed in Chapter 8), but generic algorithms do not present any special or unusual challenges.
Probably the first generic algorithm most programmers write is copy_if
, which was inexplicably omitted from the standard. The copy_if
function copies an input range to an output range, but only if a predicate returns true. Example 11-9 shows a simple implementation of copy_if
.
Example 11-9: One way to implement the copy_if function.
template<typename InIter, typename OutIter, typename Pred> OutIter copy_if(InIter first, InIter last, OutIter result, Pred pred) { for (; first != last; ++first) if (pred(*first)) { *result = *first; ++result; } return result; }
You can also specialize an algorithm. For example, you might be able to implement the algorithm more efficiently for a random access iterator. In that case, you can write helper functions, and use the iterator_category
trait to choose a specialized implementation. (Chapter 9 has more information about traits, including an example of using iterator traits to specialize a function.)
The real trick in designing and writing algorithms is being able to generalize the problem, and then find an efficient solution. Before running off to write your own solution, check the standard library first. Your problem might already have a solution.
For example, I recently wanted to write an algorithm to find the median value in a range. There is no median
algorithm, but there is nth_element
, which solves the more general problem of finding the element at any sorted index. Writing median
became a trivial matter of making a temporary copy of the data, calling nth_element
and then returning an iterator that points to the median value in the original range. Because median
makes two passes over the input range, a forward iterator is required, as shown in Example 11-10.
Example 11-10: Finding the median of a range.
template<typename FwdIter, typename Compare> FwdIter median(FwdIter first, FwdIter last, Compare compare) { std::vector<typename FwdIter::value_type> tmp(first, last); size_t median_pos = tmp.size() / 2; nth_element(tmp.begin(), tmp.begin() + median_pos, tmp.end(), compare); return std::find(first, last, tmp[median_pos]); }