Draft 2002-08-30

Chapter 11

Containers, Iterators, and Algorithms

Containers (sometimes called collections) are a staple of computer programming. Every major programming language has fundamental containers, such as arrays or lists. Modern programming languages usually have an assortment of more powerful containers for more specialized needs, such as trees.

This chapter presents the standard containers, the iterators used to examine containers, and the algorithms that can be used with the iterators.

Containers

The fundamental purpose of a container is to store multiple objects in a single container object. Different kinds of containers have different characteristics: speed, size, and ease of use. The choice of container depends on the characteristics and behavior you require.

In C++, the containers are class templates, so you can store anything in a container. (Well, almost anything. The type must have a public copy constructor and assignment operator. Some containers impose additional restrictions.)

Standard containers

The standard containers fall into two categories: sequences and associations. A sequence container preserves the original order in which items were added to the container. An associative container keeps items in ascending order, to speed up searching.

The standard containers are as follows:

deque
A deque (double-ended queue) is a sequence container that supports fast insertions and deletions at the beginning and end of the container. Inserting or deleting at any other position is slower. Items are not stored contiguously. The header is <deque>.
list
A list is a sequence container that supports rapid insertion or deletion at any position, but does not support random access. Items are not stored contiguously. The header is <list>.
map
multimap
A map (or dictionary) is an associative container that stores pairs of keys and associated values. The keys determine the order of items in the container. A map requires unique keys. A multimap permits duplicate keys. The header for map and multimap is <map>.
set
multiset
A set is an associative container that stores keys in ascending order. A set requires unique keys. A multiset permits duplicate keys. The header for set and multiset is <set>.
basic_string
string
wstring
You will usually think of the string and wstring types as strings, not containers of character values. Nonetheless, they meet the requirements of a sequence container, so you can use their iterators with the standard algorithms. The header is <string>.
vector
A vector is a sequence container that is most like an array, except that the vector can grow as needed. Items can be rapidly added to removed only at the end. Items are stored contiguously. The header is <vector>.

The set and map containers perform insertions, deletions, and searches in logarithmic time, which implies a tree or tree-like implementation. Items must be kept in sorted order, so a hash table implementation is not allowed. Many people consider the lack of a standard hash table to be a serious omission. When the C++ standard is revised, a hash table is likely to be added. Until then, there are plenty of hash table containers available on the Internet.

Container adapters

In addition to the standard containers, the standard library has several container adapters. An adapter is a class template that uses a container for storage while provided container-like behavior. The standard adapters are priority_queue, queue, and stack. For more information, see the <queue> and <stack> sections in Chapter 13.

Pseudo-containers

The standard library has a few class templates that are similar to the standard containers, but fail one or more of the requirements for a standard container. In particular, bitset, valarray, and vector<bool> are container-like types, but fail to mee the requirements of a standard container.

The bitset type represents a bitmask of arbitrary size. The sized is fixed when the bitset is declared. There are not bitset iterators, so you cannot use a bitset with the standard algorithms. See <bitset> in Chapter 13 for details.

A valarray is an array of numeric values, optmized for computational efficiency. As part of the optimzation, the compiler is free to make assumptions that prevent the user of valarray with the standard algorithms. See <valarray> in Chapter 13 for details.

The vector<bool> type is a specialization of the vector template. Although vector<> usually meets the requirements of a standard container, the vector<bool> specialization does not. See <vector> in Chapter 13 for details.

Custom containers

If you write your own container, be sure to follow the conventions and rules laid down for the standard containers. In particular, your container should define the following types:

const_iterator
The interator type for const values.
const_reference
A const lvalue type for the items stored in the container. Typically the same as the allocator's const_reference type.
difference_type
A signed integral type denoting the difference between two iterators.
iterator
The iterator type.
reference
An lvalue type for the items stored in the container. This is typically the same as the allocator's reference type.
size_type
An unsigned integral type that can hold any non-negative difference_type value.
value_type
The type of item stored in the container. This is typically the same as the first template parameter.

A container that supports bidirectional iterators should also define the reverse_iterator and const_reverse_iterator types.

An associative container should define key_type as the key type, compare_type as the key compare function, and value_compare as a function that compares two value_type objects.

Optionally, a container can declare pointer and const_pointer as synonyms for the allocator's types of the same name, and allocator_type for the allocator, which is typically the last template parameter.

A custom container should have standard default and copy constructors. The default constructor initializes the container to be empty. The copy constructor initializes the container with a copy of all the items from the other container. All containers should also have a constructor that takes two template arguments:

template<InIter> container (InIter first, InIter last)
If InIter is an integral type, the container is initialized with first copies of last (converted to value_type). Otherwise, InIter must be an input iterator, and the container is initialized with copies of all the items in the range [first, last).

A sequence container should have the following additional constructor:

container (size_type n, const value_type& x)
Initializes the container with n copies of x.

An associative container should have the following additional constructors:

container (key_compare compare)
Initializes an empty container that uses compare to compare keys.
template<InIter> container (InIter first, InIter last,
key_compare compare)
Initializes the container with copies of the items in the range [first, last), comparing keys with compare.

Optionally, your container can be parameterized with an allocator, and all the constructors can take an optional, additional parameter to specify an allocator objects.

The destructor should be sure to call the destructor for every object in the container.

All containers should have the following member functions:

iterator begin ()
const_iterator begin () const
Returns an iterator that points to the first item of the container.
void clear ()
Erases all the items in the container.
bool empty ()
Returns true if the container is empty (size() == 0).
iterator end ()
const_iterator end () const
Returns an iterator that points to one past the last item of the container.
erase (iterator p)
erase (iterator first, last)
Erases the item that p points to or all the items in the range [first, last). For a sequence container, erase returns an iterator that points to the item that comes immediately after that last deleted item or end(). For an associative container, erase does not return a value.
size_type max_size ()
Returns the largest number of items the container can possibly hold. Although many containers are not constrained, except by available memory and the limits of size_type, other container types might have a fixed maximum size, such as an array type.
container& operator= (const container& that)
Erases all items in this container and copies all the items from that.
size_type size ()
Returns the number of items in the container.
void swap (const container& that)
Swaps the elements of this container with that.

All containers should have all the comparison functions defined, either as member functions, or preferably functions at the namespace level. Namespace-level functions offer more flexibility than member functions. For example, the compiler can use implicit type conversions on the left-hand operand, but only if the function is not a member function.

A container that supports bidirectional iterators should define rbegin() and rend() member functions to return reverse iterators.

The following functions are optional. You should provide only those functions that can run in constant time:

reference at (size_type n)
const_reference at (size_type n) const
Returns the item at index n, or throws out_of_range if n >= size().
reference back ()
const_reference back () const
Returns the last item in the container. Behavior is undefined if the container is empty.
reference front ()
const_reference front () const
Returns the first item in the container. Behavior is undefined if the container is empty.
reference operator[] (size_type n)
const_reference operator[] (size_type n)
Returns the item at index n. Behavior is undefined if n >= size().
void pop_back ()
Erases the last item in the container. Behavior is undefined if the container is empty.
void pop_front ()
Erases the first item in the container. Behavior is undefined if the container is empty.
void push_back (const value_type& x)
Inserts x as the new last item in the container.
void push_front (const value_type& x)
Inserts x as the new first item in the container.

A sequence container should define the following member functions:

iterator insert (iterator p, const value_type& x)
Inserts x immediately before p and returns an iterator that points to x.
void insert (iterator p, size_type n,
const value_type& x)
Inserts n copies of x before p.
template<InIter>
void insert (iterator p, InIter first, InIter last)
Copies the values from [first, last) and inserts them before p.

An associative container should define the following member functions:

size_type count (const key_type& k) const
Returns the number of items equivalent to k.
pair<const_iterator,const_iterator>
equal_range (const key_type& k) const
pair<iterator,iterator> equal_range (const key_type& k)
Returns make_pair(lower_bound(k), upper_bound(k)).
size_type erase (const key_type& k)
Erases all the items equivalent to k. Returns the number of items erased.
const_iterator find (const key_type& k) const
iterator find (const key_type& k)
Finds an item equivalent to k and returns an iterator that points to one such item, or end() if not found.
insert (const value_type& x)
Inserts x. If the container permits duplicate keys, insert returns an iterator that points to the newly inserted item. If the container requires unique keys, insert returns pair<iterator,bool>, where the first element of the pair is an iterator that points to item equivalent to x, and the second element is true if x was inserted or false if x was already present in the container.
iterator insert (iterator p, const value_type& x)
Inserts x and returns an iterator that points to x. The iterator p is a hint for where x might belong.
template<InIter>
void insert (InIter first, InIter last)
Copies the items from [first, last) and inserts each item in the container.
key_compare key_comp () const
Returns the key compare function.
const_iterator lower_bound (const key_type& k) const
iterator lower_bound (const key_type& k)
Returns an iterator that points to the first item in the set that does not come before k. That is, if k is in the container, the iterator points to the position of its first occurrence; otherwise the iterator points to the first position where k should be inserted.
value_compare value_comp () const
Returns the value compare function.
const_iterator upper_bound (const key_type& k) const
iterator upper_bound (const key_type& k)
Returns an iterator that points to the first item in the container that comes after all occurrences of k.

Example 11-1 shows the slist container, which implements a singly-linked list. A singly-linked list rquires slightly less memory than a doubly-linked list, but offer at best a forward iterator, not a bidirectional iterator.

Example 11-1: Implementing a custom container: a singly-linked list.

// Simple container for singly-linked lists.
template<typename T, typename Alloc = ::std::allocator<T> >
class slist {
  // Private type for a link (node) in the list.
  template<typename U>
  struct link {
    link* next;
    U value;
  };
  typedef link<T> link_type;

public:
  typedef typename Alloc::reference reference;
  typedef typename Alloc::const_reference const_reference;
  typedef typename Alloc::pointer pointer;
  typedef typename Alloc::const_pointer const_pointer;
  typedef Alloc allocator_type;
  typedef T value_type;
  typedef size_t size_type;
  typedef ptrdiff_t difference_type;

  class iterator;       // See Iterators, later in this
  class const_iterator; // chapter for the iterators.

  slist(const slist& that);
  slist(const Alloc& alloc = Alloc());
  slist(size_type n, const T& x, const Alloc& alloc=Alloc());
  template<typename InputIter>
  slist(InputIter first, InputIter last,
        const Alloc& alloc = Alloc());
  ~slist()                             { clear(); }

  slist& operator=(const slist& that);
  allocator_type get_allocator() const { return alloc_; }

  iterator begin()           { return iterator(0, head_); }
  const_iterator begin() const
    { return const_iterator(0, head_); }
  iterator end()             { return iterator(0, 0); }
  const_iterator end() const { return const_iterator(0, 0); }

  void pop_front()                  { erase(begin()); }
  void push_front(const T& x)       { insert(begin(), x); }
  T front()                   const { return head_->value; }
  T& front()                        { return head_->value; }

  iterator insert(iterator p, const T& x);
  void insert(iterator p, size_type n, const T& x);
  template<typename InputIter>
  void insert(iterator p, InputIter first, InputIter last);

  iterator erase(iterator p);
  iterator erase(iterator first, iterator last);

  void clear()               { erase(begin(), end()); }
  bool empty()         const { return size() == 0; }
  size_type max_size() const
    { return ::std::numeric_limits<size_type>::max(); }
  void resize(size_type sz, const T& x = T());
  size_type size()     const { return count_; }
  void swap(slist& that);

private:
  typedef typename
    allocator_type::template rebind<link_type>::other
    link_allocator_type;

  link_type* newitem(const T& x, link_type* next = 0);
  void delitem(link_type* item);

  template<typename InputIter>
  void construct(InputIter first, InputIter last,
                 is_integer_tag);

  template<typename InputIter>
  void construct(InputIter first, InputIter last,
                 is_not_integer_tag);

  link_type* head_;
  link_type* tail_;
  size_t count_;
  allocator_type alloc_;
  link_allocator_type linkalloc_;
};

// Constructor. If InputIter is an integral type, the
// standard requires the constructor to interpret first
// and last as a count and value, and perform the
// slist(size_type, T) constructor. Use the is_integer
// trait to dispatch to the appropriate construct function,
// which does the real work.
template<typename T, typename A>
template<typename InputIter>
slist<T,A>::slist(InputIter first, InputIter last,
                  const A& alloc)
: alloc_(alloc), linkalloc_(link_allocator_type()),
  head_(0), tail_(0), count_(0)
{
  construct(first, last, is_integer<InputIter>::tag());
}

template<typename T, typename A>
template<typename InputIter>
void slist<T,A>::construct(InputIter first, InputIter last,
                           is_integer_tag)
{
  insert(begin(), static_cast<size_type>(first),
         static_cast<T>(last));
}

template<typename T, typename A>
template<typename InputIter>
void slist<T,A>::construct(InputIter first, InputIter last,
                           is_not_integer_tag)
{
  insert(begin(), first, last);
}

// Private function to allocate a new link node.
template<typename T, typename A>
typename slist<T,A>::link_type*
   slist<T,A>::newitem(const T& x, link_type* next)
{
  link_type* item = linkalloc_.allocate(1);
  item->next = next;
  alloc_.construct(&item->value, x);
  return item;
}

// Private function to release a link node.
template<typename T, typename A>
void slist<T,A>::delitem(link_type* item)
{
  alloc_.destroy(&item->value);
  linkalloc_.deallocate(item, 1);
}

// Basic insertion function. All insertions eventually find
// their way here. Inserting at the head of the list
// (p == begin()) must set the head_ member.
// Inserting at the end of the list (p == end()) means
// appending to the list, which updates the tail_'s next
// member, and then sets tail_. Anywhere else in the list
// requires updating p.prev_->next. Note that inserting into
// an empty list looks like inserting at end(). Return an
// iterator that points to the newly inserted node.
template<typename T, typename A>
typename slist<T,A>::iterator
  slist<T,A>::insert(iterator p, const T& x)
{
  // Allocate the new link before changing any pointers. If
  // newitem throws an exception, the list is not affected.
  link_type* item = newitem(x, p.node_);
  if (p.node_ == 0) {
    p.prev_ = tail_;
    // at end
    if (tail_ == 0)
      head_ = tail_ = item; // empty list
    else {
      tail_->next = item;
      tail_ = item;
    }
  }
  else if (p.prev_ == 0)
    head_ = item;          // new head of list
  else
    p.prev_->next = item;
  p.node_ = item;
  ++count_;
  return p;
}

// Erase the item at p. All erasures come here eventually.
// If erasing begin(), update head_.
// If erasing the last item in the list, update tail_.
// Update the iterator to point to the node after the one
// being deleted.
template<typename T, typename A>
typename slist<T,A>::iterator slist<T,A>::erase(iterator p)
{
  link_type* item = p.node_;
  p.node_ = item->next;
  if (p.prev_ == 0)
    head_ = item->next;
  else
    p.prev_->next = item->next;
  if (item->next == 0)
    tail_ = p.prev_;
  --count_;
  delitem(item);
  return p;
}

// Comparison functions are straightforward.
template<typename T>
bool operator==(const slist<T>& a, const slist<T>& b)
{
  return a.size() == b.size() &&
         ::std::equal(a.begin(), a.end(), b.begin());
}

Using containers

A container hold stuff. Naturally, you need to know how to add stuff to a container, remove stuff from a container, find stuff in a container, and so on.

To add an item to a container, call an insert member function. Sequence containers might also have push_front or push_back to insert an item at the beginning or end of the sequence. The push_front and push_back members exist only if they can be implemented in constant time. (Thus, for example, vector does not have push_front.)

Every container has an insert(iter, item) function, where iter is an iterator and item is the item to insert. A sequence container inserts the item before the indicated position. Associative containers treat the iterator as a hint: if the item belongs immediately after the iterator's position, performance is constant instead of logarithmic.

Sequence containers have other insert functions, to insert many copies of an item at a position and to copy a range to a given position. Associative containers can copy a range into the container.

To remove an item from a container, call an erase member function. All containers have erase functions that take a single iterator (to delete the item that the iterator points to) and two iterators (to delete every item in the range). Associative containers have an erase function that takes an item as an argument to erase all matching items.

Among the standard algorithms are remove and remove_if. Their names are suggestive, but misleading. They do not remove anything from the container. Instead, they rearrange the elements of the container so the items to remove are at the end. They return an iterator that points to the first item to be erased. Call erase with this iterator as the first argument and end() as the second to erase the items from the container. This two-step process is needed because an iterator cannot erase anything. The only way to erase an item from a container is to call a member function of the container, and the standard algorithms do not have access to the containers, only iterators. Example 11-2 shows how to implement a generic erase function that calls remove and then the erase member function.

Example 11-2: Removing matching items from a sequence container.

// Erase all items from container c that are equal to item.
template<template<typename T, typename A> class C,
         typename T, typename A>
void erase(C<T,A>& c, const T& item)
{
  c.erase(std::remove(c.begin(), c.end(), item), c.end());
}

template<template<typename T, typename A> class C,
         typename T, typename A, typename Pred>
void erase_if(C<T,A>& c, Pred pred)
{
  c.erase(std::remove_if(c.begin(), c.end(), pred), c.end());
}

int main()
{
  std::list<int> l;
  ...
  // Erase all items == 20.
  erase(l, 20);
  ...
  // Erase all items < 20.
  erase_if(l, std::bind2nd(std::less<int>(), 20));
  ...
}

The standard algorithms provide several different ways to search for items in a container: adjacent_find, find, find_end, first_first_of, find_if, search, and search_n. These algorithms essentially perform a linear search of a range. If you know exactly which item you want, you can search an associative container much faster by calling the find member function. For example, suppose you want to write a generic function, contains, that tells you whether a container contains at least one instance of an item. Example 11-3 shows one way to implement this function.

Example 11-3: Determining whether a container contains an item.

// Need a type trait to tell us which containers are
// associative and which are not.
struct associative_container_tag {};
struct sequence_container_tag {};
struct unknown_container_tag {};

template<template<typename T, typename A> class C,
         typename T, typename A>
struct is_associative
{
  typedef unknown_container_tag tag;
};
template<typename T, typename A>
struct is_associative<std::list,T,A>
{
  typedef sequence_container_tag tag;
};
// ditto for vector and deque
template<typename T, typename A>
struct is_associative<std::set,T,A>
{
  typedef associative_container_tag tag;
};
// ditto for multiset, map, and multimap

template<template<typename T, typename A> class C,
         typename T, typename A>
inline bool do_contains(const C<T,A>& c, const T& item,
  const associative_container_tag&)
{
  return c.end() != c.find(item);
}

template<template<typename T, typename A> class C,
         typename T, typename A>
inline bool do_contains(const C<T,A>& c, const T& item,
  const sequence_container_tag&)
{
  return c.end() != ::std::find(c.begin(), c.end(), item);
}

template<template<typename T, typename A> class C,
                  typename T, typename A>
inline bool do_contains(const C<T,A>& c, const T& item,
  const unknown_container_tag&)
{
  return c.end() != ::std::find(c.begin(), c.end(), item);
}

// Here is the actual contains function. It dispatches
// to do_contains, picking the appropriate overloaded
// function depending on the type of the container c.
template<template<typename T, typename A> class C,
                  typename T, typename A>
bool contains(const C<T,A>& c, const T& item)
{
  return do_contains(c, item, is_associative<C,T,A>::tag());
}

As you can see, iterators are important for using containers. You need them to insert at a specific position, identify an item for erasure, or specifying ranges for algorithms. The next section discusses iterators in more depth.

Iterators

An iterator is a kind of smart pointer for pointing into containers and other sequences. An ordinary pointer can point to different elements in an array. The ++ operator advances the pointer to the next element, and the * operator dereferences the pointer to return a value from the array. Iterators generalize the concept so the same operators have the same behavior for any container, even trees and lists. This section describes iterators in general. See the <iterator> section of Chapter 13 for more details.

Iterator categories

Ther e are five categories of iterators:

Input
An input iterator permits one pass to read a sequence. The increment (++) operator advances to the next element, but there is no decrement operator. The dereference (*) operator does not return an lvalue, so you can read elements but not modify them.
Output
An output iterator permits one pass to write a sequence. The increment (++) operator advances to the next element, but there is no decrement operator. You can dereference an element only to assign a value to it. You cannot compare output iterators.
Forward
A forward iterator is like a combination of an input and an output iterator. You can use a forward iterator anywhere an input iterator is required or where an output iterator is required. A forward iterator, as its name implies, permits unidirectional access to a sequence, but you can refer to a single element and modify it multiple times before advancing the iterator.
Bidirectional
A bidirectional iterator is like a forward iterator but also supports the -- (decrement) operator to move the iterator backward by one position.
Random access
A random access iterator is like a bidirectional iterator but also supports the [] (subscript) operator to access any index in the sequence. Also, you can add or subtract an integer to move a random access integer by more than one position at a time. Subtracting two random access iterators yields an integer distance between them. Thus, a random access iterator is most like a conventional pointer, and a pointer can be used as a random access iterator.

An input, forward, bidirectional, or random access iterator can be a constant iterator. Dereferencing a constant iterator yields a constant lvalue.

Iterator safety

The most important point to remember about iterators is that they are inherently unsafe. Like pointers, an iterator can point to a container that has been destroyed or to an element that has been erased. You can advance an iterator past the end of the container the same way a pointer can point past the end of an array. With a little care and caution, however, iterators are safe to use.

The first key to safe use of iterators is to make sure a program never dereferences an iterator that marks the end of a range. Two iterators can denote a range of values, typically in a container. One iterator points to the start of the range and another marks the end of the range by pointing to a position one past the last element in the range. The mathematical notation of [first, last) tells that that the item that first points to is included in the range, but the item that last points to is excluded from the range.

A program must never dereference an iterator that is pointing to one past the end of a range (e.g., last) because that iterator might not be valid. It might be pointing to one past the end of the elements of a container, for example.

Even a valid iterator can become invalid and therefore unsafe to use, for example if the container is destroyed. The detailed descriptions in Chapter 13 tell you this information for each container type. In general, iterators for the node-based containers (list, set, multiset, map, multimap) become invalid only when they point to an erased node. Iterators for the array-based containers (deque, vector) becomes invalid when the underlying array is reallocated, which might happen for any insertion and for some erasures.

Special iterators

Iterators are often used with containers, but they have many more uses. You can define iterators for almost any sequence of objects. The standard library includes several examples of non-container iterators, most notably I/O iterators.

At the lowest level, a stream is nothing more than a sequence of characters. At a slightly higher level, you can think of a stream as a sequence of objects, which would be read with operator>> or written with operator<<. Thus, the standard library includes the following I/O iterators: istreambuf_iterator, ostreambuf_iterator, istream_iterator, ostream_iterator. Example 11-4 shows how to use streambuf iterators to copy one stream to another.

Example 11-4: Copying streams with streambuf iterators.

template<typename charT, typename traits>
void copy(std::basic_ostream<charT,traits>& out,
          std::basic_istream<charT,traits>& in)
{
  std::copy(std::istreambuf_iterator<charT>(in),
            std::istreambuf_iterator<charT>(),
            std::ostreambuf_iterator<charT>(out));
}

A special kind of output iterator is an insert iterator, which inserts items in a sequence collection. The insert iterator requires a container, and an optional iterator to specify the position where the new items should be inserted. You can insert at the back of a sequence with back_insert_iterator, at the front of a sequence with front_insert_iterator, or at a specific position with insert_iterator. Each of these iterator class templates has an associated function template that creates the object for you, letting the compile infer the type. Example 11-5 shows how to read a series of numbers from a stream, and store them all at the end of a vector.

Example 11-5: Inserting numbers in a vector.

#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>

int main()
{
  using namespace std;

  vector<double> data;
  copy(istream_iterator<double>(cin),
       istream_iterator<double>(),
       back_inserter(data));
  // use the data...
  // Write the data, one number per line.
  copy(data.begin(), data.end(),
       ostream_iterator<double>(cout, "\n"));
}

Custom iterators

The simplest way to write your own iterator is to derive from the iterator class template, specializing it for your iterator category. The slist container from Example 11-1 needs an iterator and a const_iterator. The only difference is that a const_iterator returns rvalues instead of lvalues. Much of the iteration logic can be factored into a base class. Example 11-6 shows iterator and base_iterator; const_iterator is almost identical to iterator, so it is not shown.

Example 11-6: Writing a custom iterator.

// The declaration for iterator_base is nested in slist.
class iterator_base : 
  public std::iterator< ::std::forward_iterator_tag, T> {
    friend class slist;
public:
  bool operator==(const iterator_base& i) const
  { return node_ == i.node_; }
  bool operator!=(const iterator_base& i) const
  { return ! (*this == i); }

protected:
  iterator_base(const iterator_base& i)
  : prev_(i.prev_), node_(i.node_) {}
  iterator_base(slist::link_type* prev,
                slist::link_type* node)
  : prev_(prev), node_(node) {}
  // If node_ == 0, the iterator == end().
  slist::link_type* node_;
  // A pointer to the node before node_ is needed to support
  // erase(). If prev_ == 0, the iterator points to the head
  // of the list.
  slist::link_type* prev_;
private:
  iterator_base();
};

// The declaration for iterator is nested in slist.
class iterator : public iterator_base {
  friend class slist;
public:
  iterator(const iterator& i) : iterator_base(i) {}
  iterator& operator++() {              // pre-increment
    this->prev_ = this->node_;
    this->node_ = this->node_->next;
    return *this;
  }
  iterator  operator++(int) {          // post-increment
    iterator tmp = *this;
    operator++();
    return tmp;
  }
  T& operator*()         { return  this->node_->value; }
  T* operator->()        { return &this->node_->value; }
private:
  iterator(slist::link_type* prev, slist::link_type* node)
  : iterator_base(prev, node) {}
};

const_iterators

Every container must provide an iterator type and a const_iterator type. Functions such as begin() and end() return iterator when calls on a non-const container and return const_iterator when called on a const container.

Note that a const_iterator (with underscore) is quite different from a const iterator (without underscore). A const iterator is a constant object of type iterator. Being constant, it cannot change, so it cannot advance to point to a different position. A const_iterator, on the other hand, is a non-const object of type const_iterator. It is not constant, so it can change value. The key difference between iterator and const_iterator is that iterator returns references to T objects, and const_iterator returns type const T. The standard requires that a plain iterator be convertible to const_iterator, but not the other way.

The problem is that some members of the standard contains (most notably erase and insert) take iterator as parameters, not const_iterator. If you have a const_iterator, you cannot use it as an insertion or erasure position.

Another problem is that it might be difficult to compare an iterator with a const_iterator. If the compiler reports an error when you try to compare iterators for equality or inequality, try swapping the order of the iterators, that is, if a == b fails to compile, try b == a. The problem, most likely, is that b is a const_iterator and a is a plain iterator. By swapping the order, you let the compiler convert a to a const_iterator and allow the comparison.

For a full explanation of how best to work with const_iterators, see Scott Meyer's Effective STL (Addison-Wesley, 2001).

Reverse iterators

The standard library includes the reverse_iterator class template as a convenient way to implement the reverse_iterator type in certain containers. A reverse iterator, as its name implies, is an iterator adapter that runs in the reverse direction of the adapted iterator. The adapted iterator must be a bidirectional or random access iterator.

On paper, the reverse iterator seems like a good idea. After all, a bidirectional iterator can run in two directions. There is no reason why an iterator adapter could not implement operator++ by calling the adapted iterator's operator-- function.

Reverse iterators share a problem with const_iterators, namely hat several members, such as insert and erase, do not take an iterator template parameter, but require the exact iterator type, as declared in the container class. The reverse_iterator type is not accepted, so you must pass the adapted iterator instead, which is returned from the base() function.

As an insertion point, the base() iterator works fine, but for erasing, it is one off from the desired position. The solution is to increment the reverse iterator, then call base(), as shown in Example 11-7.

Example 11-7: Using a reverse iterator.

int main()
{
  std::list<int> l;
  l.push_back(10); l.push_back(42); l.push_back(99);
  print(l);
  std::list<int>::reverse_iterator ri;
  ri = std::find(l.rbegin(), l.rend(), 42);
  l.insert(ri.base(), 33);
  // Okay: 33 inserted before 42, from the point of view
  // of a reverse iterator, that is, 33 inserted after 42.

  ri = std::find(l.rbegin(), l.rend(), 42);
  l.erase(ri.base());
  // Oops! Item 33 is deleted, not item 42.

  ri = std::find(l.rbegin(), l.rend(), 42);
  l.erase((++ri).base());
  // That's right! In order to delete the item ri points to,
  // you must advance ri first, then delete the item.
}

For a full explanation of how best to work with reverse iterators, see Scott Meyer's Effective STL (Addison-Wesley, 2001).

Algorithms

The so-called algorithms in the standard library set off C++ from other programming languages. Every major programming language has a set of containers, but in the traditional object-oriented approach, each container defines the operations that are permitted, e.g., sorting, searching, modifying. C++ turns object-oriented programming on its head and provides a set of function templates, called algorithms, that work with iterators, and therefore with almost any container.

The advantage of the C++ approach is that the library can contain a rich set of algorithms, where each algorithm can be written once and work with any kind of container. The set of algorithms is easily extensible without touching the container classes. Another benefit is that the algorithms work with iterators, not containers, so even non-container iterators (such as the stream iterators) can participate.

C++ algorithms have one glaring disadvantage, though. Remember that iterators, like pointers, can be unsafe. Algorithms use iterators, and therefore are equally unsafe. Pass the wrong iterator to an algorithm, and the algorithm cannot detect the error and will produce undefined behavior. Fortunately, most uses of algorithms make it easy to avoid programming errors.

Most of the standard algorithms are declared in the <algorithm> header, with some numerical algorithms in <numeric>. Refer to the respective section of Chapter 13 for details; this section presents the properties of algorithms in general.

How algorithms work

The generic algorithms all work in a similar fashion. They are all function templates, where the template parameters include one or more iterators. Because the algorithms are templates, you can specialize the function with any template argument that meets the basic requirements.

For example, for_each is declared as follows:

template<typename InIter, typename Function>
Function for_each(InIter first, InIter last, Function func);

The names of the template parameters tell you what is expected as template arguments: InIter must be an input iterator, and Function must be a function pointer or functional object. The documentation for for_each further tells you that Function must take one argument whose type is the value_type of InIter. That's all. The InIter argument can be anything that meets the requirements of an input iterator. Notice that no container is mentioned in the declaration or documentation of for_each. You can use an istream_iterator, for example.

For a programmer trained in traditional object-oriented programming, the flexibility of the standard algorithms migh seem strange or backwards. Thinking in terms of algorithms takes some adjustment.

Think about how you would read a stream of numbers into a data array. Typically, you would set up a while loop to read the input stream, and for each number read, add the number to the array. Now rethink the problem in terms of an algorithmic solution. What you are actually doing is copying data from an input stream to an array, so use the copy algorithm as follows:

std::copy(std::istream_iterator<double>(stream),
          std::istream_iterator<double>(),
          std::back_inserter(data));

The copy algorithm copies all the items from one range to another. The input comes from an istream_iterator, which is an iterator interface for reading from an istream. The output range is a back_insert_iterator (created by the back_inserter function), which is an output iterator that pushes items onto a container.

At first glance, the algorithmic solution doesn't seem any simpler and clearer than a straightforward loop. More complex examples demonstrate the value of the C++ algorithms.

For example, all major programming languages have a type for character strings. They typically also have a function for finding substrings. What about the more general problem of finding a subrange in any larger range? Say, a researcher is looking for patterns in a data set and wants to see if a small data pattern occurs in a larger data set? In C++, use the search algorithm:

std::vector<double> data;
...
if (std::search(data.begin(), data.end(),
    pattern.begin(), pattern.end()) != data.end())
{
  // found the pattern...
}

A number of algorithms take a function pointer or functional object (that is, an object that has operator() overloaded) as one of the arguments. The algorithms calls the function and possibly uses the return value. For example, count_if counts the number of times the function returns a true (non-zero) result when applied to each element in a range. For example,

bool negative(double x)
{
  return x < 0;
}

std::vector<double>::iterator::difference_type neg_cnt;
std::vector<double> data;
...
neg_cnt = std::count_if(data.begin(), data.end(), negative);

In spite of the unwieldy declaration for neg_cnt, the application of count_if to count the number of negative items in the data vector is easy to write and easy to read.

If you don't want to write a function just for use with an algorithm, you might be able to use the standard functional objects (which are declared in the <functional> header). For example, the same count of negative values can be had with the following:

std::vector<double>::iterator::difference_type neg_cnt;
std::vector<double> data;
...
neg_cnt = std::count_if(data.begin(), data.end(),
              std::bind2nd(std::less<double>, 0.0));

The std::less function template takes two arguments and applies operator<. The bind2nd function template takes a two-argument functional and binds a constant value (in this case 0.0) as the second argument, returning a one-argument function (which is what count_if requires). The use of standard functional objects can make the code harder to read, but also helps avoid writing one-off custom functions

When using functional objects, be very careful about objects that maintain state or have global side effects. Some algorithms copy the functional objects, and you must be sure the state is properly copied, too. The numerical algorithms do not permit functional objects that have side effects.

Example 11-8 shows one use for a functional object (or functor). It accumulates statistical data, for computing mean and variance of a data set. Pass an instance of Statistics to the for_each algorithm to accumulate the statistics. The copy that is returned from for_each contains the desired results.

Example 11-8: Computing statistics with a functor.

template<typename T>
class Statistics {
public:
  typedef T value_type;
  Statistics() : n_(0), sum_(0), sumsq_(0) {}
  void operator()(double x) {
    ++n_;
    sum_ += x;
    sumsq_ += x * x;
  }
  size_t count() const { return n_; }
  T sum()        const { return sum_; }
  T sumsq()      const { return sumsq_; }
  T mean()       const { return sum_ / n_; }
  T variance()   const
      { return (sumsq_ - sum_*sum_ / n_) / (n_ - 1); }
private:
  size_t n_;
  T sum_;
  T sumsq_; // sum of squares
};

int main()
{
  using namespace std;

  vector<double> data;
  copy(istream_iterator<double>(cin),
       istream_iterator<double>(),
       back_inserter(data));

  Data<double> d = for_each(data.begin(), data.end(), Data<double>());
  cout << "count=" << d.count() << '\n';
  cout << "mean =" << d.mean() << '\n';
  cout << "var  =" << d.variance() << '\n';
  cout << "stdev=" << std::sqrt(d.variance()) << '\n';
  cout << "sum  =" << d.sum() << '\n';
  cout << "sumsq=" << d.sumsq() << '\n';
}

Standard algorithms

Chapter 13 describes all the algorithms in detail. This section presents a categorized summary of the algorithms.

If the algorithm has "_copy" in its name, it works by reading an input range and copies all or parts of the input range to an output range. (The sole exception is merge, which is a copy operation, and inplace_merge, which does not copy. The name change reflects the fact that a copying merge is more common than an inplace merge.)

It is the programmer's responsibility to ensure the output range is large enough to accommodate the input.

If the algorithm name ends with "_if", the final argument is a function pointer or functional object that returns a Boolean result.

Nonmodifying operations

count
Returns the number of items that match a given value.
count_if
Returns the number of items for which a predicate returns true.
for_each
Applies a function or functor to each item.

Comparison

equal
Determines whether two ranges have equivalent contents.
lexicographical_compare
Determines whether one range compares less than another range.
max
Retruns the maximum of two values.
max_element
Finds the maximum value in a range.
min
Returns the minimum of two values.
min_element
Finds the minimum value in a range.
mismatch
Finds the first position where two ranges differ.

Searching

adjacent_find
Finds the first position where an item is equal to its neighbor.
find
Find the first occurrence of a value in a range.
find_end
Finds the last occurrence of a subsequence in a range.
find_first_of
Finds the first position where a value matches any one item from a range of values.
find_if
Finds the first position where a predicate returns true.
search
Finds a subsequence in a range.
search_n
Finds a fixed-size subsequence in a range.

Binary search

binary_search
Finds an item in a sorted range, using binary search.
equal_range
Finds the upper and lower bounds.
lower_bound
Finds the lower bound of where an item belongs in a sorted range.
upper_bound
Finds the upper bound of where an item belongs in a sorted range.

Modifying sequence operations

copy
Copies an input range to an output range.
copy_backward
Copies an input range to an output range, starting at the end of the output range.
fill
Fills a range with a value.
fill_n
Fills a fixed-size range with a value.
generate
Fills a range with values returned from a function.
generate_n
Fills a fixed-size range with values returned from a function.
iter_swap
Swaps the values that two iterators point to.
random_shuffle
Shuffles a range into random order.
remove
Reorders a range to prepare for erasing all elements equal to a given value.
remove_copy
Copies a range, removing all items equal to a given value.
remove_copy_if
Copies a range, removing all items for which a predicate returns true.
remove_if
Reorders a range to prepare to erase all items for which a predicate returns true.
replace
Replaces items of a given value with a new value.
replace_copy
Copies a range, replacing items of a given value with a new value.
replace_copy_if
Copies a range, replacing items for which a predicate returns true with a new value.
replace_if
Replaces items for which a predicate returns true with a new value.
reverse
Reverses a range in place.
reverse_copy
Copies a range in reverse order.
rotate
Rotates items from one end of a range to the other end.
rotate_copy
Copies a range, rotating items from one end to the other.
swap_ranges
Swaps values in two ranges.
transform
Modifies every value in a range by applying a transformation function.
unique
Reorders a range to prepare for erasing all adjacent, duplicate items.
unique_copy
Copies a range, removing adjacent, duplicate items.

Sorting

nth_element
Finds the item that belongs at the nth position and reorders the range to partition it into items less than the nth item and items greater than or equal to the nth item.
partial_sort
Reorders a range so the first part is sorted.
partial_sort_copy
Copies a range so the first part is sorted.
partition
Reorders a range so all items for which a predicate is true come before all items for which the predicate is false.
sort
Sorts items in ascending order
stable_partition
Reorders a range so all items for which a predicate is true come before all items for which the predicate is false. The relative order of items within a partition is maintained.
stable_sort
Sorts items in ascending order. The relative order of equal items is maintained.

Merging

inplace_merge
Merges two sorted, consecutive subranges in place, so the results replace the original ranges.
merge
Merges two sorted ranges, copying the results to a separate range.

Set operations

includes
Determines whether one sorted range is a subset of another.
set_difference
Copies the set difference of two sorted ranges to an output range.
set_intersection
Copies the intersection of two sorted ranges to an output range.
set_symmetric_difference
Copies the symmetric difference of two sorted ranges to an output range.
set_union
Copies the union of two sorted ranges to an output range.

Heap operations

make_heap
Reorders a range into heap order.
pop_heap
Reorders a range to remove the first item in the heap.
push_heap
Reorders a range to add the last item to the heap.
sort_heap
Reorders a range that starts in heap order into fully sorted order.

Permutations

next_permutation
Reorders a range to form the next permutation.
prev_permutation
Reorders a range to form the previous permutation.

Custom algorithms

Writing your own algorithm is easy. Some care is always needed when writing function templates (as discussed in Chapter 8), but generic algorithms do not present any special or unusual challenges.

Probably the first generic algorithm most programmers write is copy_if, which was inexplicably omitted from the standard. The copy_if function copies an input range to an output range, but only if a predicate returns true. Example 11-9 shows a simple implementation of copy_if.

Example 11-9: One way to implement the copy_if function.

template<typename InIter, typename OutIter, typename Pred>
OutIter copy_if(InIter first, InIter last, OutIter result,
                Pred pred)
{
  for (; first != last; ++first)
    if (pred(*first)) {
      *result = *first;
      ++result;
    }
  return result;
}

You can also specialize an algorithm. For example, you might be able to implement the algorithm more efficiently for a random access iterator. In that case, you can write helper functions, and use the iterator_category trait to choose a specialized implementation. (Chapter 9 has more information about traits, including an example of using iterator traits to specialize a function.)

The real trick in designing and writing algorithms is being able to generalize the problem, and then find an efficient solution. Before running off to write your own solution, check the standard library first. Your problem might already have a solution.

For example, I recently wanted to write an algorithm to find the median value in a range. There is no median algorithm, but there is nth_element, which solves the more general problem of finding the element at any sorted index. Writing median became a trivial matter of making a temporary copy of the data, calling nth_element and then returning an iterator that points to the median value in the original range. Because median makes two passes over the input range, a forward iterator is required, as shown in Example 11-10.

Example 11-10: Finding the median of a range.

template<typename FwdIter, typename Compare>
FwdIter median(FwdIter first, FwdIter last, Compare compare)
{
  std::vector<typename FwdIter::value_type> tmp(first, last);
  size_t median_pos = tmp.size() / 2;
  nth_element(tmp.begin(), tmp.begin() + median_pos,
              tmp.end(), compare);
  return std::find(first, last, tmp[median_pos]);
}