In lesson 17.7 -- Introduction to C-style arrays, we discussed arrays and loops, and noted that using a loop to step through the elements of an array is a common thing to do. Accessing each element of a container in some order is called traversal or iteration.
Let’s take a look at a simple example of traversing a std::array
to print the value of each element:
#include <array>
#include <iostream>
int main()
{
std::array arr { 9, 7, 5, 3, 1 };
// Traverse our array and print elements
for (std::size_t index { 0 }; index < std::size(arr); ++index)
std::cout << arr[index] << ' '; // note: using subscripting here
std::cout << '\n';
return 0;
}
This works just fine. Now, let’s try this same function with std::list
, a different kind of sequence container:
#include <iostream>
#include <list>
int main()
{
std::list arr { 9, 7, 5, 3, 1 };
for (std::size_t index { 0 }; index < std::size(arr); ++index)
std::cout << arr[index] << ' '; // compile error: std::list doesn't support operator[]
return 0;
}
Unlike our array containers, the std::list
container doesn’t have an operator[]
as part of its interface. The expression arr[index]
thus produces a compilation error.
For advanced readers
In an array, the memory to store the elements is allocated in one contiguous chunk, and each element is placed sequentially in memory. When we subscript an array, it is a quick calculation to determine where the element we care about resides in memory (we discuss this further in lesson 17.9 -- Pointer arithmetic and subscripting).
In a linked list, each element is called a node. A node typically contains 3 members: the value of the element, a pointer to the next node in the list, and a pointer to the previous node in the list. Unlike an array, storage for each node is allocated individually, which means a given node could reside anywhere in memory. And because the nodes of a list are not sequential, there is no convenient way to calculate where a given node resides in memory. Wikipedia has a good description of linked lists.
std::list
only has front()
and back()
member functions to access the first and last element of the list. So how do we even get to the middle elements of a std::list
? We’ll answer this question in a section below.
Because arrays and lists are both sequence containers, ideally we should be able to swap out our std::array
for a std::list
and have the rest of our code work without additional changes. But std::array
and std::list
have different interfaces, and even more different implementations. So how do we make this work in practice?
Since iterating through a container is such a common thing to do, it would be nice if there was some consistent way that we could iterate through a container regardless of what type of container it is.
Introduction to iterators
An iterator is an object that is used to iterate through a container using a simple and consistent interface.
Basic iterators support two primary operations:
- Get the current element, typically implemented via
operator*
. - Advance to the next element, typically implemented via
operator++
.
Once we have an iterator, we can use the iterator to traverse and access elements of the container without having to worry about how the container is implemented or what kind of interface the container exposes. In other words, an iterator provides a consistent interface for iterating through a container, abstracting away the implementation and interface details of that container.
Iterating through a C-style array using a pointer
Historically in C++, pointers were used as basic iterators to traverse C-style arrays.
Let’s look at a simple example of this:
#include <iostream>
int main()
{
int arr[] { 9, 7, 5, 3, 1 };
int* begin = arr; // pointer to first array element
int* end = arr + std::size(arr); // pointer to one-past-end of array
// Traverse our array and print elements
for ( auto iter { begin }; // iter starts at beginning of array
iter != end; // continue iterating if we haven't reached the end
++iter) // operator++ advances iter to the next contiguous element
std::cout << *iter << ' ' ; // operator* gets the value of the current element
std::cout << '\n';
return 0;
}
In the above example, iter
is our iterator object (of type int*
). We start our traversal at the element pointed to by begin
(which in this case is element 0 of the array). Since begin != end
yet, the loop body executes. Inside the loop, we access the current element via *iter
, which dereferences the pointer to fetch the object at the address being pointed to. After the loop body, we do ++iter
, which uses pointer arithmetic to change the address of iter
to the address of the next element. Since begin != end
is still true, the loop body executes again. Eventually, begin != end
is false
(which happens when begin == end
), and the loop terminates.
Related content
We cover pointer arithmetic in lesson 17.9 -- Pointer arithmetic and subscripting.
Thus, the above prints:
9 7 5 3 1
There are a few things worth noting here. First, with arrays, end
is set to one-element-past-the-end of the array. Having end
hold this address is fine (so long as we don’t dereference end
, as there isn’t a valid element at that address). We do this because it makes our math and comparisons as simple as possible (no need to add or subtract 1 anywhere).
Tip
For a pointer that is pointing to a C-style array element, pointer arithmetic is valid so long as the resulting address is the address of a valid array element, or one-past-last element. If pointer arithmetic results in an address beyond these bounds, it is undefined behavior (even if the result is not dereferenced).
Second, with iterators, it is conventional to test whether we’ve reached the end using operator==
or operator!=
rather than a relational comparison (e.g. operator>
). The reasons for this will become clear later.
Iterating through a std::array
Let’s do a similar example, using std::array
this time:
#include <array>
#include <iostream>
int main()
{
std::array arr { 9, 7, 5, 3, 1 };
for ( auto iter = arr.begin();
iter != arr.end();
++iter)
std::cout << *iter << ' ';
std::cout << '\n';
}
This prints:
9 7 5 3 1
This example works just like the prior example.
However, instead of manually determining begin
and end
, we’re using the begin()
and end()
member functions of std::array
. These return an iterator to the first and one-past-last element respectively. In the case of std::array
, the iterator is just a pointer (in this case, of type int*
).
All C++ Containers have these begin()
and end()
member functions. You can also use the std::begin()
and std::end()
non-member functions, which work with both Containers and C-style arrays.
To summarize, when using a pointer as an iterator, we use the following operators:
operator*
(dereference) is used to get the current element.operator++
(prefix or postfix) is used to advance to the next contiguous element.
The word “contiguous” is important here. When used on a pointer, operator++
increments the address of the pointer (by the size of the type being pointed to) so that the pointer now points to the address where the next contiguous element would be. This is exactly what we want when the elements of our container are laid out contiguously.
But what if they aren’t?
Iterators for non-contiguous containers
When the elements of a container are non-contiguous, using operator++
will advance the pointer to the address of where the next contiguous element would be, but there will be no valid element in that location. Dereferencing a pointer to an invalid element will result in undefined behavior.
So how do we iterate through a container with non-contiguous elements (like std::list
)? There are two challenges to solve here:
- We need some way to indicate that we’ve reached the end of the container and there are no more elements to iterate over (what
end
represents in the above example). In simple cases, we can just usenullptr
to represent the end. - We need some way to advance our iterator to the next element. This is the hard one. With an array, we know the elements are organized sequentially so we can just increment the address appropriately. With any other kind of container, how the elements of the container are organized is implementation-dependent. And that means how we advance to the next element is implementation-dependent as well. We’d probably want to write some function to do this. And that function would only work for the specific container type it was written for.
The C++ standard library has a clever solution for the above:
- In most cases, the iterator returned by
begin()
is not a pointer, but is a class that implements an iterator for that specific container. - The iterator class has an overloaded
operator*
that returns the current element, an overloadedoperator++
that advances the iterator to the next element (in whatever way makes sense for the container), and overloadedoperator==
/operator!=
that compares two iterators for equality.
Let’s see how this works with std::list
:
#include <iostream>
#include <list>
int main()
{
std::list arr { 9, 7, 5, 3, 1 };
for (auto it = arr.begin();
it != arr.end();
++it)
std::cout << *it << ' ';
std::cout << '\n';
}
This compiles, and produces the result we want:
9 7 5 3 1
Note that this program is otherwise identical to the std::array
version previously! The hidden difference is what arr.begin()
returns: instead of returning an int*
, it returns a std::list::iterator
object that knows how to iterate over a std::list
.
Key insight
Although std::vector
i
For consistency, modern C++ iterators adopt the same syntax.
insight
By providing a consistent interface for iteration, iterators allow us to iterate through a wide variety of different containers in a consistent manner.
No title
Iterating through a Container in C++
Creating an iterator in C++ is simple: in most cases, we can just ask our container for one.
All iteratable types in the standard library have (at least) two functions that return iterators:
- The
begin()
member function returns an iterator to the first element of the container. - The
end()
member function returns an iterator to one-past-the-last element of the container.
>> WHAT DOES STD::LIST DO?
tip
Views (like std::string_view
or std::span
) also offer iterators to the container they are viewing. This is why we use the term “iteratable types” (rather than “containers”) in the prior paragraph.
No title
You can also use non-member functions std::begin()
and std::end()
, which do the same thing, but also work with C-style arrays.
Let’s show how we can iterate through both a std::array
and a std::list
using an iterator:
#include <array>
#include <iostream>
#include <list>
template <typename T>
void printElements(const T& c)
{
for (auto it { c.begin() }; // get iterator to first element
it != c.end(); // stop iterating at end of container
++it) // operator++ used to advance iterator
std::cout << *it << ' '; // operator* used to get current element
std::cout << '\n';
}
int main()
{
std::array arr { 9, 7, 5, 3, 1 };
printElements(arr);
std::list lst { 9, 7, 5, 3, 1 };
printElements(lst);
return 0;
}
Let’s take a look at printElements()
in more detail. This is a template function that allows the caller to pass in whatever kind of container they like. We use a for-loop to iterate through the container. We start calling c.begin()
to get an iterator to the beginning
// Do example where just passing in iterator rather than entire container
A container may provide different kinds of iterators. For example, an array container might offer a forwards iterator that walks through the array in forward order, and a reverse iterator that walks through the array in reverse order.
Why operator== or operator!=
This is why we use
operator==
or operator!=
with iterators: it doesn’t make sense to compare the current iterator to nullptr
using operator<
or one of the other relational comparisons.
One-past-the-last element
And because C++ iterators typically use the same interface for traversal (operator++ to move to the next element) and access (operator* to access the current element), we can iterate through a wide variety of different container types using a consistent method.
An iterator is an object that is used to iterate through a container. Once an iterator is created, the programmer can then use the interface provided by the iterator to traverse and access elements of the container.
worry about what kind of traversal is being done or how the data is being stored in the container.
For consistency, iterators in C++ conventionally use a pointer-like syntax:
A container may provide different kinds of iterators. For example, an array container might offer a forwards iterator that walks through the array in forward order, and a reverse iterator that walks through the array in reverse order.
Once the appropriate type of iterator is created, the programmer can then use the interface provided by the iterator to traverse and access elements without having to worry about what kind of traversal is being done or how the data is being stored in the container. And because C++ iterators typically use the same interface for traversal (operator++ to move to the next element) and access (operator* to access the current element), we can iterate through a wide variety of different container types using a consistent method.
In lesson 17.9 -- Pointer arithmetic and subscripting, we discussed how a pointer can be used to iterate through a C-style array. This method works on any container with contiguous memory, so let’s adapt it to
Let’s take a look at a primitive type of iterator: the pointer.
In the following program, we use a pointer to iterate through an array.
#include <array>
#include <iostream>
int main()
{
std::array arr { 9, 7, 5, 3, 1 };
const int* begin{ arr.data() }; // begin points to start element
const int* end{ begin + std::size(arr) }; // end points to one-past-the-end element
for (; begin != end; ++begin) // iterate from begin up to (but excluding) end
{
std::cout << *begin << ' '; // dereference our loop variable to get the current element
}
return 0;
}
--
is designed to iterate through a container (e.g. the values in an array, or the characters in a string), providing access to each element along the way.
>> Accessing each element of a container in some order is called traversal, or traversing the container. Traversal is also sometimes called iterating over or iterating through the container.
Iterating through an array (or other structure) of data is quite a common thing to do in programming. And so far, we’ve covered many different ways to do so: with loops and an index (for-loops
and while loops
), with pointers and pointer arithmetic, and with range-based for-loops
:
#include <array>
#include <cstddef>
#include <iostream>
int main()
{
// In C++17, the type of variable data is deduced to std::array<int, 7>
// If you get an error compiling this example, see the warning below
std::array data{ 0, 1, 2, 3, 4, 5, 6 };
std::size_t length{ std::size(data) };
// while-loop with explicit index
std::size_t index{ 0 };
while (index < length)
{
std::cout << data[index] << ' ';
++index;
}
std::cout << '\n';
// for-loop with explicit index
for (index = 0; index < length; ++index)
{
std::cout << data[index] << ' ';
}
std::cout << '\n';
// for-loop with pointer (Note: ptr can't be const, because we increment it)
for (auto ptr{ &data[0] }; ptr != (&data[0] + length); ++ptr)
{
std::cout << *ptr << ' ';
}
std::cout << '\n';
// range-based for loop
for (int i : data)
{
std::cout << i << ' ';
}
std::cout << '\n';
return 0;
}
Warning
The examples in this lesson use a C++17 feature called class template argument deduction
to deduce the template arguments for a template variable from its initializer. In the example above, when the compiler sees std::array data{ 0, 1, 2, 3, 4, 5, 6 };
, it will deduce that we want std::array<int, 7> data { 0, 1, 2, 3, 4, 5, 6 };
.
If your compiler is not C++17 enabled, you’ll get an error that says something like, “missing template arguments before ‘data’”. In that case, your best bet is to enable C++17, as per lesson 0.12 -- Configuring your compiler: Choosing a language standard. If you can not, you can replace the lines that use class template argument deduction with lines that have explicit template arguments (e.g. replace std::array data{ 0, 1, 2, 3, 4, 5, 6 };
with std::array<int, 7> data { 0, 1, 2, 3, 4, 5, 6 };
Looping using indexes is more typing than needed if we only use the index to access elements. It also only works if the container (e.g. the array) provides direct access to elements (which arrays do, but some other types of containers, such as lists, do not).
Looping with pointers and pointer arithmetic is verbose, and can be confusing to readers who don’t know the rules of pointer arithmetic. Pointer arithmetic also only works if elements are consecutive in memory (which is true for arrays, but not true for other types of containers, such as lists, trees, and maps).
For advanced readers
Pointers (without pointer arithmetic) can also be used to iterate through some non-sequential structures. In a linked list, each element is connected to the prior element by a pointer. We can iterate through the list by following the chain of pointers.
Range-based for-loops are a little more interesting, as the mechanism for iterating through our container is hidden -- and yet, they still work for all kinds of different structures (arrays, lists, trees, maps, etc…). How do these work? They use iterators.
Iterators
An iterator is an object designed to traverse through a container (e.g. the values in an array, or the characters in a string), providing access to each element along the way.
A container may provide different kinds of iterators. For example, an array container might offer a forwards iterator that walks through the array in forward order, and a reverse iterator that walks through the array in reverse order.
Once the appropriate type of iterator is created, the programmer can then use the interface provided by the iterator to traverse and access elements without having to worry about what kind of traversal is being done or how the data is being stored in the container. And because C++ iterators typically use the same interface for traversal (operator++ to move to the next element) and access (operator* to access the current element), we can iterate through a wide variety of different container types using a consistent method.
Pointers as an iterator
The simplest kind of iterator is a pointer, which (using pointer arithmetic) works for data stored sequentially in memory. Let’s revisit a simple array traversal using a pointer and pointer arithmetic:
#include <array>
#include <iostream>
int main()
{
std::array data{ 0, 1, 2, 3, 4, 5, 6 };
auto begin{ &data[0] };
// note that this points to one spot beyond the last element
auto end{ begin + std::size(data) };
// for-loop with pointer
for (auto ptr{ begin }; ptr != end; ++ptr) // ++ to move to next element
{
std::cout << *ptr << ' '; // Indirection to get value of current element
}
std::cout << '\n';
return 0;
}
Output:
0 1 2 3 4 5 6
In the above, we defined two variables: begin
(which points to the beginning of our container), and end
(which marks an end point). For arrays, the end marker is typically the place in memory where the last element would be if the container contained one more element.
The pointer then iterates between begin
and end
, and the current element can be accessed by dereferencing the pointer.
Warning
You might be tempted to calculate the end marker using the address-of operator and array syntax like so:
int* end{ &data[std::size(data)] };
But this causes undefined behavior, because data[std::size(data)]
implicitly dereferences an element that is off the end of the array.
Instead, use:
int* end{ data.data() + std::size(data) }; // data() returns a pointer to the first element
Standard library iterators
Iterating is such a common operation that all standard library containers offer direct support for iteration. Instead of calculating our own begin and end points, we can simply ask the container for the begin and end points via member functions conveniently named begin()
and end()
:
#include <array>
#include <iostream>
int main()
{
std::array array{ 1, 2, 3 };
// Ask our array for the begin and end points (via the begin and end member functions).
auto begin{ array.begin() };
auto end{ array.end() };
for (auto p{ begin }; p != end; ++p) // ++ to move to next element.
{
std::cout << *p << ' '; // Indirection to get value of current element.
}
std::cout << '\n';
return 0;
}
This prints:
1 2 3
The iterator
header also contains two generic functions (std::begin
and std::end
) that can be used.
[/note]tip
std::begin
and std::end
for C-style arrays are defined in the <iterator> header.
std::begin
and std::end
for containers that support iterators are defined in the header files for those containers (e.g. <array>, <vector>).
!!!
#include <array> // includes <iterator>
#include <iostream>
int main()
{
std::array array{ 1, 2, 3 };
// Use std::begin and std::end to get the begin and end points.
auto begin{ std::begin(array) };
auto end{ std::end(array) };
for (auto p{ begin }; p != end; ++p) // ++ to move to next element
{
std::cout << *p << ' '; // Indirection to get value of current element
}
std::cout << '\n';
return 0;
}
This also prints:
1 2 3
Don’t worry about the types of the iterators for now, we’ll re-visit iterators in a later chapter. The important thing is that the iterator takes care of the details of iterating through the container. All we need are four things: the begin point, the end point, operator++ to move the iterator to the next element (or the end), and operator* to get the value of the current element.
operator<
vs operator!=
for iterators
In lesson 8.10 -- For statements, we noted that using operator<
was preferred over operator!=
when doing numeric comparisons in the loop condition:
for (index = 0; index < length; ++index)
With iterators, it is conventional to use operator!=
to test whether the iterator has reached the end element:
for (auto p{ begin }; p != end; ++p)
This is because some iterator types are not relationally comparable. operator!=
works with all iterator types.
Back to range-based for loops
All types that have both begin()
and end()
member functions, or that can be used with std::begin()
and std::end()
, are usable in range-based for-loops.
#include <array>
#include <iostream>
int main()
{
std::array array{ 1, 2, 3 };
// This does exactly the same as the loop we used before.
for (int i : array)
{
std::cout << i << ' ';
}
std::cout << '\n';
return 0;
}
Behind the scenes, the range-based for-loop calls begin()
and end()
of the type to iterate over. std::array
has begin
and end
member functions, so we can use it in a range-based loop. C-style fixed arrays can be used with std::begin
and std::end
functions, so we can loop through them with a range-based loop as well. Dynamic C-style arrays (or decayed C-style arrays) don’t work though, because there is no std::end
function for them (because the type information doesn’t contain the array’s length).
You’ll learn how to add these functions to your types later, so that they can be used with range-based for-loops too.
Range-based for-loops aren’t the only thing that makes use of iterators. They’re also used in std::sort
and other algorithms. Now that you know what they are, you’ll notice they’re used quite a bit in the standard library.
Iterator invalidation (dangling iterators)
Much like pointers and references, iterators can be left “dangling” if the elements being iterated over change address or are destroyed. When this happens, we say the iterator has been invalidated. Accessing an invalidated iterator produces undefined behavior.
Some operations that modify containers (such as adding an element to a std::vector
) can have the side effect of causing the elements in the container to change addresses. When this happens, existing iterators to those elements will be invalidated. Good C++ reference documentation should note which container operations may or will invalidate iterators. As an example, see the “Iterator invalidation” section of std::vector
on cppreference.
Since range-based for-loops use iterators behind the scenes, we must be careful not to do anything that invalidates the iterators of the container we are actively traversing:
#include <vector>
int main()
{
std::vector v { 0, 1, 2, 3 };
for (auto num : v) // implicitly iterates over v
{
if (num % 2 == 0)
v.push_back(num + 1); // when this invalidates the iterators of v, undefined behavior will result
}
return 0;
}
Here’s another example of iterator invalidation:
#include <iostream>
#include <vector>
int main()
{
std::vector v{ 1, 2, 3, 4, 5, 6, 7 };
auto it{ v.begin() };
++it; // move to second element
std::cout << *it << '\n'; // ok: prints 2
v.erase(it); // erase the element currently being iterated over
// erase() invalidates iterators to the erased element (and subsequent elements)
// so iterator "it" is now invalidated
++it; // undefined behavior
std::cout << *it << '\n'; // undefined behavior
return 0;
}
Invalidated iterators can be revalidated by assigning a valid iterator to them (e.g. begin()
, end()
, or the return value of some other function that returns an iterator).
The erase()
function returns an iterator to the element one past the erased element (or end()
if the last element was removed). Therefore, we can fix the above code like this:
#include <iostream>
#include <vector>
int main()
{
std::vector v{ 1, 2, 3, 4, 5, 6, 7 };
auto it{ v.begin() };
++it; // move to second element
std::cout << *it << '\n';
it = v.erase(it); // erase the element currently being iterated over, set `it` to next element
std::cout << *it << '\n'; // now ok, prints 3
return 0;
}
(h/t to nascardriver for significant contributions to this lesson)