When writing code, sometimes you’ll run across cases where you’re not sure which of two or more options will be more performant. So how might you tell? The answer is obvious: measure how long each takes to run!
While we could measure the amount of time some bit of code takes to run manually (using a stopwatch, or the timer app on your smartphone), it will be hard to start and stop the timer at the precisely the desired points, and our anticipation and reflexes (or lack thereof) will introduce a lot of variance into our timing. This may not matter that much if the options take a perceptibly different amount of time to run (e.g. one option takes 1 second, the other option takes 10 seconds). But with modern computers being extremely fast, more often we end up in cases where the two options are perceptibly identical. For example, if one option takes 1 millisecond and the other option takes 10 milliseconds, the second option is still 10 times as slow, but to us it will appear that both options run instantaneously. We need a more precise way to measure.
A better option is to have our code measure the time it took to execute from one point to another. That way we can precisely choose where we want the timing measurement to start and stop, and avoid any kind of direct human variance.
To assist with this, we can use the Chrono standard library, which contains functionality for interacting with system clocks and handling time. While the Chrono library is a bit arcane, we can wrap the bits of the Chrono library that we need in a class with an easy-to-use interface. We can then use this class in our own programs to do time measurements.
Timer.h:
#ifndef TIMER_H
#define TIMER_H
#include <chrono> // for std::chrono functions
// A Timer class for measuring elapsed time
// Freely redistributable, courtesy of learncpp.com (https://clone.learncpp.com/cpp-tutorial/timing-your-code-timer-h/)
template <typename Clock=std::chrono::steady_clock>
class Timer
{
private:
std::chrono::time_point<Clock> m_start { Clock::now() };
public:
void reset()
{
m_start = Clock::now();
}
// Returns number of elapsed TimeUnits (defaults to fractional seconds)
// T can be set to double to return fractional units, or an integral type to return whole units
template <typename T=double, typename TimeUnit=std::ratio<1>>
T elapsed() const
{
using Time = std::chrono::duration<T, TimeUnit>;
return std::chrono::duration_cast<Time>(Clock::now() - m_start).count();
}
// Helper functions
// Returns elapsed time in seconds (long long)
template <typename T=long long>
T sec() const { return elapsed<T, std::ratio<1>>(); }
// Returns elapsed time in fractional seconds (double)
template <typename T=double>
T sec_d() const { return elapsed<T, std::ratio<1>>(); }
// Returns elapsed time in milliseconds (long long)
template <typename T=long long>
T milli() const { return elapsed<T, std::milli>(); }
// Returns elapsed time in microseconds (long long)
template <typename T=long long>
T micro() const { return elapsed<T, std::micro>(); }
// Returns elapsed time in nanoseconds (long long)
template <typename T=long long>
T nano() const { return elapsed<T, std::nano>(); }
};
#endif
While this may look complex, that’s just because everything is templated, so you can change how things work if you don’t like the default template parameters.
To use the Timer:
- Include the “Timer.h” header.
- Instantiate a
Timer
object wherever you want to begin timing something. - Call one of the helper functions to return the amount of time that has elapsed since the timer was instantiated.
The different helper functions return different time units (e.g. seconds, millisecond). You can also provide an explicit template argument to change the return type. For example:
#include "Timer.h"
#include <iostream>
int main()
{
Timer t{}; // start timing here
// Code to time goes here
std::cout << "Time elapsed: " << t.micro() << " microseconds\n"; // get elapsed microseconds
// You can provide an explicit template argument to change the return type
std::cout << "Time elapsed: " << t.milli<double>() << " milliseconds\n"; // get elapsed (fractional) milliseconds as a double value
// This can be useful for to avoid having to explicitly static_cast the results to a narrower type
int save { t.sec<int>() }; // get elapsed seconds as an int
std::cout << "Time elapsed: " << save << " seconds\n";
return 0;
}
Using Timer.h in an example
In the following two examples, we’re going to sort an array of 10000 elements, where the value of the elements begin in reverse order (first element has value 10000, second has 9999, etc…).
First, we’ll do so with a sort function that we’ve written ourselves. You don’t need to know how the sort function works.
#include "Timer.h"
#include <array>
#include <iostream>
#include <numeric> // for std::iota
constexpr int g_numElements { 10000 };
// Selection sort works by finding the smallest element in the array and moving it to the front of the array. The first element is thus sorted. Then the process repeats on the remaining elements of the array, until all elements have been sorted.
// Selection sort is fairly simple, but not very efficient.
void selectionSort(std::array<int, g_numElements>& array)
{
// Step through each element of the array
// (except the last one, which will already be sorted by the time we get there)
for (std::size_t startIndex{ 0 }; startIndex < (g_numElements - 1); ++startIndex)
{
// smallestIndex is the index of the smallest element we’ve encountered this iteration
// Start by assuming the smallest element is the first element of this iteration
std::size_t smallestIndex{ startIndex };
// Then look for a smaller element in the rest of the array
for (std::size_t currentIndex{ startIndex + 1 }; currentIndex < g_numElements; ++currentIndex)
{
// If we've found an element that is smaller than our previously found smallest
if (array[currentIndex] < array[smallestIndex])
{
// then keep track of it
smallestIndex = currentIndex;
}
}
// smallestIndex is now the smallest element in the remaining array
// swap our start element with our smallest element (this sorts it into the correct place)
std::swap(array[startIndex], array[smallestIndex]);
}
}
int main()
{
std::array<int, g_numElements> array;
std::iota(array.rbegin(), array.rend(), 1); // fill the array with values 10000 to 1
Timer t{}; // start timing here
selectionSort(array);
std::cout << "Time taken: " << t.micro() << " microseconds\n";
return 0;
}
On the author’s machine, three runs produced timings of 160187, 161089, and 158606 microseconds. So we’ll say about 160,000 microseconds.
Now, let’s do the same test using std::sort
from the standard library.
#include "Timer.h"
#include <array>
#include <iostream>
#include <numeric> // for std::iota
constexpr int g_numElements { 10000 };
int main()
{
std::array<int, g_numElements> array;
std::iota(array.rbegin(), array.rend(), 1); // fill the array with values 10000 to 1
Timer t{}; // start timing here
std::sort(array.begin(), array.end());
std::cout << "Time taken: " << t.micro() << " microseconds\n";
return 0;
}
On the author’s machine, this produced timings of 396, 371, and 414 microseconds. So we’ll say about 400 microseconds.
In other words, std::sort
is performing about 400 times faster than the sort we wrote ourselves in this case! That’s because std::sort
uses a much more efficient algorithm. We’ll talk about the efficiency of algorithms later in this chapter.
Things that can impact the performance of your program
Timing a run of your program is fairly straightforward, but your results can be significantly impacted by a number of things, and it’s important to be aware of how to properly measure and what things can impact timing.
First, make sure you’re using a release build target, not a debug build target. Debug build targets typically turn optimization off, and that optimization can have a significant impact on the results. For example, using a debug build target, running the above std::sort
example on the author’s machine took 13219 microseconds, about 33 times as long!
Second, your timing results may be influenced by other things your system may be doing in the background. Make sure your system isn’t doing anything CPU, memory, or hard drive intensive (e.g. playing a game, searching for a file, running an antivirus scan, or installing a update in the background). Seemingly innocent things, like idle web browsers, can temporarily spike your CPU to 100% utilization when the active tab rotates in a new ad banner and has to parse a bunch of javascript. The more apps you can shut down before measuring, the less variance in your results you are likely to have.
Third, if your program uses a random number generator, the particular sequence of random numbers generated may impact timing. For example, if you’re sorting an array filled with random numbers, the results will likely vary from run to run because the number of swaps required to sort the array will vary from run to run. To get more consistent results across multiple runs of your program, you can temporarily seed your random number generator with a literal value (rather than std::random_device
or the system clock), so that it generates the same sequence of numbers with each run. However, if your program’s performance is highly dependent on the particular random sequence generated, this can also lead to misleading results overall.
Fourth, I/O is slow. For best results, remove/disable all I/O from the code being timed, as it will skew the results. Also make sure you’re not timing waiting for user input, as how long the user takes to input something should not be part of your timing considerations. If user input is required, consider adding some way to provide that input that does not wait on the user (e.g. command line, from a file, having a code path that routes around the input and uses a hardcoded value).
Measuring and interpreting performance metrics
When measuring the performance of your program, gather at least 3 results. If the results are all similar, these likely represent the actual performance of your program on that machine. Otherwise, continue to take measurements until you have a cluster of similar results (and understand which other results are outliers). It’s not uncommon to have one or more outliers due to your system doing something in the background during some of those runs.
If your results have a lot of variance (and aren’t clustering well), your timings are likely being affected by either other things happening on the system, or by the effects of randomization within your application.
Because performance measurements are impacted by so many things (particularly hardware speed, but also OS, apps running, etc…), absolute performance metrics (e.g. “the program runs in 10 seconds”) are generally not that useful outside of understanding how well the program runs on one particular machine you care about. On a different machine, that same program may run in 1 second, 10 seconds, or 1 minute. It’s hard to know without actually measuring across a spectrum of different hardware.
However, on a single machine, relative performance measurements can be useful. We can gather performance results from several different variants of a program to determine which variant is the most performant. For example, if variant 1 runs in 10 seconds and variant 2 runs in 8 seconds, variant 2 is likely to be faster on all similar machines (regardless of the absolute performance of that machine).
After measuring the second variant, a good sanity check is to measure the first variant again. If the results of the first variant are consistent with your initial measurements for that variant, then the result of both variants should be reasonably comparable. For example, if variant 1 runs in 10 seconds, and variant 2 runs in 8 seconds, and then we measure variant 1 again and get 10 seconds, then we can reasonably conclude that the measurements for both variants were fairly measured, and that variant 2 is faster.
However, if the results of the first variant are no longer consistent with your initial measurements for that variant, then something has happened on the machine that is now affecting performance, and it will be hard to tell whether differences in measurement are due to the variant or due to the machine itself. For example, if variant 1 runs in 10 seconds, and variant 2 runs in 12 seconds, and then we measure variant 1 again and get 14 seconds, we can’t trust that our measurement for variant 2 was comparable. In this case, it’s best to discard the existing results and re-measure.