Performance of Parallel Algorithms
Understand the factors impacting the execution performance of parallel algorithms in C++17. Learn about overhead costs, synchronization effects, memory throughput challenges, and hardware dependencies. This lesson helps you evaluate when to use parallel STL algorithms effectively and how to measure their performance compared to sequential code.
We'll cover the following...
Factors affecting execution
Parallel algorithms are a robust abstraction layer. Although they’re relatively easy to use, assessing their performance is less straightforward.
The first point to note is that a parallel algorithm will generally do more work than the sequential version. That’s because the algorithm needs to set up and arrange the threading subsystem to run the tasks.
For example, if you invoke std::transform on a vector of 100k elements, then the STL implementation needs to divide your vector into chunks and then schedule each chunk to be executed appropriately. If necessary, the implementation might even copy the elements before the execution. If you have a system with 10 free threads, the 100k element vector might be divided into chunks of 10k, and then each chunk is transformed by one thread. Due to the setup cost and other limitations of parallel code, the whole execution time won’t be 10x faster than the serial version.
The ...