Search⌘ K
AI Features

Approach 2: Split the Algorithm Into Two Parts

Explore how to optimize C++ algorithms by dividing them into two stages: parallel conditional copying of elements followed by sequential merging to create a contiguous output. Understand performance implications using different predicates and see benchmarking results comparing parallel and standard algorithms.

The second approach is to split the algorithm into two parts. First, the conditional copying is performed in parallel chunks, and then the resulting sparse range is squeezed to a continuous range.

Part one: Copy elements in parallel into the destination range

The first part copies the elements in chunks, resulting in the sparse destination array illustrated below. Each chunk is conditionally copied in parallel, and the resulting range iterators are stored in std::future objects for later retrieval:

The sparse destination range after the first step of the conditional copying
The sparse destination range after the first step of the conditional copying

The following code implements the first half of the algorithm:

C++
template <typename SrcIt, typename DstIt, typename Pred>
auto par_copy_if_split(SrcIt first, SrcIt last, DstIt dst, Pred pred,
size_t chunk_sz) -> DstIt {
auto n = static_cast<size_t>(std::distance(first, last));
auto futures = std::vector<std::future<std::pair<DstIt, DstIt>>>{};
futures.reserve(n / chunk_sz);
for (size_t i = 0; i < n; i += chunk_sz) {
const auto stop_idx = std::min(i + chunk_sz, n);
auto future = std::async([=, &pred] {
auto dst_first = dst + i;
auto dst_last =
std::copy_if(first + i, first + stop_idx, dst_first, pred);
return std::make_pair(dst_first, dst_last);
});
futures.emplace_back(std::move(future));
}
// To be continued ...

We have now copied the elements (that should ...