While implementing element-wise additions in C++, there is more to consider than just code execution. The seemingly harmless choice of using separate loops or combined loops can drastically affect the program’s efficiency.
Separate loops involve processing various data sets in respective iterations. This method optimizes memory access by concentrating on one dataset at a time. Consequently, the accessed memory locations are often closer, reducing cache misses. Cache misses occur when the CPU demands data that is unavailable in the cache, leading to slower access times. With separate loops, cache locality enhances data access efficiency and overall performance.
On the other hand, combined loops confine multiple operations in a single iteration. While this seems concise, it can cause issues with memory access patterns. Operations using different data sets caught within a single loop can disrupt cache locality. Cache thrashing occurs when cache lines are frequently invalidated due to unstable memory access patterns. Cache thrashing causes memory latency and can slow the pace of overall code execution.
Element-wise addition in C++ adds elements from multiple arrays or data structures. Each element is added separately at the same position in the arrays to form a new array with the calculated summed values. This process is performed element by element, where the index i element in the sum array is the sum of index i elements in the input arrays. We will use different loops in C++ to implement the same logic to showcase performance differences.
The chrono library, or the Chrono Utility library, is a C++ standard library that facilitates working with time-related functions and measuring times with high precision. We’re going to use that to calculate elapsed time.
#include <iostream>#include <chrono>using namespace std;int main() {// Assigning sizes to arraysconst int arraySize = 10000000;int* array_1 = new int[arraySize];int* array_2 = new int[arraySize];int* sum = new int[arraySize];// Initializing arrays with valuesfor (int i = 0; i < arraySize; ++i) {array_1[i] = i;array_2[i] = i * 2;}auto start = chrono::high_resolution_clock::now();// Separate loops for element-wise additionfor (int i = 0; i < arraySize; ++i) {sum[i] = array_1[i] + array_2[i];}auto end = chrono::high_resolution_clock::now();chrono::duration<double> elapsed = end - start;cout << "Separate loops runtime: " << elapsed.count() << " seconds\n";delete[] array_1;delete[] array_2;delete[] sum;return 0;}
In the above C++ example, we assign three arrays (array_1, array_2, and sum) of length 10,000,000 and initialize array_1 and array_2 with distinct values. It uses a separate loop to perform element-wise addition by adding elements of array_1 and array_2 and keeping the results in the sum array. The program measures the time taken for the addition operation using chrono::high_resolution_clock and outputs the time.
#include <iostream>#include <chrono>using namespace std;int main() {// Assigning array lengthconst int arraySize = 10000000;int* array_1 = new int[arraySize];int* array_2 = new int[arraySize];int* sum = new int[arraySize];// Initializing arrays with valuesfor (int i = 0; i < arraySize; ++i) {array_1[i] = i;array_2[i] = i * 2;}auto start = chrono::high_resolution_clock::now();// Combined loop for element-wise additionfor (int i = 0; i < arraySize; ++i) {sum[i] = array_1[i] + array_2[i];}auto end = chrono::high_resolution_clock::now();chrono::duration<double> elapsed = end - start;cout << "Combined loop runtime: " << elapsed.count() << " seconds\n";delete[] array_1;delete[] array_2;delete[] sum;return 0;}
In the above C++ code, we allocate three arrays (array_1, array_2, and sum) of length 10,000,000. Initial values are assigned so that array_1[i] holds i and array_2[i] holds i * 2. The program uses a combined loop structure to calculate the sum of corresponding elements of the two arrays and store the results in the sum array. The runtime of this combined loop approach is computed using the chrono library, and the time is displayed as output.
As we can see, element-wise addition is faster with separate loops than with combined loops. Separate loops are faster because they optimize memory access patterns and leverage the efficiency of the cache hierarchy. It eventually results in decreased cache misses, more rapid data retrieval, and improved general performance compared to combined loops.
Free Resources