Introduction
Explore the concept of code vectorization by comparing pure Python and NumPy methods for adding lists of integers. Understand how data types and input sizes impact performance and how NumPy's vectorized operations can lead to faster computations as dataset size increases.
What is Code vectorization?
Code vectorization means that the problem you’re trying to solve is inherently vectorizable and only requires a few NumPy tricks to make it faster. Of course, it does not mean it is easy or straightforward, but at least it does not necessitate totally rethinking your problem (as it will be the case in the Problem vectorization chapter). Still, it may require some experience to see where code can be vectorized.
Example
Let’s illustrate this through a simple example where we want to sum up two lists of integers.
Solution 1: Pure Python
One simple way using pure Python is:
Solution 2: Use np.add(list1,list2)
This first naive solution can be vectorized very easily using NumPy:
Compare time of two approaches
In this benchmark, the pure Python approach is actually faster because the inputs are standard Python lists and the dataset is small. In such cases, the overhead of calling NumPy functions and handling type conversion outweighs the benefits of vectorization.
The benchmark uses plain Python lists. Passing lists to NumPy functions incurs a hidden conversion cost. For best performance, always use NumPy arrays as inputs.
NumPy's speed advantage becomes significant at larger datasets (100,000+ elements). At small scales, the function call overhead outweighs the benefits of vectorization.
Not only does performance depend on input type and size, but the behavior of operators also changes depending on the data structure. This is why we do not use Z1 + Z2 for element-wise addition.
In Python, the + operator behaves differently depending on the object type:
- For lists → concatenation
- For NumPy arrays → element-wise addition
To illustrate this difference, consider nested lists:
Summing up, the first method concatenates the two lists together, the second method concatenates the internal lists together and the last one computes what is (numerically) expected.
Solve this Quiz!
Which of the following is a good approach when designing a solution to a problem?
Think of a brute force Python solution
Think of vectorization using NumPy tricks
Now that you have learned code vectorization, let’s move on to an exercise in the next lesson.