Search⌘ K
AI Features

Introduction

Explore the concept of code vectorization by comparing pure Python and NumPy methods for adding lists of integers. Understand how data types and input sizes impact performance and how NumPy's vectorized operations can lead to faster computations as dataset size increases.

What is Code vectorization?

Code vectorization means that the problem you’re trying to solve is inherently vectorizable and only requires a few NumPy tricks to make it faster. Of course, it does not mean it is easy or straightforward, but at least it does not necessitate totally rethinking your problem (as it will be the case in the Problem vectorization chapter). Still, it may require some experience to see where code can be vectorized.

Example

Let’s illustrate this through a simple example where we want to sum up two lists of integers.

Solution 1: Pure Python

One simple way using pure Python is:

Python 3.5
def add_python(Z1,Z2):
return [z1+z2 for (z1,z2) in zip(Z1,Z2)]

Solution 2: Use np.add(list1,list2)

This first naive solution can be vectorized very easily using NumPy:

Python 3.5
def add_numpy(Z1,Z2):
return np.add(Z1,Z2)

Compare time of two approaches

In this benchmark, the pure Python approach is actually faster because the inputs are standard Python lists and the dataset is small. In such cases, the overhead of calling NumPy functions and handling type conversion outweighs the benefits of vectorization.

Python 3.5
import random
import numpy as np
from tools import timeit
def add_python(Z1,Z2):
return [z1+z2 for (z1,z2) in zip(Z1,Z2)]
def add_numpy(Z1,Z2):
return np.add(Z1,Z2)
Z1 = random.sample(range(1000), 100)
Z2 = random.sample(range(1000), 100)
timeit("add_python(Z1, Z2)", globals())
timeit("add_numpy(Z1, Z2)", globals())
  • The benchmark uses plain Python lists. Passing lists to NumPy functions incurs a hidden conversion cost. For best performance, always use NumPy arrays as inputs.

  • NumPy's speed advantage becomes significant at larger datasets (100,000+ elements). At small scales, the function call overhead outweighs the benefits of vectorization.

Not only does performance depend on input type and size, but the behavior of operators also changes depending on the data structure. This is why we do not use Z1 + Z2 for element-wise addition.

In Python, the + operator behaves differently depending on the object type:

  • For lists → concatenation
  • For NumPy arrays → element-wise addition

To illustrate this difference, consider nested lists:

Python 3.5
import numpy as np
def add_python(Z1,Z2):
return [z1+z2 for (z1,z2) in zip(Z1,Z2)]
def add_numpy(Z1,Z2):
return np.add(Z1,Z2)
Z1 = [[1, 2], [3, 4]]
Z2 = [[5, 6], [7, 8]]
print("Using concatenation:",Z1 + Z2)
#[[1, 2], [3, 4], [5, 6], [7, 8]]
print("Using pure python:",add_python(Z1, Z2))
#[[1, 2, 5, 6], [3, 4, 7, 8]]
print("Using numpy.add:",add_numpy(Z1, Z2))
#[[ 6 8][10 12]]

Summing up, the first method concatenates the two lists together, the second method concatenates the internal lists together and the last one computes what is (numerically) expected.

Solve this Quiz!

1.

Which of the following is a good approach when designing a solution to a problem?

A.

Think of a brute force Python solution

B.

Think of vectorization using NumPy tricks


1 / 1

Now that you have learned code vectorization, let’s move on to an exercise in the next lesson.