...

Introduction

In this lesson, we'll learn how to maximize the speed using NumPy!

We'll cover the following...

This chapter explains the basic anatomy of NumPy arrays, especially regarding the memory layout, view, copy and the data type. They are critical notions to understand if you want your computation to benefit from NumPy philosophy.

Let’s consider a simple example where we want to clear all the values from an array which has the data type np.float32. How does one write it to maximize speed? The below syntax is rather obvious (at least for those familiar with NumPy) but the above question asks to find the fastest operation.

Press + to interact

Python 3.5

Files

import numpy as np
from tools import timeit #get timeit from tools.py(custom module)
Z = np.ones(4*1000000, np.float32) #create an array of size 4*10000000 np.float32
print("np.float16:")
#time required to view array as np.float16
timeit("Z.view(np.float16)[...] = 0", globals())
print("np.int16:")
#time required to view array as np.int16
timeit("Z.view(np.int16)[...] = 0", globals())
print("np.int32:")
#time required to view array as np.int32
timeit("Z.view(np.int32)[...] = 0", globals())
print("np.float32:")
#time required to view array as np.float32
timeit("Z.view(np.float32)[...] = 0", globals())
print("np.int64:")
#time required to view array as np.int64
timeit("Z.view(np.int64)[...] = 0", globals())
print("np.float64:")
#time required to view array as np.float64
timeit("Z.view(np.float64)[...] = 0", globals())
print("np.complex128:")
#time required to view array as np.complex128
timeit("Z.view(np.complex128)[...] = 0", globals())
print("np.int8:")
#time required to view array as np.int8
timeit("Z.view(np.int8)[...] = 0", globals())
print("np.float16:")
#time required to view array as np.float16
timeit("Z.view(np.float16)[...] = 0", globals())
print("np.int16:")
#time required to view array as np.int16
timeit("Z.view(np.int16)[...] = 0", globals())
print("np.int32:")
#time required to view array as np.int32
timeit("Z.view(np.int32)[...] = 0", globals())
print("np.float32:")
#time required to view array as np.float32
timeit("Z.view(np.float32)[...] = 0", globals())
print("np.int64:")
#time required to view array as np.int64
timeit("Z.view(np.int64)[...] = 0", globals())
print("np.float64:")
#time required to view array as np.float64
timeit("Z.view(np.float64)[...] = 0", globals())
print("np.complex128:")
#time required to view array as np.complex128
timeit("Z.view(np.complex128)[...] = 0", globals())
print("np.int8:")
#time required to view array as np.int8
timeit("Z.view(np.int8)[...] = 0", globals())

Here timeit is a custom function used. Interestingly enough, the obvious way of clearing all the values is not the fastest. The total number of CPU cycle to execute each above instruction are 100 but the two instruction take less time per loop. By casting the array into a larger data type such as np.float64, we gained a 25% speed factor. But, by viewing the array as a byte array (np.int8), we gained a 50% factor. The reason for such speedup is to be found in the internal NumPy machinery and the compiler optimization.

Solve this Quiz !

How can you increase the speed factor for clearing data from an array(setting all values in an array to 0)?

Z = np.ones(4*1000000, np.float32)

timeit("Z.view(np.float64)[...] = 0", globals())

timeit("Z.view(np.float16)[...] = 0", globals())

Introduction

Anatomy of an Array

Code Vectorization

Problem Vectorization

Custom Vectorization

Beyond NumPy

Conclusion

Introduction