Time it

A popular saying among programmers is “Make it right, then make it fast.” Assuming that our program is correct and that we have several solutions, we want the fastest option. This is often the case in Python, despite the principle from The Zen of Python that says: “There should be one—and preferably only one— obvious way to do it.”

An incorrect way to time our code is to call the time() function from the namesake module before and after the code fragment of interest. The accuracy of the function is low, usually well below the fragment execution time. The right tool for the job is the function, timeit(),in the timeit module. The function takes some parameters, the first two being the most important.

The first parameter is the code fragment that we want to time. We must provide it to the function as a string. Otherwise, Python will execute it before calling the timing function. The code must include the definitions of all involved variables because the function evaluates the parameters in a clean environment that doesn’t include any previously defined functions and variables or previously imported modules. The function evaluates the code number times, where the number is an optional parameter, and returns the average execution time.
The second parameter is another string that contains the initialization code. The timeit() function executes this code once before it starts collecting statistics. We can use this parameter to import modules or define global variables.

We may wonder what the best way to square a number is. Do we multiply it by itself, use the exponential operator (**), or call math.pow()?In cases like this, we should turn to The Zen of Python, which says “In the face of ambiguity, refuse the temptation to guess." Use the Timeit()!

Note: If we apply the first two code fragments to an integer, the results will be quite different. For more details, see the Beware of Large and Slow Ints section at the end of this lesson. The function returns the total execution time in seconds.

The Timeit() function executes the statement number=1,000,000 times. Divide the returned time by number to get the time-per-statement execution.

Let’s run the below code.

Checkpoint saves time

There’s nothing more frustrating than irrecoverably removing a semi-finished file within a Ph.D. dissertation or our program crashing after several hours (or days) of heavy-duty computations. Frequent backups help with the former. Checkpointing helps with the latter.

Checkpointing is a form of real-time backup and can be easily implemented in Python via pickling.

To create a checkpoint, save the values of the variables that were difficult to obtain in files so that, if the program crashes, we could restore values from files instead of recomputing them.

For example, our program may call the fictitious and very time-consuming functions, foo() and bar():

result1 = foo(original_data)
result2 = bar(result1)

We may want to add one or two checkpoints, if we want to treat the final result in the same way as the intermediate results. We should create a directory for the checkpoint data. Check if the results already exist before recalculating them. If a checkpoint concerns multiple results, store them as a tuple or one after another in the same file. pickle supports several objects per file.

try:
   with open('checkpoints/result1.p', 'rb') as infile:
       result1 = pickle.load(infile) 
except FileNotFoundException:
    result1 = foo(original_data)
    with open('checkpoints/result1.p', 'wb') as outfile:
        pickle.dump(outfile, result1)
try:
    with open('checkpoints/result2.p', 'rb') as infile:
        result2 = pickle.load(infile) 
except FileNotFoundException:
    result2 = bar(result1)
    with open('checkpoints/result2.p', 'wb') as outfile:
        pickle.dump(outfile, result2)

This solution extends to any number of intermediate stages and can easily save lots of time and effort.

This tip is quite similar to CacheIt. As mentioned, checkpointing is a form of caching.

Optimize lookups

Some say (and introductory computer science books agree) that searching through data is one of the most frequent operations—if not the most frequent operation—in computing. If we need an engine for something, that something must be important.

Use of `in` operator

The Python way to search for an item in some data set (typically in a collection, such as a list, a tuple, a set, or a dictionary) is to use the in operator. The operator looks up an item and checks if it’s in the collection either as a value or as an object. We shouldn’t worry about a single lookup in a small collection. As the collection size and the number of lookups grow, however, performance may become an issue.

List lookups are the slowest. A list is a sequential data structure. To check if a metaphorical needle of value 1001,is in the list, haystack, Python must scan through the list and compare the needle with each list item. In the worst case, it may need to scan the whole list. The longer the list, the longer it takes to scan through. On an average computer, checking if a nonexistent needle is in the haystack of 1,000 integer numbers takes around 9.1 microseconds:

Let’s try it here.

Hash tables

How is this possible? Python dictionaries and sets are implemented as hash tables. Hash tables have an almost constant lookup time. They consist of bins (buckets) arranged in an array. The buckets’ locations are calculated from the needle’s value by calling a hash function. Both the function call and the array lookup take constant time. If the needle is present in the set, then it must be stored in the specific bucket. If it’s not in that bucket, it’s not in the set. Simple and fast.

Avoid function calls, they’re costly

It takes time to call a function even if it doesn’t have parameters. On an average computer, a call to a function without parameters takes about 55 nanoseconds. For example, a function that takes no parameter and calculates the sum of two numbers.

def sumOfTwo():
   a = 4
   b = 5
   sum = a + b
   return sum

Let’s calculate the execution time for the above function call.

This increasing delay means that we should not create and call functions without a need.

An arithmetic addition expression is faster (and better) than a function, add(a,b), that adds two numbers.
A function with ten parameters is somewhat slower than the same function that takes one tuple of ten items.
Multiplying a number by itself is faster than calling math.pow().

How do we know when to avoid a function call?

Are there any ways to solve the problem without that call?
Is the functionless solution maintainable and readable? If the answer is yes, avoid the call.
If the functionless solution is much faster, avoid the call.

Otherwise, call the function. Because our job is to do it right and then make it fast.

1.Introduction

2.Documentation Tips

3.General Tips

4.Data Types and Data Structures Tips

5.Function Tips

6.Performance Tips

7.Safety Tips

8.Wrapping Up

Time Related Tips

Time it

Checkpoint saves time

Optimize lookups

Use of `in` operator

Hash tables

Avoid function calls, they’re costly

Build, then print

Waste space, save time

Beware of large and slow ints

Quiz

Time Related Tips

Time it

Checkpoint saves time

Optimize lookups

Use of in operator

Hash tables

Avoid function calls, they’re costly

Build, then print

Waste space, save time

Beware of large and slow ints

Quiz

Use of `in` operator