Manipulation Methods
Learn data manipulation methods in this lesson.
We'll cover the following
We consider manipulation methods to be the workhorses of pandas. When we have a dataset that we’re trying to understand, clean up, and model, we use methods that operate on a Series and return a new Series (usually with the same index) to stick it back in the DataFrame we’re working on. Most of the methods we discuss here manipulate the Series values but preserve the index.
Manipulating data using .apply
The .apply
is a curious method, and often it’s recommended to avoid it, but sometimes it comes in handy. This method allows us to apply a function element-wise to every value. If we pass in a NumPy function that works on an array, it will broadcast the operation to the Series.
However, usually, when we see this method in use, it’s a code smell. How so? Because the .apply
method typically operates on each individual value in the Series, the function is called once for every value. If we have 1,000,000 values in a Series, it will be called 1,000,000 times. It breaks out of the fast vectorized code paths we can leverage in pandas and puts us back to using slow Python code.
For example, we previously checked whether the values in the mileage were greater than 20. We can also do this with the .apply
method. We’ll use the Jupyter %%timeit
cell magic to microbenchmark this (note this will only work in Jupyter or IPython):
Get hands-on with 1400+ tech skills courses.