Accessors and Operations
Learn the accessors and operations for handling sparse arrays.
We'll cover the following...
Introduction
Having learned about how sparse data can be represented as SparseArray objects in pandas, let’s now look at the accessors and operations we can apply to these sparse arrays. We’ll look at the sparse dataset of movie ratings scored between 1 and 5 by different viewers, where NaN means that the movie isn’t rated yet:
Movie Ratings By Viewers
Movie 1 | Movie 2 | Movie 3 | Movie 4 | Movie 5 | Movie 6 | |
Viewer 1 | NaN | 3.0 | NaN | 5.0 | 3.0 | NaN |
Viewer 2 | NaN | NaN | 3.0 | NaN | NaN | 3.0 |
Viewer 3 | 2.0 | 1.0 | 1.0 | NaN | NaN | 1.0 |
Viewer 4 | 5.0 | NaN | NaN | NaN | NaN | 5.0 |
Viewer 5 | NaN | NaN | NaN | 2.0 | NaN | NaN |
Viewer 6 | 2.0 | NaN | NaN | NaN | NaN | NaN |
Accessors
The SparseArray object supports the .sparse accessor for sparse-specific methods and attributes. It’s similar to the other accessors we have seen before, such as .str for string data and .dt for datetime data. Firstly, let’s convert the original DataFrame into a fully sparse representation:
We can then use the .sparse accessor to find attributes, such as fill and non-fill values of a SparseArray and the density of a DataFrame (i.e., the proportion of non-fill values).
In the example above, the fill_value and sp_values attributes are for the SparseArray at the column level (i.e., an array with SparseDtype). On the other hand, the density attribute is generated from the DataFrame.sparse accessor because it applies to the entire sparse DataFrame. This is because pandas has included the .sparse accessor for DataFrames as well.
The DataFrame.sparse accessor also lets us perform conversions to other formats. For instance, the following code shows how to convert a sparse DataFrame into a sparse SciPy COO (Coordinate Format) matrix:
The COO representation is a sparse matrix format for efficiently storing ...