Chunked arrays, series, and DataFrames

As we proceed in our quest to master data analysis in Rust, we’ll eventually discover that even ndarray limits our capabilities. In many situations, we’ll need to analyze data in tables where each column contains a different datatype.

To overcome this problem, we can use polars, which brings the DataFrame concept into Rust. Think of a DataFrame as a table of heterogeneous data types.

Polars itself is targeted to both Rust and Python. In the Python world, it is a replacement for the Pandas framework.

Polars has both an eager API, which is similar to Pandas, and a lazy API, which is similar to TensorFlow. We’ll explore the eager API, but the documentation for the lazy API covers more specific cases.

Polars can be imported into the Cargo.toml as follows:

polars = "0.17.0"

Chunked arrays

Chunked arrays are at the base of Polars. They are arrays, of the same type T, mapped in a chunk of memory (which gives them the name). As Polar internally uses Apache Arrow to manage its memory model, it is very efficient.

Let’s look at how to create a chunked array.

Get hands-on with 1200+ tech skills courses.