A data manipulation library, implemented in Rust. It is a faster and more memory efficient alternative to pandas, as it is able to take advantage of all CPU cores via rayon to distribute the workload.

It has exposes the dataframe API to a lesser extent also supports SQL.

I’ve seen two orders of magnitude speedup with this tool (granted on a poorly implemented baseline).

API Flavours

Polars comes with a few flavours of API:

  1. An API that mimics pandas more closely. This allows for a quick import statement swap
  2. An expression API that is closer to Spark1. This allows for independent composition of transformation logic
  3. This API has two flavours, eager execution and lazy execution. The former produces a polars.DataFrame while the other produces a polars.LazyFrame
  4. A SQL API

It is highly recommended to use the lazy API as that allows for plenty of optimizations including caching, predicate pushdown and projection pushdown. See here for list of all optimizations.

Warning

Unlike pandas, polars intentional does not support the concept of indices as they believe a query’s semantics should not be affected by the state of an index.

API FlavourEager/LazyOptimizationsStreaming
pandas-likeEagerSome
Spark-like (aka expression-based); eagerEagerSimilar to above(?)
Spark-like (aka expression-based); lazyLazyA lot

todo Improve table

Streaming Support

With the lazy API also comes the a streaming functionality. This allows you to operate on out-of-core workloads. This can be done simply be parametrizing .collect(streaming=True).

Footnotes

  1. duckdb also supports this API to a lesser extent