A data manipulation library, implemented in Rust. It is a faster and more memory efficient alternative to pandas, as it is able to take advantage of all CPU cores via rayon to distribute the workload.
It has exposes the dataframe API to a lesser extent also supports SQL.
I’ve seen two orders of magnitude speedup with this tool (granted on a poorly implemented baseline).
API Flavours
Polars comes with a few flavours of API:
- An API that mimics
pandas
more closely. This allows for a quick import statement swap - An expression API that is closer to Spark1. This allows for independent composition of transformation logic
- This API has two flavours, eager execution and lazy execution. The former produces a
polars.DataFrame
while the other produces apolars.LazyFrame
- A SQL API
It is highly recommended to use the lazy API as that allows for plenty of optimizations including caching, predicate pushdown and projection pushdown. See here for list of all optimizations.
Warning
Unlike
pandas
,polars
intentional does not support the concept of indices as they believe a query’s semantics should not be affected by the state of an index.
API Flavour | Eager/Lazy | Optimizations | Streaming |
---|---|---|---|
pandas -like | Eager | Some | ❌ |
Spark-like (aka expression-based); eager | Eager | Similar to above(?) | ❌ |
Spark-like (aka expression-based); lazy | Lazy | A lot | ✅ |
todo Improve table
Streaming Support
With the lazy API also comes the a streaming functionality. This allows you to operate on out-of-core workloads. This can be done simply be parametrizing .collect(streaming=True)
.