Summary & references
Summary
This book is a practical guide to the Polars DataFrame library for Python, written for R and tidyverse users. Rather than starting from scratch, it maps Polars concepts directly onto what you already know — dplyr verbs, tidyr reshaping, lubridate datetime handling, and stringr string operations — so the learning curve is a translation, not a reinvention.
The chapters cover:
- First steps — installation, reading data, and inspecting DataFrames
- Data manipulation — the full single-table toolkit: filtering, selecting, mutating, aggregating, sorting, joining, concatenating, pivoting, handling missing values, string methods, datetime operations, and conditional expressions
- Import / Export — reading and writing CSV, Parquet, Excel, SQL databases, and cloud storage (S3, GCS, Azure)
- Lazy API — deferred execution, query optimisation (predicate and projection pushdown), benchmarking, and streaming for larger-than-memory data
The Zen of Polars and tidyverse design principles
Polars’ design philosophy and the tidyverse’s tidy design principles are more than superficially similar — they share a common conviction that API design shapes how people think about data.
Both prioritise readability over brevity: a Polars method chain reads left-to-right like a dplyr pipe, and both ecosystems resist cryptic shortcuts in favour of self-documenting code. Both favour explicit over implicit: Polars never coerces types silently (contrast R’s implicit coercion hierarchy), and the tidyverse makes data flow visible through the pipe rather than hiding it in global state. Both aim for predictable, pure operations — Polars’ immutable DataFrames echo the tidyverse’s preference for functions that return new objects rather than modifying in place. And both treat minimising ambiguity as a first-class goal: one function, one clear purpose, consistent argument names.
The result is that switching from dplyr to Polars feels less like learning a new tool and more like translating a familiar vocabulary into a new dialect — the underlying grammar of tidy data transformation is the same.
Polars’ design principles in brief:
- explicit over implicit
- aim for a single return dtype per expression
- API should nudge to fast code
- pure over in-place
- underscore over concatenated words
- minimise ambiguity
References
Here is a compilation of resources — talks, articles, blogs, documentation, and tutorials — used in crafting this book.
Polars
- Must watch: Ritchie Vink’s (author of Polars) keynote at EuroSciPy 2023
- Polars user guide
- Polars Python API reference
- A bird’s eye view of Polars
- Understanding Polars data types
- Modern Polars
- Cookbook Polars for R
- Awesome Polars — a curated list of docs, talks, tools and articles
- Real Python — Python Polars: A lightning-fast DataFrame library
- Real Python — Polars GroupBy
- Real Python — Polars missing data
- Practical Business Python — Introduction to Polars
- The Polars vs pandas difference nobody is talking about
- LazyFrame vs DataFrame performance comparison
- Date and datetime manipulation in Polars
R / tidyverse
- R for Data Science (2e) — Hadley Wickham, Mine Çetinkaya-Rundel, Garrett Grolemund
- Tidy design principles
- dplyr documentation
- tidyr documentation
- lubridate documentation
- stringr documentation
- The split-apply-combine strategy for data analysis — Hadley Wickham, Journal of Statistical Software (2011)
- Per-operation grouping with
.by
Python Polars — chapter references
- Data manipulation: Polars expressions and contexts, Polars selectors
- Strings: Polars string functions
- Datetime: Polars temporal expressions
- Missing values: Polars missing data
- Pivoting: Polars pivot transformations
- Lazy API: Polars lazy API concepts
- I/O — Excel: Polars Excel I/O
- I/O — Database: Polars database I/O
- I/O — Cloud: Polars cloud storage I/O