A tidyverse user’s guide to polars library

Author

An Chu

Published

February 4, 2024

Preface

This book was inspired by a series of blog posts recommending a Pythonic data science stack that mimics the comfort and familiarity of R’s tidyverse tools. In particular, this book explores the Polars library, a rising star in the DataFrame space (alongside established players like Pandas, Dask, Modin, Ray, and Vaex).

In a nutshell, Polars is defined as a query engine with a DataFrame front-end. It offers a rich set of intuitive functions and principled workflows for data manipulation and analysis. Designed from the ground up with performance in mind, Polars is also noted for its lightning-fast execution speed. Despite its quick rise in popularity, Polars is still in its early stages of development. Polars has finally reached the milestone of version 1.0

The majority of this book contains structured examples of data wrangling tasks that demonstrate idiomatic Polars and related dplyr/tidyr code for comparison. Examples might include discussion on API choices.

Code
import polars as pl
from lets_plot import *
LetsPlot.setup_html(no_js=True)

def okabeito(n):
    pal = ["#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7", "#999999", "#000000"]
    return pal[0:n]

stars = (
    pl.read_csv('./data/star-history-202653.csv', has_header=True)
    .rename({'Repository': 'project', 'Date': 'date', 'Stars': 'stars'})
    .with_columns(
        short_date=pl.col('date').str.slice(0, 15).str.to_date('%a %b %d %Y')
    )
    .sort('short_date')
)

(
    ggplot(stars.to_pandas(), aes(x='short_date', y='stars', color='project')) +
    geom_line(size=1.5) +
    labs(y='Github stars', title='Polars is surging in popularity', caption="Data as of May 2026") +
    scale_x_datetime(name='Date') +
    theme(legend_title=element_blank(), 
          text=element_text(family='Roboto Condensed', size=15),
         plot_title=element_text(face='bold', size=18)) +
    scale_y_continuous(breaks=[0, 10_000, 20_000, 30_000, 40_000], format='0,', expand=[0.01,0.01]) +
    scale_color_manual(values=okabeito(5)) +
    ggsize(700,380)
)
2010 2015 2020 2025 0 10,000 20,000 30,000 40,000 Polars is surging in popularity Data as of May 2026 Github stars Date pandas-dev/pandas dask/dask vaexio/vaex modin-project/modin pola-rs/polars

Run the code from this book

  • Clone the book repository:
git clone https://github.com/chuvanan/cookbook-python-polars.git python-polars-cookbook
cd python-polars-cookbook
  • Install uv then sync dependencies:
uv sync
python download-data.py

Credit

This work builds upon the organizational style of https://ddotta.github.io/cookbook-rpolars/ but the content and examples are tailored to Polars library in Python. So credit goes to Damien Dotta, all remaining errors are mine.

Contributing

Feel free to open an issue if you notice any problems with this book. It’s free and open source, and your feedback is valuable to me.