1  First steps with Polars

1.1 Installation

Start by setting up a project with uv, a fast Python package manager, and install the latest stable Polars release:

uv pip install polars

Depending on your use case, you might want to install the optional dependencies as well:

uv pip install 'polars[numpy,pandas,pyarrow]'

To inspect the installed version and enabled features, use show_versions():

import polars as pl
pl.show_versions()
--------Version info---------
Polars:              1.40.1
Index type:          UInt32
Platform:            Linux-6.8.0-117-generic-x86_64-with-glibc2.39
Python:              3.12.3 (main, Mar 23 2026, 19:04:32) [GCC 13.3.0]
Runtime:             rt32

----Optional dependencies----
Azure CLI            <not installed>
adbc_driver_manager  <not installed>
altair               <not installed>
azure.identity       <not installed>
boto3                <not installed>
cloudpickle          <not installed>
connectorx           0.4.5
deltalake            <not installed>
fastexcel            0.20.2
fsspec               <not installed>
gevent               <not installed>
google.auth          <not installed>
great_tables         <not installed>
matplotlib           <not installed>
numpy                2.4.4
openpyxl             3.1.5
pandas               3.0.2
polars_cloud         <not installed>
pyarrow              24.0.0
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           2.0.49
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           3.2.9

1.2 Initial exploration

Throughout this book, Polars’ configuration options are set to keep output compact and readable:

  • Limit displayed columns and rows to 10 each
  • Apply rounded-corner table formatting
pl.Config.set_tbl_cols(10)
pl.Config.set_tbl_rows(10)
pl.Config.set_tbl_formatting(rounded_corners=True)
polars.config.Config

1.2.1 Data loading

Reading a CSV file using Polars is straightforward. Let’s take a quick look:

flights = pl.read_csv('./data/flights.csv')

Think of pl.read_csv() as readr::read_csv(). Once loaded, a DataFrame behaves like a tibble: it prints compactly, shows dtypes under column names, and has no row names.

Like dplyr’s glimpse(), Polars provides the same method for a compact column-by-column overview :

# Glimpse the first 10 columns (the `flights[:, 0:10]` selects all rows and the first 10 columns, akin to `flights[, 1:10]` in base R)
flights[:,0:10].glimpse()
Rows: 999
Columns: 10
$ Year                        <i64> 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022
$ Quarter                     <i64> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
$ Month                       <i64> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
$ DayofMonth                  <i64> 14, 15, 16, 17, 18, 19, 20, 21, 22, 23
$ DayOfWeek                   <i64> 5, 6, 7, 1, 2, 3, 4, 5, 6, 7
$ FlightDate                  <str> '2022-01-14', '2022-01-15', '2022-01-16', '2022-01-17', '2022-01-18', '2022-01-19', '2022-01-20', '2022-01-21', '2022-01-22', '2022-01-23'
$ Reporting_Airline           <str> 'YX', 'YX', 'YX', 'YX', 'YX', 'YX', 'YX', 'YX', 'YX', 'YX'
$ DOT_ID_Reporting_Airline    <i64> 20452, 20452, 20452, 20452, 20452, 20452, 20452, 20452, 20452, 20452
$ IATA_CODE_Reporting_Airline <str> 'YX', 'YX', 'YX', 'YX', 'YX', 'YX', 'YX', 'YX', 'YX', 'YX'
$ Tail_Number                 <str> 'N119HQ', 'N122HQ', 'N412YX', 'N405YX', 'N420YX', 'N446YX', 'N116HQ', 'N419YX', 'N137HQ', 'N110HQ'


Familiar methods from R — head(), tail(), and describe() (similar to summary()) — work the same way:

flights.head()
shape: (5, 110)
Year Quarter Month DayofMonth DayOfWeek Div5TotalGTime Div5LongestGTime Div5WheelsOff Div5TailNum
i64 i64 i64 i64 i64 str str str str str
2022 1 1 14 5 null null "" "" null
2022 1 1 15 6 null null "" "" null
2022 1 1 16 7 null null "" "" null
2022 1 1 17 1 null null "" "" null
2022 1 1 18 2 null null "" "" null
flights.tail()
shape: (5, 110)
Year Quarter Month DayofMonth DayOfWeek Div5TotalGTime Div5LongestGTime Div5WheelsOff Div5TailNum
i64 i64 i64 i64 i64 str str str str str
2022 1 1 12 3 null null "" "" null
2022 1 1 13 4 null null "" "" null
2022 1 1 14 5 null null "" "" null
2022 1 1 17 1 null null "" "" null
2022 1 1 18 2 null null "" "" null
flights.describe()
shape: (9, 111)
statistic Year Quarter Month DayofMonth Div5TotalGTime Div5LongestGTime Div5WheelsOff Div5TailNum
str f64 f64 f64 f64 str str str str str
"count" 999.0 999.0 999.0 999.0 "0" "0" "999" "999" "0"
"null_count" 0.0 0.0 0.0 0.0 "999" "999" "0" "0" "999"
"mean" 2022.0 1.0 1.0 16.2002 null null null null null
"std" 0.0 0.0 0.0 8.802666 null null null null null
"min" 2022.0 1.0 1.0 1.0 null null "" "" null
"25%" 2022.0 1.0 1.0 9.0 null null null null null
"50%" 2022.0 1.0 1.0 16.0 null null null null null
"75%" 2022.0 1.0 1.0 24.0 null null null null null
"max" 2022.0 1.0 1.0 31.0 null null "" "" null


In R you pipe with |> or %>%. In Polars you chain methods with .flights.head() is flights |> head() in R. Polars’ method chaining will be introduced in the next section.


If you want to take a peek at different parts of your DataFrame, here’s a handy trick: use the sample() method. This method randomly picks n number of rows from the DataFrame and returns them for inspection.

flights.sample(3)
shape: (3, 110)
Year Quarter Month DayofMonth DayOfWeek Div5TotalGTime Div5LongestGTime Div5WheelsOff Div5TailNum
i64 i64 i64 i64 i64 str str str str str
2022 1 1 14 5 null null "" "" null
2022 1 1 2 7 null null "" "" null
2022 1 1 27 4 null null "" "" null


The output from Polars comes with some useful features:

  • Underneath each column name is a data type.

  • No index numbers are present.

  • String values are quoted with double quotes.

  • Missing values are represented as null, applicable to all data types.

1.2.2 Row and column counting

Determining the number of rows and columns in a Polars DataFrame is as simple as checking the shape:

flights.shape
(999, 110)

1.2.3 Understanding data structure

The two fundamental data structures in Polars are Series and DataFrame. A Series is a 1-dimensional typed array, akin to R’s atomic vector — all elements must share the same dtype. A DataFrame is built from named Series columns.

In practice, you’ll rarely construct Series by hand; they appear as columns inside a DataFrame. The quick way to create a small DataFrame is from a dictionary, which maps directly to tibble::tibble():

import numpy as np
num_rows = 5000
rng = np.random.default_rng(seed=7)

buildings = pl.DataFrame({
    "sqft": rng.exponential(scale=1000, size=num_rows),
    "year": rng.integers(low=1995, high=2023, size=num_rows),
    "building_type": rng.choice(["A", "B", "C"], size=num_rows),
})
buildings
shape: (5_000, 3)
sqft year building_type
f64 i64 str
707.529256 1996 "C"
1025.203348 2020 "C"
568.548657 2012 "A"
895.109864 2000 "A"
206.532754 2011 "A"
710.435755 2003 "C"
408.872783 2009 "C"
57.562059 2019 "C"
3728.088949 2020 "C"
686.678345 2011 "C"

This is the Polars equivalent of tibble(sqft = ..., year = ..., building_type = ...).

DataFrames come with several attributes for exploration:

# Get the number of rows
flights.height
999
# Get the number of columns
flights.width
110
# Get a list of column names
flights.columns[:10]
['Year',
 'Quarter',
 'Month',
 'DayofMonth',
 'DayOfWeek',
 'FlightDate',
 'Reporting_Airline',
 'DOT_ID_Reporting_Airline',
 'IATA_CODE_Reporting_Airline',
 'Tail_Number']
# Get a list of column dtypes
flights.dtypes[:10]
[Int64, Int64, Int64, Int64, Int64, String, String, Int64, String, String]
# Get a Schema object mapping column names to their dtype
flights[:,:10].schema
Schema([('Year', Int64),
        ('Quarter', Int64),
        ('Month', Int64),
        ('DayofMonth', Int64),
        ('DayOfWeek', Int64),
        ('FlightDate', String),
        ('Reporting_Airline', String),
        ('DOT_ID_Reporting_Airline', Int64),
        ('IATA_CODE_Reporting_Airline', String),
        ('Tail_Number', String)])

1.3 Summary

Series and DataFrame are the two fundamental building blocks of Polars. Every operation in the chapters that follow — filtering, grouping, joining, reshaping — is expressed as a method call on one of these objects, chained together to form a readable pipeline. The next chapter puts them to work on a real-world dataset.