Run several processes with differnet libraries. The time it takes is estimated over several runs of the same code.
Here is another interactive site comparing pandas and polars
https://dash-polars-pandas-docker.onrender.com/speed-comparison
-
System:
Mackbook Pro 15 Inch, 2019
Processor: 2.3 GHz 8-Core Intel Core i9 Memory: 16 GB 2400 MHz DDR4 macOS: Sonoma 14.5 Beta (23F5049f)
-
Data source:
Western Pennsylvania Regional Data Center
Cumulative Crash Data:
Given the size of the original dataset is larger than normal limit in github, this command can be run from the notebook to download the data and generate files of different sizes:
exec(open("get_data.py").read())
-
Documentation
https://docs.pola.rs/py-polars/html/reference/dataframe/api/polars.DataFrame.clone.html
-
Installation
!pip install polars
-
Documentation
-
Installation
!pip install datatable
-
Documentation
-
Installation (with dask)
!pip install modin
!pip install "dask[distributed]" --upgrade
-
Set up (with dask)
import os
os.environ['MODIN_ENGINE'] = 'dask'
import modin.pandas as md
-
Documentation
https://pypi.org/project/vaex/
Currently (20240429) Vaex is not supported for python 3.12 (vaexio/vaex#2397)
-
Installation
!pip install vaex
or
!conda install -c conda-forge vaex