marp | author | theme | class | paginate | math |
---|---|---|---|---|---|
true |
Ian Lanham |
gaia |
invert |
true |
mathjax |
- Who uses Python regularly?
- How many people use Jupyter Notebook/Lab?
- How many people use Anaconda?
- What's your editor of choice?
- How many have installed a library with
pip
? - How many people create virtual environments for their Python projects?
Ian Lanham @[email protected] GitHub: https://github.com/ilanham LinkedIn: https://www.linkedin.com/in/ian-lanham/
I've been a data professional for ~10 years, starting as a DBA on SQL Server and PostgreSQL. In 2023, I earned my Master's Degree in Data Analytics from UCF. It opened my eyes about how data science and its use can impact decision making across different business sectors.
I got hooked on Python in 2017, and haven't stopped talking about it since.
- When working in Python, you tend to start using different libraries to solve different problem scopes
- I want to use a database ORM/scrape websites/run OCR on images/train a neural network
- You can do all of these with Python, sometimes in the same script
- Jupyter has a lot of dependencies
- Other packages (almost) always have dependencies to install:
- dbt-core and flask require both Jinja2 and click (at different versions)
- Currently dagster doesn't run on Python 3.12, but listed in requirements
What does it look like?
- Understanding Python Packages pip Dependency Resolver and Version Conflicts (with Solutions) - codingshower.com
- The author shows the problem by installing two projects with a conflicting dependency from a
requirements.txt
file
Since PEP 621 they can appear in the pyproject.tml
file inside the root of the modules' repo
# scikit-learn's pyproject.toml file
[build-system]
requires = [
"setuptools",
...
"numpy>=1.25",
"scipy>=1.6.0",
]
- Prior to PEP 621, dependencies in
setup.py
-
We run into this when trying to install dependencies a few ways:
- Install everything into the default Python environment
- For notebooks: install Jupyter standalone and create a virtual environment for everything else
- Create a new virtual environment for everything
- I do this, it can get annoying
- Jupyter gets duplicated in each virtual environment
-
To help with this, we have tools
The cause (and sometimes solution to) the problem
- Installs packages to our Python environment
- Typical step 1 of 99.99% of Python tutorials:
pip install <something>
- Installs packages from the Python Package Index (PyPI)
- Part of the Python standard library since 3.3, included after install from Python.org
- What is a virtual environment?
- "A folder structure that contains an isolated environment of Python and its dependencies"
- YouTube - ArjanCodes - How to Create and Use Virtual Environments in Python With Poetry
- Deeper dive: Real Python - Python Virtual Environments: A Primer
- tl;dr: It isolates Python binaries &
pip
installs to a folder structure - You can create a new venv and install different package versions from PyPI
python3 -m venv my_new_environment
- When activating a venv, the environment's Python binary is at the top of your environment path
Solution 2
- A project used prior to
venv
, starting with Python 2 - Much faster than
venv
when creating environments - I can also specify a version of Python different than what I have installed:
virtualenv venv39 -p python3.9
A solution for a different problem
- Comes with Anaconda, a distribution of Python geared towards scientific computing, data science, and machine learning
- Can install non-Python libraries (C, R, Rust, Julia, etc.)
conda install rust --channel conda-forge
- More of a replacement for
pip
and PyPI, they manage their own versions of populay PyPI packages and ensure compatibility - Default install is > 4 GB(!), over 250 packages included
- Creates a default virtual environment base, activates any time you open a terminal
- To turn it off:
conda config --set auto__activate_base false
- You should do this if you have a Python install with your OS or you installed Python yourself from python.org
- To turn it off:
- You can create an environment from a .yml file similar to a requirements.txt file: Conda Docs - Managing environments
- Like a GUI for exploring packages installed into a Python virtual environment
- Can also create new virtual environments from the GUI
- Helpful for those new to virtual environments
- VS Code Extension Python Environment Manager works similar to Anaconda Navigator
- Mini-conda: The conda program without 3-4 GB of extra binaries
- mamba: conda written in C++
- conda-forge: A community-driven alternative to base conda and the Anaconda channel
- Anaconda came before
pip
had a lot of the features and libraries it does now- 1.0 release was in 2012, comparable latest Python release was 3.2
pip
has matured greatly since the release of conda
- Two separate products:
- Jupyter Notebook (older style, most common)
- JupyterLab
- Depending on how you use Jupyter (native, webhosted, VS Code plugin), you can run into lots of problems managing conflicting dependencies in the same notebook
- If using, Conda (and conda-forge) handle the version control for us
- Anaconda.cloud
- Google Colab
- Sign in with a GMail account
- Recommended by François Chollet in "Deep Learning with Python"
- JetBrains Data Lore
- AWS SageMaker Studio Lab
- Azure Machine Learning Workspace
- These two have more pre-reqs to get started
- Has it's own hosted JupyterLab workspace
- Also a large directory of Jupyter plugins
- Once you create a domain (combination of private notebook space, block storage, and compute engine), you can get started
- re: a Fast Launch instance - probably go get a cup of coffee
- Most similar to AWS in its initial setup, more explicit about what it requires
- It needs AppInsights, Storage Acct, & Key Vault
- A little more to juggle - Data assets have to be defined with the API version you want to be compatible with
Pre-stamped code to work with your Cloud Data™ locally:
- Creator of
pipx
, Chad Smith, has a table with all of these projects and their strengths, weakenesses, and usecases - The question to answer here is: "Do I want to develop Python modules to distribute over PyPI (or a private repository)?"
- If yes, go through that list, find what meets your needs
- Can be installed by
pipx
- Comes with pip-compile, input requirements by a rough list of packages
- i.e. a text file with just "bs4, requests" without versions or underlying dependencies
- pip-compile creates a full requirements.txt file from a rougher "requirements.in" file
- It focuses on modules that can be called from the command line
- A great alternative to creating a separate virtual environment for just one tool
- The tools installed with
pipx
are available gloablly
- A dependency and environment manager
- Also a build system
- It can also help you publish to PyPI
- Hatch
- Rye
- pyenv
- PDM
These belong at the beginning of the slide deck, but I wanted these to be a lasting impression: