Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔨(notebook) add example notebook that interacts with the database #126

Merged
merged 1 commit into from
Jul 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,9 @@ venv.bak/
# Jupytext: we version converted md files not ipynb sources
*.ipynb

# Jupyter
.ipynb_checkpoints/

# -- Provisioning
provisioning/.terraform*
provisioning/terraform.tfstate*
2 changes: 2 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,8 @@ services:
NB_GID: ${DOCKER_GID:-1000}
CHOWN_HOME: 'yes'
CHOWN_HOME_OPTS: -R
env_file:
- env.d/notebook
ports:
- 8888:8888
volumes:
Expand Down
1 change: 1 addition & 0 deletions env.d/notebook
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
DATABASE_URL=postgresql+psycopg://qualicharge:pass@postgresql:5432/qualicharge-api
8 changes: 6 additions & 2 deletions src/notebook/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
# -- Custom image --
FROM jupyter/base-notebook
FROM quay.io/jupyter/base-notebook:notebook-7.2.1

# Install base dependencies
#
# FIXME: jupytext 1.16.4+ seems to fix the issue but is not released yet
# see: https://github.com/mwouts/jupytext/issues/1260
RUN mamba install --yes \
duckdb \
geopandas \
jupytext \
jupytext==1.16.2 \
matplotlib \
pandas \
psycopg[binary,pool] \
seaborn
145 changes: 145 additions & 0 deletions src/notebook/example.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
---
jupyter:
jupytext:
formats: ipynb,md
text_representation:
extension: .md
format_name: markdown
format_version: '1.3'
jupytext_version: 1.16.2
kernelspec:
display_name: Python 3 (ipykernel)
language: python
name: python3
---

# QualiCharge data: an example notebook

This notebook aims to be an example notebook used as a starting point for a new analysis or indicator calculation. It provides code snippets and examples to fetch and record data from our PostgreSQL database.

## Create the database engine

```python
import os
from sqlalchemy import create_engine

# Get database URL from the environment
database_url = os.getenv("DATABASE_URL")

# Create a database engine that will be used to generate connections
engine = create_engine(database_url)
```

## Fetch data from the database

### Example 1: generate a stations map using GeoPandas

```python
from geopandas import GeoDataFrame

query = """
SELECT
Station.nom_station,
Station.id_station_itinerance,
Amenageur.nom_amenageur as amenageur,
Localisation."coordonneesXY" as geom
FROM
Station
INNER JOIN Localisation ON Station.localisation_id = Localisation.id
INNER JOIN Amenageur ON Station.amenageur_id = Amenageur.id
"""


with engine.connect() as conn:
# Query a PostgreSQL database using the PostGIS extension
stations = GeoDataFrame.from_postgis(query, conn)

print(f"Loaded {len(stations.index)} stations")
stations.sample(10)
```

```python
# Display an interactive map of the stations
stations.explore(column="amenageur")
```

### Example 2: explore operators distribution

```python
import pandas as pd

query = """
SELECT
Operateur.nom_operateur,
PointDeCharge.id_pdc_itinerance
FROM
PointDeCharge
INNER JOIN Station ON PointDeCharge.station_id = Station.id
INNER JOIN Operateur ON Station.operateur_id = Operateur.id
"""

with engine.connect() as conn:
# Query a PostgreSQL database using the PostGIS extension
pdcs = pd.read_sql_query(query, conn)

print(f"Loaded {len(pdcs.index)} points of charge")
pdcs.sample(10)
```

```python
import seaborn as sns

# Render a barplot with the number of points of charge by operator
sns.barplot(data=pdcs.value_counts("nom_operateur"))
```

## Write data to the database

### Example 1: create a new table with calculated indicator

In this example, we will calculate the number of points of charge per French department at a particular date/time (now) and store this stateful snapshot in the database.

```python
import uuid
import pandas as pd

# Get the city code insee for each point of charge
query = """
SELECT
Localisation.code_insee_commune
FROM
PointDeCharge
INNER JOIN Station ON PointDeCharge.station_id = Station.id
INNER JOIN Localisation ON Station.localisation_id = Localisation.id
"""
with engine.connect() as conn:
# Query a PostgreSQL database using the PostGIS extension
codes_insee = pd.read_sql_query(query, conn)

# Add a department column
codes_insee["department"] = codes_insee["code_insee_commune"].map(lambda x: int(x[:2]) if x else None)

# Calculate our indicator and add a timestamp to each department counts (row)
indicator = codes_insee.value_counts("department").to_frame().reset_index()
indicator["calculated_at"] = pd.Timestamp.now()

# Set UUIDs as the index
indicator["uuid"] = indicator.apply(lambda _: uuid.uuid4(), axis=1)
indicator.set_index("uuid", inplace=True)

# Explictly set the department column as integers
indicator = indicator.astype({"department": "int32"})
indicator
```

```python
# Save the indicator to a (new) table
indicator.to_sql("IDepartmentDynamic", engine, if_exists="append")
```

```python
# Check inserted results
query = 'SELECT * FROM "IDepartmentDynamic" WHERE department = 75'
paris = pd.read_sql_query(query, engine)
paris
```
Loading