Skip to content

Commit

Permalink
🔨(notebook) add example notebook that interacts with the database
Browse files Browse the repository at this point in the history
We will fetch and write data from a PostgreSQL database to calculate
various indicators and perform data analysis. Current work is suite of
code snippets that can be used to do so.
  • Loading branch information
jmaupetit committed Jul 22, 2024
1 parent 9be3357 commit e5fd4f1
Show file tree
Hide file tree
Showing 5 changed files with 157 additions and 2 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,9 @@ venv.bak/
# Jupytext: we version converted md files not ipynb sources
*.ipynb

# Jupyter
.ipynb_checkpoints/

# -- Provisioning
provisioning/.terraform*
provisioning/terraform.tfstate*
2 changes: 2 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,8 @@ services:
NB_GID: ${DOCKER_GID:-1000}
CHOWN_HOME: 'yes'
CHOWN_HOME_OPTS: -R
env_file:
- env.d/notebook
ports:
- 8888:8888
volumes:
Expand Down
1 change: 1 addition & 0 deletions env.d/notebook
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
DATABASE_URL=postgresql+psycopg://qualicharge:pass@postgresql:5432/qualicharge-api
8 changes: 6 additions & 2 deletions src/notebook/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
# -- Custom image --
FROM jupyter/base-notebook
FROM quay.io/jupyter/base-notebook:notebook-7.2.1

# Install base dependencies
#
# FIXME: jupytext 1.16.4+ seems to fix the issue but is not released yet
# see: https://github.com/mwouts/jupytext/issues/1260
RUN mamba install --yes \
duckdb \
geopandas \
jupytext \
jupytext==1.16.2 \
matplotlib \
pandas \
psycopg[binary,pool] \
seaborn
145 changes: 145 additions & 0 deletions src/notebook/example.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
---
jupyter:
jupytext:
formats: ipynb,md
text_representation:
extension: .md
format_name: markdown
format_version: '1.3'
jupytext_version: 1.16.2
kernelspec:
display_name: Python 3 (ipykernel)
language: python
name: python3
---

# QualiCharge data: an example notebook

This notebook aims to be an example notebook used as a starting point for a new analysis or indicator calculation. It provides code snippets and examples to fetch and record data from our PostgreSQL database.

## Create the database engine

```python
import os
from sqlalchemy import create_engine

# Get database URL from the environment
database_url = os.getenv("DATABASE_URL")

# Create a database engine that will be used to generate connections
engine = create_engine(database_url)
```

## Fetch data from the database

### Example 1: generate a stations map using GeoPandas

```python
from geopandas import GeoDataFrame

query = """
SELECT
Station.nom_station,
Station.id_station_itinerance,
Amenageur.nom_amenageur as amenageur,
Localisation."coordonneesXY" as geom
FROM
Station
INNER JOIN Localisation ON Station.localisation_id = Localisation.id
INNER JOIN Amenageur ON Station.amenageur_id = Amenageur.id
"""


with engine.connect() as conn:
# Query a PostgreSQL database using the PostGIS extension
stations = GeoDataFrame.from_postgis(query, conn)

print(f"Loaded {len(stations.index)} stations")
stations.sample(10)
```

```python
# Display an interactive map of the stations
stations.explore(column="amenageur")
```

### Example 2: explore operators distribution

```python
import pandas as pd

query = """
SELECT
Operateur.nom_operateur,
PointDeCharge.id_pdc_itinerance
FROM
PointDeCharge
INNER JOIN Station ON PointDeCharge.station_id = Station.id
INNER JOIN Operateur ON Station.operateur_id = Operateur.id
"""

with engine.connect() as conn:
# Query a PostgreSQL database using the PostGIS extension
pdcs = pd.read_sql_query(query, conn)

print(f"Loaded {len(pdcs.index)} points of charge")
pdcs.sample(10)
```

```python
import seaborn as sns

# Render a barplot with the number of points of charge by operator
sns.barplot(data=pdcs.value_counts("nom_operateur"))
```

## Write data to the database

### Example 1: create a new table with calculated indicator

In this example, we will calculate the number of points of charge per French department at a particular date/time (now) and store this stateful snapshot in the database.

```python
import uuid
import pandas as pd

# Get the city code insee for each point of charge
query = """
SELECT
Localisation.code_insee_commune
FROM
PointDeCharge
INNER JOIN Station ON PointDeCharge.station_id = Station.id
INNER JOIN Localisation ON Station.localisation_id = Localisation.id
"""
with engine.connect() as conn:
# Query a PostgreSQL database using the PostGIS extension
codes_insee = pd.read_sql_query(query, conn)

# Add a department column
codes_insee["department"] = codes_insee["code_insee_commune"].map(lambda x: int(x[:2]) if x else None)

# Calculate our indicator and add a timestamp to each department counts (row)
indicator = codes_insee.value_counts("department").to_frame().reset_index()
indicator["calculated_at"] = pd.Timestamp.now()

# Set UUIDs as the index
indicator["uuid"] = indicator.apply(lambda _: uuid.uuid4(), axis=1)
indicator.set_index("uuid", inplace=True)

# Explictly set the department column as integers
indicator = indicator.astype({"department": "int32"})
indicator
```

```python
# Save the indicator to a (new) table
indicator.to_sql("IDepartmentDynamic", engine, if_exists="append")
```

```python
# Check inserted results
query = 'SELECT * FROM "IDepartmentDynamic" WHERE department = 75'
paris = pd.read_sql_query(query, engine)
paris
```

0 comments on commit e5fd4f1

Please sign in to comment.