Skip to content

Commit

Permalink
translate app4
Browse files Browse the repository at this point in the history
  • Loading branch information
ThomasFaria committed Jul 29, 2024
1 parent a184d31 commit d24abad
Show file tree
Hide file tree
Showing 2 changed files with 141 additions and 10 deletions.
89 changes: 83 additions & 6 deletions slides/en/applications/_application4a.qmd
Original file line number Diff line number Diff line change
@@ -1,9 +1,86 @@
:::{.callout-tip collapse="true" icon=false}
## Introduction to MLflow concepts

:::{.incremental}
1. In `JupyterLab`, open the notebook located at `formation-mlops/notebooks/mlflow-introduction.ipynb`
2. Execute the notebook cell by cell. If you are finished early, explore the `MLflow` UI and try to build your own experiments from the example code provided in the notebook.
:::
:::{.nonincremental}
:::: {.callout-tip collapse="true" icon=false}
## Part 1: Logging business metrics

1. Using the [logging](https://docs.python.org/3/library/logging.html) package, add logs to your API. For each request, display the label to be coded as well as the responses returned by your API. To do this, modify the `app/main.py` file.

<details>
<summary>
<font size=\"3\" color=\"darkgreen\"><b>Click to see the steps to complete </b></font>
</summary>

1. Import the logging package:

```{.python filename="main.py"}
import logging
```

2. Set up your logging configuration before defining your first entry point:

```{.python filename="main.py"}
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
handlers=[
logging.FileHandler("log_file.log"),
logging.StreamHandler(),
],
)
```

3. Add the label and the API response to your logs:

```{.python filename="main.py"}
# Logging
logging.info(f"{{'Query': {description}, 'Response': {predictions[0]}}}")
```

</details>

2. Commit your changes and push them to your remote repository.

3. Whenever you make a change to your API, it needs to be redeployed for the changes to take effect. In theory, it would be necessary to rebuild a new image for our API containing the latest adjustments. To simplify, we have already built the two images with and without logs in the API. Until now you have used the image without logs, redeploy your API using the image with logs tagged as `logs`.

<details>
<summary>
<font size="3" color="darkgreen"><b>Click to see the steps to complete </b></font>
</summary>

1. In the `kubernetes/deployment.yml` file, replace the `no-logs` tag with the `logs` tag:

---{yaml code-line-numbers="8" filename="deployment.yml"}
template:
metadata:
labels:
app: codification-api
spec:
containers:
- name: api
image: inseefrlab/formation-mlops:logs
imagePullPolicy: Always
---

2. Commit your changes and push them to your remote repository.

3. Wait 5 minutes for `ArgoCD` to automatically synchronize the changes from your Github repository or force synchronization.

</details>

4. Run your `predict-api.py` script.

<details>
<summary>
<font size="3" color="darkgreen"><b>Click to see the command </b></font>
</summary>

```shell
python formation-mlops/src/predict-api.py
```
</details>

5. In ArgoCD, open your application and click on your pod that should start with `"codification-api-..."`. Observe the logs.

::::
:::

62 changes: 58 additions & 4 deletions slides/en/applications/_application4b.qmd
Original file line number Diff line number Diff line change
@@ -1,9 +1,63 @@
:::{.callout-tip collapse="true" icon=false}
## Introduction to MLflow concepts
## Part 2: Creating a monitoring dashboard

:::::{.nonincremental}

1. We will use [`Quarto Dashboards`](https://quarto.org/docs/dashboards/). Open the `dashboard/index.qmd` file and inspect the code. To retrieve the data needed to create the dashboard, we use a *serverless* DBMS: `DuckDB`. `DuckDB` allows us to run `SQL` queries on a `.parquet` file containing parsed logs. This file contains one row per prediction, with the variables `timestamp`, `text`, `prediction_1`, `proba_1`, `prediction_2`, and `proba_2`.

2. To visualize the dashboard, enter the following commands in a `Terminal` from the project root and click on the generated link.

```sh
cd dashboard
quarto preview index.qmd
```

3. Currently, the percentage of predictions with a probability greater than 0.8 does not correspond to reality. Modify the SQL query to obtain the `pct_predictions` variable to display the correct value.

<details>
<summary>
<font size=\"3\" color=\"darkgreen\"><b>Click to see the answer </b></font>
</summary>

```python
pct_predictions = duckdb.sql(
"""
SELECT 100 * COUNT(*) / COUNT(*)
FROM data;
"""
).fetchall()[0][0]
```

</details>

4. The two charts at the bottom of the dashboard are also incorrect. Modify the SQL query to obtain the `daily_stats` variable to display the correct charts.

<details>
<summary>
<font size=\"3\" color=\"darkgreen\"><b>Click to see the answer </b></font>
</summary>

```python
daily_stats = duckdb.sql(
"""
SELECT
CAST(timestamp AS DATE) AS date,
COUNT(*) AS n_liasses,
(
COUNT(
CASE WHEN data.proba_1 > 0.8 THEN 1 END
) * 100.0 / COUNT(*)
) AS pct_high_proba
FROM data
GROUP BY CAST(timestamp AS DATE);
"""
).to_df()
```

5. Notice the changes made to the dashboard.

</details>

:::{.incremental}
1. In `JupyterLab`, open the notebook located at `formation-mlops/notebooks/mlflow-introduction.ipynb`
2. Execute the notebook cell by cell. If you are finished early, explore the `MLflow` UI and try to build your own experiments from the example code provided in the notebook.
:::

:::

0 comments on commit d24abad

Please sign in to comment.