-
Notifications
You must be signed in to change notification settings - Fork 27
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
a184d31
commit d24abad
Showing
2 changed files
with
141 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,86 @@ | ||
:::{.callout-tip collapse="true" icon=false} | ||
## Introduction to MLflow concepts | ||
|
||
:::{.incremental} | ||
1. In `JupyterLab`, open the notebook located at `formation-mlops/notebooks/mlflow-introduction.ipynb` | ||
2. Execute the notebook cell by cell. If you are finished early, explore the `MLflow` UI and try to build your own experiments from the example code provided in the notebook. | ||
::: | ||
:::{.nonincremental} | ||
:::: {.callout-tip collapse="true" icon=false} | ||
## Part 1: Logging business metrics | ||
|
||
1. Using the [logging](https://docs.python.org/3/library/logging.html) package, add logs to your API. For each request, display the label to be coded as well as the responses returned by your API. To do this, modify the `app/main.py` file. | ||
|
||
<details> | ||
<summary> | ||
<font size=\"3\" color=\"darkgreen\"><b>Click to see the steps to complete </b></font> | ||
</summary> | ||
|
||
1. Import the logging package: | ||
|
||
```{.python filename="main.py"} | ||
import logging | ||
``` | ||
|
||
2. Set up your logging configuration before defining your first entry point: | ||
|
||
```{.python filename="main.py"} | ||
logging.basicConfig( | ||
level=logging.INFO, | ||
format="%(asctime)s - %(levelname)s - %(message)s", | ||
handlers=[ | ||
logging.FileHandler("log_file.log"), | ||
logging.StreamHandler(), | ||
], | ||
) | ||
``` | ||
|
||
3. Add the label and the API response to your logs: | ||
|
||
```{.python filename="main.py"} | ||
# Logging | ||
logging.info(f"{{'Query': {description}, 'Response': {predictions[0]}}}") | ||
``` | ||
|
||
</details> | ||
|
||
2. Commit your changes and push them to your remote repository. | ||
|
||
3. Whenever you make a change to your API, it needs to be redeployed for the changes to take effect. In theory, it would be necessary to rebuild a new image for our API containing the latest adjustments. To simplify, we have already built the two images with and without logs in the API. Until now you have used the image without logs, redeploy your API using the image with logs tagged as `logs`. | ||
|
||
<details> | ||
<summary> | ||
<font size="3" color="darkgreen"><b>Click to see the steps to complete </b></font> | ||
</summary> | ||
|
||
1. In the `kubernetes/deployment.yml` file, replace the `no-logs` tag with the `logs` tag: | ||
|
||
---{yaml code-line-numbers="8" filename="deployment.yml"} | ||
template: | ||
metadata: | ||
labels: | ||
app: codification-api | ||
spec: | ||
containers: | ||
- name: api | ||
image: inseefrlab/formation-mlops:logs | ||
imagePullPolicy: Always | ||
--- | ||
|
||
2. Commit your changes and push them to your remote repository. | ||
|
||
3. Wait 5 minutes for `ArgoCD` to automatically synchronize the changes from your Github repository or force synchronization. | ||
|
||
</details> | ||
|
||
4. Run your `predict-api.py` script. | ||
|
||
<details> | ||
<summary> | ||
<font size="3" color="darkgreen"><b>Click to see the command </b></font> | ||
</summary> | ||
|
||
```shell | ||
python formation-mlops/src/predict-api.py | ||
``` | ||
</details> | ||
|
||
5. In ArgoCD, open your application and click on your pod that should start with `"codification-api-..."`. Observe the logs. | ||
|
||
:::: | ||
::: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,63 @@ | ||
:::{.callout-tip collapse="true" icon=false} | ||
## Introduction to MLflow concepts | ||
## Part 2: Creating a monitoring dashboard | ||
|
||
:::::{.nonincremental} | ||
|
||
1. We will use [`Quarto Dashboards`](https://quarto.org/docs/dashboards/). Open the `dashboard/index.qmd` file and inspect the code. To retrieve the data needed to create the dashboard, we use a *serverless* DBMS: `DuckDB`. `DuckDB` allows us to run `SQL` queries on a `.parquet` file containing parsed logs. This file contains one row per prediction, with the variables `timestamp`, `text`, `prediction_1`, `proba_1`, `prediction_2`, and `proba_2`. | ||
|
||
2. To visualize the dashboard, enter the following commands in a `Terminal` from the project root and click on the generated link. | ||
|
||
```sh | ||
cd dashboard | ||
quarto preview index.qmd | ||
``` | ||
|
||
3. Currently, the percentage of predictions with a probability greater than 0.8 does not correspond to reality. Modify the SQL query to obtain the `pct_predictions` variable to display the correct value. | ||
|
||
<details> | ||
<summary> | ||
<font size=\"3\" color=\"darkgreen\"><b>Click to see the answer </b></font> | ||
</summary> | ||
|
||
```python | ||
pct_predictions = duckdb.sql( | ||
""" | ||
SELECT 100 * COUNT(*) / COUNT(*) | ||
FROM data; | ||
""" | ||
).fetchall()[0][0] | ||
``` | ||
|
||
</details> | ||
|
||
4. The two charts at the bottom of the dashboard are also incorrect. Modify the SQL query to obtain the `daily_stats` variable to display the correct charts. | ||
|
||
<details> | ||
<summary> | ||
<font size=\"3\" color=\"darkgreen\"><b>Click to see the answer </b></font> | ||
</summary> | ||
|
||
```python | ||
daily_stats = duckdb.sql( | ||
""" | ||
SELECT | ||
CAST(timestamp AS DATE) AS date, | ||
COUNT(*) AS n_liasses, | ||
( | ||
COUNT( | ||
CASE WHEN data.proba_1 > 0.8 THEN 1 END | ||
) * 100.0 / COUNT(*) | ||
) AS pct_high_proba | ||
FROM data | ||
GROUP BY CAST(timestamp AS DATE); | ||
""" | ||
).to_df() | ||
``` | ||
|
||
5. Notice the changes made to the dashboard. | ||
|
||
</details> | ||
|
||
:::{.incremental} | ||
1. In `JupyterLab`, open the notebook located at `formation-mlops/notebooks/mlflow-introduction.ipynb` | ||
2. Execute the notebook cell by cell. If you are finished early, explore the `MLflow` UI and try to build your own experiments from the example code provided in the notebook. | ||
::: | ||
|
||
::: |