Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(backend): Demonstrate metrics #1488

Draft
wants to merge 4 commits into
base: dev
Choose a base branch
from
Draft

feat(backend): Demonstrate metrics #1488

wants to merge 4 commits into from

Conversation

sylvlecl
Copy link
Member

@sylvlecl sylvlecl commented Apr 28, 2023

Description

This is a draft for defining metrics and exposing them to a prometheus instance, in order to improve the application observability.

The only metrics that are defined fro now are metrics about HTTP requests (count, duration).

Some technical details:

  • metrics are exposed at the /metrics endpoint
  • a "worker_id" label is used to enable the distinction between workers
  • since our app works with multiple processes (workers), we need to aggregate the workers metrics. For that purpose we use the facility provided by prometheus client, which uses a temporary folder to store individual workers metrics and then aggregate them. This requires the definition of an additional env variable.

In the future, we could add metrics about various things: DB sessions, tasks durations, ...

In the long run, using opentelemetry could be an alternative.

As a very basic illustration of grah in grafana:
image

@commit-lint
Copy link

commit-lint bot commented Apr 28, 2023

Features

  • backend: expose prometheus metrics (436e6d5)

Contributors

sylvlecl

Commit-Lint commands

You can trigger Commit-Lint actions by commenting on this PR:

  • @Commit-Lint merge patch will merge dependabot PR on "patch" versions (X.X.Y - Y change)
  • @Commit-Lint merge minor will merge dependabot PR on "minor" versions (X.Y.Y - Y change)
  • @Commit-Lint merge major will merge dependabot PR on "major" versions (Y.Y.Y - Y change)
  • @Commit-Lint merge disable will desactivate merge dependabot PR
  • @Commit-Lint review will approve dependabot PR
  • @Commit-Lint stop review will stop approve dependabot PR

Comment on lines 376 to 378
@staticmethod
def from_dict(data: JSON) -> "PrometheusConfig":
return PrometheusConfig(multiprocess=bool(data["multiprocess"]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pythonic way to implement multiple constructors is to use @classmethod. See this article about "Providing Multiple Constructors in Your Python Classes".

@classmethod
def from_dict(cls, data: JSON) -> "PrometheusConfig":
    return cls(multiprocess=bool(data["multiprocess"]))

Comment on lines 389 to 395
@staticmethod
def from_dict(data: JSON) -> "MetricsConfig":
return MetricsConfig(
prometheus=PrometheusConfig.from_dict(data["prometheus"])
if "prometheus" in data
else None
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use @classmethod instead

Comment on lines 234 to 235
def __init__(self, message: str) -> None:
super().__init__(message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is not really useful unless you add some information, for instance, you can use the message template here (I mean "Environment variable {_PROMETHEUS_MULTIPROCESS_ENV_VAR} must be defined for use of prometheus in a multiprocess environment")

logger = logging.getLogger(__name__)


_PROMETHEUS_MULTIPROCESS_ENV_VAR = "PROMETHEUS_MULTIPROC_DIR"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a short comment to explain the purpose of this environment variable, for instance:

# The `PROMETHEUS_MULTIPROC_DIR` environment variable is used by
# the Python Prometheus client library to configure process-level metrics
# when running in a multi-process environment.

Comment on lines 76 to 79
if _PROMETHEUS_MULTIPROCESS_ENV_VAR not in os.environ:
raise ConfigurationError(
f"Environment variable {_PROMETHEUS_MULTIPROCESS_ENV_VAR} must be defined for use of prometheus in a multiprocess environment"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check that this environment variable exists but it is not used directly in the code (but only in the client). So, a little explanation is needed.
Moreover, we should also check that it is a directory that exists (with the right permissions).

@@ -8,7 +8,7 @@ checksumdir~=1.2.0
click~=8.0.3
contextvars~=2.4
fastapi-jwt-auth~=0.5.0
fastapi[all]~=0.73.0
fastapi[all]~=0.74.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: A version upgrade would be welcome to take advantage of the latest developments that simplify the update of the Swagger API documentation. But, we need to analyse the impacts...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! This was the minimum upgrade to get the "route" information from the request

"""
Put the worker_id into an env variable for further use within the app.
"""
os.environ["APP_WORKER_ID"] = str(worker.age)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I had a choice, I would prefer GUNICORN_WORKER_ID for that.

Why do you use worker.age instead of worker.id?

worker.age is a number that indicates how many requests the worker process has handled since it was started.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understood, worker.age is a counter incremented every time gunicorn creates a new worker (for ex. when one dies).
Using the age here allows to have an almost stable set of worker IDs between application runs (after a restart of the app, for example, we sill have IDs in the range [1-8] for 8 workers), to have mostly stable metrics labels.
I don't see any worker.id, we have the PID as an alternative, but it's not stable at all between runs.

@laurent-laporte-pro
Copy link
Contributor

@sylvlecl : do you know about that : https://autometrics.dev/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants