Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ratio Metrics / Derived Metrics #1297

Open
shangyian opened this issue Jan 30, 2025 · 1 comment
Open

Ratio Metrics / Derived Metrics #1297

shangyian opened this issue Jan 30, 2025 · 1 comment

Comments

@shangyian
Copy link
Contributor

A derived metric is defined using an expression that references existing metrics. Existing metrics are defined as a single aggregation function applied to a measure. However, to enable more complex analytical use cases, we can introduce derived metrics (by relaxing the requirements for our metrics today), which allow users to compose existing metrics into new ones, such as ratio metrics and other mathematical transformations.

Example

Let's consider two source nodes that represent events:

errors:
- user_id
- device_id
- timestamp

views:
- user_id
- is_valid
- timestamp
- device_id

We have a few regular metrics:

- name: total_errors
  query: SELECT COUNT(1) FROM errors
  required_dimensions: user_id

- name: total_views
  query: SELECT COUNT(is_valid) FROM views
  required_dimensions: user_id

- name: android_errors
  query: SELECT COUNT(device_id = 'Android') FROM errors 
  required_dimensions: user_id

We can define ratio metrics -- they reference the two regular metrics from above:

- name: error_rate
  query: SELECT SUM(total_errors) / SUM(total_views)

- name: android_errors
  query: SELECT SUM(android_errors) / SUM(total_errors)
@agorajek
Copy link
Member

agorajek commented Jan 30, 2025

Unless I am missing something I feel like the user_id is unnecessarily used as a required dimension in your metric examples.

But in general I like the idea and I wanted to extend the example to see if we can mix & match the required dimensions, e.g.:

- name: total_errors
  query: SELECT COUNT(1) FROM errors  # implicit GROUP BY NULL
  required_dimensions: None

- name: errors_by_device
  query: SELECT COUNT(1) FROM errors  # implicit GROUP BY device_id
  required_dimensions: device_id

- name: error_ratio_by_device (derived metric)
  query: errors_by_device / total_errors  # implicit GROUP BY device_id
  required_dimensions: device_id

And to clarify how the error_ratio_by_device (derived metric) should be computed:

SELECT
  errors_by_device.device_id, ANY(errors_by_device.errors_by_device / total_errors.total_errors)
FROM
 (
    SELECT COUNT(1) AS total_errors 
    FROM errors GROUP BY NULL
  ) AS total_errors
 JOIN
 (
    SELECT device_id, COUNT(1) AS errors_by_device 
    FROM errors GROUP BY device_id
  ) AS errors_by_device
ON 
  1 = 1 # nothing else because no dimensions are shared.
GROUP BY 
  errors_by_device.device_id

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants