Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event flows monitoring #4744

Merged
merged 11 commits into from
Jan 11, 2024
Merged

Event flows monitoring #4744

merged 11 commits into from
Jan 11, 2024

Conversation

scholtzan
Copy link
Collaborator

@scholtzan scholtzan commented Dec 22, 2023

Checklist for reviewer:

  • Commits should reference a bug or github issue, if relevant (if a bug is referenced, the pull request should include the bug number in the title).
  • If the PR comes from a fork, trigger integration CI tests by running the Push to upstream workflow and provide the <username>:<branch> of the fork as parameter. The parameter will also show up
    in the logs of the manual-trigger-required-for-fork CI task together with more detailed instructions.
  • If adding a new field to a query, ensure that the schema and dependent downstream schemas have been updated.
  • When adding a new derived dataset, ensure that data is not available already (fully or partially) and recommend extending an existing dataset in favor of creating new ones. Data can be available in the bigquery-etl repository, looker-hub or in looker-spoke-default.

For modifications to schemas in restricted namespaces (see CODEOWNERS):

┆Issue is synchronized with this Jira Task

@dataops-ci-bot

This comment has been minimized.

@scholtzan scholtzan marked this pull request as ready for review January 4, 2024 22:13
@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

Comment on lines +153 to +187
MERGE
`{{ project_id }}.{{ target_table }}` r
USING
event_flows f
ON
r.flow_id = f.flow_id
-- look back up to 3 days to see if a flow has seen new events and needs to be replaced
AND r.submission_date > DATE_SUB(@submission_date, INTERVAL 3 DAY)
WHEN NOT MATCHED
THEN
INSERT
(submission_date, flow_id, events, normalized_app_name, channel, flow_hash)
VALUES
(f.submission_date, f.flow_id, f.events, f.normalized_app_name, f.channel, f.flow_hash)
WHEN NOT MATCHED BY SOURCE
-- look back up to 3 days to see if a flow has seen new events and needs to be replaced
AND r.submission_date > DATE_SUB(@submission_date, INTERVAL 3 DAY)
THEN
DELETE;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a workaround for #4783

@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

Copy link
Collaborator

@akkomar akkomar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r+wc

@dataops-ci-bot

This comment has been minimized.

@scholtzan scholtzan enabled auto-merge (squash) January 10, 2024 23:11
@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot
Copy link

Integration report for "Merge branch 'main' into event_flows_monitoring"

sql.diff

Click to expand!
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/dags/bqetl_monitoring.py /tmp/workspace/generated-sql/dags/bqetl_monitoring.py
--- /tmp/workspace/main-generated-sql/dags/bqetl_monitoring.py	2024-01-11 19:35:04.000000000 +0000
+++ /tmp/workspace/generated-sql/dags/bqetl_monitoring.py	2024-01-11 19:34:43.000000000 +0000
@@ -168,6 +168,19 @@
         depends_on_past=False,
     )
 
+    monitoring_derived__event_flow_monitoring_aggregates__v1 = bigquery_etl_query(
+        task_id="monitoring_derived__event_flow_monitoring_aggregates__v1",
+        destination_table=None,
+        dataset_id="monitoring_derived",
+        project_id="moz-fx-data-shared-prod",
+        owner="[email protected]",
+        email=["[email protected]", "[email protected]"],
+        date_partition_parameter=None,
+        depends_on_past=False,
+        parameters=["submission_date:DATE:"],
+        sql_file_path="sql/moz-fx-data-shared-prod/monitoring_derived/event_flow_monitoring_aggregates_v1/script.sql",
+    )
+
     monitoring_derived__event_monitoring_aggregates__v1 = bigquery_etl_query(
         task_id="monitoring_derived__event_monitoring_aggregates__v1",
         destination_table="event_monitoring_aggregates_v1",
@@ -335,6 +348,10 @@
         wait_for_copy_deduplicate_all
     )
 
+    monitoring_derived__event_flow_monitoring_aggregates__v1.set_upstream(
+        wait_for_copy_deduplicate_all
+    )
+
     monitoring_derived__event_monitoring_aggregates__v1.set_upstream(
         wait_for_copy_deduplicate_all
     )
Only in /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived: event_flow_monitoring_aggregates_v1
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/event_flow_monitoring_aggregates_v1/metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/event_flow_monitoring_aggregates_v1/metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/event_flow_monitoring_aggregates_v1/metadata.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/event_flow_monitoring_aggregates_v1/metadata.yaml	2024-01-11 19:32:17.000000000 +0000
@@ -0,0 +1,36 @@
+friendly_name: Event Flow Monitoring Aggregates
+description: |-
+  Aggregates of event flows (funnels based on flow_id) in event pings coming from all Glean apps.
+owners:
+- [email protected]
+- [email protected]
+labels:
+  incremental: true
+  dag: bqetl_monitoring
+  owner1: akomar
+  owner2: ascholtz
+scheduling:
+  dag_name: bqetl_monitoring
+  referenced_tables:
+  - - moz-fx-data-shared-prod
+    - '*_stable'
+    - events_v1
+  date_partition_parameter: null
+  parameters:
+  - 'submission_date:DATE:'
+bigquery:
+  time_partitioning:
+    type: day
+    field: submission_date
+    require_partition_filter: true
+    expiration_days: null
+  clustering:
+    fields:
+    - normalized_app_name
+    - channel
+workgroup_access:
+- role: roles/bigquery.dataViewer
+  members:
+  - workgroup:mozilla-confidential
+references: {}
+deprecated: false
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/event_flow_monitoring_aggregates_v1/schema.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/event_flow_monitoring_aggregates_v1/schema.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/event_flow_monitoring_aggregates_v1/schema.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/event_flow_monitoring_aggregates_v1/schema.yaml	2024-01-11 19:23:08.000000000 +0000
@@ -0,0 +1,54 @@
+fields:
+- mode: NULLABLE
+  name: submission_date
+  type: DATE
+  description: The date when flow was captured
+- mode: NULLABLE
+  name: flow_id
+  type: STRING
+  description: Unique identifier for the specific flow
+- fields:
+  - fields:
+    - name: category
+      type: STRING
+      description: Event category
+    - name: name
+      type: STRING
+      description: Event name
+    - name: timestamp
+      type: TIMESTAMP
+      description: Event timestamp
+    mode: NULLABLE
+    name: source
+    type: RECORD
+    description: Source event
+  - fields:
+    - name: category
+      type: STRING
+      description: Event category
+    - name: name
+      type: STRING
+      description: Event name
+    - name: timestamp
+      type: TIMESTAMP
+      description: Event timestamp
+    mode: NULLABLE
+    name: target
+    type: RECORD
+    description: Target event
+  mode: REPEATED
+  name: events
+  type: RECORD
+  description: Flow events
+- mode: NULLABLE
+  name: normalized_app_name
+  type: STRING
+  description: The name of the app the event flow is coming from
+- mode: NULLABLE
+  name: channel
+  type: STRING
+  description: The app channel
+- mode: NULLABLE
+  name: flow_hash
+  type: STRING
+  description: Hash of the complete event flow
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/event_flow_monitoring_aggregates_v1/script.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/event_flow_monitoring_aggregates_v1/script.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/event_flow_monitoring_aggregates_v1/script.sql	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/event_flow_monitoring_aggregates_v1/script.sql	2024-01-11 19:30:08.000000000 +0000
@@ -0,0 +1,920 @@
+-- Generated via ./bqetl generate glean_usage
+-- This table aggregates event flows across Glean applications.
+DECLARE dummy INT64; -- dummy variable to indicate to BigQuery this is a script
+CREATE TEMP TABLE
+  event_flows(
+    submission_date DATE,
+    flow_id STRING,
+    normalized_app_name STRING,
+    channel STRING,
+    events ARRAY<
+      STRUCT<
+        source STRUCT<category STRING, name STRING, timestamp TIMESTAMP>,
+        target STRUCT<category STRING, name STRING, timestamp TIMESTAMP>
+      >
+    >,
+    flow_hash STRING
+  ) AS (
+    -- get events from all apps that are related to some flow (have 'flow_id' in event_extras)
+    WITH all_app_events AS (
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "firefox_desktop" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.firefox_desktop.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "firefox_desktop_background_update" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.firefox_desktop_background_update.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "firefox_desktop_background_defaultagent" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.firefox_desktop_background_defaultagent.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "pine" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.pine.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "fenix" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.fenix.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "firefox_ios" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.firefox_ios.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "reference_browser" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.reference_browser.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "firefox_fire_tv" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.firefox_fire_tv.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "firefox_reality" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.firefox_reality.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "lockwise_android" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.lockwise_android.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "lockwise_ios" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.lockwise_ios.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "mozregression" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.mozregression.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "burnham" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.burnham.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "mozphab" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.mozphab.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "firefox_echo_show" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.firefox_echo_show.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "firefox_reality_pc" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.firefox_reality_pc.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "mach" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.mach.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "focus_ios" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.focus_ios.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "klar_ios" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.klar_ios.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "focus_android" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.focus_android.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "klar_android" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.klar_android.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "bergamot" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.bergamot.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "firefox_translations" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.firefox_translations.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "mozilla_vpn" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.mozilla_vpn.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "mozillavpn_backend_cirrus" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.mozillavpn_backend_cirrus.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "glean_dictionary" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.glean_dictionary.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "mdn_yari" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.mdn_yari.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "bedrock" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.bedrock.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "viu_politica" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.viu_politica.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "treeherder" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.treeherder.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "firefox_desktop_background_tasks" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.firefox_desktop_background_tasks.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "accounts_frontend" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.accounts_frontend.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "accounts_backend" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.accounts_backend.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "monitor_cirrus" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.monitor_cirrus.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "debug_ping_view" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.debug_ping_view.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "monitor_frontend" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.monitor_frontend.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "moso_mastodon_backend" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.moso_mastodon_backend.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "moso_mastodon_web" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.moso_mastodon_web.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "tiktokreporter_ios" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.tiktokreporter_ios.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+      UNION ALL
+      SELECT DISTINCT
+        @submission_date AS submission_date,
+        ext.value AS flow_id,
+        event_category AS category,
+        event_name AS name,
+        TIMESTAMP_ADD(
+          submission_timestamp,
+          -- limit event.timestamp, otherwise this will cause an overflow
+          INTERVAL LEAST(event_timestamp, 20000000000000) MILLISECOND
+        ) AS timestamp,
+        "tiktokreporter_android" AS normalized_app_name,
+        client_info.app_channel AS channel
+      FROM
+        `moz-fx-data-shared-prod.tiktokreporter_android.events_unnested`,
+        UNNEST(event_extra) AS ext
+      WHERE
+        DATE(submission_timestamp) = @submission_date
+        AND ext.key = "flow_id"
+    ),
+    -- determine events that belong to the same flow
+    new_event_flows AS (
+      SELECT
+        @submission_date AS submission_date,
+        flow_id,
+        normalized_app_name,
+        channel,
+        ARRAY_AGG(
+          (
+            SELECT AS STRUCT
+              category AS category,
+              name AS name,
+              timestamp AS timestamp
+            LIMIT
+              100 -- limit number of events considered
+          )
+          ORDER BY
+            timestamp
+        ) AS events
+      FROM
+        all_app_events
+      GROUP BY
+        flow_id,
+        normalized_app_name,
+        channel
+    ),
+    unnested_events AS (
+      SELECT
+        new_event_flows.*,
+        event,
+        event_offset
+      FROM
+        new_event_flows,
+        UNNEST(events) AS event
+        WITH OFFSET AS event_offset
+    ),
+    -- create source -> target event pairs based on the order of when the events were seen
+    source_target_events AS (
+      SELECT
+        prev_event.flow_id,
+        prev_event.normalized_app_name,
+        prev_event.channel,
+        ARRAY_AGG(
+          STRUCT(prev_event.event AS source, cur_event.event AS target)
+          ORDER BY
+            prev_event.event.timestamp
+        ) AS events
+      FROM
+        unnested_events AS prev_event
+      INNER JOIN
+        unnested_events AS cur_event
+      ON
+        prev_event.flow_id = cur_event.flow_id
+        AND prev_event.event_offset = cur_event.event_offset - 1
+      GROUP BY
+        flow_id,
+        normalized_app_name,
+        channel
+    )
+    SELECT
+      @submission_date AS submission_date,
+      flow_id,
+      normalized_app_name,
+      channel,
+      ARRAY_AGG(event ORDER BY event.source.timestamp) AS events,
+      -- create a flow hash that concats all the events that are part of the flow
+      -- <event_category>.<event_name> -> <event_category>.<event_name> -> ...
+      ARRAY_TO_STRING(
+        ARRAY_CONCAT(
+          ARRAY_AGG(
+            CONCAT(
+              IF(event.source.category IS NOT NULL, CONCAT(event.source.category, "."), ""),
+              event.source.name
+            )
+            ORDER BY
+              event.source.timestamp
+          ),
+          [
+            ARRAY_REVERSE(
+              ARRAY_AGG(
+                CONCAT(
+                  IF(event.target.category IS NOT NULL, CONCAT(event.target.category, "."), ""),
+                  event.target.name
+                )
+                ORDER BY
+                  event.source.timestamp
+              )
+            )[SAFE_OFFSET(0)]
+          ]
+        ),
+        " -> "
+      ) AS flow_hash
+    FROM
+      (
+        SELECT
+          flow_id,
+          normalized_app_name,
+          channel,
+          event
+        FROM
+          source_target_events,
+          UNNEST(events) AS event
+        UNION ALL
+        -- some flows might go over multiple days;
+        -- use previously seen flows and combine with new flows
+        SELECT
+          flow_id,
+          normalized_app_name,
+          channel,
+          event
+        FROM
+          `moz-fx-data-shared-prod.monitoring_derived.event_flow_monitoring_aggregates_v1`,
+          UNNEST(events) AS event
+        WHERE
+          submission_date > DATE_SUB(@submission_date, INTERVAL 3 DAY)
+      )
+    GROUP BY
+      flow_id,
+      normalized_app_name,
+      channel
+  );
+
+MERGE
+  `moz-fx-data-shared-prod.monitoring_derived.event_flow_monitoring_aggregates_v1` r
+USING
+  event_flows f
+ON
+  r.flow_id = f.flow_id
+  -- look back up to 3 days to see if a flow has seen new events and needs to be replaced
+  AND r.submission_date > DATE_SUB(@submission_date, INTERVAL 3 DAY)
+WHEN NOT MATCHED
+THEN
+  INSERT
+    (submission_date, flow_id, events, normalized_app_name, channel, flow_hash)
+  VALUES
+    (f.submission_date, f.flow_id, f.events, f.normalized_app_name, f.channel, f.flow_hash)
+  WHEN NOT MATCHED BY SOURCE
+    -- look back up to 3 days to see if a flow has seen new events and needs to be replaced
+    AND r.submission_date > DATE_SUB(@submission_date, INTERVAL 3 DAY)
+THEN
+  DELETE;
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/telemetry/releases_latest/schema.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/telemetry/releases_latest/schema.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/telemetry/releases_latest/schema.yaml	2024-01-11 19:33:53.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/telemetry/releases_latest/schema.yaml	2024-01-11 19:26:46.000000000 +0000
@@ -2,21 +2,18 @@
 - name: date
   type: DATE
   mode: NULLABLE
-  description: null
 - name: product
   type: STRING
   mode: NULLABLE
 - name: category
   type: STRING
   mode: NULLABLE
-  description: null
 - name: channel
   type: STRING
   mode: NULLABLE
 - name: build_number
   type: INTEGER
   mode: NULLABLE
-  description: null
 - name: release_date
   type: DATE
   mode: NULLABLE

Link to full diff

@scholtzan scholtzan merged commit 8dd0e09 into main Jan 11, 2024
13 of 14 checks passed
@scholtzan scholtzan deleted the event_flows_monitoring branch January 11, 2024 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants