-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DENG-6880 - Daily aggregations for NewTab #6863
Conversation
@gkatre can we combine |
This comment has been minimized.
This comment has been minimized.
...moz-fx-data-shared-prod/telemetry_derived/newtab_conditional_daily_aggregates_v1/schema.yaml
Outdated
Show resolved
Hide resolved
…al_daily_aggregates_v1/schema.yaml Co-authored-by: Ben Wu <[email protected]>
This comment has been minimized.
This comment has been minimized.
@alekhyamoz the reason i did not combine
The below listed fields used in the granularities are expected to be mostly mutually exclusive so as to not double count the distinct clients and visits. Example: A client who has
cc: @bani : Would you also be able to validate these conditional computations in the |
@gkatre in that case can we add the columns from |
We can go the other way. We can rollup and add the columns from
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Thank you for working on this G. I will also let @bani review this.
) AS pocket_signed_in_client_count, | ||
COUNT( | ||
DISTINCT IF(channel = 'release' AND pocket_is_signed_in, newtab_visit_id, NULL) | ||
) AS pocket_signed_in_visit_count, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need one for counting just homepage category enabled/disabled independent of source?
Maybe I missed it but I couldn't find.
For this widget: https://mozilla.cloud.looker.com/explore/firefox_desktop/newtab_interactions?qid=qOIDEWK6Oo10MxPDZcg7rn&origin_space=464&toggle=vis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also for this widget with newtab_category: https://mozilla.cloud.looker.com/explore/firefox_desktop/newtab_interactions?qid=L3HcgEzCJWLQBVDaYeVVbg&origin_space=464&toggle=vis
I think due to the count distinct we can't just SUM home vs newtab in the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have it here:
Would this work? Or perhaps I could add newtab_homepage_category
and/or newtab_newtab_category
to the granularity (GROUP BY) if you feel its ok (that is: we will not end up double counting clients and visits)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point was that all aggregations of that field have either newtab_open_source = 'about:home'
or newtab_open_source = 'about:newtab'
but the query on the explore doesn't include anything about newtab_open_source and I'm not sure how that affects the numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Just to be safe, instead of adding newtab_homepage_category
and newtab_newtab_category
as a dimension in the GROUP BY, I will add a set of 2 distinct count metrics:
- one set for
newtab_homepage_category
enabled - one set for
newtab_newtab_category
enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bani added the aggregates for newtab_homepage_category enabled and newtab_newtab_category enabled
@alekhyamoz if there are any views to be updated, we can always do those anytime later. We dont need to hold up on the deployment/backfills for the retention project. As long as we are ok with the way these 3 tables (newtab_clients_daily_aggregates_v1, newtab_daily_interactions_aggregates_v1, newtab_conditional_daily_aggregates_v1) are created, we could perhaps proceed? |
topsites_rows, | ||
is_new_profile, | ||
activity_segment, | ||
COUNT(DISTINCT client_id) AS client_count, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This data will be an issue since it'll be quite common to have a client doing interactions across different dimensions in a day. But it doesn't seem feasible to create tables for each so we may need to just double check is stakeholders if there are any specific ones that really require no double counting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. If you feel by having it is confusing, I can remove these count(distinct --) fields from that table:
- client_count
- legacy_telemetry_client_count
- newtab_interactions_visits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to keep it as that's the closes we have to what is currently in the dashboard. It'll just be overcounting in certain cases once we SUM.
submission_date, | ||
search_engine, | ||
search_access_point, | ||
pocket_story_position, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one dimension I'm particularly worried about inflating all distinct client counts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that table is too granular. Therefore the distinct counts may not go well in that table because its not possible to rollup those values. I can remove the 3 count distinct fields, if you feel thats the right thing to do.
We have the conditional aggregate table for such distinct counts where needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, good to know that we covered all use cases.
Do we have something for |
The |
It's also worth highlighting to stakeholders that they have a lot of "COUNT DISTINCT" across time, and this is not something we'll be able to replicate for different timeframes in the past. |
It seems the dashboard is using some columns from But it's unclear if there is an use case for historical data there due to the same issue of unclear timeframes for aggregation. EDIT: this one actually look at the % over time, is this data point covered? https://mozilla.cloud.looker.com/explore/firefox_desktop/newtab_visits?qid=dh3tzf1nHzKbGVOATeQKrM&origin_space=464&toggle=vis |
@bani I agree completely! I also feel doing that I am also not sure what the stakeholders expectations are on historic data and if requirements are going to be relatively static to justify the need to build materialized aggregation tables. |
@bani You are correct that |
@alekhyamoz / @bani Is there anything you feel is a blocker to proceed with deploying and start backfilling these tables? |
This comment has been minimized.
This comment has been minimized.
Integration report for "Merge branch 'main' into newtab/DENG-6880-newtab_update_aggregations"
|
Is this referring to the weekly rollover? If so, I’ve already mentioned to Carolyn that we won’t be able to include metrics that rely on weekly rolling calculations; however, we will still be able to collect daily metrics. |
Description
The daily aggregations to address the retention requirements of the Desktop New Tab Dashboard
The following daily aggregation tables are created/updates:
Related Tickets & Documents
Reviewer, please follow this checklist
┆Issue is synchronized with this Jira Task