-
-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request: Access to Granular Telemetry Data for Post-Hoc Analysis, Monitoring & Research Initiatives #6915
Comments
Thanks for raising this @helizabetholsen! Having more granular data available for particular users and deployments would be very powerful for a wide variety of analysis, and could supercede #5977. If we are able to manage it with permissions, we would be able to control for bandwidth concerns. @garethbowen how much data is currently collected on device per day? |
Thanks, @abbyad. Looking forward to @garethbowen's response. I think this definitely supersedes #5977 in priority, particularly for how this relates to R&L and data science work we have planned. Thanks for the swift response! |
Data collection depends very much on how much the app is used, however for bandwidth we're not concerned about how much data is collected, but only how much is transmitted. The data is aggregated so a month's worth of data is roughly the same size as a day's worth of data, so changing the period to weekly will increase bandwidth usage for telemetry by ~4x, and daily by ~30x. Each aggregated doc is on the order of 10KB uncompressed. |
Thanks @garethbowen, and good to distinguish data collected vs the size of the aggregate data. @helizabetholsen, would aggregate data be useful if it is at a daily/weekly resolution? Just want to make sure that is what you are looking for, and not the actual raw data. Also, do you have a list of data fields you are interested in? Are there some that you know of that aren't already captured? |
Aggregate data would definitely add value at a daily/weekly resolution. I actually think weekly monitoring would make the most sense for many of these metrics but after discussion with @abbyad I think we should start with daily and go from there. I'd want to ask @yrimal if he feels a need for the actual raw data but I do not at this time. I will make a point of sharing particular data fields that we are interested in, though I'd want to make sure this list is aligned with @kennsippell and his requests too. |
Sorry for the delayed response here, team! |
I have closed #5977 that is superseded by this, and assigned this ticket to the same boards, and with the same triage than #5977. I started to learn how telemetry works, in the meantime, I would like to add that the scope of this ticket should be narrowed to just increase the granularity of the telemetry data, and how to setup such behavior. Any change related to the quality of the data, like add more fields should be analyzed in a different ticket, even if we plan to include those changes in the same release. |
That sounds like a good plan, thanks @mrsarm! |
Please allocate any time spent on this to |
There are 3 key points we need to define:
I made a document with different proposals and observations for all the points, I think it would be better to continue the discussion there: https://docs.google.com/document/d/1XrbwJGBNWMvZzAewerFmtV12Pcbg0-9vof6pF-QBDeg/edit?usp=sharing |
@mrsarm it looks like you have what you need from discussion with @kennsippell and I agree with the proposals in the linked document. Happy to provide additional input as needed for Research and Monitoring use cases! |
@helizabetholsen , @yrimal , we made some investigations after discussing the data usage with Gareth, Kenn and Craig, and the conclusion is that it's fine in terms of data usage to go directly to daily telemetry: the increase in bandwidth is too little taking into account the compression used and the usage of the network by the app for other use cases, and the increase of size in the DB also won't have impact. So for the new spec I need to define just how we will encode the day in which the data was collected, currently in the metadata https://docs.communityhealthtoolkit.org/apps/guides/performance/telemetry/#metadata section of the JSON sent by the app we have the following: "metadata": {
"year": 2021,
"month": 4,
"user": "...",
//... To include the day of the month the data belongs to, I plan to just add a new field "metadata": {
"year": 2021,
"month": 4,
+ "day": 25,
"user": "...",
... I just want to confirm this is the best way for you to query the data, taking into account the tools your team use. Another possibility could be replace the |
Ready for AT, branch
Notes about testing the feature:
|
This ticket is a request for support on accessing and writing granular telemetry data (and other product user metadata) for post-hoc analysis, monitoring & research initiatives.
Ideally we'd like data to be uploaded daily, though weekly is acceptable, to enable monitoring for trends over time at different temporal intervals.
We'd like to be able to look at these data for specific role permissions, i.e. only for supervisors or in a specific geographic area.
This work would be a component of a grant and would enable more routine monitoring and data analyses of our telemetry/product user metadata to detect data anomalies and performance differences on the platform within and across user groups.
The text was updated successfully, but these errors were encountered: