Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: Access to Granular Telemetry Data for Post-Hoc Analysis, Monitoring & Research Initiatives #6915

Closed
helizabetholsen opened this issue Feb 10, 2021 · 14 comments
Assignees
Labels
Chargeable This ticket needs to be charged against a specific timecode. Code in comments. Monitoring For the visibility of system status Priority: 2 - Medium Normal priority Type: Feature Add something new
Milestone

Comments

@helizabetholsen
Copy link

helizabetholsen commented Feb 10, 2021

This ticket is a request for support on accessing and writing granular telemetry data (and other product user metadata) for post-hoc analysis, monitoring & research initiatives.

Ideally we'd like data to be uploaded daily, though weekly is acceptable, to enable monitoring for trends over time at different temporal intervals.

We'd like to be able to look at these data for specific role permissions, i.e. only for supervisors or in a specific geographic area.

This work would be a component of a grant and would enable more routine monitoring and data analyses of our telemetry/product user metadata to detect data anomalies and performance differences on the platform within and across user groups.

@helizabetholsen helizabetholsen added the Type: Feature Add something new label Feb 10, 2021
@abbyad
Copy link
Contributor

abbyad commented Feb 10, 2021

Thanks for raising this @helizabetholsen! Having more granular data available for particular users and deployments would be very powerful for a wide variety of analysis, and could supercede #5977.

If we are able to manage it with permissions, we would be able to control for bandwidth concerns. @garethbowen how much data is currently collected on device per day?

@helizabetholsen
Copy link
Author

Thanks, @abbyad. Looking forward to @garethbowen's response. I think this definitely supersedes #5977 in priority, particularly for how this relates to R&L and data science work we have planned. Thanks for the swift response!

@garethbowen
Copy link
Contributor

Data collection depends very much on how much the app is used, however for bandwidth we're not concerned about how much data is collected, but only how much is transmitted. The data is aggregated so a month's worth of data is roughly the same size as a day's worth of data, so changing the period to weekly will increase bandwidth usage for telemetry by ~4x, and daily by ~30x. Each aggregated doc is on the order of 10KB uncompressed.

@abbyad
Copy link
Contributor

abbyad commented Feb 10, 2021

Thanks @garethbowen, and good to distinguish data collected vs the size of the aggregate data. @helizabetholsen, would aggregate data be useful if it is at a daily/weekly resolution? Just want to make sure that is what you are looking for, and not the actual raw data.

Also, do you have a list of data fields you are interested in? Are there some that you know of that aren't already captured?

@helizabetholsen
Copy link
Author

helizabetholsen commented Feb 18, 2021

Aggregate data would definitely add value at a daily/weekly resolution. I actually think weekly monitoring would make the most sense for many of these metrics but after discussion with @abbyad I think we should start with daily and go from there.

I'd want to ask @yrimal if he feels a need for the actual raw data but I do not at this time.

I will make a point of sharing particular data fields that we are interested in, though I'd want to make sure this list is aligned with @kennsippell and his requests too.

@helizabetholsen
Copy link
Author

Sorry for the delayed response here, team!

@mrsarm
Copy link
Contributor

mrsarm commented Apr 6, 2021

I have closed #5977 that is superseded by this, and assigned this ticket to the same boards, and with the same triage than #5977.

I started to learn how telemetry works, in the meantime, I would like to add that the scope of this ticket should be narrowed to just increase the granularity of the telemetry data, and how to setup such behavior. Any change related to the quality of the data, like add more fields should be analyzed in a different ticket, even if we plan to include those changes in the same release.

@abbyad
Copy link
Contributor

abbyad commented Apr 7, 2021

That sounds like a good plan, thanks @mrsarm!

@michaelkohn michaelkohn added the Chargeable This ticket needs to be charged against a specific timecode. Code in comments. label Apr 12, 2021
@michaelkohn
Copy link
Contributor

Please allocate any time spent on this to Project | 214 Research in Clicktime.

@mrsarm
Copy link
Contributor

mrsarm commented Apr 13, 2021

There are 3 key points we need to define:

  1. The format in the telemetry output in order to identify the granularity.
  2. How admins or app developers will be able to modify the granularity.
  3. If granularity changes in apps will be supported after initial deployment.

I made a document with different proposals and observations for all the points, I think it would be better to continue the discussion there: https://docs.google.com/document/d/1XrbwJGBNWMvZzAewerFmtV12Pcbg0-9vof6pF-QBDeg/edit?usp=sharing

CC @helizabetholsen @kennsippell @michaelkohn

@helizabetholsen
Copy link
Author

@mrsarm it looks like you have what you need from discussion with @kennsippell and I agree with the proposals in the linked document. Happy to provide additional input as needed for Research and Monitoring use cases!

@mrsarm
Copy link
Contributor

mrsarm commented May 3, 2021

@helizabetholsen , @yrimal , we made some investigations after discussing the data usage with Gareth, Kenn and Craig, and the conclusion is that it's fine in terms of data usage to go directly to daily telemetry: the increase in bandwidth is too little taking into account the compression used and the usage of the network by the app for other use cases, and the increase of size in the DB also won't have impact.

So for the new spec I need to define just how we will encode the day in which the data was collected, currently in the metadata https://docs.communityhealthtoolkit.org/apps/guides/performance/telemetry/#metadata section of the JSON sent by the app we have the following:

 "metadata": {
    "year": 2021,
    "month": 4,
    "user": "...",
    //...

To include the day of the month the data belongs to, I plan to just add a new field day with the number of the day of the month (1 - 31), eg. for data collected on April 25, 2021:

 "metadata": {
    "year": 2021,
    "month": 4,
+    "day": 25,
    "user": "...",
    ...

I just want to confirm this is the best way for you to query the data, taking into account the tools your team use.

Another possibility could be replace the year and month fields with a new field with the date in ISO format, eg. iso_date: "2021-04-25", but I don't think it would make easier the querying of the data, and could make harder if the data needs to be aggregated and compared with data aggregate monthly in the past and stored with the "legacy" format.

@mrsarm
Copy link
Contributor

mrsarm commented May 31, 2021

Ready for AT, branch 6915-telemetry-daily-freq.

Notes about testing the feature:

  • Test the changes requires up to 1 day to wait for the data being synchronized with Couch. Time in the devices used can be changed but that may also trigger other issues specially testing the feature with mobile devices and SSL certs. I recommend to test the feature with more than 1 device and user to be sure different scenarios are tested without the need to wait again for a series of tests, and do backups of the meta databases first, so the sync can be rolled back and tested again more easily.
  • Both scenarios, a device with no previous data synced and one with already data should be tested. In devices with data already collected, what to expected is described in the PR description notes under the "Backward compatibility" section.

@ngaruko ngaruko self-assigned this Jun 1, 2021
@ngaruko
Copy link
Contributor

ngaruko commented Jun 1, 2021

LGTM.
Telemetry synced in a day with date in the name of the docs and in the 'metadata
image

@mrsarm mrsarm closed this as completed Jun 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Chargeable This ticket needs to be charged against a specific timecode. Code in comments. Monitoring For the visibility of system status Priority: 2 - Medium Normal priority Type: Feature Add something new
Projects
None yet
Development

No branches or pull requests

6 participants