Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

6915 - Telemetry daily frequency #7095

Merged
merged 17 commits into from
Jun 1, 2021
Merged

6915 - Telemetry daily frequency #7095

merged 17 commits into from
Jun 1, 2021

Conversation

mrsarm
Copy link
Contributor

@mrsarm mrsarm commented May 12, 2021

Description

Change telemetry frequency from monthly to daily.

#6915

Improvements

  • Telemetry frequency changed from monthly to daily.
  • The id of a telemetry record now looks like telemetry-<year>-<month>-<day>-<username>-<uuid>. Also in the metadata section a new field day is added with the day of the month the data belongs to.
  • More test coverage in the telemetry spec.
  • Minor improvements in the script to collect meta data from a terminal (scripts/get_users_meta_docs.js):
    • Output errors in the standard error stream to see the errors in the console while piping results to a JSON file.
    • Add Unix exec permission and header to execute the script without the node prefix.

Please also review the documentation changes here.

Bug fixed

It also fix the following bugs bug found in the current implementation:

When record() is called for the first time in the period (a period was a month, now is a day), the record is added to the telemetry DB and then the aggregation is executed with the data that was stored previously, so the aggregation process aggregates all the records from the previous period + a record that is actually from the current period. The last record shouldn't be taken into account in the aggregation submitted.

This represents a minor issue even for daily reports, and it is fixed in this PR just changing the order of the execution: executing first the aggregation if a new period started, and then add the new record.

Backward compatibility

Systems that rely on telemetry data will need to adequate to the new frequency, but so far the data collected and the schema is the same, with the only addition of the new field metadata.day.

There is no special code to "migrate" existent data collected in the devices where the new version is installed: if a device was collecting telemetry data to send next month and then the new version is installed, the app will aggregate the data collected the same day the device is used, and send the aggregation when connection is available, but the record will look like a "daily" aggregation, not monthly, or more precisely "partially monthly". E.g.:

  1. A device is in sync with a CHT v3.11 deployment and has been sending monthly aggregation for a while. Last time it sent an aggregation report was May 2nd with the data collected previous month (April). So the id of that reports is something like telemetry-2021-4-greg-aaabbb1234.
  2. Now is May 12, so the device was recording telemetry data since May 2. All these records were not aggregated yet.
  3. On May 13 CHT v3.12.0 is deployed, and the same day the app is synced. The new frequency algorithm is applied so data collected from May 2 to May 13 is aggregated and sent as it was data collected from the last recorded day May 2, the id of the record will be: telemetry-2021-4-2-greg-aaabbb1234.
  4. Next day data sent will be truly daily aggregated. On May 14 the device will aggregate the data collected on May 13 and the record id will be: telemetry-2021-5-13-greg-aaabbb1234. Next day if there is data collected will be telemetry-2021-5-14-... and son on. Days that the devices is not used won't have telemetry recorded therefore no reports to send.

So once a upgrade of the CHT is rolled out, first reports will include data from many days, but should be easy to identify these reports because the month in the id (or the field metadata.month) will be from the past month, not the current month. If we want to implement a more smooth transition, like send one report for each day the telemetry data has not been aggregated, will require more changes, specially in the device webapp, and not sure if it worth the effort.

CC @helizabetholsen and @dianabarsan on this, let me know if is not clear enough.

To-do in another app / issue

medic-couch2pg has a PG view useview_telemetry with the telemetry data, and there is a field field period_start that we need to adapt to support these changes. I've created a ticket to keep track of it: medic-couch2pg#85 → released in 3.3.0

Checklist

  • Readable: Concise, well named, follows the style guide, documented if necessary.
  • Documented: Configuration and user documentation on Telemetry with daily frequency changes cht-docs#493
  • Tested: Unit and/or e2e where appropriate
  • Backwards compatible: Works with existing data and configuration or includes a migration. Any breaking changes documented in the release notes. --> discussion above.

License

The software is provided under AGPL-3.0. Contributions to this project are accepted under the same license.

@mrsarm mrsarm force-pushed the 6915-telemetry-daily-freq branch from b6a63ad to b9eca76 Compare May 12, 2021 20:22
mrsarm added 2 commits May 12, 2021 17:41
- Output errors in the standard error stream to see the errors in the console while piping right results to a JSON file
- Add Unix exec permission and header to execute the script without the `node ` prefix
@mrsarm mrsarm requested a review from dianabarsan May 13, 2021 01:46
@mrsarm mrsarm marked this pull request as ready for review May 13, 2021 01:46
@mrsarm
Copy link
Contributor Author

mrsarm commented May 13, 2021

@dianabarsan would you mind to review?

@garethbowen , could you at least check the description of the PR? specially the "Backward compatibility" section.

@garethbowen
Copy link
Contributor

I think the backwards compatibility section is fine. What will happen if couch2pg is not updated but this is rolled out?

@mrsarm
Copy link
Contributor Author

mrsarm commented May 13, 2021

I think the backwards compatibility section is fine. What will happen if couch2pg is not updated but this is rolled out?

The view has a unique index that should fail, so I guess will prevent couch2pg to insert data.

Anyway, I think we only need to update the SQL definition of that view, should be easy to do, I'll start to work on that tomorrow so we will have a PR this week. Moreover the changes should be backward compatible and work with both schemas of the metadata, so we can release couch2pg first, to have time to deploy it before CHT 3.12.

@dianabarsan
Copy link
Member

dianabarsan commented May 13, 2021

So once a upgrade of the CHT is rolled out, first reports will include data from many days, but should be easy to identify these reports because the month in the id (or the field metadata.month) will be from the past month, not the current month

How will we know which is the "current" month if we check these records years after they were created?

Copy link
Member

@dianabarsan dianabarsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I left a few suggestions inline.

webapp/src/ts/services/telemetry.service.ts Show resolved Hide resolved
webapp/src/ts/services/telemetry.service.ts Outdated Show resolved Hide resolved
webapp/src/ts/services/telemetry.service.ts Outdated Show resolved Hide resolved
webapp/src/ts/services/telemetry.service.ts Outdated Show resolved Hide resolved
webapp/src/ts/services/telemetry.service.ts Outdated Show resolved Hide resolved
webapp/tests/karma/ts/services/telemetry.service.spec.ts Outdated Show resolved Hide resolved
webapp/src/ts/services/telemetry.service.ts Outdated Show resolved Hide resolved
webapp/src/ts/services/telemetry.service.ts Outdated Show resolved Hide resolved
webapp/src/ts/services/telemetry.service.ts Outdated Show resolved Hide resolved
mrsarm and others added 4 commits May 13, 2021 15:30
Co-authored-by: Diana Barsan <[email protected]>
Co-authored-by: Diana Barsan <[email protected]>
Co-authored-by: Diana Barsan <[email protected]>
Co-authored-by: Diana Barsan <[email protected]>
@garethbowen
Copy link
Contributor

garethbowen commented May 13, 2021

Moreover the changes should be backward compatible and work with both schemas of the metadata, so we can release couch2pg first, to have time to deploy it before CHT 3.12.

For backwards compatibility we can't require an upgrade to couch2pg as part of a CHT Core upgrade unless we bump the major (ie: make 3.12.0 a 4.0.0 release). Not only would this be a headache for the internal rollout, there are self-hosting partners with couch2pg that would need to be notified.

We must make every attempt to maintain backwards compatibility. Worst case would be making this configurable defaulting to monthly, so app developers can opt in when their couch2pg supports it.

@mrsarm
Copy link
Contributor Author

mrsarm commented May 14, 2021

Moreover the changes should be backward compatible and work with both schemas of the metadata, so we can release couch2pg first, to have time to deploy it before CHT 3.12.

For backwards compatibility we can't require an upgrade to couch2pg as part of a CHT Core upgrade unless we bump the major (ie: make 3.12.0 a 4.0.0 release). Not only would this be a headache for the internal rollout, there are self-hosting partners with couch2pg that would need to be notified.

We must make every attempt to maintain backwards compatibility. Worst case would be making this configurable defaulting to monthly, so app developers can opt in when their couch2pg supports it.

@garethbowen I just tested medic-couch2pg against a CouchDB that has daily telemetry generated with the changes made in this PR and indeed it's NOT compatible with daily telemetry, it fails because the unique constraint idx_useview_telemetry_period_start_user that needs to be updated.

The definition of the constraint is as follow:

CREATE UNIQUE INDEX idx_useview_telemetry_period_start_user ON useview_telemetry(period_start,user_name);

The problem is not the constraint though, it is fine to have only one record per perdiod / user, the problem is that period is defined as the concatenation of metadata.year and metadata.month that prior to these changes was enough to make the period unique, and now we need to add the metadata.day to the concatenation in the view's column (useview_telemetry.period_start).

We can release a new version of medic-couch2pg that is compatible with both versions of the telemetry data, so if installed before the upgrade of the CHT, the transition would be smooth: records from CHT < 3.12 will have the period_start field set with year-month and once the CHT is upgraded, the PG view will compute the field as year-month-day if the record has the metadata.day field set.

Otherwise, as you said, we will need to make telemetry configurable. In that case we need to:

  1. Release a new version of medic-couch2pg anyway with the changes mentioned above, but deployments will require the upgrade of medic-couch2pg only if daily telemetry is enabled.

  2. Add a new section in the app_settings.json file as mentioned in this doc where we can configure telemetry, but maybe a simplified version (only support daily / monthly, not weekly / biweekly):

    "telemetry": {
      "granularity": "daily"
    }

    This change will require a new version of medic-conf, but will also be required to be installed only if the new configuration section is used. In the document we discussed about some drawbacks of moving from one frequency to another, we can document these potential problem in the CHT docs, or add also the field "start_from" as documented in the same doc here, though the addition of the field does not warranty the prevention of these problems so maybe we can just do the first.

@mrsarm mrsarm force-pushed the 6915-telemetry-daily-freq branch from 4e100b8 to c9328d6 Compare May 14, 2021 19:05
@mrsarm
Copy link
Contributor Author

mrsarm commented May 14, 2021

Worth to add that the concatenation in the column period_start also adds at the end the hardcoded string "-1", so if the year is 2021 and the month May, the resulting value in the columns is "2021-5-1", that in the end is casted to a DATE SQL type. So the new computation need to replace the hardcoded "-1" by the real day, but it is important to highlight that the final format of the column won't change, but its meaning.

@mrsarm
Copy link
Contributor Author

mrsarm commented May 14, 2021

I just found this bug in the last changes introduced in medic-couch2pg to support users meta data that will require a patch release regardless of the telemetry changes proposed here, forcing at least all the deployments that has the last version installed to upgrade soon or later.

CC @garethbowen

@garethbowen
Copy link
Contributor

@mrsarm It is essential that upgrading is easy and free of breaking changes so we can keep projects on recent versions of the CHT. As such, we cannot knowingly release a that will crash couch2pg. If you cannot find a solution to the backwards compatibility issue, then we'll have to delay this change to go out with CHT v4.0.0.

@mrsarm mrsarm force-pushed the 6915-telemetry-daily-freq branch from c9328d6 to 49c54d5 Compare May 18, 2021 02:55
@mrsarm
Copy link
Contributor Author

mrsarm commented May 18, 2021

@mrsarm It is essential that upgrading is easy and free of breaking changes so we can keep projects on recent versions of the CHT. As such, we cannot knowingly release a that will crash couch2pg. If you cannot find a solution to the backwards compatibility issue, then we'll have to delay this change to go out with CHT v4.0.0.

@garethbowen , summarizing what we discussed in our last call regarding this:

medic-couch2pg versions < 3.2.x don't support migration of medic-users-meta information, so these versions ARE and WILL continue being compatible with any version of CHT, including the upcoming release if we include the changes here.

medic-couch2pg >= 3.2.x (3.2.0 and 3.2.1) do support medic-users-meta information from older CHT versions and DON'T support the changes introduced here, but as reported in medic/cht-couch2pg#86 and medic/cht-couch2pg#78 , NONE of the 3.2.x versions actually work, and at least until medic/cht-couch2pg#86 is resolved (medic/cht-couch2pg#78 was resolved in 3.2.1) we don't have a version that is fully compatible with either of the medic-users-meta versions (monthly or daily).

So, if CHT <= 3.11 deployments that upgrade to CHT 3.12 (with the changes included in this PR) don't upgrade to the upcoming medic-couch2pg release and stick with medic-couch2pg < 3.2.x, they won't notice any change when upgrading to CHT 3.12, because feedback and telemetry records aren't accessible from the Postgres databases synchronized with old versions of medic-couch2pg.

If CHT deployments of any version want to access feedback and telemetry data from the PG database synchronized with medic-couch2pg, they will need to upgrade the medic-couch2pg to an upcoming version not released yet, regardless of the CHT version, and therefore regardless of telemetry being monthly or daily, because there is no medic-couch2pg version today that supports feedback and telemetry without bugs, so we will need to install the upcoming version of medic-couch2pg for that, and this upcoming version will support both daily and monthly aggregation.

The upcoming release of medic-couch2pg needs to be released first than CHT 3.12, but the PR that addresses the bugs and adds support to the daily aggregation is under review now so shouldn't be a problem: medic/cht-couch2pg#91

@mrsarm
Copy link
Contributor Author

mrsarm commented May 26, 2021

@dianabarsan , all the changes you suggested were applied, and the PR was paused because the couch2pg issue that was indirectly related but now fixed (the recreation of the telemetry view with the wrong index that now is also compatible with the changes here). Could you please check again?

Copy link
Member

@dianabarsan dianabarsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. I think some of the comments from the previous review were skipped, so I've "bumped" them.

webapp/src/ts/services/telemetry.service.ts Show resolved Hide resolved
webapp/src/ts/services/telemetry.service.ts Outdated Show resolved Hide resolved
webapp/tests/karma/ts/services/telemetry.service.spec.ts Outdated Show resolved Hide resolved
webapp/tests/karma/ts/services/telemetry.service.spec.ts Outdated Show resolved Hide resolved
webapp/tests/karma/ts/services/telemetry.service.spec.ts Outdated Show resolved Hide resolved
@mrsarm
Copy link
Contributor Author

mrsarm commented May 27, 2021

@dianabarsan could you review again ? I'm confident enough that I have addressed all the changes requested this time 😬

@mrsarm mrsarm requested a review from dianabarsan May 27, 2021 22:27
Copy link
Member

@dianabarsan dianabarsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants