Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update convert-form action to include a hash which maps report back to the source form #395

Closed
kennsippell opened this issue Apr 16, 2021 · 15 comments
Assignees
Labels
Type: Feature Add something new

Comments

@kennsippell
Copy link
Member

kennsippell commented Apr 16, 2021

Is your feature request related to a problem? Please describe.
Over the life of a CHT application, many different versions of a form will run in production. The resulting data in the reports are unschema and so it is difficult to map the report to the version of the form that was filled. Typically, people write SQL heuristics to determine which version they are analysing - but this isn't always possible (for example, changing the wording of a question but nothing else).

Describe the solution you'd like
When we convert-forms via medic-conf, could we hash the xlsx file.
Automatically inject that hash into the form xml so that the completed report will contain fields.formHash (or alike).
If the hashing algorithm we use is the same as GitHub, this would let users GitHub hash search for the exact version of the form that caused the report.

Describe alternatives you've considered
The form version is in the telemetry documents. Queries are complex and you lose some fidelity when relying on the aggregated info in those docs.

@kennsippell kennsippell added the Type: Feature Add something new label Apr 16, 2021
@kennsippell
Copy link
Member Author

kennsippell commented Apr 16, 2021

Not sure whether to add this as a comment on medic/cht-core#5977 or if it is its own feature request (?) It seems fairly easy, but the value would be very high in SQL-land.

@garethbowen
Copy link
Contributor

This is a great addition.

I don't think the hash should be in the fields object, because it's metadata, so it would be better at the top level of the report doc. Does that work for you?

Secondly I think it would make life easier to use a sequential version rather than a random looking hash. That would mean queries could be written to compare with < and > rather than having a long in statement. Assuming the version/hash must be stable between dev, test, and prod this would have to be generated by medic-conf at some point, for example when a change in the form is detected it increments the version number sets a timestamp?

@kennsippell
Copy link
Member Author

kennsippell commented Apr 20, 2021

I don't think the hash should be in the fields object, because it's metadata
would make life easier to use a sequential version rather than a random looking hash

Both seem excellent

I think the "GitHub hash search" is pretty nice for investigations. Not sure if we can get both somehow.

@derickl
Copy link
Member

derickl commented Apr 22, 2021

Would this be somewhat addressed by medic/cht-core#7310?

@aodhan-domhnaill
Copy link
Contributor

aodhan-domhnaill commented Aug 10, 2021

Could we do a hash and the Unix time of the file (Unix modification time would work)?

I like the sequence approach mentioned above, but a few concerns

  • the sequence number doesn't have meaning outside it's context, where timestamp would
  • we need to "trust" that the sequence code is working, while hash is more straight forward
  • searching on GitHub is important. So either commit or hash would work for that.

@garethbowen
Copy link
Contributor

@aidan-plenert-macdonald The changes needed for this are...

  1. The cht-conf upload-forms command somewhere around this line. This is where you want to write the version/timestamp so the doc that contains the xform also has the version when it's uploaded into CouchDB.
  2. The cht-core enketo service somewhere around here. This is where it will copy the version from the form document into the new report document and saved to pouch, replicated to couch, and finally synced to postgres to be used!

The code changes are quite minor but it's a complicated are, so once you've had a chance to read through feel free to reach out and I can help on how to make the actual change.

@kennsippell
Copy link
Member Author

kennsippell commented Sep 15, 2021

An alternative might be to append a note or calculate field into the xForm xml via a pyxform change or in the convert-app-forms action. You would end up with fields.formHash or whatever. It could be one step, and wouldn't require the core upgrade or compat complexity with medic-conf.

The _rev is a couchdb value which changes each time the doc changes. It is in the format of x-guid where x is an incremental numeric. Could the _rev value on the form doc be useful for the sequence hash?

@garethbowen
Copy link
Contributor

An alternative might be to append a note or calculate field into the xForm xml via a pyxform change or in the convert-app-forms action.

Yes that would mean this could be rolled out quicker but it's a less correct place for the version to be stored, and we'll be stuck with it forever more. Depending on the field name it also has a slight risk of colliding with a user defined field.

It's safer and more correct to store this at the root of the doc so I think we stick with that, and hopefully it's another great reason for people to upgrade cht-core!

Could the _rev value on the form doc be useful for the sequence hash?

The _rev isn't reliably sequential so it's not ideal unfortunately. I think if we just use the file last modified date it satisfies the sequential and (almost) guaranteed unique criteria, with the possible downside of ending up with spurious versions where nothing actually changed. This shouldn't be an issue so long as if clauses use less than/greater than and not equality.

Another way would be to require a user to enter a specific version, or have cht-conf attempt to detect when something has changed materially, but I prefer the timestamp as it's foolproof.

@aodhan-domhnaill
Copy link
Contributor

Draft PR's

#431

I'm not sure Git will preserve the file modified time correctly. What timestamp would you recommend?

Also, please check this medic/cht-core#7319. I think this is the right spot.

@mrjones-plip
Copy link
Contributor

Love this ticket/idea!!

Should we be concerned about forms getting uploaded via the admin GUI (/admin/#/forms) in the CHT?

@garethbowen
Copy link
Contributor

Should we be concerned about forms getting uploaded via the admin GUI (/admin/#/forms) in the CHT?

Yes I think so, just to preserve backwards compatibility. We probably just need to store the date the form was uploaded as the version.

I'm not sure Git will preserve the file modified time correctly. What timestamp would you recommend?

Comment added here: https://github.com/medic/cht-conf/pull/431/files#r711800800

@aodhan-domhnaill
Copy link
Contributor

I think this requires me to upgrade medic-conf to cht-conf. I started that here medic/cht-core#7340 because I'm having trouble otherwise.

@ngaruko
Copy link
Contributor

ngaruko commented Feb 3, 2022

Tested with cht-conf v3.10.0

  1. Check the version of cht-conf already installed (should be v3.9.x)

  2. Install cht-core if not already installed

  3. Upload app forms if not already done (with cht upload-app-forms)

  4. Check any form in CouchDb e.g : https://localhost/_utils/#database/medic/form:delivery and take note of the metadata:
    image

  5. Update cht-conf (npm install -g cht-conf) - Now the version should be 3.10.0

  6. Modify the form xml and save
    image

  7. Run cht upload-app-forms and check that the modified form was uploaded

  8. Back to couchdb, the same form now has
    "xmlVersion": { "time": 1643845530857, "sha256": "b0569dfc40d3969d6127cc146bc94701448f701e4ea52ccfcb9e4218c88181f4" },
    image

  9. Modify the form again, upload and check that the time and sha have changed
    image

@aodhan-domhnaill
Copy link
Contributor

#431

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Add something new
Projects
None yet
Development

No branches or pull requests

6 participants