Skip to content

Data removal

Thomas Sibley edited this page Jul 9, 2020 · 1 revision

Prerequisites

Before you get started, you'll need the following:

  • a personal admin account for the production ID3C database
  • a personal admin account for the testing ID3C database or access to the postgres user

ID3C

Removing an individual and all linked data

Follow these steps to delete all data related to a single individual.

  1. Login to the production ID3C database with admin account.

  2. Find individual connected to provided sample or collection barcode.

  3. Set psql variable individual to the individual.identifier

  4. Check delete-individual.sql to ensure that it is up-to-date with ID3C receiving and warehouse schema.

    • Notice this does not delete records from receiving.presence_absence. The only identifier those results contain is the sample barcodes for this individual, and these are embedded alongside results for unrelated samples. In order to prevent re-processing the specific sample results later, the script removes the sample and collection identifiers from our warehouse.identifier. The presence/absence ETL will ignore/skip such results.
  5. Run delete-individual.sql via the \include psql meta-command.

  6. Verify the appropriate records have been deleted.

    • If something doesn't look right, run rollback; to un-do all deletions.
  7. Run commit; to commit the transaction and make all changes permanent.

  8. Repeat steps 2-7 on the testing ID3C database or refresh the testing database.

Removing a sample and all linked data

Follow these steps to delete all data related to a single sample that is not associated with any encounter data.

  1. Login to the production ID3C database with admin account.

  2. Find sample.identifier for given sample or collection barcode.

  3. Set psql variable sample to the sample.identifier

  4. Check delete-sample.sql to ensure that it is up-to-date with ID3C receiving and warehouse schema.

    • Notice this does not delete records from receiving.presence_absence or receiving.fhir. The only identifier those results contain is the sample barcodes for this individual, and these are embedded alongside results for unrelated samples. In order to prevent re-processing the specific sample results later, the script removes the sample and collection identifiers from our warehouse.identifier. The presence/absence ETL and FHIR ETL will ignore/skip such results.
  5. Run delete-sample.sql via the \include psql meta-command.

  6. Verify the appropriate records have been deleted.

    • If something doesn't look right, run rollback; to un-do all deletions.
  7. Run commit; to commit the transaction and make all changes permanent.

  8. Repeat steps 2-7 on the testing ID3C database or refresh the testing database.

Other data stores

  • Notify all devs to remove local copies of database.
  • AWS backups of the database naturally expire within a month.
  • Notify/check-in with upstream data sources (SCH, UW, BBI, NWGC)
    • The upstream specimen manifest does not need to be changed because the sample and collection identifiers will be removed from our warehouse.identifier. The manifest ETL will ignore/skip such results.
Clone this wiki locally