Merge additional data during phylogenetic (instead of ingest) workflow #68
+191
−132
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of proposed changes
This PR refines our approach to merging private data—such as latitude and longitude annotations for samples collected at mosquito traps—into state-level phylogenetic trees (e.g. Washington). Previously, we merged private data during the
ingest
workflow by extending anannotations.tsv
file. This led to a convolutedphylogenetic
config file that varied depending on whether data was sourced from S3 or from theingest/results/*
merged files.In this update, private user data is incorporated directly during the
phylogenetic
workflow. We adopt a variation of theconfig.additional_inputs
method (copied and modified from these lines in avian-flu) and further advocates for a generic pattern for including additional user data as being explored in the pathogen-repo-guide#72.Config changes
Before
After
This updated configuration greatly simplifies our documentation, making it easier for other states to modify the Washington state builds and incorporate their own private data.
Related issue(s)
Checklist