Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ingest #6

Merged
merged 13 commits into from
Jul 15, 2024
Merged

Add ingest #6

merged 13 commits into from
Jul 15, 2024

Conversation

j23414
Copy link
Contributor

@j23414 j23414 commented Jul 9, 2024

Description of proposed changes

Add ingest folder from pathogen-repo-guide and make lassa-specific modifications, following these steps:

  • Add ingest folder from pathogen-repo-guide
  • Add lassa-specific config parameters
  • Remove files that are not currently being used for this workflow, including files related to Entrez data-fetching and Nextclade
  • Add ingest automation workflows and update lassa-specific configs

Related issue(s)

Checklist

  • Checks pass

j23414 added 9 commits July 9, 2024 06:58
Copy the ingest directory from pathogen-repo-guide

https://github.com/nextstrain/pathogen-repo-guide/tree/e3bfb52c8155058a3d48592f4268a7382bf3e12a

The `ingest/vendored` subdirectory is not copied over since that folder should
be added with `git-subrepo`.

Future commits will change this to work with lassa data.
…/vendored

subrepo:
  subdir:   "ingest/vendored"
  merged:   "c94d78d"
upstream:
  origin:   "https://github.com/nextstrain/ingest"
  branch:   "main"
  commit:   "c94d78d"
git-subrepo:
  version:  "0.4.6"
  origin:   "https://github.com/ingydotnet/git-subrepo"
  commit:   "110b9eb"
Add a top level nextstrain-pathogen.yaml file to enable nextstrain build from subdirectories. See https://github.com/nextstrain/cli/releases/tag/8.2.0
@j23414 j23414 requested a review from a team July 9, 2024 11:47
j23414 added 2 commits July 9, 2024 07:49
Nextstrain pathogen repositories should be standardized include
`.gitattributes` to force line endings to be LF.

We've run into issues in the past with Windows users running into
workflow errors because of Windows line endings (CRLF):

nextstrain/mpox@def0a71
nextstrain/seasonal-flu@202263a

We are letting Git determine if a file is text or binary to avoid
corrupting binary files as alluded to by @jameshadfield in review ¹ and
suggested by @tsibley in a separate PR.²

¹ #37 (comment)
² nextstrain/mpox#47 (comment)
@j23414 j23414 mentioned this pull request Jul 10, 2024
4 tasks
Co-authored-by: John SJ Anderson <[email protected]>
@j23414 j23414 marked this pull request as draft July 11, 2024 16:55
@j23414 j23414 marked this pull request as ready for review July 14, 2024 09:50
@j23414 j23414 merged commit 9fee3e2 into master Jul 15, 2024
6 checks passed
@j23414 j23414 deleted the add-ingest branch July 15, 2024 13:45
Comment on lines +20 to +32
workflow_dispatch:
inputs:
image:
description: 'Specific container image to use for ingest workflow (will override the default of "nextstrain build")'
required: false
type: string
trial_name:
description: |
Trial name for outputs.
If not set, outputs will overwrite files at s3://nextstrain-data/files/workflows/lassa/
If set, outputs will be uploaded to s3://nextstrain-data/files/workflows/lassa/trials/<trial_name>/
required: false
type: string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying my comments from rabies because I heard @j23414 mention lassa automation in the priorities meeting

Noting that the this GH Action workflow will error out until rabies is added to the nextstrain/infra repo. See docs for automation.

I want to be explicitly clear here that this workflow does not have an automated schedule. Zika's automated workflow (ingest-to-phylogenetic) calls on the ingest workflow and phylogenetic workflow. Once the phylogenetic workflow is set up in this repo, they can be connected the same way.

Not asking for any changes in this PR, just wanted to make sure we're all on the same page 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding lassa to supported pathogen repos in nextstrain/infra#26

joverlee521 added a commit to nextstrain/infra that referenced this pull request Jul 18, 2024
Prompted by nextstrain/lassa#6 which includes
the addition of a GH Action workflow that uses pathogen-repo-build
workflow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants