nextstrain · j23414 · Jan 19, 2024 · Nov 13, 2023 · Nov 13, 2023 · Nov 13, 2023
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
@@ -5,5 +5,24 @@ on:
   - pull_request
 
 jobs:
-  ci:
-    uses: nextstrain/.github/.github/workflows/pathogen-repo-ci.yaml@master
+  pathogen-ci:
+    strategy:
+      matrix:
+        runtime: [docker, conda]
+    permissions:
+      id-token: write
+    uses: nextstrain/.github/.github/workflows/pathogen-repo-build.yaml@master
+    secrets: inherit
+    with:
+      runtime: ${{ matrix.runtime }}
+      run: |
+        nextstrain build \
+          phylogenetic \
+          --configfile profiles/ci/profiles_config.yaml
+      artifact-name: output-${{ matrix.runtime }}
+      artifact-paths: |
+        phylogenetic/auspice/
+        phylogenetic/results/
+        phylogenetic/benchmarks/
+        phylogenetic/logs/
+        phylogenetic/.snakemake/log/
diff --git a/.gitignore b/.gitignore
@@ -9,7 +9,9 @@ build/
 environment*
 
 # Snakemake state dir
-/.snakemake
+.snakemake/
+benchmarks/
+logs/
 
 # Local config overrides
 /config_local.yaml

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,50 @@
+# Developer guide
+
+## CI
+
+Checks are automatically run on certain pushed commits for testing and linting
+purposes. Some are defined by [.github/workflows/ci.yaml][] while others are
+configured outside of this repository.
+
+[.github/workflows/ci.yaml]: ./.github/workflows/ci.yaml
+
+## Pre-commit
+
+[pre-commit][] is used for various checks (see [configuration][]).
+
+You can either [install it yourself][] to catch issues before pushing or look
+for the [pre-commit.ci run][] after pushing.
+
+[pre-commit]: https://pre-commit.com/
+[configuration]: ./.pre-commit-config.yaml
+[install it yourself]: https://pre-commit.com/#install
+[pre-commit.ci run]: https://results.pre-commit.ci/repo/github/493877605
+
+## Snakemake formatting
+
+We use [`snakefmt`](https://github.com/snakemake/snakefmt) to ensure consistency in style across Snakemake files in this project.
+
+### Installing
+
+- Using mamba/bioconda:
+
+```bash
+mamba install -c bioconda snakefmt
+```
+
+- Using pip:
+
+```bash
+pip install snakefmt
+```
+
+### IDE-independent
+
+1. Check for styling issues with `snakefmt --check .`
+1. Automatically fix styling issues with `snakefmt .`
+
+### Using VSCode extension
+
+1. Install the [VSCode extension](https://marketplace.visualstudio.com/items?itemName=tfehlmann.snakefmt)
+1. Check for styling issues with `Ctrl+Shift+P` and select `snakefmt: Check`
+1. Automatically fix styling issues with `Ctrl+Shift+P` and select `Format document`
diff --git a/README.md b/README.md
@@ -1,88 +1,12 @@
-# nextstrain.org/zika
+# Nextstrain repository for Zika virus
 
-This is the [Nextstrain](https://nextstrain.org) build for Zika, visible at
-[nextstrain.org/zika](https://nextstrain.org/zika).
+This repository contains two workflows for the analysis of Zika virus data:
 
-The build encompasses fetching data, preparing it for analysis, doing quality
-control, performing analyses, and saving the results in a format suitable for
-visualization (with [auspice][]).  This involves running components of
-Nextstrain such as [fauna][] and [augur][].
+- [`ingest/`](./ingest) - Download data from GenBank, clean and curate it and upload it to S3
+- [`phylogenetic/`](./phylogenetic) - Make phylogenetic trees for nextstrain.org
 
-All Zika-specific steps and functionality for the Nextstrain pipeline should be
-housed in this repository.
+Each folder contains a README.md with more information.
 
-_This build requires Augur v6._
+## Documentation
 
-[![Build Status](https://github.com/nextstrain/zika/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/nextstrain/zika/actions/workflows/ci.yaml)
-
-## Usage
-
-If you're unfamiliar with Nextstrain builds, you may want to follow our
-[quickstart guide][] first and then come back here.
-
-There are two main ways to run & visualise the output from this build:
-
-The first, and easiest, way to run this pathogen build is using the [Nextstrain
-command-line tool][nextstrain-cli]:
-```
-nextstrain build . 
-nextstrain view auspice/
-```
-
-See the [nextstrain-cli README][] for how to install the `nextstrain` command.
-
-The second is to install augur & auspice using conda, following [these instructions](https://nextstrain.org/docs/getting-started/local-installation#install-augur--auspice-with-conda-recommended).
-The build may then be run via:
-```
-snakemake
-auspice --datasetDir auspice/
-```
-
-Build output goes into the directories `data/`, `results/` and `auspice/`.
-
-## Configuration
-
-Configuration takes place entirely with the `Snakefile`. This can be read top-to-bottom, each rule
-specifies its file inputs and output and also its parameters. There is little redirection and each
-rule should be able to be reasoned with on its own.
-
-
-## Input data
-
-This build starts by downloading sequences from
-https://data.nextstrain.org/files/zika/sequences.fasta.xz
-and metadata from
-https://data.nextstrain.org/files/zika/metadata.tsv.gz.
-These are publicly provisioned data by the Nextstrain team by pulling sequences
-from NCBI GenBank via ViPR and performing 
-[additional bespoke curation](https://github.com/nextstrain/fauna/blob/master/builds/ZIKA.md).
-
-Data from GenBank follows Open Data principles, such that we can make input data
-and intermediate files available for further analysis. Open Data is data that
-can be freely used, re-used and redistributed by anyone - subject only, at most,
-to the requirement to attribute and sharealike.
-
-We gratefully acknowledge the authors, originating and submitting laboratories
-of the genetic sequences and metadata for sharing their work in open databases.
-Please note that although data generators have generously shared data in an open
-fashion, that does not mean there should be free license to publish on this
-data. Data generators should be cited where possible and collaborations should
-be sought in some circumstances. Please try to avoid scooping someone else's
-work. Reach out if uncertain. Authors, paper references (where available) and
-links to GenBank entries are provided in the metadata file.
-
-A faster build process can be run working from example data by copying over
-sequences and metadata from `example_data/` to `data/` via:
-```
-mkdir -p data/
-cp -v example_data/* data/
-```
-
-[Nextstrain]: https://nextstrain.org
-[fauna]: https://github.com/nextstrain/fauna
-[augur]: https://github.com/nextstrain/augur
-[auspice]: https://github.com/nextstrain/auspice
-[snakemake cli]: https://snakemake.readthedocs.io/en/stable/executable.html#all-options
-[nextstrain-cli]: https://github.com/nextstrain/cli
-[nextstrain-cli README]: https://github.com/nextstrain/cli/blob/master/README.md
-[quickstart guide]: https://nextstrain.org/docs/getting-started/quickstart
+- [Contributor documentation](./CONTRIBUTING.md)