Skip to content

Commit

Permalink
minor fixups
Browse files Browse the repository at this point in the history
  • Loading branch information
bsweger committed Nov 19, 2024
1 parent 04c4347 commit cc26fc9
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 12 deletions.
15 changes: 8 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ files as they existing on this date (defaults to the current UTC datetime)

## Accessing sequence data

Each CladeTime object has a link to the full set of Nextstrain's SARS-Cov-2
Each `CladeTime` object has a link to the full set of Nextstrain's SARS-Cov-2
genomic sequences as they existed on the `sequence_as_of` date. This data
is in .fasta format, and most users won't need to download it directly.

Expand All @@ -57,12 +57,12 @@ https://nextstrain-data.s3.amazonaws.com/files/ncov/open/sequences.fasta.xz?vers
More interesting to most users will be the [metadata that describes each
sequence](https://docs.nextstrain.org/projects/ncov/en/latest/reference/metadata-fields.html).

The `sequence_metadata` attribute of a CladeTime object is a Polars LazyFrame
The `sequence_metadata` attribute of a `CladeTime` object is a Polars LazyFrame
that points to a copy of Nextstrain's sequence metadata.

You can apply your own filters and transformations to the LazyFrame, but
it's a good idea to start with CladeTime's built-in filter that removes
non-US and non-human sequences from the metadata.
it's a good idea to start with the built-in `filter_metadata` function that
removes non-US and non-human sequences from the metadata.

A `collect()` operation will return the filtered metadata as an in-memory
Polars DataFrame.
Expand Down Expand Up @@ -107,8 +107,9 @@ You may want to assign sequence clades using a reference tree from a past date.
This feature is helpful when creating "source of truth" data to evaluate
models that predict clade proportions:

- use the `tree_as_of` parameter when creating a `CladeTime` object
- create a `CladeTime` object using the `tree_as_of` parameter
- filter the sequence metadata to include only the sequences you want to assign
- pass the filtered metadata to the `assign_clades` method

CladeTime's `assign_clades` method returns two Polars LazyFrames:

Expand Down Expand Up @@ -172,10 +173,10 @@ shape: (5, 5)

## Reproducibility

CladeTime objects have an `ncov_metadata` property with information needed to
`CladeTime` objects have an `ncov_metadata` property with information needed to
reproduce the clade assignments in the object's sequence metadata.

In the example below, the `ncov_metadata` property shows that the
In the example below, `ncov_metadata` shows that the
[Nextclade dataset](https://docs.nextstrain.org/projects/nextclade/en/stable/user/datasets.html)
used for clade assignment on 2024-09-22 was `2024-07-17--12-57-03Z`.

Expand Down
4 changes: 3 additions & 1 deletion src/cladetime/cladetime.py
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,9 @@ def assign_clades(self, sequence_metadata: pl.LazyFrame, output_file: str | None
For each sequence in a sequence file (.fasta), assign a Nextstrain
clade using the Nextclade reference tree that corresponds to the
tree_as_of date.
tree_as_of date. The earliest available tree_as_of date is 2024-08-01,
when Nextstrain began publishing the pipeline metadata that Cladetime
uses to retrieve past reference trees.
Parameters
----------
Expand Down
6 changes: 2 additions & 4 deletions src/cladetime/sequence.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,13 +191,11 @@ def filter_metadata(
This function will filter out metadata rows with invalid state names or
date strings that cannot be cast to a Polars date format.
Example:
Example
--------
>>> from cladetime import CladeTime
>>> from cladetime.sequence import filter_covid_genome_metadata
Apply common filters to the sequence metadata of a CladeTime object:
>>>
>>> ct = CladeTime(seq_as_of="2024-10-15")
>>> ct = CladeTime(sequence_as_of="2024-10-15")
>>> filtered_metadata = filter_covid_genome_metadata(ct.sequence_metadata)
Expand Down

0 comments on commit cc26fc9

Please sign in to comment.