Releases: googlegenomics/gcp-variant-transforms
Releases · googlegenomics/gcp-variant-transforms
Release v0.5.0
Main changes since last release:
- BigQuery to VCF (alpha release): Generate VCF files from any BigQuery table generated using Variant Transforms. This can be useful when working with tools that only operate on VCF files. The available options enable generating small files (e.g. by specifying genomic regions of interest or a subset of samples) to very large files (e.g. exporting an entire BigQuery table as a VCF file).
- Annotation type inference: By default, all annotation fields are loaded as strings. Simply add
--infer_annotation_types
and the pipeline will automatically infer field types from the annotation content.
Release v0.4.2
Patch release that mostly has usability improvements (failing on unrecognized/incompatible flags and additional logging in case of failures). Also includes a small fix for --optimize_for_large_inputs
when partitioning is not requested.
Release v0.4.1
This is a patch release that makes the following improvements:
- The validator/preprocessor tool now catches more mismatch issues between VCF headers and variant records, e.g., type and Number mismatches (Issue #258).
- Support for running VEP on GRCh37 based VCF files is added (Issue #201).
- A fix in the HTTP request retry logic of google-api-python-client is integrated (details).
Release v0.4.0
Main changes since last release:
- Native annotation support: Annotate and import VCF files to BigQuery through a single command that uses our newly published VEP v91 docker image and GRCh38 cache! Check out the documentation for more details.
- Native partitioning support: Partition the BigQuery output into any number of (configurable) smaller tables based on chromosome and/or regions. This feature can be used to reduce query cost for large tables especially in applications where particular regions are more heavily queried than others.
- Automatic BigQuery schema update on appends: Append data to existing tables even if the BigQuery schema is different (but still compatible) and the schema is automatically updated in such cases.
Release v0.3.0
Main changes since last release:
- VCF validator/preprocessor: this is a lightweight tool that can be used to validate the VCF files and check for any inconsistencies in the data prior to loading the full VCF to BigQuery pipeline. Check out the documentation for more details.
- Robustness improvements: several new features to enhance robustness of Variant Transforms when dealing with malformed/incomplete data such as setting custom headers when parsing and more accurate header inference in case of missing headers.
- Performance improvements: optimizations for merging variants and writing to BigQuery when loading very large inputs (>5TB, >30B variants).
- Annotation enhancements (experimental): added support to run VEP natively as part of the pipeline using pre-built docker image and cache files. Check out the documentation for more details.
Release v0.2.0
First release checkpoint of Variant Transforms! Main features of this release:
- Highly scalable import of VCF files to BigQuery (500K+ files, TBs of data)
- Robust import functions (see
--infer_undefined_headers
and--allow_incompatible_records
) - Annotation support (experimental). Add
--annotation_fields
when running the pipeline. - See README and documents under the
/docs
folder for more details.
P.S. We're starting from 0.2.0 instead of 0.1.0 as the initial version at launch should have been 0.1.0.