Skip to content

Releases: googlegenomics/gcp-variant-transforms

Release v0.5.0

18 Oct 20:32
ccb26ee
Compare
Choose a tag to compare

Main changes since last release:

  • BigQuery to VCF (alpha release): Generate VCF files from any BigQuery table generated using Variant Transforms. This can be useful when working with tools that only operate on VCF files. The available options enable generating small files (e.g. by specifying genomic regions of interest or a subset of samples) to very large files (e.g. exporting an entire BigQuery table as a VCF file).
  • Annotation type inference: By default, all annotation fields are loaded as strings. Simply add --infer_annotation_types and the pipeline will automatically infer field types from the annotation content.

Release v0.4.2

30 Jul 22:45
9c6b229
Compare
Choose a tag to compare

Patch release that mostly has usability improvements (failing on unrecognized/incompatible flags and additional logging in case of failures). Also includes a small fix for --optimize_for_large_inputs when partitioning is not requested.

Release v0.4.1

16 Jul 18:30
731031b
Compare
Choose a tag to compare

This is a patch release that makes the following improvements:

  • The validator/preprocessor tool now catches more mismatch issues between VCF headers and variant records, e.g., type and Number mismatches (Issue #258).
  • Support for running VEP on GRCh37 based VCF files is added (Issue #201).
  • A fix in the HTTP request retry logic of google-api-python-client is integrated (details).

Release v0.4.0

11 Jun 19:49
cf34b99
Compare
Choose a tag to compare

Main changes since last release:

  • Native annotation support: Annotate and import VCF files to BigQuery through a single command that uses our newly published VEP v91 docker image and GRCh38 cache! Check out the documentation for more details.
  • Native partitioning support: Partition the BigQuery output into any number of (configurable) smaller tables based on chromosome and/or regions. This feature can be used to reduce query cost for large tables especially in applications where particular regions are more heavily queried than others.
  • Automatic BigQuery schema update on appends: Append data to existing tables even if the BigQuery schema is different (but still compatible) and the schema is automatically updated in such cases.

Release v0.3.0

09 May 21:05
5512e44
Compare
Choose a tag to compare

Main changes since last release:

  • VCF validator/preprocessor: this is a lightweight tool that can be used to validate the VCF files and check for any inconsistencies in the data prior to loading the full VCF to BigQuery pipeline. Check out the documentation for more details.
  • Robustness improvements: several new features to enhance robustness of Variant Transforms when dealing with malformed/incomplete data such as setting custom headers when parsing and more accurate header inference in case of missing headers.
  • Performance improvements: optimizations for merging variants and writing to BigQuery when loading very large inputs (>5TB, >30B variants).
  • Annotation enhancements (experimental): added support to run VEP natively as part of the pipeline using pre-built docker image and cache files. Check out the documentation for more details.

Release v0.2.0

04 Apr 20:19
Compare
Choose a tag to compare

First release checkpoint of Variant Transforms! Main features of this release:

  • Highly scalable import of VCF files to BigQuery (500K+ files, TBs of data)
  • Robust import functions (see --infer_undefined_headers and --allow_incompatible_records)
  • Annotation support (experimental). Add --annotation_fields when running the pipeline.
  • See README and documents under the /docs folder for more details.

P.S. We're starting from 0.2.0 instead of 0.1.0 as the initial version at launch should have been 0.1.0.