- BigQuery query job labeling for collect and write operations. Labels are passed via
job_labels
dict argument inDatasetConfiguration
andDatasetManager
.
- Switched from Google Container Registry to Artifact Registry. Made
-r
/--docker-repository
common for all deploy commands. Build and deploy commands authenticate to the Docker repository taken fromdeployment_config.py
or CLI arguments, instead of hardcodedhttps://eu.gcr.io
.
- Bumped basic dependencies: Apache Beam 2.48.0, google-cloud-bigtable 2.17.0, google-cloud-language 2.10.0, google-cloud-storage 2.11.2, among others (#374).
- Added the
env_variable
argument tobigflow.Workflow
which enables to change a name of the variable used to obtain environment name (#365).
- Fixed compatibility issues with Cloud Composer 2.x and Airflow 2.x
- Cloud Composer 2.0.x is not supported properly, please use either 1.x or 2.1+
- Bigflow CLI commands won't fail on additional unknown parameters. This allows to pass additional parameters to BigFlow Jobs.
- Bumped dependencies of main libraries (e.g. Apache Beam to version 2.45 or BigQuery to version 3.6.0). It enabled compatibility with MacBooks with M1 processor.
- Requires Python version = 3.8
- Enabled vault endpoint TLS certificate verification by default for
bf build
andbf deploy
commands. This fixes the MITM attack vulnerability. Kudos to Konstantin Weddige for reporting.
- Default vault endpoint TLS certificate verification for
bf build
andbf deploy
may fail in some environments. Use-vev
/--vault-endpoint-verify
option to disable or provide path to custom trusted certificates or CA certificates. Disabling makes execution vulnerable for MITM attacks and is discouraged - do it only when justified and in trusted environments. See https://requests.readthedocs.io/en/latest/user/advanced/#ssl-cert-verification for details.
- Added two more parameters in KubernetesPodOperator required since Composer 2.1.0
- MarkupSafe bumped to >2.1.0 (avoiding the broken 2.1.0 version)
- Jinja version bumped to >=3<4
- Fixing the DAG builder issue introduced in 1.5.1 – now it produces DAGs compatible with (airflow 1.x + composer 1.x) or (airflow 2.x + composer 2.x)
Broken! – DAG builder produces DAGs incompatible with (airflow 1.x + composer 1.x). Fixed in 1.5.2.
- Composer 2.0 support – using
composer-user-workloads
namespace inside generated DAGs if running on Composer 2.X, to fix the problem with inheriting the Composer SA
- Setting grpcio-status as <=1.48.2 to avoid problems with pip-compile on protobuf package
- changing docker image caching implementation – using BUILDKIT_INLINE_CACHE=1 only if cache properties are set
- always installing typing-extensions>=3.7 to avoid clashes
- Deprecated
log
anddataproc
extras
- The
base_frozen
extras with frozen base requirements - More type hints
- Making exporting image to tar as optional
bf build
arguments validation- fixed broken MarkupSafe package version
- Optional support for 'pytest' as testing framework
- Labels support for datasets and tables in DatasetConfig and DatasetManager
- Check if docker image was pushed before deploying airflow dags
- Propagate 'env' to bigflow jobs
- Automatically create and push
:latest
docker tag
- Changes in building toolchain - optimizes, better logging, optional setup.py
- Tool for synching requirements with Dataflow preinstalled python dependencies.
- Schema for 'dirty' (not on git tag or there are local changes) versions was changed. Now it includes git commit and git workdir hash instead of random suffix.
- Don't delete intermediate docker layers after build.
- Dockerfile template was changed (does not affect existing projects).
- Integration with
pip-tools
- Configurable timeout for jobs
Job.execution_timeout
- Flag
Job.depends_on_past
to ignore errors from previous job - Tools/base classes to simplify writing e2e tests for BigQuery
pyproject.toml
to new projects
- Project configuration moved to
setup.py
- The same
setup.py
is used by Beam to build tarballs - Deprecate some functions at
bigflow.resources
andbigflow.commons
- Bigflow uses version ranges for dependencies
- Dataflow jobs starts much faster
- Airflow task waits until Dataflow job is finished
- Fixed
Job.retry_count
andJob.retry_pause_sec
- Initial release