Scheduled Vertex Pipelines

This repo contains a Terraform module for scheduling a Vertex Pipeline using Google Cloud Scheduler, without the need for a Cloud Function or other "glue code".

This module is available in the Datatonic Terraform Registry.

Examples

Check out the the examples directory.

Limitations

Pipeline job names

Vertex Pipeline jobs created using the Python SDK follow the format <pipeline name>-<timestamp>. This is implemented in the SDK itself (not by the API). Pipeline jobs created using this Terraform module instead just have the numeric ID as the job name. This is for two reasons:

The ability to specify the job name is only available in the gRPC API, not the HTTP API (Cloud Scheduler jobs can only target the HTTP API, not the gRPC API)
Regardless, Cloud Scheduler jobs cannot dynamically alter the HTTP payload based on a timestamp

Caching

Using the SDK, you can override the caching behaviour for Vertex Pipeline steps. This feature is not available in the HTTP API used by this module. Instead, you can specify the caching behaviour in your actual pipeline definition.

Development

Local setup

Install pre-commit
Install the pre-commit hooks - pre-commit install

README

The README file is autogenerated using terraform-docs. This is done when you create a pull request (or push to an existing PR).

You can customise the template (including this text for example) in .github/workflows/pr-checks.yml.

Requirements

Name	Version
google	>= 4.0.0

Providers

Name	Version
google	>= 4.0.0

Modules

No modules.

Resources

Name	Type
google_cloud_scheduler_job.job	resource
google_compute_default_service_account.default	data source
google_storage_bucket_object_content.pipeline_spec	data source

Inputs

Name	Description	Type	Default	Required
cloud_scheduler_job_attempt_deadline	The deadline for Cloud Scheduler job attempts. If the request handler does not respond by this deadline then the request is cancelled and the attempt is marked as a DEADLINE_EXCEEDED failure. The failed attempt can be viewed in execution logs. Cloud Scheduler will retry the job according to the RetryConfig. The allowed duration for this deadline is between 15 seconds and 30 minutes. A duration in seconds with up to nine fractional digits, terminated by 's'. Example: "3.5s"	`string`	`"320s"`	no
cloud_scheduler_job_description	A human-readable description for the Cloud Scheduler job. This string must not contain more than 500 characters.	`string`	`null`	no
cloud_scheduler_job_name	The name of the Cloud Scheduler job.	`string`	n/a	yes
cloud_scheduler_region	The GCP region where the Cloud Scheduler job should be executed.	`string`	n/a	yes
cloud_scheduler_retry_count	The number of attempts that the system will make to run a Cloud Scheduler job using the exponential backoff procedure described by maxDoublings. Values greater than 5 and negative values are not allowed.	`number`	`1`	no
cloud_scheduler_sa_email	Service account email to be used for executing the Cloud Scheduler job. The service account must be within the same project as the job.	`string`	`null`	no
display_name	The display name of the Pipeline. The name can be up to 128 characters long and can be consist of any UTF-8 characters.	`string`	`null`	no
gcs_output_directory	Required. A path in a Cloud Storage bucket, which will be treated as the root output directory of the pipeline. It is used by the system to generate the paths of output artifacts. The artifact paths are generated with a sub-path pattern {job_id}/{taskId}/{output_key} under the specified output directory. The service account specified in this pipeline must have the storage.objects.get and storage.objects.create permissions for this bucket.	`string`	n/a	yes
kms_key_name	The Cloud KMS resource identifier of the customer managed encryption key used to protect a resource. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as the Vertex Pipeline execution.	`string`	`null`	no
labels	The labels with user-defined metadata to organize PipelineJob. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.	`map(string)`	`{}`	no
network	The full name of the Compute Engine network to which the Pipeline Job's workload should be peered. For example, projects/12345/global/networks/myVPC. Format is of the form projects/{project}/global/networks/{network}. Where {project} is a project number, as in 12345, and {network} is a network name. Private services access must already be configured for the network. Pipeline job will apply the network configuration to the GCP resources being launched, if applied, such as Vertex AI Training or Dataflow job. If left unspecified, the workload is not peered with any network.	`string`	`null`	no
parameter_values	The runtime parameters of the PipelineJob. The parameters will be passed into PipelineJob.pipeline_spec to replace the placeholders at runtime. This field is used by pipelines built using PipelineJob.pipeline_spec.schema_version 2.1.0, such as pipelines built using Kubeflow Pipelines SDK 1.9 or higher and the v2 DSL.	`map(any)`	`null`	no
parameters	Deprecated. Use RuntimeConfig.parameter_values instead. The runtime parameters of the PipelineJob. The parameters will be passed into PipelineJob.pipeline_spec to replace the placeholders at runtime. This field is used by pipelines built using PipelineJob.pipeline_spec.schema_version 2.0.0 or lower, such as pipelines built using Kubeflow Pipelines SDK 1.8 or lower.	`map(any)`	`null`	no
pipeline_spec_path	Path to the KFP pipeline spec file (YAML or JSON). This can be a local file, GCS path, or Artifact Registry path.	`string`	n/a	yes
project	The GCP project ID where the cloud scheduler job and Vertex Pipeline should be deployed.	`string`	n/a	yes
schedule	Describes the schedule on which the job will be executed.	`string`	n/a	yes
time_zone	Specifies the time zone to be used in interpreting schedule. The value of this field must be a time zone name from the tz database.	`string`	`"UTC"`	no
vertex_region	The GCP region where the Vertex Pipeline should be executed.	`string`	n/a	yes
vertex_service_account_email	The service account that the pipeline workload runs as. If not specified, the Compute Engine default service account in the project will be used. See https://cloud.google.com/compute/docs/access/service-accounts#default_service_account. Users starting the pipeline must have the iam.serviceAccounts.actAs permission on this service account.	`string`	`null`	no

Outputs

Name	Description
id	an identifier for the Cloud Scheduler job resource with format projects/{{project}}/locations/{{region}}/jobs/{{name}}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Scheduled Vertex Pipelines

Examples

Limitations

Pipeline job names

Caching

Development

Local setup

README

Requirements

Providers

Modules

Resources

Inputs

Outputs

Files

README.md

Latest commit

History

README.md

File metadata and controls

Scheduled Vertex Pipelines

Examples

Limitations

Pipeline job names

Caching

Development

Local setup

README

Requirements

Providers

Modules

Resources

Inputs

Outputs