This repo contains a Terraform module for scheduling a Vertex Pipeline using Google Cloud Scheduler, without the need for a Cloud Function or other "glue code".
This module is available in the Datatonic Terraform Registry.
Check out the the examples directory.
Vertex Pipeline jobs created using the Python SDK follow the format <pipeline name>-<timestamp>
. This is implemented in the SDK itself (not by the API).
Pipeline jobs created using this Terraform module instead just have the numeric ID as the job name. This is for two reasons:
- The ability to specify the job name is only available in the gRPC API, not the HTTP API (Cloud Scheduler jobs can only target the HTTP API, not the gRPC API)
- Regardless, Cloud Scheduler jobs cannot dynamically alter the HTTP payload based on a timestamp
Using the SDK, you can override the caching behaviour for Vertex Pipeline steps. This feature is not available in the HTTP API used by this module. Instead, you can specify the caching behaviour in your actual pipeline definition.
- Install pre-commit
- Install the pre-commit hooks -
pre-commit install
The README file is autogenerated using terraform-docs
. This is done when you create a pull request (or push to an existing PR).
You can customise the template (including this text for example) in .github/workflows/pr-checks.yml
.
Name | Version |
---|---|
>= 4.0.0 |
Name | Version |
---|---|
>= 4.0.0 |
No modules.
Name | Type |
---|---|
google_cloud_scheduler_job.job | resource |
google_compute_default_service_account.default | data source |
google_storage_bucket_object_content.pipeline_spec | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
cloud_scheduler_job_attempt_deadline | The deadline for Cloud Scheduler job attempts. If the request handler does not respond by this deadline then the request is cancelled and the attempt is marked as a DEADLINE_EXCEEDED failure. The failed attempt can be viewed in execution logs. Cloud Scheduler will retry the job according to the RetryConfig. The allowed duration for this deadline is between 15 seconds and 30 minutes. A duration in seconds with up to nine fractional digits, terminated by 's'. Example: "3.5s" | string |
"320s" |
no |
cloud_scheduler_job_description | A human-readable description for the Cloud Scheduler job. This string must not contain more than 500 characters. | string |
null |
no |
cloud_scheduler_job_name | The name of the Cloud Scheduler job. | string |
n/a | yes |
cloud_scheduler_region | The GCP region where the Cloud Scheduler job should be executed. | string |
n/a | yes |
cloud_scheduler_retry_count | The number of attempts that the system will make to run a Cloud Scheduler job using the exponential backoff procedure described by maxDoublings. Values greater than 5 and negative values are not allowed. | number |
1 |
no |
cloud_scheduler_sa_email | Service account email to be used for executing the Cloud Scheduler job. The service account must be within the same project as the job. | string |
null |
no |
display_name | The display name of the Pipeline. The name can be up to 128 characters long and can be consist of any UTF-8 characters. | string |
null |
no |
gcs_output_directory | Required. A path in a Cloud Storage bucket, which will be treated as the root output directory of the pipeline. It is used by the system to generate the paths of output artifacts. The artifact paths are generated with a sub-path pattern {job_id}/{taskId}/{output_key} under the specified output directory. The service account specified in this pipeline must have the storage.objects.get and storage.objects.create permissions for this bucket. | string |
n/a | yes |
kms_key_name | The Cloud KMS resource identifier of the customer managed encryption key used to protect a resource. Has the form: projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key. The key needs to be in the same region as the Vertex Pipeline execution. | string |
null |
no |
labels | The labels with user-defined metadata to organize PipelineJob. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels. | map(string) |
{} |
no |
network | The full name of the Compute Engine network to which the Pipeline Job's workload should be peered. For example, projects/12345/global/networks/myVPC. Format is of the form projects/{project}/global/networks/{network}. Where {project} is a project number, as in 12345, and {network} is a network name. Private services access must already be configured for the network. Pipeline job will apply the network configuration to the GCP resources being launched, if applied, such as Vertex AI Training or Dataflow job. If left unspecified, the workload is not peered with any network. | string |
null |
no |
parameter_values | The runtime parameters of the PipelineJob. The parameters will be passed into PipelineJob.pipeline_spec to replace the placeholders at runtime. This field is used by pipelines built using PipelineJob.pipeline_spec.schema_version 2.1.0, such as pipelines built using Kubeflow Pipelines SDK 1.9 or higher and the v2 DSL. | map(any) |
null |
no |
parameters | Deprecated. Use RuntimeConfig.parameter_values instead. The runtime parameters of the PipelineJob. The parameters will be passed into PipelineJob.pipeline_spec to replace the placeholders at runtime. This field is used by pipelines built using PipelineJob.pipeline_spec.schema_version 2.0.0 or lower, such as pipelines built using Kubeflow Pipelines SDK 1.8 or lower. | map(any) |
null |
no |
pipeline_spec_path | Path to the KFP pipeline spec file (YAML or JSON). This can be a local file, GCS path, or Artifact Registry path. | string |
n/a | yes |
project | The GCP project ID where the cloud scheduler job and Vertex Pipeline should be deployed. | string |
n/a | yes |
schedule | Describes the schedule on which the job will be executed. | string |
n/a | yes |
time_zone | Specifies the time zone to be used in interpreting schedule. The value of this field must be a time zone name from the tz database. | string |
"UTC" |
no |
vertex_region | The GCP region where the Vertex Pipeline should be executed. | string |
n/a | yes |
vertex_service_account_email | The service account that the pipeline workload runs as. If not specified, the Compute Engine default service account in the project will be used. See https://cloud.google.com/compute/docs/access/service-accounts#default_service_account. Users starting the pipeline must have the iam.serviceAccounts.actAs permission on this service account. | string |
null |
no |
Name | Description |
---|---|
id | an identifier for the Cloud Scheduler job resource with format projects/{{project}}/locations/{{region}}/jobs/{{name}} |