Skip to content

triglav-dataflow/triglav-agent-bigquery

Repository files navigation

Triglav::Agent::Bigquery

Triglav Agent for BigQuery

Requirements

  • Ruby >= 2.3.0

Prerequisites

  • BigQuery view is not supported

Installation

Add this line to your application's Gemfile:

gem 'triglav-agent-bigquery'

And then execute:

$ bundle

Or install it yourself as:

$ gem install triglav-agent-bigquery

CLI

Usage: triglav-agent-bigquery [options]
    -c, --config VALUE               Config file (default: config.yml)
    -s, --status VALUE               Status stroage file (default: status.yml)
    -t, --token VALUE                Triglav access token storage file (default: token.yml)
        --dotenv                     Load environment variables from .env file (default: false)
    -h, --help                       help
        --log VALUE                  Log path (default: STDOUT)
        --log-level VALUE            Log level (default: info)

Run as:

TRIGLAV_ENV=development bundle exec triglav-agent-bigquery --dotenv -c config.yml

Configuration

Prepare config.yml as example/config.yml.

You can use erb template. You may load environment variables from .env file with --dotenv option as an example/example.env file shows.

serverengine section

You can specify any serverengine options at this section

triglav section

Specify triglav api url, and a credential to authenticate.

The access token obtained is stored into a token storage file (--token option).

bigquery section

This section is the special section for triglav-agent-bigquery.

  • monitor_interval: The interval to watch tables (number, default: 60)
  • connection_info: key-value pairs of bigquery connection info where keys are resource URI pattern in regular expression, and values are connection infomation
    • auth_method: Authentication method. Must be one of service_account, authorized_user (for oauth2), compute_engine, and application_default. Default obtains from credentials.
    • credentials_file: Credentials file path such as service account json.
    • credentials: Instead of credentials_file, you may pass json contents as a string

Specification of Resource URI

Resource URI must be a form of:

https://bigquery.cloud.google.com/table/#{project}:#{dataset}.#{table}

#{table} also accepts strftime formatted suffix such as

#{table}_%Y%m%d

and strftime formatted partition decorator for a partitioned table such as

#{table}$%Y%m%d

How it behaves

  1. Authenticate with triglav
  • Store the access token into the token storage file
  • Read the token from the token storage file next time
  • Refresh the access token if it is expired
  1. Repeat followings in monitor_interval seconds:
  2. Obtain resource (table) lists of the specified prefix (keys of connection_info) from triglav.
  3. Connect to bigquery with an appropriate connection info for a resource uri, and find tables which are newer than last check.
  4. Store checking information into the status storage file for the next time check.

Development

Prepare

./prepare.sh

Edit .env or config.yml file directly.

Start

Start up triglav api on localhost.

Run triglav-anget-bigquery as:

TRIGLAV_ENV=development bundle exec triglav-agent-bigquery --dotenv --debug -c example/config.yml

The debug mode with --debug option ignores the last_modified_time value in status file.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/triglav-dataflow/triglav-agent-bigquery. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.

About

BigQuery agent for Triglav, data-driven workflow tool

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published