Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy new gardener and k8s etl parser to prod #305

Open
gfr10598 opened this issue Aug 5, 2020 · 6 comments
Open

Deploy new gardener and k8s etl parser to prod #305

gfr10598 opened this issue Aug 5, 2020 · 6 comments
Assignees

Comments

@gfr10598
Copy link
Contributor

gfr10598 commented Aug 5, 2020

Looks like prod mostly runs in us-central instead of east region. So the new k8s cluster should probably be there too.

There is some documentation in the README.md file from January.

Steps:

  1. Create data-processing cluster, with appropriate networking options
  2. Create node-pools for etl and gardener.
  3. Add cloud builder rule for etl prod- tags.
@autolabel autolabel bot added the review/triage Team should review and assign priority label Aug 5, 2020
@gfr10598
Copy link
Contributor Author

gfr10598 commented Aug 5, 2020

Based on info in README.md, added create-cluster.sh in new branch, which has all the gcloud commands to set up the network, subnet, firewall rules, cluster, and node-pools.

@gfr10598
Copy link
Contributor Author

gfr10598 commented Aug 6, 2020

Manually added cloud build trigger. Note that gcloud beta builds now supports creating triggers, too.

gcloud beta builds triggers create github
--repo-name=[REPO_NAME]
--repo-owner=[REPO_OWNER]
--branch-pattern=".*"
--build-config=[BUILD_CONFIG_FILE] \

@gfr10598
Copy link
Contributor Author

gfr10598 commented Aug 6, 2020

bq --project=mlab-oti mk tmp_ndt
bq --project=mlab-oti mk raw_ndt

Need to add the table creation and schema updates to etl-schema.

@gfr10598
Copy link
Contributor Author

gfr10598 commented Aug 7, 2020

CREATE OR REPLACE TABLE mlab-oti.raw_ndt.ndt7
PARTITION BY date CLUSTER BY metro
AS
SELECT date, REGEXP_EXTRACT(parser.ArchiveURL , ".-mlab[1-4]-([a-z]{3})[0-9]{2}.") AS metro, id, * EXCEPT(date,id)
FROM mlab-sandbox.tmp_ndt.ndt7
WHERE date > CURRENT_DATE()

@gfr10598
Copy link
Contributor Author

gfr10598 commented Aug 7, 2020

CREATE OR REPLACE TABLE mlab-oti.raw_ndt.annotation
PARTITION BY date CLUSTER BY metro
AS
SELECT date, REGEXP_EXTRACT(parser.ArchiveURL , ".-mlab[1-4]-([a-z]{3})[0-9]{2}.") AS metro, id, * EXCEPT(date,id)
FROM mlab-sandbox.tmp_ndt.annotation
WHERE date > CURRENT_DATE()

@gfr10598
Copy link
Contributor Author

gfr10598 commented Aug 7, 2020

NOTE: bigquery does not store data in us-central. This may mean that we will get network egress charges for the BQ loads?

Probably should specify the BQ dataset data_location=US to make it multi-regional. See https://cloud.google.com/bigquery/docs/locations#multi-regional-locations

The documentation is not crystal clear, so we should probably just look for these charges in billing.

@laiyi-ohlsen laiyi-ohlsen removed the review/triage Team should review and assign priority label Sep 28, 2020
@gfr10598 gfr10598 self-assigned this Oct 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants