Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FE-147 add sentry for production readiness interim dagster #326

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
f2b6065
Updated manifest.py with testing info, and updated dev & main git act…
bahill Jan 16, 2024
e30dd50
removing test data & updating build & publish yamls with better condi…
bahill Jan 18, 2024
e2c5218
ignore test data
bahill Jan 18, 2024
56080db
Merge branch 'FE-128-dedup-github-actions' of https://github.com/Data…
bahill Jan 18, 2024
acbc137
removing test data
bahill Jan 18, 2024
2d0fe27
fixing git action syntax
bahill Jan 18, 2024
d241965
renaming with underscores
bahill Jan 18, 2024
76e0324
updated syntax for ignoring branches and improved name
bahill Jan 18, 2024
f5d1f96
Updating dev action to not run tests, and main name to fix display is…
bahill Jan 18, 2024
59f94af
fixing syntax
bahill Jan 18, 2024
91e5910
Merge 59f94afe35e8117bf91e959edb82637e8c528cb4 into c5626c4464d79485f…
bahill Jan 18, 2024
4bcd722
Update requirements.txt
Jan 18, 2024
c3190dd
give name to validation job so it can be required by main as a check
bahill Jan 18, 2024
390852c
Merge branch 'FE-128-dedup-github-actions' of https://github.com/Data…
bahill Jan 18, 2024
a9f06ba
added Sentry to dependencies and Docker env
bahill Jan 19, 2024
a191fe0
adding fetch depth 0 for sonar cloud
bahill Jan 19, 2024
c422483
First attempt to integrate sentry - test 1
bahill Jan 19, 2024
29d992d
Merge branch 'main' of https://github.com/DataBiosphere/hca-ingest in…
bahill Jan 19, 2024
c4910d9
fixed left over from merge conflict resolution
bahill Jan 19, 2024
b2a918a
Adding fetch depth
bahill Jan 22, 2024
b9d66a0
importing os - maybe this is why the repo didn't load?
bahill Jan 22, 2024
b328575
linting
bahill Jan 23, 2024
baf8221
Merge branch 'main' into FE-147-add-sentry-for-production-readiness-i…
bahill Jan 24, 2024
182f2ec
Flake8 lint - needed two spaces before my func def
bahill Jan 24, 2024
80ec976
Adding Sentry to our active pipelines
bahill Jan 29, 2024
c93a2fa
linting
bahill Jan 29, 2024
3de4055
linting
bahill Jan 29, 2024
bc271be
Adding config for pylint - ignoring check for unpacking parameters
bahill Jan 29, 2024
d33785d
Merge bc271bed0d413ec6746869cf68b686b4c0c533fc into 4ed2d88480285f9eb…
bahill Jan 29, 2024
ce167b5
Update requirements.txt
Jan 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
625 changes: 625 additions & 0 deletions .github/linters/.pylintrc

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions .github/workflows/build_and_publish_dev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
# Needed by sonar to get the git history for the branch the PR will be merged into.
with:
fetch-depth: 0
- name: Fetch tag history
run: git fetch --tags
- uses: google-github-actions/[email protected]
Expand Down
9 changes: 7 additions & 2 deletions .github/workflows/build_and_publish_main.yaml
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
name: Main Validation and Release Workflow
on:
pull_request_target:
types:
- closed
types:
- closed
branches:
- main
jobs:
main-ci:
if: github.event.pull_request.merged == true
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
# Needed by sonar to get the git history for the branch the PR will be merged into.
with:
fetch-depth: 0
- name: Fetch tag history
run: git fetch --tags
- uses: olafurpg/setup-scala@v10
Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/generate-requirements-file.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ jobs:
steps:
- name: Check out repository
uses: actions/checkout@v3
# Needed by sonar to get the git history for the branch the PR will be merged into.
with:
fetch-depth: 0
- name: Set up python
id: setup-python
uses: actions/setup-python@v4
Expand Down
4 changes: 3 additions & 1 deletion .github/workflows/super_linter.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,4 +59,6 @@ jobs:
VALIDATE_PYTHON_BLACK: false
VALIDATE_PYTHON_MYPY: false
DEFAULT_BRANCH: main
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
LINTER_RULES_PATH: ../linters

3 changes: 3 additions & 0 deletions .github/workflows/trivy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
# Needed by sonar to get the git history for the branch the PR will be merged into.
with:
fetch-depth: 0
- uses: broadinstitute/dsp-appsec-trivy-action@v1
with:
context: ./orchestration
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/validate_pull_request_main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ jobs:
project_id: ${{ secrets.DEV_PROJECT_ID }}
service_account_key: ${{ secrets.GCP_TEST_KEY }}
export_default_credentials: true
fetch-depth: 0
- name: Set up Python 3.9 for dataflow tests
uses: actions/setup-python@v2
with:
Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/validate_python.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ jobs:
ENV: test
steps:
- uses: actions/checkout@v2
# Needed by sonar to get the git history for the branch the PR will be merged into.
with:
fetch-depth: 0
- uses: google-github-actions/[email protected]
with:
project_id: ${{ secrets.DEV_PROJECT_ID }}
Expand Down
3 changes: 2 additions & 1 deletion orchestration/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ ENV PYTHONFAULTHANDLER=1 \
PIP_NO_CACHE_DIR=off \
PIP_DISABLE_PIP_VERSION_CHECK=on \
PIP_DEFAULT_TIMEOUT=100 \
POETRY_VERSION=1.1.8
POETRY_VERSION=1.1.8 \
SENTRY_DSN=https://[email protected]/4506559533088768
kilonzi marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean the pipeline is using the same DSN for dev and prod? Will that intermix errors between the two environments or is there a way to distinguish between the two the sentry setup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - Sam has a setup for this whereby each env gets a DSN which I like, but seemed like overkill for a throwaway. Plus, the only time dev runs is when I am testing the pipeline and hopefully don't have to do that again?
And - I'm still unsure of the utility here as it's all just duplicating the dagit errors which are also caught in the cloud console logs explorer.
What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not thrilled with the idea of intermixing the environments but given the time constraints and desire to replatform probably not worth revisiting at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW - if it becomes an issue in this interim, I will address it and definitely will not intermix in the re-platform


RUN pip install "poetry==$POETRY_VERSION"

Expand Down
10 changes: 10 additions & 0 deletions orchestration/hca_orchestration/pipelines/cut_snapshot.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import os
import warnings

import sentry_sdk
from dagster import (
ExperimentalWarning,
HookContext,
Expand Down Expand Up @@ -36,6 +38,14 @@
warnings.filterwarnings("ignore", category=ExperimentalWarning)


SENTRY_DSN = os.getenv(
"SENTRY_DSN",
"",
)
if SENTRY_DSN:
sentry_sdk.init(dsn=SENTRY_DSN, traces_sample_rate=1.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs state that the sdk will attempt to pull from the environment automatically and if it can't find the DSN it will just not send events. Given that, what do you think about simplifying this logic to a sentry_sdk.init(traces_sample_rate=1.0) and letting the SDK sort out the env var resolution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me - I just copied this in from what Zebrafish & SCP did. - I don't have strong feelings either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point... I'm inclined to leave it alone unless I need to touch this again. I'd have to rebuild, deploy to dev, test, then PR to main, which triggers the E2E tests, and then go through review and deploy to prod, then smoke test. That functionally takes me to the end of the week.
What do you think about the compromise of making a ticket for this (it's what I did with cleaning up the git actions)? I'll pick this back up if I have to update the codebase again?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, ticket makes sense.



def cut_project_snapshot_job(hca_env: str, jade_env: str, steward: str) -> PipelineDefinition:
return cut_snapshot.to_job(
description="Given a source project ID, this pipeline will determine the corresponding "
Expand Down
36 changes: 29 additions & 7 deletions orchestration/hca_orchestration/pipelines/load_hca.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,41 @@
area, transforming via Google Cloud Dataflow jobs into a form suitable for ingestion to TDR and the final
load to TDR itself.
"""
import os
import warnings

from dagster import graph, ExperimentalWarning

from hca_orchestration.solids.load_hca.data_files.load_data_files import import_data_files
from hca_orchestration.solids.load_hca.data_files.load_data_metadata_files import file_metadata_fanout
from hca_orchestration.solids.load_hca.non_file_metadata.load_non_file_metadata import non_file_metadata_fanout
from hca_orchestration.solids.load_hca.stage_data import clear_scratch_dir, pre_process_metadata, create_scratch_dataset
from hca_orchestration.solids.load_hca.utilities import send_start_notification, validate_and_send_finish_notification
import sentry_sdk
from dagster import ExperimentalWarning, graph
from hca_orchestration.solids.load_hca.data_files.load_data_files import (
import_data_files,
)
from hca_orchestration.solids.load_hca.data_files.load_data_metadata_files import (
file_metadata_fanout,
)
from hca_orchestration.solids.load_hca.non_file_metadata.load_non_file_metadata import (
non_file_metadata_fanout,
)
from hca_orchestration.solids.load_hca.stage_data import (
clear_scratch_dir,
create_scratch_dataset,
pre_process_metadata,
)
from hca_orchestration.solids.load_hca.utilities import (
send_start_notification,
validate_and_send_finish_notification,
)

warnings.filterwarnings("ignore", category=ExperimentalWarning)


SENTRY_DSN = os.getenv(
"SENTRY_DSN",
"",
)
if SENTRY_DSN:
sentry_sdk.init(dsn=SENTRY_DSN, traces_sample_rate=1.0)


@graph
def load_hca() -> None:
staging_dataset = create_scratch_dataset(
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import os
import warnings

import sentry_sdk
from dagster import (
ExperimentalWarning,
HookContext,
Expand Down Expand Up @@ -30,6 +32,13 @@

warnings.filterwarnings("ignore", category=ExperimentalWarning)

SENTRY_DSN = os.getenv(
"SENTRY_DSN",
"",
)
if SENTRY_DSN:
sentry_sdk.init(dsn=SENTRY_DSN, traces_sample_rate=1.0)


def make_snapshot_public_job(hca_env: str, jade_env: str) -> PipelineDefinition:
return set_snapshot_public.to_job(
Expand Down
10 changes: 10 additions & 0 deletions orchestration/hca_orchestration/pipelines/validate_ingress.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
import os

import sentry_sdk
from dagster import (
HookContext,
InitResourceContext,
Expand All @@ -13,6 +16,13 @@
pre_flight_validate,
)

SENTRY_DSN = os.getenv(
"SENTRY_DSN",
"",
)
if SENTRY_DSN:
sentry_sdk.init(dsn=SENTRY_DSN, traces_sample_rate=1.0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know or can you test that if you had a file say sentry_entrypoint.py and in that if you defined the
SENTRY_DSN = os.getenv( "SENTRY_DSN", "", ) if SENTRY_DSN: sentry_sdk.init(dsn=SENTRY_DSN, traces_sample_rate=1.0)
and that if in each of the files where you want sentry to monitor then you wrote
**from sentry_entrypoint import *** if that would work too?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bahill I was trying to find a way of not repeating if SENTRY_DSN: everywhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% with you on that. I weighed it out - doing something like SCP does, but it only saved me like 4 lines of code, and I'm not changing this again, so I went with the brute force duct tape approach.

Copy link
Contributor Author

@bahill bahill Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - re: Drew's point from above - it's basically the same number of lines and actually fewer files if I were to leave out the if SENTRY_DSN bit and let Sentry just figure out its DSN from the env



def run_config_for_validation_ingress_partition(
partition: Partition,
Expand Down
43 changes: 43 additions & 0 deletions orchestration/poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions orchestration/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ protobuf = "3.20.2"
python-dateutil = "^2.8.1"
pyyaml = "^5.3"
rfc3339-validator = "^0.1.4"
sentry-sdk = "^1.39.2"
typing-extensions = "^3.7.4"
# werkzeug = "2.2.3"
# will have to update dagit which means updating broad-dagster-utils - FE-36
Expand Down
1 change: 1 addition & 0 deletions orchestration/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ rfc3339-validator==0.1.4; (python_version >= "2.7" and python_full_version < "3.
rpds-py==0.10.6; python_version >= "3.8"
rsa==4.9; python_version >= "3.6" and python_version < "4" and (python_version >= "3.9" and python_full_version < "3.0.0" and python_version < "3.10" or python_full_version >= "3.6.0" and python_version >= "3.9" and python_version < "3.10")
rx==1.6.3; python_version >= "3.9" and python_version < "3.10"
sentry-sdk==1.39.2
six==1.16.0; python_version >= "3.9" and python_full_version < "3.0.0" and python_version < "3.10" or python_version >= "3.9" and python_version < "3.10" and python_full_version >= "3.6.0"
slack-sdk==3.23.0; python_full_version >= "3.6.0"
slackclient==2.9.4; python_version >= "3.9" and python_version < "3.10" and python_full_version >= "3.6.0"
Expand Down
Loading