Skip to content

Commit

Permalink
Merge dev to stage (#76)
Browse files Browse the repository at this point in the history
* [pre-commit.ci] pre-commit autoupdate (#48)

updates:
- [github.com/PyCQA/flake8: 7.1.0 → 7.1.1](PyCQA/flake8@7.1.0...7.1.1)
- [github.com/awslabs/cfn-python-lint: v1.9.0 → v1.15.0](aws-cloudformation/cfn-lint@v1.9.0...v1.15.0)
- [github.com/psf/black: 24.4.2 → 24.8.0](psf/black@24.4.2...24.8.0)
- [github.com/sirosen/check-jsonschema: 0.29.1 → 0.29.2](python-jsonschema/check-jsonschema@0.29.1...0.29.2)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [IT-3918] Fix the image URLs returned by the image service (#51)

* update the config of the image service

* use `{fully_qualified_domain_name}`

* use an f-string

* [pre-commit.ci] pre-commit autoupdate (#52)

updates:
- [github.com/awslabs/cfn-python-lint: v1.15.0 → v1.15.2](aws-cloudformation/cfn-lint@v1.15.0...v1.15.2)
- [github.com/sirosen/check-jsonschema: 0.29.2 → 0.29.3](python-jsonschema/check-jsonschema@0.29.2...0.29.3)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [pre-commit.ci] pre-commit autoupdate (#53)

updates:
- [github.com/pre-commit/pre-commit-hooks: v4.6.0 → v5.0.0](pre-commit/pre-commit-hooks@v4.6.0...v5.0.0)
- [github.com/awslabs/cfn-python-lint: v1.15.2 → v1.16.0](aws-cloudformation/cfn-lint@v1.15.2...v1.16.0)
- [github.com/psf/black: 24.8.0 → 24.10.0](psf/black@24.8.0...24.10.0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Parametrize stack version and update app config (#54)

* parametrize stack version

* update data update date

* set Google tag manager ID

* rename `stack_version` to `image_version`

* Increase GH workflow timeout (#55)

A change[1] was made to update all containers at the same time
which takes longer to deploy so we need to increase the deployment
timeout.

[1] #54

* [pre-commit.ci] pre-commit autoupdate (#56)

updates:
- [github.com/awslabs/cfn-python-lint: v1.16.0 → v1.16.1](aws-cloudformation/cfn-lint@v1.16.0...v1.16.1)
- [github.com/sirosen/check-jsonschema: 0.29.3 → 0.29.4](python-jsonschema/check-jsonschema@0.29.3...0.29.4)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [pre-commit.ci] pre-commit autoupdate (#59)

updates:
- [github.com/awslabs/cfn-python-lint: v1.16.1 → v1.18.1](aws-cloudformation/cfn-lint@v1.16.1...v1.18.1)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [pre-commit.ci] pre-commit autoupdate (#60)

updates:
- [github.com/awslabs/cfn-python-lint: v1.18.1 → v1.18.2](aws-cloudformation/cfn-lint@v1.18.1...v1.18.2)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Set concurrency to 5 (best results) (#61)

* [pre-commit.ci] pre-commit autoupdate (#62)

updates:
- [github.com/awslabs/cfn-python-lint: v1.18.2 → v1.18.4](aws-cloudformation/cfn-lint@v1.18.2...v1.18.4)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update image tag, app version and data release date (#63)

* [pre-commit.ci] pre-commit autoupdate (#66)

updates:
- [github.com/awslabs/cfn-python-lint: v1.18.4 → v1.19.0](aws-cloudformation/cfn-lint@v1.18.4...v1.19.0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Refactor mounting volumes (#67)

The current implementation to mount volumes was very specific to one
container.  We are replacing it with an implementation that is much
more generic to make it easy to mount volumes in other containers.

* [IT-4003] Auto-update pre-commit hook versions monthly

Change the frequency that PRs to update pre-commit hook versions are
auto-generated from weekly (the default) to monthly.

* Update to OC v1.1.1 (#69)

* Update to v1.1.1

* update data updated on

* [pre-commit.ci] pre-commit autoupdate (#70)

updates:
- [github.com/awslabs/cfn-python-lint: v1.19.0 → v1.20.1](aws-cloudformation/cfn-lint@v1.19.0...v1.20.1)
- [github.com/sirosen/check-jsonschema: 0.29.4 → 0.30.0](python-jsonschema/check-jsonschema@0.29.4...0.30.0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [IT-3951] Fix guardduty container (#71)

We enable guardduty security monitoring for ECS in every account.
For that to work we need to give Fragate tasks access to do ECS stuff
with the service-role/AmazonECSTaskExecutionRolePolicy[1].

[1] https://docs.aws.amazon.com/guardduty/latest/ug/prereq-runtime-monitoring-ecs-support.html#before-enable-runtime-monitoring-ecs

* remove source.bat (#74)

* Add Docker in Docker to the dev container (#73)

* Add Docker in Docker to the devcontainer

* add docs about docker

* forward local environment variables to the devcontainer

* remove containerEnv

* Add AWS Lambda for upcoming data integration (ARCH-356) (#72)

* update docs on setup tools

* define lambda role and function

* update path to Dockerfile

* update README

* trigger the lambda every 5 minutes

* use plural form of the unit

* Remove lambda fct architecture

* Migrate data integration code to L2 constructs

* Add @DataClass to DataIntegrationProps

* Add docstrings

* Replace `_lambda` by `lambda_`

* Add docstrings

* Add docstrings

* Externalize the description of the schedule (#75)

* define lambda role and function

* update path to Dockerfile

* Externalize the description of the schedule

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Khai Do <[email protected]>
Co-authored-by: Joni Harker <[email protected]>
Co-authored-by: Joni Harker <[email protected]>
  • Loading branch information
5 people authored Dec 10, 2024
1 parent a29c10f commit be639e2
Show file tree
Hide file tree
Showing 13 changed files with 251 additions and 32 deletions.
6 changes: 5 additions & 1 deletion .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,11 @@
"ghcr.io/devcontainers/features/python:1.6.3": {
"version": "3.12.0"
},
"ghcr.io/devcontainers/features/aws-cli:1": {}
"ghcr.io/devcontainers/features/aws-cli:1": {},
"ghcr.io/devcontainers/features/docker-in-docker:2.12.0": {
"version": "27.0.3",
"moby": false
}
},
"postCreateCommand": "./tools/setup.sh",
"shutdownAction": "stopContainer"
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/aws-deploy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ jobs:
role-session-name: ${{ inputs.role-session-name }}
role-duration-seconds: ${{ inputs.role-duration-seconds }}
- name: CDK deploy
run: cdk deploy --all --require-approval never
run: cdk deploy --all --concurrency 5 --require-approval never
env:
ENV: ${{ inputs.environment }}
SECRETS: ${{ inputs.secrets-location }}
7 changes: 5 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
ci:
autoupdate_schedule: monthly

default_language_version:
python: python3

Expand All @@ -17,7 +20,7 @@ repos:
hooks:
- id: yamllint
- repo: https://github.com/awslabs/cfn-python-lint
rev: v1.16.0
rev: v1.20.1
hooks:
- id: cfn-python-lint
args:
Expand All @@ -36,7 +39,7 @@ repos:
hooks:
- id: black
- repo: https://github.com/sirosen/check-jsonschema
rev: 0.29.3
rev: 0.30.0
hooks:
- id: check-github-workflows
- id: check-github-actions
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,17 @@ also include a Python virtual environment where all the Python packages needed
are already installed.

If you decide the develop outside of the dev container, some of the development
tools can be installed by running:
tools can be installed manually by running:

```console
./tools/setup.sh
```

When developing outside the dev container, the following tools must be installed
manually.

- [Docker](https://docs.docker.com/engine/install/) >= v27

Development requires the activation of the Python virtual environment:

```
Expand Down
31 changes: 27 additions & 4 deletions app.py
Original file line number Diff line number Diff line change
@@ -1,20 +1,23 @@
import aws_cdk as cdk
from aws_cdk.aws_scheduler_alpha import ScheduleExpression

from openchallenges.bucket_stack import BucketStack
from openchallenges.network_stack import NetworkStack
from openchallenges.ecs_stack import EcsStack
from openchallenges.service_stack import ServiceStack
from openchallenges.service_stack import LoadBalancedServiceStack
from openchallenges.load_balancer_stack import LoadBalancerStack
from openchallenges.service_props import ServiceProps
from openchallenges.service_props import ServiceProps, ContainerVolume
from openchallenges.data_integration_stack import DataIntegrationStack
from openchallenges.data_integration_props import DataIntegrationProps
import openchallenges.utils as utils

app = cdk.App()

# get the environment
environment = utils.get_environment()
stack_name_prefix = f"openchallenges-{environment}"
image_version = "0.0.11"
image_version = "1.1.1"

# get VARS from cdk.json
env_vars = app.node.try_get_context(environment)
Expand Down Expand Up @@ -45,6 +48,12 @@
"MARIADB_PASSWORD": secrets["MARIADB_PASSWORD"],
"MARIADB_ROOT_PASSWORD": secrets["MARIADB_ROOT_PASSWORD"],
},
container_volumes=[
ContainerVolume(
path="/data/db",
size=30,
)
],
)

mariadb_stack = ServiceStack(
Expand Down Expand Up @@ -297,9 +306,9 @@
f"ghcr.io/sage-bionetworks/openchallenges-app:{image_version}",
{
"API_DOCS_URL": f"https://{fully_qualified_domain_name}/api-docs",
"APP_VERSION": "1.0.0-alpha",
"APP_VERSION": image_version,
"CSR_API_URL": f"https://{fully_qualified_domain_name}/api/v1",
"DATA_UPDATED_ON": "2024-10-11",
"DATA_UPDATED_ON": "2024-11-27",
"ENVIRONMENT": "production",
"GOOGLE_TAG_MANAGER_ID": "GTM-NBR5XD8C",
"SSR_API_URL": "http://openchallenges-api-gateway:8082/api/v1",
Expand All @@ -322,6 +331,20 @@
app, f"{stack_name_prefix}-load-balancer", network_stack.vpc
)

data_integration_props = DataIntegrationProps(
schedule=ScheduleExpression.cron(
minute="*/5",
hour="*",
day="*",
month="*",
time_zone=cdk.TimeZone.AMERICA_LOS_ANGELES,
),
schedule_description="This is a cron-based schedule that will run every 5 minutes",
)
data_integration_stack = DataIntegrationStack(
app, f"{stack_name_prefix}-data-integration", data_integration_props
)

api_docs_props = ServiceProps(
"openchallenges-api-docs",
8010,
Expand Down
1 change: 1 addition & 0 deletions cdk_docker/data-integration-lambda/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
FROM ghcr.io/sage-bionetworks/sandbox-lambda-python:sha-b38dc22
62 changes: 62 additions & 0 deletions openchallenges/data_integration_lambda.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
from aws_cdk import aws_iam as iam
from aws_cdk import aws_lambda as lambda_
from constructs import Construct


class DataIntegrationLambda(Construct):
"""
A CDK construct to define an AWS Lambda function for data integration.
This construct creates an IAM role with the necessary permissions and a Docker-based
Lambda function for handling data integration tasks.
"""

def __init__(self, scope: Construct, id: str) -> None:
"""
Builds the IAM role for the Lambda function.
This role allows the Lambda function to execute basic AWS operations.
Returns:
iam.Role: The IAM role for the Lambda function.
"""
super().__init__(scope, id)

self.lambda_role = self._build_lambda_role()
self.lambda_function = self._build_lambda_function(self.lambda_role)

def _build_lambda_role(self) -> iam.Role:
return iam.Role(
self,
"LambdaRole",
assumed_by=iam.ServicePrincipal("lambda.amazonaws.com"),
managed_policies=[
iam.ManagedPolicy.from_aws_managed_policy_name(
managed_policy_name=("service-role/AWSLambdaBasicExecutionRole")
)
],
)

def _build_lambda_function(self, role: iam.Role) -> lambda_.Function:
"""
Builds the Docker-based AWS Lambda function.
The Lambda function uses a Docker image built from a local directory.
Args:
role (iam.Role): The IAM role to associate with the Lambda function.
Returns:
_lambda.Function: The Docker-based AWS Lambda function.
"""
return lambda_.DockerImageFunction(
self,
"LambdaFunction",
code=lambda_.DockerImageCode.from_image_asset(
# Directory relative to where you execute cdk deploy contains a
# Dockerfile with build instructions.
directory="cdk_docker/data-integration-lambda"
),
role=role,
memory_size=128,
)
19 changes: 19 additions & 0 deletions openchallenges/data_integration_props.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
from dataclasses import dataclass
from aws_cdk.aws_scheduler_alpha import ScheduleExpression


@dataclass
class DataIntegrationProps:
"""
Data integration properties.
Attributes:
schedule (ScheduleExpression): The schedule for triggering the data integration.
schedule_description (str): The description of the schedule.
"""

schedule: ScheduleExpression
"""The schedule for triggering the data integration."""

schedule_description: str
"""The description of the schedule."""
66 changes: 66 additions & 0 deletions openchallenges/data_integration_stack.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
import aws_cdk as cdk
from aws_cdk import (
aws_scheduler_alpha as scheduler_alpha,
aws_scheduler_targets_alpha as scheduler_targets,
)
from openchallenges.data_integration_lambda import DataIntegrationLambda
from openchallenges.data_integration_props import DataIntegrationProps
from constructs import Construct


class DataIntegrationStack(cdk.Stack):
"""
Defines an AWS CDK stack for data integration.
This stack sets up the resources required for scheduling and executing
data integration tasks using AWS Lambda and EventBridge Scheduler.
The stack includes:
- A Lambda function for data integration.
- An EventBridge Scheduler schedule to trigger the Lambda function.
- An EventBridge Scheduler group for organizing schedules.
Attributes:
scope (Construct): The parent construct.
id (str): The unique identifier for this stack.
props (DataIntegrationProps): The properties for the data integration, including the schedule.
"""

def __init__(
self, scope: Construct, id: str, props: DataIntegrationProps, **kwargs
) -> None:
"""
Initializes the DataIntegrationStack.
Arguments:
scope (Construct): The parent construct for this stack.
id (str): The unique identifier for this stack.
props (DataIntegrationProps): The properties required for data integration,
including the schedule.
**kwargs: Additional arguments passed to the base `cdk.Stack` class.
"""
super().__init__(scope, id, **kwargs)

data_integration_lambda = DataIntegrationLambda(self, "data-integration-lambda")

target = scheduler_targets.LambdaInvoke(
data_integration_lambda.lambda_function,
input=scheduler_alpha.ScheduleTargetInput.from_object({}),
)

# Create a group for the schedule (maybe we want to add more schedules
# to this group the future)
schedule_group = scheduler_alpha.Group(
self,
"group",
group_name="schedule-group",
)

scheduler_alpha.Schedule(
self,
"schedule",
schedule=props.schedule,
target=target,
group=schedule_group,
description=props.schedule_description,
)
25 changes: 25 additions & 0 deletions openchallenges/service_props.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,25 @@
from dataclasses import dataclass
from typing import List

CONTAINER_LOCATION_PATH_ID = "path://"


@dataclass
class ContainerVolume:
"""
Holds onto configuration for a volume used in the container.
Attributes:
path: The path on the container to mount the host volume at.
size: The size of the volume in GiB.
read_only: Container has read-only access to the volume, set to `false` for write access.
"""

path: str
size: int = 15
read_only: bool = False


class ServiceProps:
"""
ECS service properties
Expand All @@ -13,6 +32,7 @@ class ServiceProps:
supports docker registry references (i.e. ghcr.io/sage-bionetworks/openchallenges-thumbor:latest)
container_env_vars: a json dictionary of environment variables to pass into the container
i.e. {"EnvA": "EnvValueA", "EnvB": "EnvValueB"}
container_volumes: List of `ContainerVolume` resources to mount into the container
"""

def __init__(
Expand All @@ -22,6 +42,7 @@ def __init__(
container_memory: int,
container_location: str,
container_env_vars: dict,
container_volumes: List[ContainerVolume] = None,
) -> None:
self.container_name = container_name
self.container_port = container_port
Expand All @@ -32,3 +53,7 @@ def __init__(
)
self.container_location = container_location
self.container_env_vars = container_env_vars
if container_volumes is None:
self.container_volumes = []
else:
self.container_volumes = container_volumes
Loading

0 comments on commit be639e2

Please sign in to comment.