This repository maintains the data storage system. We use this Google Drive folder for design docs and meeting notes, and this Zenhub board to track our GitHub work.
The DSS is a replicated data storage system designed for hosting large sets of scientific experimental data on Amazon S3 and Google Storage. The DSS exposes an API for interacting with the data and is built using Chalice, API Gateway and AWS Lambda. The API also implements Step Functions to orchestrate Lambdas for long-running tasks such as large file writes. You can find the API documentation and give it a try here.
The DSS API uses Swagger to define the API specification according to the OpenAPI 2.0 specification. Connexion is used to map the API specification to its implementation in Python.
You can use the
Swagger Editor
to review and edit the API specification. When the API is live, the spec is also available at /v1/swagger.json
.
The DSS API Swagger is also available at https://dss.dev.ucsc-cgp-redwood.org.
- DSS: The Data Storage System
- Overview
- Table of Contents
- Getting Started
- Deployment
- Using the Data Store CLI Client
- Checking Indexing
- Running Tests
- Development
- Security Policy
- Contributing
In this section, you'll configure and deploy a development version of the DSS, consisting of a local API server and a suite of cloud services.
All commands given in this Readme should be run from the root of this repository after sourcing the
correct environment (see the Configuration section below). The root directory of the repository
is also available in the environment variable $DSS_HOME
.
NOTE: Deploying the data store requires privileged access to cloud accounts (AWS, GCP, etc.). If your deployment fails due to access restrictions, please consult your local system administrators.
The first step to get started with the data store is to clone this repository:
git clone [email protected]:DataBiosphere/data-store.git
cd data-store
The DSS requires Python 3.6+ to run. The file requirements.txt
contains Python dependencies for those running a data store,
and requirements-dev.txt
contains Python dependencies for those developing code for the data store. Once this
repository has been cloned, use pip to install the Python dependencies:
pip install -r requirements-dev.txt
To interact with AWS and GCP from the command line, use the officially distributed CLI tools.
The aws
CLI tool can be installed via pip install awscli
(or any other method covered in the
aws-cli repository Readme).
The gcloud
CLI tool should be installed directly from Google Cloud. Use the gcloud
Downloads page to download the latest version. Use the gcloud
Quickstarts page for installation instructions for various
operating systems.
Terraform, a tool From Hasicorp, should also be downloaded from
terraform.io and the binary moved somewhere on your $PATH
.
The data store requires that a specific version of Terraform be used. Check common.mk
for the
specific version of Terraform that should be installed.
NOTE: The Dockerfile for the CI/CD test cluster, allspark.Dockerfile
, contains
a set of commands to download and install a specified version of Terraform.
The data store makes use of a number of other command line utilities that should be present on your system (if they
are not, make
commands will fail):
jq
- install viaapt-get install jq
orbrew install jq
sponge
- install viaapt-get install moreutils
orbrew install moreutils
envsubst
- install viaapt-get install gettext
orbrew install gettext && brew link gettext
See the file common.mk
for more information.
The DSS is configured via environment variables.
The file environment
sets default values for all variables used in the data store. The file
environment.local
overrides default values with custom entries. To customize the
configuration environment variables:
- Copy
environment.local.example
toenvironment.local
- Edit
environment.local
to add custom entries that override the default values inenvironment
- Run
source environment
now and whenever these environment files are modified.
When the user runs source environment
, it will execute the entire environment
file, setting each variable to its
default value; then environment
will source environment.local
, overwriting the default values with the new
values defined in environment.local
.
The full list of configurable environment variables and their descriptions is here.
To configure a data store with multiple stages, several changes are needed:
-
The
environment
configuration file will need additional stage-specific environment variables defined. See the Human Cell Atlas data store repo and itsenvironment
file for an example of an environment file for a multi-stage data store deployment. -
A stage-specific environment file
environment.$DSS_STAGE
should also be used to override some environment variable values. For an example of a stage-specific environment file, see theenvironment.prod
file in the Human Cell Atlas data store repo.
The DSS uses Terraform's AWS S3 backend for deployment. This means Terraform will use an AWS S3 bucket to store its configuration files.
Before Terraform is used, the Terraform bucket that will contain the configuration files must be created -
Terraform will not create this bucket itself. Specify the bucket name using the environment variable
$DSS_TERRAFORM_BACKEND_BUCKET_TEMPLATE
.
All other buckets will be created by Terraform during the infrastructure deployment step and should not exist before deploying for the first time.
To configure the AWS CLI:
-
Configure your AWS CLI credentials following the data store AWS CLI Configuration Guide.
-
Verify that
AWS_DEFAULT_REGION
points to your prefered AWS region. -
Specify the names of S3 buckets in
environment.local
using the environment variablesDSS_S3_BUCKET_*
. These buckets will be created by Terraform and should not exist before deploying.
To configure GCP for deployment of infrastructure, start by creating an OAuth application and generating associated tokens. These will be stored in the AWS Secrets Manager and used for automated deployment of infrastructure to GCP. Here are the steps:
-
Go to the GCP API and Service Credentials page. You may have to select Organization and Project again.
-
Click Create Credentials and select OAuth client
-
For Application type choose Other
-
Under application name, use
${DSS_PLATFORM}-dss-
followed by the stage name (i.e. the value ofDSS_DEPLOYMENT_STAGE
.. This is a convention only and carries no technical significance. -
Click Create, don't worry about noting the client ID and secret, click OK
-
Click the edit icon for the new credentials and click Download JSON
-
Place the downloaded JSON file into the project root as
application_secrets.json
-
Run the following command to store
application_secrets.json
in the AWS Secrets Manager (to make it available later during the deployment process)### WARNING: RUNNING THIS COMMAND WILL ### CLEAR EXISTING SCRET VALUE cat $DSS_HOME/application_secrets.json | ./scripts/dss-ops.py secrets set --force $GOOGLE_APPLICATION_SECRETS_SECRETS_NAME
Next, configure the gcloud
command line utility with the following steps:
-
Choose a region that has support for Cloud Functions and set
GCP_DEFAULT_REGION
to that region. See the GCP locations list for a list of supported regions. -
Run
gcloud config set project PROJECT_ID
, wherePROJECT_ID
is the ID of the project, not the name (i.e:dss-store-21555
, NOT justdss-store
) of the GCP project you selected earlier. -
Enable the required APIs:
gcloud services enable cloudfunctions.googleapis.com gcloud services enable runtimeconfig.googleapis.com gcloud services enable iam.googleapis.com
-
Specify the names of Google Cloud Storage buckets in
environment.local
using the environment variablesDSS_GS_BUCKET_*
. These buckets will be created by Terraform and should not exist before deploying.
The following environment variables must be set to enable user authentication and authorization:
OIDC_AUDIENCE
must be populated with the expected JWT (JSON web token) audience.OPENID_PROVIDER
is the generator of the JWT, and is used to determine how the JWT is validated.OIDC_GROUP_CLAIM
is the JWT claim that specifies the group the users belongs to.OIDC_EMAIL_CLAIM
is the JWT claim that specifies the requests email.
Also update authorizationUrl
in dss-api.yml
to point to an authorization endpoint that will return
a valid JWT.
It is optional to configure custom authentication and authorizatoin in a swagger file before deploying the
data store. This can be done with the script scripts/swagger_auth.py
. This script will load the swagger YML
file, modify or add auth sections in the YML file, and add auth to various API endpoints.
Auth is added to API endpoints using a key-value dictionary, where the keys are API endpoints and the values are
HTTP actions taken on the endpoint, such as "put" and "get". The configuration dictionary can be specified on the
command line or stored in a file. To pass a configuration on the command line, use the --config_security
or -c
flag, and pass the values in a list of strings. For example:
python scripts/swagger_auth.py -c='{"/path": ["call"]}'
Alternatively, auth can be set for all swagger endpoints by passing the --secure
flag:
python scripts/swagger_auth.py --secure
Note that removing auth from endpoints will currently break tests, however adding auth should be fine
(make test
should run successfully).
Some daemons (dss-checkout-sfn
for example) use Amazon SES to send emails. You must set DSS_NOTIFICATION_SENDER
to your email address, then verify that email address using the SES Console. This will enable SES to send notification
emails.
Run ./dss-api
in the top-level data-store
directory to deploy the DSS API on your localhost
.
We use Terraform to automatically create a Google Cloud service account (referred to as the "deployment service account") to deploy Google Cloud infrastructure.
When deploying for the first time, we need to manually create a (different) service account (referred to as the "utility service account") that Terraform can utilize to create the deployment service account. The utility service account is only used once, during the first deployment, to create the deployment service account.
To manually create the utility service account:
-
In the Google Cloud Console, select the correct Google user account on the top right and the correct GCP project in the drop down in the top center. Go to "IAM & Admin", then "Service accounts".
-
Click "Create service account" and select "Furnish a new private key". Under "Roles", select the following roles:
a) "Project – Owner"
b) "Service Accounts – Service Account User"
c) "Cloud Functions – Cloud Function Developer"
-
Create the account and download the utility service account key JSON file.
-
Place the file at
$DSS_HOME/gcp-credentials-util.json
. Terraform will use this utility service account credentials file to create the deployment service account.
Now that we have the utility service account credentials, we can use Terraform to create the deployment service account:
-
Specify the name of the Google Cloud Platform deployment service account in
environment.local
using the environment variableDSS_GCP_SERVICE_ACCOUNT_NAME
. It should be set to$DSS_HOME/gcp-credentials-util.json
. -
Specify that you want to use the utility service account credentials to create the deployment service account by setting
GOOGLE_APPLICATION_CREDENTIALS
to$DSS_HOME/gcp-credentials-util.json
:export GOOGLE_APPLICATION_CREDENTIALS="$DSS_HOME/gcp-credentials-util.json"
-
Create the Google Cloud Platform deployment service account using the command
make -C infra COMPONENT=gcp_service_account apply
Alternatively, an existing service account can be imported instead using
terraform import
from the Google service account component directory:cd infra/gcp_service_account terraform import google_service_account.dss ${DSS_GCP_SERVICE_ACCOUNT_NAME}@${GCP_PROJECT_ID}.iam.gserviceaccount.com
This step can be skipped if you're rotating credentials.
-
Once the deployment service account has been created, open the Google Cloud Platform web console and navigate to "IAM & Admin", then "Service accounts". Click the menu on the right and select the "Create new key" option. Create and download a new JSON key and place the downloaded key into the project root at
${DSS_HOME}/gcp-credentials.json
. -
Store the deployment service account credentials just downloaded in the AWS Secrets Manager:
### WARNING: RUNNING THIS COMMAND WILL ### CLEAR EXISTING SECRET VALUE cat $DSS_HOME/gcp-credentials.json | ./scripts/dss-ops.py secrets set --force $GOOGLE_APPLICATION_CREDENTIALS_SECRETS_NAME
Lastly, when you have finished creating the deployment service account, switch to its credentials by resetting
GOOGLE_APPLICATION_CREDENTIALS
to the deployment service account credentials file, which should be at
${DSS_HOME}/gcp-credentials.json
:
GOOGLE_APPLICATION_CREDENTIALS=${DSS_HOME}/gcp-credentials.json
Note that if you are having problems with GCP credentials that look like this:
Error applying IAM policy for project "${GCP_PROJECT_ID}":
Error setting IAM policy for project "${GCP_PROJECT_ID}":
googleapi: Error 403: The caller does not have permission, forbidden
double-check that your GOOGLE_APPLICATION_CREDENTIALS
are set to the utility
service account, and not the deployment service account - otherwise the deployment
service account is trying to modify itself!
Set admin account emails within AWS Secret Manager:
### WARNING: RUNNING THIS COMMAND WILL
### CLEAR EXISTING SECRET VALUE
echo -n '[email protected],[email protected]' | ./scripts/dss-ops.py secrets set --force $ADMIN_USER_EMAILS_SECRETS_NAME
Alternatively, define ADMIN_USER_EMAILS
in environment.local
and run:
### WARNING: RUNNING THIS COMMAND WILL
### CLEAR EXISTING SECRET VALUE
echo -n $ADMIN_USER_EMAILS | ./scripts/dss-ops.py secrets set --force $ADMIN_USER_EMAILS_SECRETS_NAME
Assuming the tests have passed above, the next step is to manually deploy. See the section below for information on CI/CD with Travis if continuous deployment is your goal.
Several components in the DSS are deployed separately as daemons, found in $DSS_HOME/daemons
. Daemon deployment may
be dependent on infrastructure being deployed, such SQS queues or SNS topics. This infrastructure can be handled by placing
Terraform files in the daemon directory, e.g., ${DSS_HOME}/daemons/dss-admin/my_queue_defs.tf
. This infrastructure is
deployed non-interactively, without the usual Terraform workflow of planning and reviewing. Therefore it should be
lightweight in nature.
More complex or larger infrastructure should be added to $DSS_HOME/infra
instead of the daemon infrastructure
whenever possible.
Both AWS and GCP use global namespaces shared amongst all users, so ensure that you name your resources appropriately to avoid name collisions.
Buckets within AWS and GCP need to be available for use by the DSS. Use Terraform to set up the buckets:
make -C infra COMPONENT=buckets plan
make -C infra COMPONENT=buckets apply
The AWS Elasticsearch Service is used for metadata indexing. Currently the DSS uses version 5.5 of ElasticSearch. For typical development deployments the
t2.small.elasticsearch instance type is sufficient. Use the DSS_ES_
variables to adjust the cluster as needed.
The operator doing the deployment must add their public IP address as an allowed IP address to access the Elasticsearch cluster.
Allowed Elasticsearch IP addresses should be added to the secrets manager; separate IP addresses with commas. For
example, if the public IP addresses of two operators needing to deploy a data store are 1.1.1.1
and 2.2.2.2
,
the Elasticsearch allowed source IPs variable would be set like so:
### WARNING: RUNNING THIS COMMAND WILL
### CLEAR EXISTING SECRET VALUE
echo -n '1.1.1.1,2.2.2.2' | ./scripts/dss-ops.py secrets set --force $ES_ALLOWED_SOURCE_IP_SECRETS_NAME
Use Terraform to deploy ES resource:
make -C infra COMPONENT=elasticsearch plan
make -C infra COMPONENT=elasticsearch apply
Open the AWS Web Console and navigate to the Elasticsearch Service.
The Elasticsearch domain with the name matching DSS_ES_DOMAIN
should
show up in the list. Open this Elasticsearch domain. The Elasticsearch
endpoint will be shown there, and will look something like:
https://search-${DSS_ES_DOMAIN}-abcxyz1234567890.${AWS_REGION}.es.amazonaws.com
Now set the environment variable DSS_ES_ENDPOINT
in environment.local
to this
Elasticsearch url, minus the https://
prefix. For example,
DSS_ES_ENDPOINT="search-${DSS_ES_DOMAIN}-abcxyz1234567890.${AWS_REGION}.es.amazonaws.com "
Note that it should not be stored in a version-controlled file like environment
, but should be stored in
the local environment file environment.local
instead. Export the new environment variable values with
source environment
once the new variable is set.
Once the DSS_ES_ENDPOINT
, DSS_ES_ALLOWED_IPS
, and ADMIN_USER_EMAILS
environment variables have been set,
all variables required by the lambda functions have been set, so the next step is to export the lambda function
environment variables in the local environment and store it in the parameter store under the variable
environment
. These environment variables will then be set in each lambda function during the deployment step.
To export the lambda function environment variables, use the lambda update
function of the dss operations script:
./scripts/dss-ops.py lambda update
If there are already lambda functions deployed, you can add the --update-deployed
flag to export the variables to
all deployed lambda functions, in addition to exporting the variables to the parameter store.
./scripts/dss-ops.py lambda update --update-deployed
It is useful to be able to check on the lambda environments to troubleshoot problems with a data store
deployment. There are two ways to check the lambda environment, both using the ./scripts/dss-ops.py
script:
-
Print lambda environment variables and values from the currently-deployed lambdas. These are the environment variable values that are currently deployed to the lambdas (and therefore may not match values in the parameter store or in your local environment).
./scripts/dss-ops.py lambda environment
-
Print lambda environment variables and values stored in the parameter store. These are the environment variable values that will be deployed to the lambdas during the next deployment.
./scripts/dss-ops.py params environment
It is assumed that Route 53 and the AWS Certificate Manager are used to manage domains and HTTPS certificates for those domains.
The first step is to verify the domain that the data store will use should be listed as a Hosted Zone in Route 53. To verify, open the AWS Web Console, select Route 53, then select Hosted Zones.
The next step is to create a wildcard certificate for your domain. Your ownership or control of the domain must be verified to create a certificate matching the domain. We recommended to use the DNS method of verification, as this is well-integrated with Route 53.
- Open the AWS Web Console and select the AWS Certificate Manager.
- Click "Request a Certificate".
- Select "Request a public certificate" and click "Next".
- Enter the domain or subdomain you want the data store to use. You can use
*.example.com
to create a wildcard cert for an entire domain, or*.data.example.com
to create a wildcard cert for thedata.example.com
subdomain only. - Select "DNS validation" as the domain validation method.
- Optionally, add relevant tags (Name, Owner, Project, etc.) and click "Review".
- Click "Confirm and Request". This will inform you that the cert is pending validation and requires you to verify ownership.
- Click the triangle next to the domain name to expand the cert request. Click "Create a record in Route 53".
- The Certificate Manager will ask you to confirm creation of a Route 53 DNS record. Click "Create", then "Continue".
- Wait for the validation step to complete. Once the certificate validation step has finished, the "Status" will change to "Issued".
Once you have created your certificate, set ACM_CERTIFICATE_IDENTIFIER
to the identifier of the certificate,
which can be found on the AWS console.
Note that if you are having problems with certificates that look like this:
aws_api_gateway_domain_name.dss: Creating...
Error: Error creating API Gateway Domain Name: BadRequestException: The provided certificate does not exist.
double-check that the certificate you have created in Certificate Manager was created in the same region specified
in your environment
file by the variable AWS_DEFAULT_REGION
.
One last piece of infrastructure that must be created before deployment is the AWS Event Relay User. The event
relay (daemons/dss-gs-event-relay
) is responsible for transmitting events from AWS
to GCP. Running this script will create a user, which requirest the iam:CreateUser
privilege, which is granted to
project admins on the GCP account.
# This script must be run by a GCP project admin
./scripts/create_config_aws_event_relay_user.py
If you do not run this step, the make deploy
command will fail due to a missing secret in the secrets manager.
Now deploy using make:
make plan-infra
make deploy-infra
make deploy
If successful, you should be able to see the Swagger API documentation at:
https://${API_DOMAIN_NAME}
And you should be able to list bundles like this:
curl -X GET "https://${API_DOMAIN_NAME}/v1/bundles" -H "accept: application/json"
Please see the data-store-monitor repo for additional monitoring tools.
Updating environment variables defined in either environment
or environment.local
requires
some care when generating Terraform files from those environment variables.
Many environment variables make their way into Terraform files via templates that are made with
make commands, so when you change your environment
or environment.local
file, it is not enough
to just run source environment
.
The Terraform files that store variable values are called variables.tf
and are located in
subdirectories of the infra/
folder. These variable.tf
files are Terraform files automatically
generated by the infra/build_deploy_config.py
script. To remake these files, re-create all of the
variables.tf
files in infra/
by running the make command:
# Remake all variables.tf files in infra/
make plan-infra
To remake variables.tf
for a particular component,
# Remake variables.tf files for bucket infra
make -C infra COMPONENT=buckets plan
Note that make
commands should always be used, otherwise the operator may experience problems during
the deployment process.
What happens when the deployment process tries to create resources, but those resources already exist?
Here, we have two options:
-
Import an existing resource, so that Terraform can manage and use it as part of the data store's infrastructure
-
Delete and re-create infrastructure
Importing Existing Resources:
The terraform import
command allows Terraform to import infrastructure so that it can be
managed as part of the data store's infrastructure. This command must be run directly, there
are no make
commands for it.
Here is an example of how to import an existing Google service account:
cd infra/gcp_service_account
terraform import google_service_account.dss ${DSS_GCP_SERVICE_ACCOUNT_NAME}@${GCP_PROJECT_ID}.iam.gserviceaccount.com
and another example to import an existing DynamoDB table for the async step function database:
cd infra/async_state_db/
terraform import aws_dynamodb_table.sfn_state dss-async-state-dev
The first argument after terraform import
will be the resource identifier, and the second argument
is the name of the resource that you would like to import.
Deleting and Re-Creating Resources:
You can use the Makefile to destroy infrastructure resources, just as you can use it to create them.
For example, to delete all buckets (note: buckets must be empty! If they are not,
remove their contents with the aws s3
command), use the following make command:
# WARNING: THIS WILL DELETE ALL DATA STORE BUCKETS!
make -C infra COMPONENT=buckets destroy
If you wish to destroy all infra resources, use the following make command:
# WARNING: THIS WILL DELETE ALL DATA STORE INFRASTRUCTURE!
make -C infra destroy-all
Note that the infrastructure being destroyed uses names from the environment
file,
so if the environment
file variables do not match existing infrastructure, the
make -C infra destroy-all
command will not work. You may also need to re-make the
Terraform variables.tf
files, as covered in the prior section.
Once the make destroy-all
command has been used to destroy existing infrastructure,
you can use the same make deploy-infra
command covered above to re-create all the infrastructure.
We use Travis CI for continuous unit testing that does
not involve deployed components. A private GitLab instance is used for deployment to
the dev
environment if unit tests pass, as well as further testing of deployed components, for every commit
on the master
branch. GitLab testing results are announced on the
data-store-eng
Slack channel in the HumanCellAtlas workspace.
Travis behaviour is defined in .travis.yml
, and GitLab behaviour is defined in .gitlab-ci.yml
.
Encrypted environment variables give Travis CI the AWS credentials needed to run the tests and deploy the app. Run
scripts/authorize_aws_deploy.sh IAM-PRINCIPAL-TYPE IAM-PRINCIPAL-NAME
(e.g. authorize_aws_deploy.sh group travis-ci
) to give that principal the permissions needed to deploy the app. Because a group policy has a higher size
limit (5,120 characters) than a user policy (2,048 characters), it is advisable to apply this to a group and add the
principal to that group. Because this is a limited set of permissions, it does not have write access to IAM. To set up
the IAM policies for resources in your account that the app will use, run make deploy
using privileged account
credentials once from your workstation. After this is done, Travis CI will be able to deploy on its own. You must
repeat the make deploy
step from a privileged account any time you change the IAM policies templates in
iam/policy-templates/
.
Now that you have deployed the data store, the next step is to use the Data Store CLI client dbio
to upload and
download data to the system. See the data-store-cli repo for
installation instructions.
Examples of CLI use:
# list bundles
dbio dss post-search --es-query "{}" --replica=aws | less
# upload full bundle
dbio dss upload --replica aws --staging-bucket staging_bucket_name --src-dir ${DSS_HOME}/tests/fixtures/datafiles/example_bundle
Now that you've uploaded data, the next step is to confirm the indexing is working properly and you can query the indexed metadata.
dbio dss post-search --replica aws --es-query '
{
"query": {
"bool": {
"must": [{
"match": {
"files.donor_organism_json.medical_history.smoking_history": "yes"
}
}, {
"match": {
"files.specimen_from_organism_json.genus_species.text": "Homo sapiens"
}
}, {
"match": {
"files.specimen_from_organism_json.organ.text": "brain"
}
}]
}
}
}
'
-
Check that software packages required to test and deploy are available, and install them if necessary:
make --dry-run
-
Populate text fixture buckets with test fixture data (This command will completely empty the given buckets before populating them with test fixture data, please ensure the correct bucket names are provided):
tests/fixtures/populate.py --s3-bucket $DSS_S3_BUCKET_TEST_FIXTURES --gs-bucket $DSS_GS_BUCKET_TEST_FIXTURES
-
Set the environment variable
DSS_TEST_ES_PATH
to the path of theelasticsearch
binary on your machine. -
Run tests with
make test
All tests for the DSS fall into one of two categories:
- Standalone tests, which do not depend on deployed components, and
- Integration tests, which depend on deployed components.
As such, standalone tests can be expected to pass even if no deployment is configured, and in fact should pass before an initial deployment. For more information on tests, see tests/README.md.
The direct runtime dependencies of this project are defined in requirements.txt.in
. Direct development dependencies
are defined in requirements-dev.txt.in
. All dependencies, direct and transitive, are defined in the corresponding
requirements.txt
and requirements-dev.txt
files. The latter two can be generated using make requirements.txt
or
make requirements-dev.txt
respectively. Modifications to any of these four files need to be committed. This process is
aimed at making dependency handling more deterministic without accumulating the upgrade debt that would be incurred by
simply pinning all direct and transitive dependencies. Avoid being overly restrictive when constraining the allowed
version range of direct dependencies in -requirements.txt.in
and requirements-dev.txt.in
If you need to modify or add a direct runtime dependency declaration, follow the steps below:
- Make sure there are no pending changes to
requirements.txt
orrequirements-dev.txt
. - Make the desired change to
requirements.txt.in
orrequirements-dev.txt.in
- Run
make requirements.txt
. Runmake requirements-dev.txt
if you have modifiedrequirements-dev.txt.in
. - Visually check the changes to
requirements.txt
andrequirements-dev.txt
. - Commit them with a message like
Bumping dependencies
.
You now have two commits, one that catches up with updates to transitive dependencies, and one that tracks your explict
change to a direct dependency. This process applies to development dependencies as well, except for
requirements-dev.txt
and requirements-dev.txt.in
respectively.
If you wish to re-pin all the dependencies, run make refresh_all_requirements
. It is advisable to do a full
test-deploy-test cycle after this (the test after the deploy is required to test the lambdas).
-
Always use a module-level logger, call it
logger
and initialize it as follows:import logging logger = logging.getLogger(__name__)
-
Do not configure logging at module scope. It should be possible to import any module without side-effects on logging. The
dss.logging
module contains functions that configure logging for this application, its Lambda functions and unit tests. -
When logging a message, pass either
-
an f-string as the first and only positional argument or
-
a %-string as the first argument and substitution values as subsequent arguments. Do not mix the two string interpolation methods. If you mix them, any percent sign in a substituted value will raise an exception.
# In other words, use logger.info(f"Foo is {foo} and bar is {bar}") # or logger.info("Foo is %s and bar is %s", foo, bar) # but not logger.info(f"Foo is {foo} and bar is %s", bar) # Keyword arguments can be used safely in conjunction with f-strings: logger.info(f"Foo is {foo}", exc_info=True)
-
-
To enable verbose logging by application code, set the environment variable
DSS_DEBUG
to1
. To enable verbose logging by dependencies setDSS_DEBUG
to2
. To disable verbose logging unsetDSS_DEBUG
or set it to0
. -
To assert in tests that certain messages were logged, use the
dss
logger or one of its childrendss_logger = logging.getLogger('dss') with self.assertLogs(dss_logger) as log_monitor: # do stuff # or import dss with self.assertLogs(dss.logger) as log_monitor: # do stuff
AWS Xray tracing is used for profiling the performance of deployed lambdas. This can be enabled for chalice/app.py
by
setting the lambda environment variable DSS_XRAY_TRACE=1
. For all other daemons you must also check
"Enable active tracking" under "Debugging and error handling" in the AWS Lambda console.
See our Security Policy.
External contributions are welcome. Please review the Contributing Guidelines