Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add tracing to collection API #236

Merged
merged 7 commits into from
Nov 13, 2024
Merged

Conversation

jpetto
Copy link
Contributor

@jpetto jpetto commented Nov 8, 2024

Goal

add tracing to collection API (take 2)

  • uses pocket's OTLP collector endpoint
  • sends traces AND logs to GCP for easy viewing
  • updates sentry on all ECS services to version 8
    • version 8 has most of our integrations built in - yay!
  • removed some legacy infra code

This has been deployed to dev - you can see traces in GCP here.

Implementation Decisions

  • instead of sending to AWS X-Ray (which was buggy and memory intensive), sends traces and logs to a standalone ECS service, which in turn sends traces and logs on to GCP for better monitoring.
  • integrating feature flags/unleash so we can turn tracing on or off, and set the sample rate for traces. also gives us the ability to use feature flags for other things!
  • greatly reduced logging levels application wide. we don't need to log debug level events, even in dev. (this can be re-enabled in dev easily if needs be.)

Deployment steps

  • Deployed to dev?
  • Secrets - created!

References

JIRA ticket:

Copy link

github-actions bot commented Nov 8, 2024

Plan Result (prospect-translation-lambda-cdk-production)

CI link

Plan: 0 to add, 1 to change, 0 to destroy.
  • Update
    • aws_lambda_function.translation-lambda_translation-sqs-lambda_B9BDF6BA
Change Result (Click me)
  # aws_lambda_function.translation-lambda_translation-sqs-lambda_B9BDF6BA will be updated in-place
  ~ resource "aws_lambda_function" "translation-lambda_translation-sqs-lambda_B9BDF6BA" {
        id                             = "ProspectAPI-Prod-Sqs-Translation-Function"
        tags                           = {
            "app_code"       = "content"
            "component_code" = "content-prospectapi"
            "env_code"       = "prod"
            "environment"    = "Prod"
            "service"        = "ProspectAPI-Sqs-Translation"
        }
        # (22 unchanged attributes hidden)

      ~ environment {
          ~ variables = {
              ~ "GIT_SHA"                      = (sensitive value)
                # (4 unchanged elements hidden)
            }
        }

        # (4 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

⚠️ Errors

Copy link

github-actions bot commented Nov 8, 2024

Plan Result (corpus-scheduler-lambda-cdk-production)

CI link

Plan: 0 to add, 1 to change, 0 to destroy.
  • Update
    • aws_lambda_function.corpus-scheduler-sqs-lambda_F2ECDF9F
Change Result (Click me)
  # aws_lambda_function.corpus-scheduler-sqs-lambda_F2ECDF9F will be updated in-place
  ~ resource "aws_lambda_function" "corpus-scheduler-sqs-lambda_F2ECDF9F" {
        id                             = "CorpusSchedulerLambda-Prod-SQS-Function"
      ~ qualified_arn                  = "arn:aws:lambda:us-east-1:996905175585:function:CorpusSchedulerLambda-Prod-SQS-Function:169" -> (known after apply)
      ~ qualified_invoke_arn           = "arn:aws:apigateway:us-east-1:lambda:path/2015-03-31/functions/arn:aws:lambda:us-east-1:996905175585:function:CorpusSchedulerLambda-Prod-SQS-Function:169/invocations" -> (known after apply)
        tags                           = {
            "app_code"       = "content"
            "component_code" = "content-corpusschedulerlambda"
            "env_code"       = "prod"
            "environment"    = "Prod"
            "service"        = "CorpusSchedulerLambda"
        }
      ~ version                        = "169" -> (known after apply)
        # (20 unchanged attributes hidden)

      ~ environment {
          ~ variables = {
              ~ "GIT_SHA"                          = (sensitive value)
                # (7 unchanged elements hidden)
            }
        }

        # (4 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

⚠️ Errors

Copy link

github-actions bot commented Nov 8, 2024

Plan Result (prospect-api-cdk-production)

CI link

Plan: 0 to add, 3 to change, 0 to destroy.
  • Update
    • aws_dynamodb_table.dynamodb_prospects_dynamodb_table_9854E41E
    • aws_iam_policy.application_ecs_service_ecs-iam_ecs-task-role-policy_6FC89FB6
    • aws_lambda_function.bridge-lambda_bridge-sqs-lambda_343B543A
Change Result (Click me)
  # data.aws_iam_policy_document.application_ecs_service_ecs-iam_data-ecs-task-role-policy_090CC3AD will be read during apply
  # (depends on a resource or a module with changes pending)
 <= data "aws_iam_policy_document" "application_ecs_service_ecs-iam_data-ecs-task-role-policy_090CC3AD" {
      + id            = (known after apply)
      + json          = (known after apply)
      + minified_json = (known after apply)
      + version       = "2012-10-17"

      + statement {
          + actions   = [
              + "dynamodb:BatchGet*",
              + "dynamodb:DescribeTable",
              + "dynamodb:Get*",
              + "dynamodb:Query",
              + "dynamodb:Scan",
              + "dynamodb:UpdateItem",
            ]
          + effect    = "Allow"
          + resources = [
              + "arn:aws:dynamodb:us-east-1:996905175585:table/PROAPI-Prod-Prospects",
              + "arn:aws:dynamodb:us-east-1:996905175585:table/PROAPI-Prod-Prospects/*",
            ]
        }
      + statement {
          + actions   = [
              + "s3:*",
            ]
          + effect    = "Allow"
          + resources = [
              + "arn:aws:s3:::pocket-prospectapi-prod-images",
              + "arn:aws:s3:::pocket-prospectapi-prod-images/*",
            ]
        }
      + statement {
          + actions   = [
              + "events:PutEvents",
            ]
          + effect    = "Allow"
          + resources = [
              + "arn:aws:events:us-east-1:996905175585:event-bus/PocketEventBridge-Prod-Shared-Event-Bus",
            ]
        }
      + statement {
          + actions   = [
              + "logs:CreateLogGroup",
              + "logs:CreateLogStream",
              + "logs:DescribeLogGroups",
              + "logs:DescribeLogStreams",
              + "logs:PutLogEvents",
              + "xray:GetSamplingRules",
              + "xray:GetSamplingStatisticSummaries",
              + "xray:GetSamplingTargets",
              + "xray:PutTelemetryRecords",
              + "xray:PutTraceSegments",
            ]
          + effect    = "Allow"
          + resources = [
              + "*",
            ]
        }
    }

  # aws_dynamodb_table.dynamodb_prospects_dynamodb_table_9854E41E will be updated in-place
  ~ resource "aws_dynamodb_table" "dynamodb_prospects_dynamodb_table_9854E41E" {
        id                          = "PROAPI-Prod-Prospects"
        name                        = "PROAPI-Prod-Prospects"
        tags                        = {
            "app_code"       = "content"
            "component_code" = "content-prospectapi"
            "env_code"       = "prod"
            "environment"    = "Prod"
            "service"        = "ProspectAPI"
        }
        # (9 unchanged attributes hidden)

      - global_secondary_index {
          - hash_key           = "scheduledSurfaceGuid" -> null
          - name               = "scheduledSurfaceGuid-prospectType" -> null
          - non_key_attributes = [] -> null
          - projection_type    = "ALL" -> null
          - range_key          = "prospectType" -> null
          - read_capacity      = 0 -> null
          - write_capacity     = 0 -> null
        }
      + global_secondary_index {
          + hash_key           = "scheduledSurfaceGuid"
          + name               = "scheduledSurfaceGuid-prospectType"
          + non_key_attributes = []
          + projection_type    = "ALL"
          + range_key          = "prospectType"
          + read_capacity      = 5
          + write_capacity     = 5
        }

        # (5 unchanged blocks hidden)
    }

  # aws_iam_policy.application_ecs_service_ecs-iam_ecs-task-role-policy_6FC89FB6 will be updated in-place
  ~ resource "aws_iam_policy" "application_ecs_service_ecs-iam_ecs-task-role-policy_6FC89FB6" {
        id               = "arn:aws:iam::996905175585:policy/ProspectAPI-Prod-TaskRolePolicy"
        name             = "ProspectAPI-Prod-TaskRolePolicy"
      ~ policy           = jsonencode(
            {
              - Statement = [
                  - {
                      - Action   = [
                          - "dynamodb:UpdateItem",
                          - "dynamodb:Scan",
                          - "dynamodb:Query",
                          - "dynamodb:Get*",
                          - "dynamodb:DescribeTable",
                          - "dynamodb:BatchGet*",
                        ]
                      - Effect   = "Allow"
                      - Resource = [
                          - "arn:aws:dynamodb:us-east-1:996905175585:table/PROAPI-Prod-Prospects/*",
                          - "arn:aws:dynamodb:us-east-1:996905175585:table/PROAPI-Prod-Prospects",
                        ]
                    },
                  - {
                      - Action   = "s3:*"
                      - Effect   = "Allow"
                      - Resource = [
                          - "arn:aws:s3:::pocket-prospectapi-prod-images/*",
                          - "arn:aws:s3:::pocket-prospectapi-prod-images",
                        ]
                    },
                  - {
                      - Action   = "events:PutEvents"
                      - Effect   = "Allow"
                      - Resource = "arn:aws:events:us-east-1:996905175585:event-bus/PocketEventBridge-Prod-Shared-Event-Bus"
                    },
                  - {
                      - Action   = [
                          - "xray:PutTraceSegments",
                          - "xray:PutTelemetryRecords",
                          - "xray:GetSamplingTargets",
                          - "xray:GetSamplingStatisticSummaries",
                          - "xray:GetSamplingRules",
                          - "logs:PutLogEvents",
                          - "logs:DescribeLogStreams",
                          - "logs:DescribeLogGroups",
                          - "logs:CreateLogStream",
                          - "logs:CreateLogGroup",
                        ]
                      - Effect   = "Allow"
                      - Resource = "*"
                    },
                ]
              - Version   = "2012-10-17"
            }
        ) -> (known after apply)
        tags             = {
            "app_code"       = "content"
            "component_code" = "content-prospectapi"
            "env_code"       = "prod"
            "environment"    = "Prod"
            "service"        = "ProspectAPI"
        }
        # (5 unchanged attributes hidden)
    }

  # aws_lambda_function.bridge-lambda_bridge-sqs-lambda_343B543A will be updated in-place
  ~ resource "aws_lambda_function" "bridge-lambda_bridge-sqs-lambda_343B543A" {
        id                             = "ProspectAPI-Prod-Sqs-Bridge-Function"
      ~ qualified_arn                  = "arn:aws:lambda:us-east-1:996905175585:function:ProspectAPI-Prod-Sqs-Bridge-Function:213" -> (known after apply)
      ~ qualified_invoke_arn           = "arn:aws:apigateway:us-east-1:lambda:path/2015-03-31/functions/arn:aws:lambda:us-east-1:996905175585:function:ProspectAPI-Prod-Sqs-Bridge-Function:213/invocations" -> (known after apply)
        tags                           = {
            "app_code"       = "content"
            "component_code" = "content-prospectapi"
            "env_code"       = "prod"
            "environment"    = "Prod"
            "service"        = "ProspectAPI"
        }
      ~ version                        = "213" -> (known after apply)
        # (20 unchanged attributes hidden)

      ~ environment {
          ~ variables = {
              ~ "GIT_SHA"                  = (sensitive value)
                # (4 unchanged elements hidden)
            }
        }

        # (4 unchanged blocks hidden)
    }

Plan: 0 to add, 3 to change, 0 to destroy.

Copy link

github-actions bot commented Nov 8, 2024

Plan Result (collection-api-cdk-production)

CI link

⚠️ Resource Deletion will happen

This plan contains resource delete operation. Please check the plan result very carefully!

Plan: 1 to add, 2 to change, 1 to destroy.
  • Update
    • aws_iam_policy.application_ecs_service_ecs-iam_ecs-task-execution-role-policy_2D469A77
    • aws_iam_policy.application_ecs_service_ecs-iam_ecs-task-role-policy_6FC89FB6
  • Replace
    • aws_ecs_task_definition.application_ecs_service_ecs-task_461CC9D4
Change Result (Click me)
  # aws_ecs_task_definition.application_ecs_service_ecs-task_461CC9D4 must be replaced
-/+ resource "aws_ecs_task_definition" "application_ecs_service_ecs-task_461CC9D4" {
      ~ arn                      = "arn:aws:ecs:us-east-1:996905175585:task-definition/CollectionAPI-Prod:938" -> (known after apply)
      ~ arn_without_revision     = "arn:aws:ecs:us-east-1:996905175585:task-definition/CollectionAPI-Prod" -> (known after apply)
      ~ container_definitions    = jsonencode(
          ~ [
              ~ {
                  ~ environment            = [
                        # (1 unchanged element hidden)
                        {
                            name  = "EVENT_BUS_NAME"
                            value = "PocketEventBridge-Prod-Shared-Event-Bus"
                        },
                      + {
                          + name  = "LOG_LEVEL"
                          + value = "info"
                        },
                        {
                            name  = "NODE_ENV"
                            value = "production"
                        },
                      + {
                          + name  = "OTLP_COLLECTOR_URL"
                          + value = "https://otel-collector.readitlater.com:443"
                        },
                    ]
                    name                   = "app"
                  ~ secrets                = [
                        # (1 unchanged element hidden)
                        {
                            name      = "SENTRY_DSN"
                            valueFrom = "arn:aws:ssm:us-east-1:996905175585:parameter/CollectionAPI/Prod/SENTRY_DSN"
                        },
                      + {
                          + name      = "UNLEASH_ENDPOINT"
                          + valueFrom = "arn:aws:ssm:us-east-1:996905175585:parameter/Shared/Prod/UNLEASH_ENDPOINT"
                        },
                      + {
                          + name      = "UNLEASH_KEY"
                          + valueFrom = "arn:aws:secretsmanager:us-east-1:996905175585:secret:CollectionAPI/Prod/UNLEASH_KEY"
                        },
                    ]
                    # (9 unchanged attributes hidden)
                },
            ] # forces replacement
        )
      ~ id                       = "CollectionAPI-Prod" -> (known after apply)
      ~ revision                 = 938 -> (known after apply)
        tags                     = {
            "app_code"       = "pocket"
            "component_code" = "pocket-collectionapi"
            "env_code"       = "prod"
            "environment"    = "Prod"
            "service"        = "CollectionAPI"
        }
        # (10 unchanged attributes hidden)
    }

  # aws_iam_policy.application_ecs_service_ecs-iam_ecs-task-execution-role-policy_2D469A77 will be updated in-place
  ~ resource "aws_iam_policy" "application_ecs_service_ecs-iam_ecs-task-execution-role-policy_2D469A77" {
        id               = "arn:aws:iam::996905175585:policy/CollectionAPI-Prod-TaskExecutionRolePolicy"
        name             = "CollectionAPI-Prod-TaskExecutionRolePolicy"
      ~ policy           = jsonencode(
          ~ {
              ~ Statement = [
                  ~ {
                      - Sid      = ""
                        # (3 unchanged attributes hidden)
                    },
                  ~ {
                      ~ Resource = [
                          + "arn:aws:ssm:us-east-1:996905175585:parameter/Shared/Prod/*",
                          + "arn:aws:ssm:us-east-1:996905175585:parameter/Shared/Prod",
                            "arn:aws:ssm:us-east-1:996905175585:parameter/CollectionAPI/Prod/*",
                            # (1 unchanged element hidden)
                        ]
                      - Sid      = ""
                        # (2 unchanged attributes hidden)
                    },
                ]
                # (1 unchanged attribute hidden)
            }
        )
        tags             = {
            "app_code"       = "pocket"
            "component_code" = "pocket-collectionapi"
            "env_code"       = "prod"
            "environment"    = "Prod"
            "service"        = "CollectionAPI"
        }
        # (5 unchanged attributes hidden)
    }

  # aws_iam_policy.application_ecs_service_ecs-iam_ecs-task-role-policy_6FC89FB6 will be updated in-place
  ~ resource "aws_iam_policy" "application_ecs_service_ecs-iam_ecs-task-role-policy_6FC89FB6" {
        id               = "arn:aws:iam::996905175585:policy/CollectionAPI-Prod-TaskRolePolicy"
        name             = "CollectionAPI-Prod-TaskRolePolicy"
      ~ policy           = jsonencode(
          ~ {
              ~ Statement = [
                    # (1 unchanged element hidden)
                    {
                        Action   = "events:PutEvents"
                        Effect   = "Allow"
                        Resource = "arn:aws:events:us-east-1:996905175585:event-bus/PocketEventBridge-Prod-Shared-Event-Bus"
                    },
                  ~ {
                      ~ Action   = [
                          - "xray:PutTraceSegments",
                          - "xray:PutTelemetryRecords",
                          - "xray:GetSamplingTargets",
                          - "xray:GetSamplingStatisticSummaries",
                          - "xray:GetSamplingRules",
                            "logs:PutLogEvents",
                            # (4 unchanged elements hidden)
                        ]
                        # (2 unchanged attributes hidden)
                    },
                ]
                # (1 unchanged attribute hidden)
            }
        )
        tags             = {
            "app_code"       = "pocket"
            "component_code" = "pocket-collectionapi"
            "env_code"       = "prod"
            "environment"    = "Prod"
            "service"        = "CollectionAPI"
        }
        # (5 unchanged attributes hidden)
    }

Plan: 1 to add, 2 to change, 1 to destroy.

Changes to Outputs:
  ~ ecs-task-arn           = "arn:aws:ecs:us-east-1:996905175585:task-definition/CollectionAPI-Prod:938" -> (known after apply)

@jpetto jpetto force-pushed the MC-1588-add-otlp-to-collection-api branch 3 times, most recently from 70e9657 to 753cd21 Compare November 8, 2024 21:35
@github-actions github-actions bot deployed to curated-corpus-api-dev November 8, 2024 21:48 Active
@github-actions github-actions bot deployed to collection-api-dev November 8, 2024 21:48 Active
@github-actions github-actions bot deployed to prospect-api-dev November 8, 2024 21:49 Active
@github-actions github-actions bot deployed to collection-api-dev November 8, 2024 22:14 Active
@github-actions github-actions bot deployed to collection-api-dev November 8, 2024 22:36 Active
@jpetto jpetto force-pushed the MC-1588-add-otlp-to-collection-api branch from 691f76c to f469c75 Compare November 8, 2024 23:02
@jpetto jpetto marked this pull request as ready for review November 8, 2024 23:03
@jpetto jpetto marked this pull request as draft November 8, 2024 23:04
@jpetto jpetto force-pushed the MC-1588-add-otlp-to-collection-api branch from 6746003 to 7963998 Compare November 12, 2024 17:13
README.md Outdated
2. Toggle the `default` environment to "On" in the left hand "Enabled in environments" box
3. Expand the `default` environment in the main, right-hand panel
4. Click the ✎ pencil icon to edit the "Gradual rollout" strategy
5. Move the "Rollout" slider to the percentage of traces you want to collect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rollout should always be 100%, and you want to edit the variant within the rollouts number value to a value between 0-1 to represent the sampling percent.

bassrock
bassrock previously approved these changes Nov 12, 2024
@jpetto jpetto marked this pull request as ready for review November 12, 2024 17:50

- When adding a new aws-sdk, pin it to the version used throughout the monorepo.
- When upgrading aws-sdk, upgrade it consistently throughout the monorepo.
The syncpack config can be found in the `./syncpackrc` file.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the extensive documentation 🙏


To disable tracing on a service, simply toggle the `default` environment to "Off".

### Local Tracing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

Copy link
Collaborator

@katerinachinnappan katerinachinnappan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 🔥

@jpetto jpetto dismissed stale reviews from katerinachinnappan and bassrock via a86912a November 13, 2024 16:42
@jpetto jpetto force-pushed the MC-1588-add-otlp-to-collection-api branch from 8995adc to a86912a Compare November 13, 2024 16:42
@jpetto jpetto merged commit 1d0f1c4 into main Nov 13, 2024
49 checks passed
@jpetto jpetto deleted the MC-1588-add-otlp-to-collection-api branch November 13, 2024 16:56
@jpetto jpetto mentioned this pull request Nov 15, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants