Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: ingest ML-supplied URL metadata with some prospects #251

Merged
merged 1 commit into from
Dec 17, 2024

Conversation

jpetto
Copy link
Contributor

@jpetto jpetto commented Dec 16, 2024

Goal

ingest ML-supplied URL metadata for some prospect types. this is an experiment in the direction of reducing reliance on the parser. editors will be providing feedback to ML directly on the quality of the metadata.

  • this will skip using the Parser for URL metadata for these prospects, except for publisher name, which we still need to rely on the Parser for (for now)
  • small refactoring/cleanup

intend to test this on dev prior to merging - will coordinate with ML.

Implementation Decisions

  • no additional data type/structure validation failures need to be implemented here, as we want any issues with ML-supplied URL metadata to be visible to the editors in the tool (as they are providing feedback there). i am sending any issues with the ML-supplied URL metadata to Sentry for added visibility.

Deployment steps

  • Deployed to and verified on dev?

References

JIRA ticket:

Copy link

github-actions bot commented Dec 16, 2024

Plan Result (prospect-translation-lambda-cdk-production)

CI link

Plan: 0 to add, 1 to change, 0 to destroy.
  • Update
    • aws_lambda_function.translation-lambda_translation-sqs-lambda_B9BDF6BA
Change Result (Click me)
  # aws_lambda_function.translation-lambda_translation-sqs-lambda_B9BDF6BA will be updated in-place
  ~ resource "aws_lambda_function" "translation-lambda_translation-sqs-lambda_B9BDF6BA" {
        id                             = "ProspectAPI-Prod-Sqs-Translation-Function"
        tags                           = {
            "app_code"       = "content"
            "component_code" = "content-prospectapi"
            "env_code"       = "prod"
            "environment"    = "Prod"
            "service"        = "ProspectAPI-Sqs-Translation"
        }
        # (22 unchanged attributes hidden)

      ~ environment {
          ~ variables = {
              ~ "GIT_SHA"                      = (sensitive value)
                # (4 unchanged elements hidden)
            }
        }

        # (4 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

⚠️ Errors

Copy link

github-actions bot commented Dec 16, 2024

Plan Result (corpus-scheduler-lambda-cdk-production)

CI link

Plan: 0 to add, 1 to change, 0 to destroy.
  • Update
    • aws_lambda_function.corpus-scheduler-sqs-lambda_F2ECDF9F
Change Result (Click me)
  # aws_lambda_function.corpus-scheduler-sqs-lambda_F2ECDF9F will be updated in-place
  ~ resource "aws_lambda_function" "corpus-scheduler-sqs-lambda_F2ECDF9F" {
        id                             = "CorpusSchedulerLambda-Prod-SQS-Function"
      ~ qualified_arn                  = "arn:aws:lambda:us-east-1:996905175585:function:CorpusSchedulerLambda-Prod-SQS-Function:185" -> (known after apply)
      ~ qualified_invoke_arn           = "arn:aws:apigateway:us-east-1:lambda:path/2015-03-31/functions/arn:aws:lambda:us-east-1:996905175585:function:CorpusSchedulerLambda-Prod-SQS-Function:185/invocations" -> (known after apply)
        tags                           = {
            "app_code"       = "content"
            "component_code" = "content-corpusschedulerlambda"
            "env_code"       = "prod"
            "environment"    = "Prod"
            "service"        = "CorpusSchedulerLambda"
        }
      ~ version                        = "185" -> (known after apply)
        # (20 unchanged attributes hidden)

      ~ environment {
          ~ variables = {
              ~ "GIT_SHA"                          = (sensitive value)
                # (7 unchanged elements hidden)
            }
        }

        # (4 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

⚠️ Errors

Copy link

github-actions bot commented Dec 16, 2024

Plan Result (prospect-api-cdk-production)

CI link

Plan: 0 to add, 2 to change, 0 to destroy.
  • Update
    • aws_dynamodb_table.dynamodb_prospects_dynamodb_table_9854E41E
    • aws_iam_policy.application_ecs_service_ecs-iam_ecs-task-role-policy_6FC89FB6
Change Result (Click me)
  # data.aws_iam_policy_document.application_ecs_service_ecs-iam_data-ecs-task-role-policy_090CC3AD will be read during apply
  # (depends on a resource or a module with changes pending)
 <= data "aws_iam_policy_document" "application_ecs_service_ecs-iam_data-ecs-task-role-policy_090CC3AD" {
      + id            = (known after apply)
      + json          = (known after apply)
      + minified_json = (known after apply)
      + version       = "2012-10-17"

      + statement {
          + actions   = [
              + "dynamodb:BatchGet*",
              + "dynamodb:DescribeTable",
              + "dynamodb:Get*",
              + "dynamodb:Query",
              + "dynamodb:Scan",
              + "dynamodb:UpdateItem",
            ]
          + effect    = "Allow"
          + resources = [
              + "arn:aws:dynamodb:us-east-1:996905175585:table/PROAPI-Prod-Prospects",
              + "arn:aws:dynamodb:us-east-1:996905175585:table/PROAPI-Prod-Prospects/*",
            ]
        }
      + statement {
          + actions   = [
              + "s3:*",
            ]
          + effect    = "Allow"
          + resources = [
              + "arn:aws:s3:::pocket-prospectapi-prod-images",
              + "arn:aws:s3:::pocket-prospectapi-prod-images/*",
            ]
        }
      + statement {
          + actions   = [
              + "events:PutEvents",
            ]
          + effect    = "Allow"
          + resources = [
              + "arn:aws:events:us-east-1:996905175585:event-bus/PocketEventBridge-Prod-Shared-Event-Bus",
            ]
        }
      + statement {
          + actions   = [
              + "logs:CreateLogGroup",
              + "logs:CreateLogStream",
              + "logs:DescribeLogGroups",
              + "logs:DescribeLogStreams",
              + "logs:PutLogEvents",
            ]
          + effect    = "Allow"
          + resources = [
              + "*",
            ]
        }
    }

  # aws_dynamodb_table.dynamodb_prospects_dynamodb_table_9854E41E will be updated in-place
  ~ resource "aws_dynamodb_table" "dynamodb_prospects_dynamodb_table_9854E41E" {
        id                          = "PROAPI-Prod-Prospects"
        name                        = "PROAPI-Prod-Prospects"
        tags                        = {
            "app_code"       = "content"
            "component_code" = "content-prospectapi"
            "env_code"       = "prod"
            "environment"    = "Prod"
            "service"        = "ProspectAPI"
        }
        # (9 unchanged attributes hidden)

      - global_secondary_index {
          - hash_key           = "scheduledSurfaceGuid" -> null
          - name               = "scheduledSurfaceGuid-prospectType" -> null
          - non_key_attributes = [] -> null
          - projection_type    = "ALL" -> null
          - range_key          = "prospectType" -> null
          - read_capacity      = 0 -> null
          - write_capacity     = 0 -> null
        }
      + global_secondary_index {
          + hash_key           = "scheduledSurfaceGuid"
          + name               = "scheduledSurfaceGuid-prospectType"
          + non_key_attributes = []
          + projection_type    = "ALL"
          + range_key          = "prospectType"
          + read_capacity      = 5
          + write_capacity     = 5
        }

        # (5 unchanged blocks hidden)
    }

  # aws_iam_policy.application_ecs_service_ecs-iam_ecs-task-role-policy_6FC89FB6 will be updated in-place
  ~ resource "aws_iam_policy" "application_ecs_service_ecs-iam_ecs-task-role-policy_6FC89FB6" {
        id               = "arn:aws:iam::996905175585:policy/ProspectAPI-Prod-TaskRolePolicy"
        name             = "ProspectAPI-Prod-TaskRolePolicy"
      ~ policy           = jsonencode(
            {
              - Statement = [
                  - {
                      - Action   = [
                          - "dynamodb:UpdateItem",
                          - "dynamodb:Scan",
                          - "dynamodb:Query",
                          - "dynamodb:Get*",
                          - "dynamodb:DescribeTable",
                          - "dynamodb:BatchGet*",
                        ]
                      - Effect   = "Allow"
                      - Resource = [
                          - "arn:aws:dynamodb:us-east-1:996905175585:table/PROAPI-Prod-Prospects/*",
                          - "arn:aws:dynamodb:us-east-1:996905175585:table/PROAPI-Prod-Prospects",
                        ]
                    },
                  - {
                      - Action   = "s3:*"
                      - Effect   = "Allow"
                      - Resource = [
                          - "arn:aws:s3:::pocket-prospectapi-prod-images/*",
                          - "arn:aws:s3:::pocket-prospectapi-prod-images",
                        ]
                    },
                  - {
                      - Action   = "events:PutEvents"
                      - Effect   = "Allow"
                      - Resource = "arn:aws:events:us-east-1:996905175585:event-bus/PocketEventBridge-Prod-Shared-Event-Bus"
                    },
                  - {
                      - Action   = [
                          - "logs:PutLogEvents",
                          - "logs:DescribeLogStreams",
                          - "logs:DescribeLogGroups",
                          - "logs:CreateLogStream",
                          - "logs:CreateLogGroup",
                        ]
                      - Effect   = "Allow"
                      - Resource = "*"
                    },
                ]
              - Version   = "2012-10-17"
            }
        ) -> (known after apply)
        tags             = {
            "app_code"       = "content"
            "component_code" = "content-prospectapi"
            "env_code"       = "prod"
            "environment"    = "Prod"
            "service"        = "ProspectAPI"
        }
        # (5 unchanged attributes hidden)
    }

Plan: 0 to add, 2 to change, 0 to destroy.

@jpetto jpetto force-pushed the MC-1628-use-ML-URL-metadata-for-qa-prospect-types branch from ef49f64 to f9e85dd Compare December 16, 2024 17:43
runDetails: ProspectRunDetails,
features: ProspectFeatures,
): SnowplowProspect => {
return {
object_version: 'new',
prospect_id: prospect.prospectId,
prospect_source: prospectSource,
prospect_source: prospect.prospectType,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this replacement value maps 1:1 with the extra parameter removed in this PR.

@github-actions github-actions bot deployed to prospect-api-dev December 16, 2024 18:02 Active
@github-actions github-actions bot deployed to collection-api-dev December 16, 2024 18:02 Active
@github-actions github-actions bot deployed to curated-corpus-api-dev December 16, 2024 18:04 Active
@jpetto jpetto force-pushed the MC-1628-use-ML-URL-metadata-for-qa-prospect-types branch from f9e85dd to b39b551 Compare December 16, 2024 20:47
- this will skip using the Parser for URL metadata for these prospects
- a little bit of refactoring/cleanup
@jpetto jpetto force-pushed the MC-1628-use-ML-URL-metadata-for-qa-prospect-types branch from b39b551 to 814747b Compare December 16, 2024 23:49
@jpetto jpetto marked this pull request as ready for review December 17, 2024 15:39
// noting which prospect types will have ML-supplied URL metadata. this is
// likely a temporary array, as we should move to a consistent URL metadata
// source for all prospect types.
export const ProspectTypesWithMlUrlMetadata: ProspectType[] = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: Most consts are in camelCase, is there a reason this one is PascalCase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, no. i think my brain was thinking "type" instead of "const".

@jpetto jpetto merged commit 66d419d into main Dec 17, 2024
45 checks passed
@jpetto jpetto deleted the MC-1628-use-ML-URL-metadata-for-qa-prospect-types branch December 17, 2024 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants