Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws_ecs.service: Cannot update service if service revisions is active #33195

Open
1 task
chrisdodds opened this issue Jan 27, 2025 · 2 comments
Open
1 task
Labels
@aws-cdk/aws-ecs Related to Amazon Elastic Container bug This issue is a bug. effort/medium Medium work item – several days of effort p3

Comments

@chrisdodds
Copy link

chrisdodds commented Jan 27, 2025

Describe the bug

When an ECS service has service revisions enabled, cdk cannot update the associated stack unless other stack changes trigger a re-deployment.

Regression Issue

  • Select this option if this issue appears to be a regression.

Last Known Working CDK Version

No response

Expected Behavior

I would expect that cdk would handle this state gracefully/consistently and not leave the stack in an unmodifiable state.

In the current config, our preference would be for cdk to handle this change even if the revision has been triggered externally by a CD tool. I get that ideally, CD might trigger a CDK update to the stack rather than having it triggered by another tool, but that type of gitops config isn't always possible.

Current Behavior

"Invalid request provided: UpdateService error: TaskDefinition is inactive (Service: AmazonECS; Status Code: 400; Error Code: ClientException"

I have attempted this change when with the task definition versioning managed by CDK and without (task def updated by CLI). Both produce the same error.

Reproduction Steps

Deploy a service.
Deploy a revision of the service. (via cdk or cli)
Attempt to update the stack with changes that do not trigger a service re-deployment (like replica count).

Possible Solution

Image

This appears to be because ECS keeps the prior revision attached to the service, but set as inactive. Example: If I select the "source" revision in the screenshot above, that task definition version is listed as inactive. This makes sense, but possibly there is a way to safely handle this by either triggering a re-deploy or ignoring the inactive revision.

Image

Additional Information/Context

No response

CDK CLI Version

2.176.0

Framework Version

No response

Node.js Version

18.16

OS

MacOS Sequoia

Language

TypeScript

Language Version

No response

Other information

No response

@chrisdodds chrisdodds added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jan 27, 2025
@github-actions github-actions bot added the @aws-cdk/aws-ecs Related to Amazon Elastic Container label Jan 27, 2025
@pahud pahud self-assigned this Jan 28, 2025
@pahud
Copy link
Contributor

pahud commented Jan 28, 2025

Hi,

Deploy a service.
Deploy a revision of the service. (via cdk or cli)
Attempt to update the stack with changes that do not trigger a service re-deployment (like replica count).

Can you elaborate on how did you deploy a revision of the service using CDK?

I assume you are referring to the revision of the task definition.

Generally, when you deploy ecs service using CDK, you get an initial revision like this

DummyStackDummyTaskDef85B748C3:1 and the service would be associated with this revision.

Let me explain using this sample:

export class DummyStack extends Stack {
  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props);

    // Create VPC
    const vpc = ec2.Vpc.fromLookup(this, 'DummyVpc', {
      isDefault: true
    });

    // Create ECS Cluster
    const cluster = new ecs.Cluster(this, 'DummyCluster', {
      vpc,
    });

    // Create Task Definition
    const taskDefinition = new ecs.FargateTaskDefinition(this, 'DummyTaskDef', {
      memoryLimitMiB: 512,
      cpu: 256
    });

    // Add container to task definition
    taskDefinition.addContainer('DummyContainer', {
      image: ecs.ContainerImage.fromRegistry('nginx'),
      memoryLimitMiB: 512,
      cpu: 256,
      portMappings: [{ containerPort: 80 }],
      logging: ecs.LogDrivers.awsLogs({ streamPrefix: 'dummy-service' })
    });

    // Create Fargate Service
    const service = new ecs.FargateService(this, 'DummyService', {
      cluster,
      taskDefinition,
      desiredCount: 1,
      serviceName: 'dummy-service'
    });

    // Output the service name
    new CfnOutput(this, 'ServiceName', {
      value: service.serviceName,
      description: 'Name of the ECS Service'
    });

  }
}

As you can see, we are creating the task definition with undefined task name, which allows CDK/CFN to auto generate the name.

When we run cdk diff, a task definition will be created with an implicit revision 1.

Resources
[+] AWS::ECS::Cluster DummyCluster DummyClusterBF18C710
[+] AWS::IAM::Role DummyTaskDef/TaskRole DummyTaskDefTaskRoleA0584B61
[+] AWS::ECS::TaskDefinition DummyTaskDef DummyTaskDefE0A553CD
[+] AWS::Logs::LogGroup DummyTaskDef/DummyContainer/LogGroup DummyTaskDefDummyContainerLogGroupD83BBD9C
[+] AWS::IAM::Role DummyTaskDef/ExecutionRole DummyTaskDefExecutionRole7834A567
[+] AWS::IAM::Policy DummyTaskDef/ExecutionRole/DefaultPolicy DummyTaskDefExecutionRoleDefaultPolicy2E93016F
[+] AWS::ECS::Service DummyService/Service DummyServiceCCC23DB3
[+] AWS::EC2::SecurityGroup DummyService/SecurityGroup DummyServiceSecurityGroup21981670

Now if I change the task definition prop, cdk will trigger in-place update like this and bump the revision from 1 to 2 and associate the service to revision 2 with revision 1 inactive, triggering rolling update.

[~] AWS::ECS::TaskDefinition DummyTaskDef DummyTaskDefE0A553CD replace
 └─ [~] Memory (requires replacement)
     ├─ [-] 512
     └─ [+] 1024

Now, if you create the task definition revision from console/CLI/CD, this will create revision 3 with revision 2 still active but CDK/CFN would not aware of that.

Image

At this moment, if you run cdk diff, cdk won't detect any change.

Now, if you modify CDK code, cdk diff would detect the change from its last known CFN state for example

[~] AWS::ECS::TaskDefinition DummyTaskDef DummyTaskDefE0A553CD replace
 ├─ [~] Cpu (requires replacement)
 │   ├─ [-] 256
 │   └─ [+] 1024
 └─ [~] Memory (requires replacement)
     ├─ [-] 1024
     └─ [+] 2048

Now you will see rev 4 created and associated with service, while rev 3 still active and rev 2 inactive.

Image

And your service should be running with rev 4.

Now, if you manually create a new rev 5 and

Attempt to update the stack with changes that do not trigger a service re-deployment (like replica count).

let's update the desiredCount from 1 to 2

cd diff

Resources
[~] AWS::ECS::Service DummyService/Service DummyServiceCCC23DB3
 └─ [~] DesiredCount
     ├─ [-] 1
     └─ [+] 2

CloudFormation would just in-place update that service by changing its DesiredCount property without changing it's associated revision (rev 4 in this case) so your service with 2 tasks should be still running at rev 4. This should work as expected.

I am guessing your CD might create a new rev and update the service with that rev, which makes the CFN drift as CDK/CFN is not aware of the external changes by the CD and if you re-deploy CDK using new properties. The drift might cause some issues.

Generally I'll suggest if you use external CD to deploy ECS service/revisions, always stick to that for service/revision update and leave CDK for initial deployment as CFN might have issue handling drift in your case and that's not something CDK can handle as CDK simply synthesizes the CFN template.

Let me know if I miss anything and I'm always happy to help.

@pahud pahud added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. p3 labels Jan 28, 2025
@pahud pahud removed their assignment Jan 28, 2025
@pahud pahud added effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Jan 28, 2025
@chrisdodds
Copy link
Author

chrisdodds commented Jan 28, 2025

Thanks, @pahud for the thorough response. Really appreciated.

That all makes sense. You're correct, we're using ecs-deploy in our CD to create an updated task definition that references a new container image and patch the service like so:

ecs deploy "${ECS_CLUSTER}" "${ECS_WEB_SERVICE}" \
    --image app "${DOCKER_REPO}/app:${SEMAPHORE_GIT_SHA}" \
    --env app GIT_HASH "${SEMAPHORE_GIT_SHA}" \
    --newrelic-revision "${SEMAPHORE_GIT_SHA}" \
    --newrelic-appid "${NEWRELIC_APP_ID}" \
    --newrelic-apikey "${NEW_RELIC_API_KEY}" \
    --user "${SEMAPHORE_GIT_COMMIT_AUTHOR}" \
    --comment "${SEMAPHORE_GIT_SHA}" \
    --timeout 600 # seconds

I think I'm being greedy and wanting the best of both worlds. I want to control everything in CDK other than the task-def reference to the new git-sha, treating the infra and the container image ref as separate concerns.

I suppose I could replace the ecs-deploy step with a cdk deploy that does basically the same, but right now cdk is in its own repo to handle shared components and such.

One other work-around I though of was to use
taskDefinitionRevision: ecs.TaskDefinitionRevision.LATEST,
on the stack which seems to resolve the issue but I haven't thought through all the (potentially) unintended consequence of that.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-ecs Related to Amazon Elastic Container bug This issue is a bug. effort/medium Medium work item – several days of effort p3
Projects
None yet
Development

No branches or pull requests

2 participants