Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: handling deleted stacks via launch #1193

Closed
jfalkenstein opened this issue Jan 22, 2022 · 9 comments
Closed

Feature request: handling deleted stacks via launch #1193

jfalkenstein opened this issue Jan 22, 2022 · 9 comments

Comments

@jfalkenstein
Copy link
Contributor

Right now, sceptre launch will create new stacks and update existing stacks automatically. However, Sceptre doesn't currently see stacks that exist remotely but do not currently have a config locally. The only way to delete a stack via Sceptre is to have its Stack Config locally and run sceptre delete explicitly. But if the Stack Config doesn't exist, Sceptre can't even do that.

Requested behavior

When you run sceptre launch --allow-delete it would locate existing stacks that are in CloudFormation but do not currently exist in the local project and then delete them. When prompting the user before launch begins (unless -y is passed), it should list the stacks to be deleted separately to make it clear what deletions will occur.

Why the --allow-delete flag?

There are many situations where a user might be on a different branch or otherwise not want to delete stacks that aren't fully represented locally. As a result, it should be opt-in behavior; Otherwise it could have disastrous consequences.

How would it find remote-only stacks without state management?

It's important that Sceptre continues to avoid using a state management DB or file, since that's one of the "selling points" of Sceptre.

Instead, I propose that Sceptre be able to find remote-only stacks by:

  1. Every stack that Sceptre creates or updates should always add a "sceptre_project_code" tag and a "sceptre_name" tag automatically (in addition to any other configured stack tags).
    • Automatic tagging for state management is pretty common for infra-as-code tools. CDK, SAM CLI, and others use this mechanism to store metadata on stacks for later reference.
  2. To locate remote stacks for a given command path would involve:
    1. Gathering the project code for that command path
    2. Listing all non-deleted cloudformation stacks and collecting all those stacks whose name starts_with the project code.
      • This could produce false-positives, though, in case there's a project code that shares the same start as another project code.
    3. Describing each of the gathered stacks to retrieve their tags
    4. Selecting only those stacks that share the correct project_code tag and whose stack_name starts with the command path.

Doing this should provide a list of stack names that are remote but not locally represented in configs.

@jfalkenstein
Copy link
Contributor Author

@zaro0508 @zxiiro , this feature request came out of the conversation we had that day we met. I think this is a viable path forward that could accomplish our goals.

@craighurley
Copy link
Contributor

This argument seems dangerous to me, and stretches beyond what sceptre should be responsible for.

Every stack that Sceptre creates or updates should always add a "sceptre_project_code" tag and a "sceptre_name"

It's possible (and common in many projects I have worked on) that different stack repos are responsible for deploying different sceptre projects to the same account. Said repos can be deployed via different pipelines/people.

If folks need to clean up stacks not managed by sceptre, there are already other methods and tools that can help there, for example awsls and awsrm:

awsls aws_cloudformation_stack -p profile1,profile2 -r us-east-1,us-west-2 | grep -i example | awsrm

https://github.com/jckuester/awsls
https://github.com/jckuester/awsrm

Why re-invent the wheel?

@jfalkenstein
Copy link
Contributor Author

Hey, thanks for your input here! I'm not at all concerned with deleting stacks unassociated with Sceptre. I'm concerned with stacks that USED to be associated with Sceptre.

I think about the context of a CI/CD pipeline. When I push changes to stack configs or add stack configs, it applies those changes and stands up new stacks. Launch handles both those scenarios just fine. But if I want to remove a stack from the project that is no longer needed... Launch doesn't know. Sceptre doesn't have a way to see stacks that are on CF that have been removed from Sceptre.

It's possible (and common in many projects I have worked on) that different stack repos are responsible for deploying different sceptre projects to the same account.

My suggestion would care most about the project code. If you're using the project code correctly, it would be unique by project. In fact, ideally, it would be unique by environment as well.

Of course, many folks might not have designed their sceptre configs properly and have not really cared about implementing a proper strategy to the project_code field. That's why it would be opt-in with the flag. But if the project code has been used properly, my plan proposed should work without the dangers you've described.

@zaro0508
Copy link
Contributor

This argument seems dangerous to me, and stretches beyond what sceptre should be responsible for.

I would agree with the fist part however disagree with the 2nd part. I think it would be appropriate for sceptre to somehow remove a remote stack when a user deletes its local stack config. It would be nice but I would agree with @craighurley that it's a dangerous operation that could cause serious problems for users.

I do like this idea though and I don't see it being too dangerous as long as we figure out a good system to identify sceptre managed stacks. here are some initial questions..

  • sceptre code is not a required setting in sceptre. Are you proposing that using the --allow-delete would fail unless appropriate stack_tags are set?
  • what is sceptre_name tag and what would be it's value? or did you mean a stack_name tag?
  • instead of requiring the user to set any sceptre parameters, would it make sense to have sceptre always automatically apply one or a few unique tags to every stack? That way sceptre can identify wether it's a stack that's managed by sceptre.

@jfalkenstein
Copy link
Contributor Author

sceptre code is not a required setting in sceptre

I believe project_code actually is required. But if it's not, then we definitely wouldn't want this behavior to apply if it's not set.

Are you proposing that using the --allow-delete would fail unless appropriate stack_tags are set?

I'm proposing that we always add those two extra tags to all stacks for every update and create actions, no matter if the user has stack_tags on their config or not. This is consistent with AWS-created infra-as-code tools like CDK and SAM. But what I'm also saying is that the only stacks that would be even considered candidates for deletion would be those with the right tags on them that prove they are definitively a part of the command path, even if the local configs have been deleted.

what is sceptre_name tag and what would be it's value? or did you mean a stack_name tag?

Yeah, I wanted the tag keys to start with "sceptre_" to distinguish them from any other tags. It would be what Sceptre calles the "sceptre_name", like "dev/something/or/another.yaml". This is NOT the same as the stack_name because sometimes we override the stack name. Thus, by tagging the stacks, we can know what the "sceptre name" was even if the stack name is different.

instead of requiring the user to set any sceptre parameters, would it make sense to have sceptre always automatically apply one or a few unique tags to every stack? That way sceptre can identify wether it's a stack that's managed by sceptre.

Yeah, that's exactly what I'm suggesting. Those two tags would always be added to every stack, in addition to any other tags that might have been set on them.

@jfalkenstein
Copy link
Contributor Author

@zaro0508 @craighurley With the responses I've provided above, what are your thoughts on this feature? Do you think my proposal is viable?

If the "danger" factor is still too much for you, we could introduce a new command, something like sceptre sync or something like that, which would combine all the functionality of launch as well as the delete functionality I've described above. This would then be truly opt-in, since it wouldn't modify the launch command at all.

@zaro0508
Copy link
Contributor

zaro0508 commented Jan 30, 2022

IMO, always tagging stacks with identifiable Sceptre properties is a generally useful even without the ability to delete stacks. I also think that your proposal is viable and I don't see why it should not work as you described. I also prefer this feature as a launch cleanup flag instead of a separate command, i.e. sceptre launch --cleanup ...

@jfalkenstein
Copy link
Contributor Author

So I did some experimental development on this approach over the weekend. Here's what I found:

  • There will be some complexity dealing with multiple regions, profiles, and iam roles across a given stack group. However, this is largely manageable as long as those are defined on StackGroup configs. If a StackConfig used to specify a different region/profile/iam_role and has been deleted and it differs from what's specified on StackGroup configs, we won't be able to locate those "orphaned" stacks once the config is deleted.
  • If the Stacks follow Sceptre's default naming conventions, locating associated stacks is pretty efficient, since we can filter out all stacks that don't start with the project code. Even with a large number of stacks in a given region, we can quickly hone in on the probable stacks and check their tags to ensure they are indeed the right stacks.
  • However, if the stacks have been named directly using stack_name and don't begin with the project_code, our only recourse is to check the tags of ALL stacks in the region(s), which takes a lot longer with a lot more boto calls. And you need to make a separate request per stack to get its tags. Unfortunately, you can't locate a stack by tags and you can't get tags for numerous stacks in a single request. This is pretty inefficient and I would have concerns about using it in very large AWS orgs with potentially thousands of stacks.

In the end, I have some prototype code code that can locate orphaned stacks. However, I'm wondering if it might be much cleaner and much safer if (instead of my proposed route), there was a simple stack config value that basically meant "delete this stack on launch". You could pretty much remove everything else from the stack configuration except maybe the stack_name. This would make the process of cleaning up stacks via CI/CD look like this:

step 1: Add a "delete_on_launch: True" config to the StackConfig or something like that
step 2: Deploy this with the CI/CD tool. This will delete all stacks marked with this on launch and not launch any stacks marked this way.
step 3: Delete the stack config, now that the stacks have been cleaned up.

This approach would remove the need to locate remote stacks not represented by Stack Configs.

If we do this, there will need to be to be some logic in the deletion. Any stack that depends on a stack marked for deletion would also need to be marked for deletion, otherwise the whole process should fail unless "ignore-dependencies" has been set.

@craighurley, @zaro0508 what would you think of this alternative approach?

@jfalkenstein
Copy link
Contributor Author

I'm closing this issue in favor of #1212

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants