Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Nuke Stalls wirh 2000+ resources #453

Closed
adamLShine opened this issue Dec 10, 2024 · 16 comments
Closed

AWS Nuke Stalls wirh 2000+ resources #453

adamLShine opened this issue Dec 10, 2024 · 16 comments

Comments

@adamLShine
Copy link

I'm using a AWS Nuke within a Gitlab CI pipeline to nuke an aws account with 2000+ resources. I have tested this with the latest docker image (aws-nuke:v3.35.1) and using the binary supplied in issue 'Nuke stalls #426'

Stall Screenshot:

Screenshot 2024-12-10 at 11 35 05 AM

AWS Config:

`regions:

  • global
  • us-east-1
  • ap-southeast-2

accounts:
11111111111:

resource-types:
excludes:
- FMSNotificationChannel # Excluded because it's not available
- FMSPolicy # Excluded because it's not available
- MachineLearningMLModel # Excluded due to ML being unavailable
- MachineLearningDataSource # Excluded due to ML being unavailable
- MachineLearningBranchPrediction # Excluded due to ML being unavailable
- MachineLearningEvaluation # Excluded due to ML being unavailable
- QuickSightUser # Excluded as it make AWS Nuke Stall
- ElasticTranscoderPreset # Deprecated Service
- ElasticTranscoderPipeline # Deprecated Service
- RoboMakerDeploymentJob # Deprecated Service
- RoboMakerFleet # Deprecated Service
- RoboMakerRobot # Deprecated Service
- RoboMakerSimulationJob
- RoboMakerRobotApplication
- RoboMakerSimulationApplication
- OpsWorksApp # Deprecated service
- OpsWorksInstance # Deprecated service
- OpsWorksLayer # Deprecated service
- OpsWorksUserProfile # Deprecated service
- OpsWorksCMBackup # Deprecated service
- OpsWorksCMServer # Deprecated service
- OpsWorksCMServerState # Deprecated service
- CodeStarProject # Deprecated service
- CodeStarConnection # Deprecated service
- CodeStarNotification # Deprecated service
- Cloud9Environment # Deprecated service
- CloudSearchDomain # Deprecated service
- RedshiftServerlessSnapshot # Deprecated service
- RedshiftServerlessNamespace # Deprecated service
- RedshiftServerlessWorkgroup # Deprecated service
- S3Object
- ELBv2ListenerRule
- CloudWatchLogsLogGroup
- S3Bucket
- S3MultipartUpload
- BudgetsBudget
- CloudWatchAlarm `

@adamLShine adamLShine mentioned this issue Dec 10, 2024
@adamLShine
Copy link
Author

adamLShine commented Dec 10, 2024

I changed my config file to only include the 'Global' region and It doesn't appear I get this error anymore, Is this intended?

@adamLShine
Copy link
Author

I added my existing regions back into the config file and got similar errors.

I updated my regions to 'all' for additional testing and aws nuke stalls with this output:

"time="2024-12-10T01:17:38Z" level=debug msg="skipping request: service 'route53' is global, but the session is not"

@ekristen
Copy link
Owner

This is a debug statement, it's not an error and is expected and is not the root cause of anything.

We need to identify which resource it's stuck querying.

Do you have any sense of what resource you have the most of?

@ekristen ekristen changed the title AWS Nuke Stalls "skipping request: service 'iam' is global, but the session is not" AWS Nuke Stalls wirh 2000+ resources Dec 10, 2024
@adamLShine
Copy link
Author

Is there a debug option to see which AWS API is being called? That way to can confirm what it's stalling against.

@ekristen
Copy link
Owner

Can you run with log level trace and send me the output up until it stalls or at least the couple hundred lines before it stalls.

@ekristen
Copy link
Owner

Is there a debug option to see which AWS API is being called? That way to can confirm what it's stalling against.

I don't think so at the moment, that would be good to add. Trace level might give us extra.

@adamLShine
Copy link
Author

adamLShine commented Dec 10, 2024

Thanks, I'll try this out, however the gitlab hosted runners only provide 4mb of log memory and the trace log outputs considerably more. Also because the command is stalling the gitlab runner is timing out I dont think I can redirect stdout into a file and produce a gitlab artifact. So I may not be able to provide this.

@ekristen
Copy link
Owner

It might be easier to debug this outside of GitLab runner but I'm pretty familiar with GitLab runners. When im back in front of a computer I'll look at what can be done.

@adamLShine
Copy link
Author

Thanks, just to confirm i'm using gitlab SAAS runners :)

Also just to give a better idea of my use case im trying to do the following.

We have sandpits accounts that we manually create resources in that need to be nuked. We want to use gitlab scheduled pipelines to periodically run aws nuke against the accounts.

@adamLShine
Copy link
Author

adamLShine commented Dec 10, 2024

I've been doing more testing and I believe that aws-nuke benefit from more output / progress updates during execution. Without this it makes it extremely difficult to determine if aws nuke has stalled or it just working in the background.

For example, I ran aws nuke locally with the log level set to trace. This was the last output I got and then I receive no other outputs for 10+ minutes so I'm not sure if aws nuke is working in the background or has stopped working

TRAC[0098] sending AWS request: > POST / HTTP/1.1 > Host: ssm.ap-southeast-2.amazonaws.com > Authorization: <hidden> > Content-Length: 96 > Content-Type: application/x-amz-json-1.1 > User-Agent: aws-sdk-go/1.54.20 (go1.21.13; darwin; arm64) > X-Amz-Date: 20241210T113649Z > X-Amz-Target: AmazonSSM.ListTagsForResource > > {"ResourceId":"/////","ResourceType":"Parameter"} TRAC[0098] received AWS response: < HTTP/1.1 200 OK < Connection: keep-alive < Content-Type: application/x-amz-json-1.1 < Date: Tue, 10 Dec 2024 11:36:49 GMT < Server: Server < X-Amzn-Requestid: 5820193a-9bfe-4842-872c-f2685d7e5171 < < {"TagList":[{"Key":"aws:cloudformation:stack-name","Value”:”xx”},{“Key":"aws:cloudformation:logical-id","Value”:”xx”x},{“Key":"aws:cloudformation:stack-id","Value”:”xx”}]}

@ekristen
Copy link
Owner

Agreed.

@ekristen
Copy link
Owner

Try https://github.com/ekristen/aws-nuke/releases/tag/v3.35.2 with --log-level debug -- added logging to the libnuke library around resources are scanned.

@adamLShine
Copy link
Author

Thanks! I've tested it and it's heaps better

@ekristen
Copy link
Owner

ok, let me know where it stalls at. :)

@adamLShine
Copy link
Author

I completed my testing, I think I was originally stalling on the large amount of cloudwatch log groups. Your updated binary file fixed that and after more testing AWS Nuke has not stalled :)

I think adding the additionally logs really helped because it showed me that nuke was still running and hadn't stalled. I'm think that adding more output to the normal log level would be beneficial, because if you are a new user and are nuking a large AWS account you can have periods of 1+ hour with no output from aws nuke which makes people think that something has gone wrong, when in reality it's working in the background and just taking a long time. What are you thoughts? :)

Thanks for you support! We can close this issues now.

@ekristen
Copy link
Owner

Great news! Thanks for the feedback.

I'll see what I can do to put more feedback in the tool. It's tough to duplicate some of these edge cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants