-
-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nuke stalls #426
Comments
My theory it hits a resource type with huge number of resources, and just gets stuck on it. If so - can I bypass it somehow (I know I can add it to the exclusion list, but I will have to guess which resource type it is every time it happens). And also - how do I actually clean up these resources? |
Like, maybe have an option of maximum number of resources to retrieve? This way nuke can be done in several consecutive runs. |
Is |
Yes, matter of fact I tried to target exclusively |
Well this gives us a place to start ... Unfortunately 34k is a LOT and there are rate limits involved with those APIs. We can only query 10 queries per second, 50 at a time at a time, then 15 streams queries per-second. https://github.com/ekristen/aws-nuke/blob/main/resources/cloudwatchlogs-loggroups.go#L44-L49 34k at 50 per query is 680 queries to describe log groups. Because we also query LogStream to get some additional metadata, there's 1 query per log group 15 per second max, so that's 34k at 15/second - that's 37 minutes to discover. I can add some debugging logging to a special build for testing via github actions. It's likely just taking forever to query everything. It's possible I could add a setting to bypass querying the log stream, that'd cut out 37 minutes worth of time. |
It's not just log groups, there are other resource types with huge number of resources. Would it be possible to add a generic option to limit number of resources nuke enumerates per resource type, something like |
Interesting idea. What do you think would be more useful, per resource or global.
|
I think per resource type is a better option. Nuke has no problems going over large number of resource types, it's when a particular resource type has huge number of resources - it gets stuck. Limiting number of resources retrieved per resource type should solve this issue. |
My only wonder if it should be per-resource type instead of globally for all resource-types. Like |
Having option per specific resource type is good if you know the resource type that causing troubles - then you can target that type specifically. But sometimes you don't know in advance which type it's going to be. If it's possible to have it both ways (if type is specified - apply limit only to that type, otherwise to all types) it would be the best of both worlds. |
You would need to run aws-nuke a lot more times to clean everything up with this limit in place, but at least it wouldn't get stuck. If aws-nuke is run regularly against an account it should work well most of the time. Cost Explorer can help identify where the spend is in an account but that doesn't necessarily correspond to what you have a lot of. You can have a huge amount of something that costs a negligible amount, and not much of something that costs a lot. |
It definitely feels like an advanced feature. It's something that would have to be implemented per resource too. |
That's the idea. It's the initial cleanup that is problematic, once it's done we plan to schedule a weekly nuke run that should keep things tidy. We already do this with our other sandbox accounts, and it works pretty well.
Same, e.g. we have thousands of log groups that are over 5 years old, and contain literally gigabytes of logs. |
@YuriGal this is not hard to implement but very time consuming and beyond this use case I'm not sure it makes sense to do just yet. However, I'd be willing to create a branch, make a hard coded change on the 1 or 2 resources you are having issues with and I can upload a release against the issue, or you can build the docker image yourself with the changes. What do you think? |
If I can get a Darwin ARM binary for a release that would have this feature for Cloudwatch log groups, and Quicksight users - it would be a great help, thanks! |
It would be custom so not an actual release. Just a custom branch, I can build it for you and post a link to download. |
Oh yes, I understand, I didn't mean it would be a general release. And I really appreciate you doing this. |
@YuriGal builds are at the above link, let me know how it goes. Wish you luck. |
Thanks! What is the name of the flag you implemented? It doesn't seem to recognize |
No flag. I just hard coded for 1000 max per run. |
I am getting error
I am running commands under |
I just tested on two different machines. Works ok. |
Weird. How are you supplying AWS credentials? |
Always environment variables. :) i/o timeout indicates to me that the local system is preventing the network connection for whatever reason |
Still no luck. Maybe because it's an unsigned executable? MacOS wouldn't even let me run the file until I found where to enable it, but I can't find the same for denied network connection. |
That's it. I think I have it setup to only sign tagged builds. System Preferences > Security you can hit allow to run, that should fix things. |
That's the thing - I did that, and it allowed me to run this build. But apparently it doesn't allow it to connect. |
Sorry, still unable to run it. I enabled it in security settings, so I can execute it. I added it to firewall allowed list, but still I am getting
when running it. Just to reiterate this does not happen with released nuke version |
Hi All, I believe I may also having this issue, I have approx 2400 resources in my AWS account and AWS Nuke is flagging 1800 resources to be removed. My pipeline running AWS nuke has been running for 75 minutes, does that seem too long for that amount of resources? |
Yuris problem is different likely. What resource types? How far does it get? Send logs and config. |
oh, you probably want to try and use the binaries I built for this, it limits it to 1000 each run and strips a bunch of extra queries out. Just to get you a decent baseline. |
ok thanks, I'm using the docker images are the binaries included on them or will they need to be added separately? |
You'll need to grab them from the GitHub Actions, this is a special build to try and help you all out for the time being while I think through if I can easily implement a limits cli option. |
That message is normal. Do you have S3Object excluded? Open a new issue about this please, provide version and config and run with log level trace and provide that. |
Thanks, yes I do. The new issue is here #453 |
Thanks! Finally managed to run it. Just to confirm - hard-coded 1000 resources limit applies only to Quicksight users and Cloudwatch log groups, or to all resource types in the account? |
Just to the cloudwatch log groups. We can do more hard coding in a special build if needed to help you out I don't mind, but to do it where it's configurable will take a bit of effort that I don't have time for at the moment. |
I think having Cloudwatch groups support should suffice for us for now, they're the main offender. For the rest I'd rather wait when the feature is officially supported in a release. Thanks again! |
We have an account in a desperate need for cleanup, it's been used as playground, have tons of old stale resources. When I run nuke on this account - the nuke just stalls, not outputting anything even a list of would be removed resources. Any idea what's causing it?
The text was updated successfully, but these errors were encountered: