Nuke stalls #426

YuriGal · 2024-11-21T16:29:29Z

We have an account in a desperate need for cleanup, it's been used as playground, have tons of old stale resources. When I run nuke on this account - the nuke just stalls, not outputting anything even a list of would be removed resources. Any idea what's causing it?

YuriGal · 2024-11-21T16:33:55Z

My theory it hits a resource type with huge number of resources, and just gets stuck on it. If so - can I bypass it somehow (I know I can add it to the exclusion list, but I will have to guess which resource type it is every time it happens). And also - how do I actually clean up these resources?

YuriGal · 2024-11-21T16:44:57Z

Like, maybe have an option of maximum number of resources to retrieve? This way nuke can be done in several consecutive runs.

ekristen · 2024-11-21T17:55:02Z

Is S3Object excluded in your configuration?

YuriGal · 2024-11-21T17:57:44Z

Yes, matter of fact I tried to target exclusively CloudWatchLogsLogGroup, but this account has 34K of those.

ekristen · 2024-11-21T18:37:28Z

Well this gives us a place to start ...

Unfortunately 34k is a LOT and there are rate limits involved with those APIs.

We can only query 10 queries per second, 50 at a time at a time, then 15 streams queries per-second.

https://github.com/ekristen/aws-nuke/blob/main/resources/cloudwatchlogs-loggroups.go#L44-L49

34k at 50 per query is 680 queries to describe log groups. Because we also query LogStream to get some additional metadata, there's 1 query per log group 15 per second max, so that's 34k at 15/second - that's 37 minutes to discover.

I can add some debugging logging to a special build for testing via github actions. It's likely just taking forever to query everything.

It's possible I could add a setting to bypass querying the log stream, that'd cut out 37 minutes worth of time.

YuriGal · 2024-11-21T18:43:21Z

It's not just log groups, there are other resource types with huge number of resources. Would it be possible to add a generic option to limit number of resources nuke enumerates per resource type, something like --max-resources 5000 so it won't have to enumerate everything?

ekristen · 2024-11-21T18:54:44Z

Interesting idea. What do you think would be more useful, per resource or global.

--max-per-resource-type=50 would limit to top 50 of each resource type.

--max-resources=5000 would limit to the first 5000 discovered (this might be harder to implement)

YuriGal · 2024-11-21T18:58:23Z

I think per resource type is a better option. Nuke has no problems going over large number of resource types, it's when a particular resource type has huge number of resources - it gets stuck. Limiting number of resources retrieved per resource type should solve this issue.

ekristen · 2024-11-21T19:00:23Z

My only wonder if it should be per-resource type instead of globally for all resource-types. Like --max-per-resource-type resourcetype=blah

YuriGal · 2024-11-21T19:06:11Z

Having option per specific resource type is good if you know the resource type that causing troubles - then you can target that type specifically. But sometimes you don't know in advance which type it's going to be. If it's possible to have it both ways (if type is specified - apply limit only to that type, otherwise to all types) it would be the best of both worlds.

mdgm88 · 2024-11-26T03:37:35Z

You would need to run aws-nuke a lot more times to clean everything up with this limit in place, but at least it wouldn't get stuck.

If aws-nuke is run regularly against an account it should work well most of the time.

Cost Explorer can help identify where the spend is in an account but that doesn't necessarily correspond to what you have a lot of. You can have a huge amount of something that costs a negligible amount, and not much of something that costs a lot.

ekristen · 2024-11-26T04:14:53Z

It definitely feels like an advanced feature. It's something that would have to be implemented per resource too.

YuriGal · 2024-11-26T14:22:01Z

If aws-nuke is run regularly against an account it should work well most of the time.

That's the idea. It's the initial cleanup that is problematic, once it's done we plan to schedule a weekly nuke run that should keep things tidy. We already do this with our other sandbox accounts, and it works pretty well.

Cost Explorer can help identify where the spend is in an account but that doesn't necessarily correspond to what you have a lot of. You can have a huge amount of something that costs a negligible amount, and not much of something that costs a lot.

Same, e.g. we have thousands of log groups that are over 5 years old, and contain literally gigabytes of logs.

ekristen · 2024-11-26T14:56:57Z

@YuriGal this is not hard to implement but very time consuming and beyond this use case I'm not sure it makes sense to do just yet. However, I'd be willing to create a branch, make a hard coded change on the 1 or 2 resources you are having issues with and I can upload a release against the issue, or you can build the docker image yourself with the changes. What do you think?

YuriGal · 2024-11-26T15:17:12Z

If I can get a Darwin ARM binary for a release that would have this feature for Cloudwatch log groups, and Quicksight users - it would be a great help, thanks!

ekristen · 2024-11-26T15:18:35Z

It would be custom so not an actual release. Just a custom branch, I can build it for you and post a link to download.

YuriGal · 2024-11-26T15:30:49Z

Oh yes, I understand, I didn't mean it would be a general release. And I really appreciate you doing this.

ekristen · 2024-12-03T00:58:28Z

https://github.com/ekristen/aws-nuke/actions/runs/12130672455/artifacts/2264757595

ekristen · 2024-12-04T16:31:18Z

@YuriGal builds are at the above link, let me know how it goes. Wish you luck.

YuriGal · 2024-12-04T17:57:53Z

Thanks! What is the name of the flag you implemented? It doesn't seem to recognize --max-per-resource-type

ekristen · 2024-12-04T17:58:44Z

No flag. I just hard coded for 1000 max per run.

YuriGal · 2024-12-04T18:10:45Z

I am getting error

FATA[0120] failed get caller identity: RequestError: send request failed
caused by: Post "https://sts.amazonaws.com/": dial tcp: lookup sts.amazonaws.com: i/o timeout

I am running commands under aws-vault, and released nuke doesn't have this issue.

ekristen · 2024-12-04T18:32:20Z

I just tested on two different machines. Works ok.

YuriGal · 2024-12-04T18:35:29Z

Weird. How are you supplying AWS credentials?

ekristen · 2024-12-04T18:37:21Z

Always environment variables. :)

i/o timeout indicates to me that the local system is preventing the network connection for whatever reason

YuriGal · 2024-12-04T19:08:59Z

Still no luck. Maybe because it's an unsigned executable? MacOS wouldn't even let me run the file until I found where to enable it, but I can't find the same for denied network connection.

ekristen · 2024-12-04T19:15:13Z

That's it. I think I have it setup to only sign tagged builds. System Preferences > Security you can hit allow to run, that should fix things.

YuriGal · 2024-12-04T19:20:50Z

That's the thing - I did that, and it allowed me to run this build. But apparently it doesn't allow it to connect.

YuriGal · 2024-12-05T17:16:27Z

Sorry, still unable to run it. I enabled it in security settings, so I can execute it. I added it to firewall allowed list, but still I am getting

FATA[0120] failed get caller identity: RequestError: send request failed
caused by: Post "https://sts.amazonaws.com/": dial tcp: lookup sts.amazonaws.com: i/o timeout

when running it. Just to reiterate this does not happen with released nuke version

ekristen · 2024-12-05T23:41:04Z

@YuriGal fixed the signing -- https://github.com/ekristen/aws-nuke/actions/runs/12189713328/artifacts/2282303070

adamLShine · 2024-12-09T06:01:38Z

Hi All,

I believe I may also having this issue, I have approx 2400 resources in my AWS account and AWS Nuke is flagging 1800 resources to be removed. My pipeline running AWS nuke has been running for 75 minutes, does that seem too long for that amount of resources?

ekristen · 2024-12-09T06:23:48Z

Yuris problem is different likely.

What resource types? How far does it get? Send logs and config.

adamLShine · 2024-12-09T22:13:02Z

Here is a copy of my config file, I have 6000 log groups and when I run aws nuke within my pipeline it just stalls. I have had it running for 30+ minutes (see screenshot)

`
regions:

global
us-east-1
ap-southeast-2

bypass-alias-check-accounts:

"111111111111"

accounts:
111111111111: # Sandpit

resource-types:
includes:
- CloudWatchLogsLogGroup
`

ekristen · 2024-12-09T22:34:21Z

oh, you probably want to try and use the binaries I built for this, it limits it to 1000 each run and strips a bunch of extra queries out. Just to get you a decent baseline.

adamLShine · 2024-12-09T22:38:19Z

ok thanks, I'm using the docker images are the binaries included on them or will they need to be added separately?

ekristen · 2024-12-09T22:40:19Z

You'll need to grab them from the GitHub Actions, this is a special build to try and help you all out for the time being while I think through if I can easily implement a limits cli option.

adamLShine · 2024-12-09T23:56:50Z

Thanks! That definitely helped and it's no longer stalling for my Cloudwatch log groups

However if I remove the 'resources types' from my config file and run again aws nuke stalls again. From my testing is seems like aws nuke has a difficult time running over a large aws account without stalling.

ekristen · 2024-12-09T23:59:02Z

That message is normal. Do you have S3Object excluded?

Open a new issue about this please, provide version and config and run with log level trace and provide that.

adamLShine · 2024-12-10T00:37:31Z

Thanks, yes I do. The new issue is here #453

YuriGal · 2024-12-11T18:50:08Z

@YuriGal fixed the signing -- https://github.com/ekristen/aws-nuke/actions/runs/12189713328/artifacts/2282303070

Thanks! Finally managed to run it. Just to confirm - hard-coded 1000 resources limit applies only to Quicksight users and Cloudwatch log groups, or to all resource types in the account?

ekristen · 2024-12-11T18:59:39Z

Just to the cloudwatch log groups. We can do more hard coding in a special build if needed to help you out I don't mind, but to do it where it's configurable will take a bit of effort that I don't have time for at the moment.

YuriGal · 2024-12-11T19:15:19Z

I think having Cloudwatch groups support should suffice for us for now, they're the main offender. For the rest I'd rather wait when the feature is officially supported in a release. Thanks again!

ekristen closed this as completed Dec 3, 2024

ekristen reopened this Dec 3, 2024

adamLShine mentioned this issue Dec 10, 2024

AWS Nuke Stalls wirh 2000+ resources #453

Closed

ekristen mentioned this issue Dec 13, 2024

feat: consider max items per resource-type limiter #460

Open

ekristen closed this as completed Dec 13, 2024

Nuke stalls #426

Nuke stalls #426

Comments

YuriGal commented Nov 21, 2024

YuriGal commented Nov 21, 2024

YuriGal commented Nov 21, 2024

ekristen commented Nov 21, 2024

YuriGal commented Nov 21, 2024

ekristen commented Nov 21, 2024

YuriGal commented Nov 21, 2024

ekristen commented Nov 21, 2024

YuriGal commented Nov 21, 2024

ekristen commented Nov 21, 2024

YuriGal commented Nov 21, 2024

mdgm88 commented Nov 26, 2024

ekristen commented Nov 26, 2024

YuriGal commented Nov 26, 2024

ekristen commented Nov 26, 2024

YuriGal commented Nov 26, 2024

ekristen commented Nov 26, 2024

YuriGal commented Nov 26, 2024

ekristen commented Dec 3, 2024

ekristen commented Dec 4, 2024

YuriGal commented Dec 4, 2024

ekristen commented Dec 4, 2024

YuriGal commented Dec 4, 2024

ekristen commented Dec 4, 2024

YuriGal commented Dec 4, 2024

ekristen commented Dec 4, 2024

YuriGal commented Dec 4, 2024

ekristen commented Dec 4, 2024

YuriGal commented Dec 4, 2024

YuriGal commented Dec 5, 2024

ekristen commented Dec 5, 2024

adamLShine commented Dec 9, 2024

ekristen commented Dec 9, 2024

adamLShine commented Dec 9, 2024

ekristen commented Dec 9, 2024

adamLShine commented Dec 9, 2024

ekristen commented Dec 9, 2024

adamLShine commented Dec 9, 2024

ekristen commented Dec 9, 2024

adamLShine commented Dec 10, 2024

YuriGal commented Dec 11, 2024

ekristen commented Dec 11, 2024

YuriGal commented Dec 11, 2024