-
-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regarding feat: implement unthrottled concurrency using task queue #141
Comments
I’m not? This is an open source tool to find archived URLs for a given domain… |
Yes, and because it isn't throttled, use of this package harms the target, which is me. |
Any progress? I was hoping for rate limiting, honoring 503 and 429 status codes, and exponential backoff. And not just "unthrottled concurrency". |
It’s open source, so PR's are welcome. It is going to be a busy month with some life changes for me – I will put this in my TODO's. Unfortunately will likely not get done until late June or early July |
Accidentally closed when commenting |
Thanks for adding to your TODO list, I appreciate it! Here's an example of making a single query in Athena that's much more efficient than gau: https://positive.security/blog/ransack-data-exfiltration#common-crawl |
Thanks for the reference & sorry about the slowness to implement. Getting hitched! |
Congratulations! |
Can you stop attacking the Common Crawl CDX API?
The text was updated successfully, but these errors were encountered: