Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to avoid IP blocking? #37

Open
vedantroy opened this issue Apr 1, 2024 · 2 comments
Open

How to avoid IP blocking? #37

vedantroy opened this issue Apr 1, 2024 · 2 comments

Comments

@vedantroy
Copy link

vedantroy commented Apr 1, 2024

I wrote a downloader using youtube-dlp, but a lot of the IPs get blocked after ~ 10K or so downloads. I'm surprised people are successfully downloading the dataset using the provided downloading script on a single machine, as I would strongly expect YouTube to block after a few gigabytes of data are downloaded.

Are there any proxies / tools / tricks used to download the entire dataset and avoid Youtube blocking?

@tsaishien-chen
Copy link
Contributor

Hi @vedantroy,
Thanks for your interest about this dataset!
Unfortunately, this is a quite common issue. You can check some discussions like this one.
The best solution is: use VPN and get different IPs once you detect your IP is banned.
If you don't have a VPN, you can try to slow down the download speed by reducing processes_count and thread_count in the config file and also set a sleep counter after a few downloading steps.
Hope this information is helpful!

@peiliu0408
Copy link

peiliu0408 commented Jun 5, 2024

Hi @vedantroy, Thanks for your interest about this dataset! Unfortunately, this is a quite common issue. You can check some discussions like this one. The best solution is: use VPN and get different IPs once you detect your IP is banned. If you don't have a VPN, you can try to slow down the download speed by reducing processes_count and thread_count in the config file and also set a sleep counter after a few downloading steps. Hope this information is helpful!

@tsaishien-chen I have been troubled by this IP block issue for quite some time. Is there a template available for implementing a 'sleep counter' after a few download steps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants