New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Bug report: scrape_filing_docs.R #100

Open

jingyujzhang opened this issue Aug 24, 2021 · 0 comments

Contributor

jingyujzhang commented Aug 24, 2021 •

edited

Loading

It seems that the logic of this script is to do incremental update of edgar.filing_docs.

Def14_a never gets updated, so the code will scrape the same set of files in the while loop endlessly.
SEC has traffic controls. Paralleling with 8 cores will cause the IP to be blocked. I tested and found that 2 cores plus sleeping for 0.5s works (at least on my server). The key is to avoid submitting more than 10 requests per second, otherwise SEC will block the IP for 10 min.

The text was updated successfully, but these errors were encountered:

jingyujzhang added a commit to jingyujzhang/edgar that referenced this issue


          Fixed bugs iangow#100

e91f52d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment