Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

South Africa scraper breaks in production sporadically #76

Open
zstumgoren opened this issue Aug 1, 2020 · 2 comments
Open

South Africa scraper breaks in production sporadically #76

zstumgoren opened this issue Aug 1, 2020 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@zstumgoren
Copy link
Member

South Africa (ZAF) scraper is breaking in production sporadically. Note that I'm unable to duplicate breakage locally.

ubuntu@data-etl:~$ covid-world-scraper --cache-dir /home/ubuntu/data/covid-world-scraper/ --log-file /home/ubuntu/logs/covid-world-scraper.log zaf
covid_world_scraper.country_scraper - START SCRAPE - Zaf
covid_world_scraper.runner -   File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/site-packages/covid_world_scraper/runner.py", line 44, in run
    scraper.run()
  File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/site-packages/covid_world_scraper/country_scraper.py", line 58, in run
    raw_data_path = self.fetch()
  File "/home/ubuntu/.pyenv/versions/3.7.4/lib/python3.7/site-packages/covid_world_scraper/zaf.py", line 29, in fetch
    most_recent_link = data_links[0][1]
@zstumgoren zstumgoren self-assigned this Aug 1, 2020
@zstumgoren zstumgoren added the bug Something isn't working label Aug 1, 2020
@zstumgoren
Copy link
Member Author

The response.status_code is 522 (a connection timeout error from Cloudfare). It seems to only be happening intermittently in production. Apparently this is a server-side issue, and the only apparent "fix" from our end would be to try the scraper again at a later time...

@zstumgoren
Copy link
Member Author

The ideal long-term solution would be to schedule additional retries using Airflow, once it's deployed. Short-term, we could simply try running the scraper for this one country more frequently on cron.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant