Automated scraper build with Scrapy that scrapes data from the running event Zevenheuvelenloop. The data gets scraped from http://evenementen.uitslagen.nl/.
The scraper process works as follows:
- The allowed domain gets set to evenementen.uitslagen.nl
- The max_id value should be changed to the maximum amount of pages that you want to have scraped.
- An example of the final url can be ~uitslag01233.html"
- The scraper will scrape the set amount of pages and add those to the csv file until done.
After competing in the event myself I wanted to have some answers based on the data that resulted from the competition. This included questions like:
- What was the average finish time of a certain category.
- How much people did a person overtake in his own category.
- How much people did a person overtake given all categories
- Etc etc..
You can use the scraper by using the command
scrapy runspider ZevenHeuvelSpider_spider.py -o zevenheuvel.csv
After completion, you can open the zevenheuvel.csv for inspection.