URL = http://ceoharyana.nic.in/?module=draftroll
Year = 2018
Total number of files = 17,018
The Script does 2 things:
-
Produces haryana.csv that contains metadata about the pdfs. The CSV has the following fields:
district_name, assembly_constituency, polling_station_name, filename
-
Downloads all the pdfs to a directory called
haryana_pdfs/
pip install -r requirements.txt
python haryana.py