URL = http://164.100.150.3/mrollpdf1/aceng.aspx
Year = Draft Roll for 2017
conda env create -f tools/environment.yml
to install working environment andsource activate erolls
- Or,
pip install -r requirements.txt
if not using a conda environment tools/utils.py
is a helper function for downloading files, and sanity checkspython jharkhand.py
to downloads all the pdfs to directory../data/jharkhand/
and creates 'jharkhand.txt' for files that were not downloaded successfullypython jharkhand_retry.py
for retrying downloads for files in 'jharkhand.txt'python jharkhand_SanityCheck.py
for doing a sanity check on the files downloaded
- Total Number of files = 28710
- The downloaded files are of form MR{assembly constituency number}_MR{assembly constituency number}{part number}.pdf
- Files not reachable: http://164.100.150.3/mrollpdf1/ceopdf/MR002/MR0020305.PDF http://164.100.150.3/mrollpdf1/ceopdf/MR002/MR0020319.PDF http://164.100.150.3/mrollpdf1/ceopdf/MR002/MR0020321.PDF http://164.100.150.3/mrollpdf1/ceopdf/MR002/MR0020334.PDF http://164.100.150.3/mrollpdf1/ceopdf/MR002/MR0020344.PDF http://164.100.150.3/mrollpdf1/ceopdf/MR017/MR0170272.PDF http://164.100.150.3/mrollpdf1/ceopdf/MR020/MR0200088.PDF