The crawler will be retrieving information from the following online judges:
- Codeforces
- CodeChef
- URI Online Judge
- Sphere Online Judge
- DMOJ
- A² Online Judge
- AtCoder
- CS Academy
- Timus Online Judge
- Caribbean Online Judge
In order to run your crawler, follow these steps:
- First, make sure you have Python 3.6 and pip installed in your system. Then:
- Go to src folder:
cd src
- Install project requirements:
pip install -r requirements.txt
- Run the crawler:
scrapy runspider crawler/questions.py
This will start a breadth first search based on some heurístic spider module responsible for downloading all pages in the specified domain. You can see them on the fly in src/retrieved/documents
and src/retrieved/objects
folder.
After running the crawler and retrieving documents, you have to manually set up an index to work with. In order to do this:
- Go to src folder:
cd src
- Run the indexer:
python3 indexer/indexer.py
It will search for documents stored at src/retrieved/objects
and create various indexes accordingly. The indexes will be avaiable for latter querys at the src/indexes
folder.