Eureka is a Rest-API project for Web Scraping and data cleaning, based on FastAPI and following a hexagonal architecture. Designed for the Eureka by Turing project of the National University of Colombia
Disclaimer: this is a work in progress project, stay tuned for updates (*).
You should create a virtual environment and activate it:
python -m venv venv/
source venv/bin/activate
Clone repository
git clone https://github.com/julianVelandia/Eureka.git
And then install the development dependencies:
pip install -r requirements.dev.txt
You can run all the tests with:
make tests
Alternatively, you can run pytest
yourself.
pytest
The project runs like any FastApi application and by default the configuration endpoint works.
uvicorn main:app --reload
- RenderEngine: Render a web page from its url to select the texts to scrape and save them in a Json file
- Templates to visualize the scraped information
- export data in json and csv files
- Make automated requests from a Json configuration file
- Unpack Json configuration files
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
This project is licensed under the terms of the MIT license.