This is a scraper for the website, built using Scrapy.
- Install virtualenv
pip install virtualenv
- Create a new virtual workspace
virtualenv my_workspace && cd my_workspace
- Clone the project into your workspace folder:
git clone
and navigate to itcd scraper-seloger
- Install the required packages
pip install -r requirements.txt
Go to the project folder
cd ~/my_workspace/scraper-seloger/simple_seloger
Run the spider with the URL of your search query on
scrapy crawl seloger -a search_url=",1&pxMax=1000000&idtt=2,5&naturebien=1,2,4&ci=910377"
You can use the -o option to specify an output file (JSON or CSV):
scrapy crawl seloger -o annonces.csv -a search_url=",1&pxMax=1000000&idtt=2,5&naturebien=1,2,4&ci=910377"
- Make sure you have MongoDB installed and its deamon running.
- Change MONGO_URI and MONGO_DB in the file of the project.
A typical scenario isMONGO_URI = mongodb://localhost:27017
andMONGO_DB = seloger
- You can use Robo 3T to see your database and manipulate the data.
The repo has all the files you need to deploy to Heroku, I'll clarify the steps below.
- Install Heroku CLI
- Create a new Heroku app
heroku create seloger-demo
- Add the new app as a remote
heroku git:remote -a seloger-demo
- Change the url argument under the [deploy:local] section in the scrapy.cfg file to
url =
- Add
git add .
and commit everythinggit commit -m "first commit"
- Finally push to Heroku and watch your app deploy
git push heroku master
The scrapyd interface is now accessible through
To start a job through scrapyd, run the following from your terminal:
curl -F project=default -F spider=seloger
-F search_url=",1&pxMax=1000000&idtt=2,5&naturebien=1,2,4&ci=910377"
- Add the add-on to your existing app
heroku addons:create mongolab:sandbox --app seloger-demo
- Get the mLab URI
heroku config:get MONGODB_URI --app seloger-demo
- Replace MONGO_URI and MONGO_DB in the file of the project with the values returned in your terminal.
- Example:
MONGO_URI = 'mongodb://heroku_v10nm298:[email protected]:37740/heroku_v10nm298'
MONGO_DB = 'heroku_v10nm298'
The data you scrape will now be saved in an external cloud MongoDB database linked with your Heroku app.