Setting up a scraping server

Setting up a scrapyd server

Super duper quick notes on getting a scrapyd server running with the Open Recipes project.

This was performed on Ubuntu Server 12.10.

Add the GPG key for the Scrapy apt repo, and install scrapyd:

 curl -s http://archive.scrapy.org/ubuntu/archive.key | sudo apt-key add -
 aptitude update
 aptitude install scrapyd-0.16

Install pip and the bleach library for Python

 apt-get install python-pip
 pip install bleach

Add/edit the file ~/.scrapy.cfg. Enter the following:

[deploy:openrecipestest]
url = http://SERVERNAME:6800/

From within openrecipes/scrapy_proj, run

 scrapy deploy openrecipestest -p openrecipes

curl http://SERVERNAME:6800/schedule.json -d project=openrecipes -d spider=thepioneerwoman.feed

scrapy list | grep .feed | xargs -I {} -p curl http://SERVERNAME:6800/schedule.json -d project=openrecipes -d spider={}

scrapy list | grep -v .feed | xargs -I {} -p curl http://SERVERNAME:6800/schedule.json -d project=openrecipes -d spider={}