This repository has been archived by the owner on Feb 8, 2018. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 113
Setting up a scraping server
Ed Finkler edited this page Apr 6, 2013
·
3 revisions
Super duper quick notes on getting a scrapyd server running with the Open Recipes project.
This was performed on Ubuntu Server 12.10.
-
Add the scrapy apt repo to sources.list
nano /etc/apt/sources.list
add the following to the end of the file:
deb http://archive.scrapy.org/ubuntu quantal main
Save and exit.
-
Add the GPG key for the Scrapy apt repo, and install scrapyd:
curl -s http://archive.scrapy.org/ubuntu/archive.key | sudo apt-key add - aptitude update aptitude install scrapyd-0.16
-
Open port 6800 in firewall if not open.
-
Install
pip
and thebleach
library for Pythonapt-get install python-pip pip install bleach
-
Visit http://SERVERNAME:6800/ to check that it's running.
-
Add/edit the file
~/.scrapy.cfg
. Enter the following:[deploy:openrecipestest] url = http://SERVERNAME:6800/
-
From within openrecipes/scrapy_proj, run
scrapy deploy openrecipestest -p openrecipes
curl http://SERVERNAME:6800/schedule.json -d project=openrecipes -d spider=thepioneerwoman.feed
scrapy list | grep .feed | xargs -I {} -p curl http://SERVERNAME:6800/schedule.json -d project=openrecipes -d spider={}
scrapy list | grep -v .feed | xargs -I {} -p curl http://SERVERNAME:6800/schedule.json -d project=openrecipes -d spider={}