SEP | 15 |
Title | ScrapyManager and SpiderManager API refactoring |
Author | Insophia Team |
Created | 2010-03-10 |
Status | Final |
This SEP proposes a refactoring of ScrapyManager
and SpiderManager
APIs.
get(spider_name)
->Spider
instancefind_by_request(request)
-> list of spider nameslist()
-> list of spider names- remove
fromdomain()
,fromurl()
crawl_request(request, spider=None)
- calls
SpiderManager.find_by_request(request)
if spider isNone
- fails if
len(spiders returned)
!= 1
- calls
crawl_spider(spider)
- calls
spider.start_requests()
- calls
crawl_spider_name(spider_name)
- calls
SpiderManager.get(spider_name)
- calls
spider.start_requests()
- calls
crawl_url(url)
- calls
spider.make_requests_from_url()
- calls
- remove
crawl()
,runonce()
Instead of using runonce()
, commands (such as crawl/parse) would call
crawl_*
and then start()
.
if is_url(arg):
- calls
ScrapyManager.crawl_url(arg)
- calls
else:
- calls
ScrapyManager.crawl_spider_name(arg)
- calls
- should we rename
ScrapyManager.crawl_*
toschedule_*
oradd_*
? SpiderManager.find_by_request
orSpiderManager.search(request=request)
?