Skip to content

Latest commit

 

History

History
57 lines (44 loc) · 1.63 KB

sep-015.rst

File metadata and controls

57 lines (44 loc) · 1.63 KB
SEP 15
Title ScrapyManager and SpiderManager API refactoring
Author Insophia Team
Created 2010-03-10
Status Final

SEP-015: ScrapyManager and SpiderManager API refactoring

This SEP proposes a refactoring of ScrapyManager and SpiderManager APIs.

SpiderManager

  • get(spider_name) -> Spider instance
  • find_by_request(request) -> list of spider names
  • list() -> list of spider names
  • remove fromdomain(), fromurl()

ScrapyManager

  • crawl_request(request, spider=None)
    • calls SpiderManager.find_by_request(request) if spider is None
    • fails if len(spiders returned) != 1
  • crawl_spider(spider)
    • calls spider.start_requests()
  • crawl_spider_name(spider_name)
    • calls SpiderManager.get(spider_name)
    • calls spider.start_requests()
  • crawl_url(url)
    • calls spider.make_requests_from_url()
  • remove crawl(), runonce()

Instead of using runonce(), commands (such as crawl/parse) would call crawl_* and then start().

Changes to Commands

  • if is_url(arg):
    • calls ScrapyManager.crawl_url(arg)
  • else:
    • calls ScrapyManager.crawl_spider_name(arg)

Pending issues

  • should we rename ScrapyManager.crawl_* to schedule_* or add_* ?
  • SpiderManager.find_by_request or SpiderManager.search(request=request) ?