SEP	21
Title	Add-ons
Author	Pablo Hoffman
Created	2014-02-14
Status	Draft

SEP-021: Add-ons

This proposal introduces add-ons, a unified way to manage Scrapy extensions, middlewares and pipelines.

Scrapy currently supports many hooks and mechanisms for extending its functionality, but no single entry point for enabling and configuring them. Instead, the hooks are spread over:

Spider middlewares (SPIDER_MIDDLEWARES)
Downloader middlewares (DOWNLOADER_MIDDLEWARES)
Downloader handlers (DOWNLOADER_HANDLERS)
Item pipelines (ITEM_PIPELINES)
Feed exporters and storages (FEED_EXPORTERS, FEED_STORAGES)
Overrideable components (DUPEFILTER_CLASS, STATS_CLASS, SCHEDULER, SPIDER_MANAGER_CLASS, ITEM_PROCESSOR, etc)
Generic extensions (EXTENSIONS)
CLI commands (COMMANDS_MODULE)

One problem of this approach is that enabling an extension often requires modifying many settings, often in a coordinated way, which is complex and error prone. Add-ons are meant to fix this by providing a simple mechanism for enabling extensions.

Design goals and non-goals

Goals:

simple to manage: adding or removing extensions should be just a matter of adding or removing lines in a scrapy.cfg file
backward compatibility with enabling extension the "old way" (i.e. modifying settings directly)

Non-goals:

a way to publish, distribute or discover extensions (use pypi for that)

Managing add-ons

Add-ons are defined in the scrapy.cfg file, inside the [addons] section.

To enable the "httpcache" addon, either shipped with Scrapy or in the Python search path, create an entry for it in your scrapy.cfg, like this:

[addons]
httpcache =

You may also specify the full path to an add-on (which may be either a .py file or a folder containing __init__.py):

[addons]
mongodb_pipeline = /path/to/mongodb_pipeline.py

Writing add-ons

Add-ons are Python modules that implement the following callbacks.

addon_configure

Receives the Settings object and modifies it to enable the required components. If it raises an exception, Scrapy will print it and exit.

Examples:

def addon_configure(settings):
    settings.overrides['DOWNLADER_MIDDLEWARES'].update({
        'scrapy.contrib.downloadermiddleware.httpcache.HttpCacheMiddleware': 900,
    })

def addon_configure(settings):
    try:
        import boto
    except ImportError:
        raise RuntimeError("boto library is required")

crawler_ready

crawler_ready receives a Crawler object after it has been initialized and is meant to be used to perform post-initialization checks like making sure the extension and its dependencies were configured properly. If it raises an exception, Scrapy will print and exit.

Examples:

def crawler_ready(crawler):
    if 'some.other.addon' not in crawler.extensions.enabled:
        raise RuntimeError("Some other addon is required to use this addon")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sep-021.rst

sep-021.rst

SEP-021: Add-ons

Design goals and non-goals

Managing add-ons

Writing add-ons

addon_configure

crawler_ready

Files

sep-021.rst

Latest commit

History

sep-021.rst

File metadata and controls

SEP-021: Add-ons

Design goals and non-goals

Managing add-ons

Writing add-ons

addon_configure

crawler_ready