Skip to content

Commit

Permalink
converted sep 7 for scrapy#629
Browse files Browse the repository at this point in the history
  • Loading branch information
AphonicChaos committed Mar 7, 2014
1 parent 690081b commit 358ec6e
Show file tree
Hide file tree
Showing 2 changed files with 137 additions and 108 deletions.
137 changes: 137 additions & 0 deletions sep/sep-007.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
======= =============================
SEP 7
Title ItemLoader processors library
Author Ismael Carnales
Created 2009-08-10
Status Draft
======= =============================

======================================
SEP-007: ItemLoader processors library
======================================

This SEP proposes a library of ``ItemLoader`` processor to ship with Scrapy.

date.py
=======

``to_date``
-----------

Converts a date string to a YYYY-MM-DD one suitable for ``DateField``

**Decision**: Obsolete. ``DateField`` doesn't exists anymore.

extraction.py
=============

``extract``
-----------

This adaptor tries to extract data from the given locations. Any
``XPathSelector`` in it will be extracted, and any other data will be added
as-is to the result.

**Decision**: Obsolete. Functionality included in ``XpathLoader``.

``ExtractImageLinks``

This adaptor may receive either XPathSelectors pointing to the desired
locations for finding image urls, or just a list of XPath expressions (which
will be turned into selectors anyway).

**Decision**: XXX

markup.py
=========

``remove_tags``
---------------

Factory that returns an adaptor for removing each tag in the ``tags`` parameter
found in the given value. If no ``tags`` are specified, all of them are
removed.

**Decision**: XXX

``remove_root``
---------------

This adaptor removes the root tag of the given string/unicode, if it's found.

**Decision**: XXX

``replace_escape``
------------------

Factory that returns an adaptor for removing/replacing each escape character in
the ``wich_ones`` parameter found in the given value.

**Decision**: XXX

``unquote``
-----------

This factory returns an adaptor that receives a string or unicode, removes all
of the CDATAs and entities (except the ones in CDATAs, and the ones you specify
in the ``keep`` parameter) and then, returns a new string or unicode.

**Decision**: XXX

misc.py
=======

``to_unicode``
--------------

Receives a string and converts it to unicode using the given encoding (if
specified, else utf-8 is used) and returns a new unicode object. E.g:

::

>> to_unicode('it costs 20\xe2\x82\xac, or 30\xc2\xa3')
[u'it costs 20\u20ac, or 30\xa3']

**Decision**: XXX

``clean_spaces``
----------------

Converts multispaces into single spaces for the given string. E.g:

::

>> clean_spaces(u'Hello sir')
u'Hello sir'

**Decision**: XXX

``drop_empty``
--------------

Removes any index that evaluates to None from the provided iterable. E.g:

::

>> drop_empty([0, 'this', None, 'is', False, 'an example'])
['this', 'is', 'an example']

**Decision**: Obsolete. Functionality included in reducers.

``delist``
----------

This factory returns and adaptor that joins an iterable with the specified
delimiter.

**Decision**: Obsolete. Functionality included in reducers.

``Regex``
----------

This adaptor must receive either a list of strings or an XPathSelector and
return a new list with the matches of the given strings with the given regular
expression (which is passed by a keyword argument, and is mandatory for this
adaptor).

**Decision**: XXX
108 changes: 0 additions & 108 deletions sep/sep-007.trac

This file was deleted.

0 comments on commit 358ec6e

Please sign in to comment.