NEWS

w3lib release notes
===================

1.20.0 (2019-01-11)
-------------------

- Fix url_query_cleaner to do not append "?" to urls without a query string (issue #109)
- Add support for Python 3.7 and drop Python 3.3 (issue #113)
- Add `w3lib.url.add_or_replace_parameters` helper (issue #117)
- Documentation fixes (issue #115)

1.19.0 (2018-01-25)
-------------------

- Add a workaround for CPython segfault (https://bugs.python.org/issue32583)
  which affect w3lib.encoding functions. This is technically **backwards
  incompatible** because it changes the way non-decodable bytes are replaced
  (in some cases instead of two ``\ufffd`` chars you can get one).
  As a side effect, the fix speeds up decoding in Python 3.4+.
- Add 'encoding' parameter for w3lib.http.basic_auth_header.
- Fix pypy testing setup, add pypy3 to CI.


1.18.0 (2017-08-03)
-------------------

- Include additional assets used for distribution packages in the source tarball
- Consider ``[`` and ``]`` as safe characters in path and query components
  of URLs, i.e. they are not escaped anymore
- Disable codecov project coverage check


1.17.0 (2017-02-08)
-------------------

- Add Python 3.5 and 3.6 support
- Add ``w3lib.url.parse_data_uri`` helper for parsing "data:" URIs
- Add ``w3lib.html.strip_html5_whitespace`` function to strip leading and
  trailing whitespace as per W3C recommendations, e.g. for cleaning
  "href" attribute values
- Fix ``w3lib.http.headers_raw_to_dict`` for multiple headers with same name
- Do not distribute tests/test_*.pyc artifacts


1.16.0 (2016-11-10)
-------------------

- ``canonicalize_url()`` and ``safe_url_string()``:
  strip ":" when no port is specified (as per `RFC 3986`_;
  see also https://github.com/scrapy/scrapy/issues/2377)
- ``url_query_cleaner()``: support new ``keep_fragments`` argument
  (defaulting to ``False``)

.. _RFC 3986: https://tools.ietf.org/html/rfc3986#section-3.2

1.15.0 (2016-07-29)
-------------------

- Add ``canonicalize_url()`` to ``w3lib.url``

1.14.3 (2016-07-14)
-------------------

Bugfix release:

- Handle IDNA encoding failures in ``safe_url_string()`` (issue #62)

1.14.2 (2016-04-11)
-------------------

Bugfix release:

- fix function import for (deprecated) ``urljoin_rfc`` (issue #51)
- only expose wanted functions from ``w3lib.url``, via ``__all__``
  (see issue #54, https://github.com/scrapy/scrapy/issues/1917)

1.14.1 (2016-04-07)
-------------------

Bugfix release:

- For bytes URLs, when supplied encoding (or default UTF8) is wrong,
  ``safe_url_string`` falls back to percent-encoding offending bytes.

1.14.0 (2016-04-06)
-------------------

Changes to safe_url_string:

- proper handling of non-ASCII characters in Python2 and Python3
- support IDNs
- new `path_encoding` to override default UTF-8 when serializing non-ASCII
  characters before percent-encoding

html_body_declared_encoding also detects encoding when not sole attribute
in ``<meta>``.

Package is now properly marked as ``zip_safe``.

1.13.0 (2015-11-05)
-------------------

- remove_tags removes uppercase tags as well;
- ignore meta-redirects inside script or noscript tags by default,
  but add an option to not ignore them;
- replace_entities now handles entities without trailing semicolon;
- fixed uncaught UnicodeDecodeError when decoding entities.

1.12.0 (2015-06-29)
-------------------

- meta_refresh regex now handles leading newlines and whitespaces in the url;
- include tests folder in source distribution.

1.11.0 (2015-01-13)
-------------------

- url_query_cleaner now supports str or list parameters;
- add support for resolving base URLs in <base> tags with attributes
  before href.

1.10.0 (2014-08-20)
-------------------

- reverted all 1.9.0 changes.

1.9.0 (2014-08-16)
------------------

- all url-related functions accept bytes and unicode and now return bytes.

1.8.1 (2014-08-14)
------------------

- w3lib.http.basic_auth_header now returns bytes

1.8.0 (2014-07-31)
------------------

- add support for big5-hkscs encoding.

1.7.1 (2014-07-26)
------------------

- PY3 fixed headers_raw_to_dict and headers_dict_to_raw;
- documentation improvements;
- provide wheels.

1.6 (2014-06-03)
----------------

- w3lib.form.encode_multipart is deprecated;
- docstrings and docs are improved;
- w3lib.url.add_or_replace_parameter is re-implemented on top of
  stdlib functions;
- remove_entities is renamed to replace_entities.

1.5 (2013-11-09)
----------------

- Python 2.6 support is dropped.

1.4 (2013-10-18)
----------------

- Python 3 support;
- get_meta_refresh encoding handling is fixed;
- check for '?' in add_or_replace_parameter;
- ISO-8859-1 is used for HTTP Basic Auth;
- fixed unicode handling in replace_escape_chars;

1.3 (2012-05-13)
----------------

- support non-standard gb_2312_80 encoding;
- drop Python 2.5 support.

1.2 (2012-05-02)
----------------

- Detect encoding for content attr before http-equiv in meta tag.

1.1 (2012-04-18)
----------------

- w3lib.html.remove_comments handles multiline comments;
- Added w3lib.encoding module, containing functions for working with character
  encoding, like encoding autodetection from HTML pages.
- w3lib.url.urljoin_rfc is deprecated.

1.0 (2011-04-17)
----------------

First release of w3lib.