forked from scrapy/w3lib
-
Notifications
You must be signed in to change notification settings - Fork 0
/
NEWS
194 lines (134 loc) · 5.14 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
w3lib release notes
===================
1.20.0 (2019-01-11)
-------------------
- Fix url_query_cleaner to do not append "?" to urls without a query string (issue #109)
- Add support for Python 3.7 and drop Python 3.3 (issue #113)
- Add `w3lib.url.add_or_replace_parameters` helper (issue #117)
- Documentation fixes (issue #115)
1.19.0 (2018-01-25)
-------------------
- Add a workaround for CPython segfault (https://bugs.python.org/issue32583)
which affect w3lib.encoding functions. This is technically **backwards
incompatible** because it changes the way non-decodable bytes are replaced
(in some cases instead of two ``\ufffd`` chars you can get one).
As a side effect, the fix speeds up decoding in Python 3.4+.
- Add 'encoding' parameter for w3lib.http.basic_auth_header.
- Fix pypy testing setup, add pypy3 to CI.
1.18.0 (2017-08-03)
-------------------
- Include additional assets used for distribution packages in the source tarball
- Consider ``[`` and ``]`` as safe characters in path and query components
of URLs, i.e. they are not escaped anymore
- Disable codecov project coverage check
1.17.0 (2017-02-08)
-------------------
- Add Python 3.5 and 3.6 support
- Add ``w3lib.url.parse_data_uri`` helper for parsing "data:" URIs
- Add ``w3lib.html.strip_html5_whitespace`` function to strip leading and
trailing whitespace as per W3C recommendations, e.g. for cleaning
"href" attribute values
- Fix ``w3lib.http.headers_raw_to_dict`` for multiple headers with same name
- Do not distribute tests/test_*.pyc artifacts
1.16.0 (2016-11-10)
-------------------
- ``canonicalize_url()`` and ``safe_url_string()``:
strip ":" when no port is specified (as per `RFC 3986`_;
see also https://github.com/scrapy/scrapy/issues/2377)
- ``url_query_cleaner()``: support new ``keep_fragments`` argument
(defaulting to ``False``)
.. _RFC 3986: https://tools.ietf.org/html/rfc3986#section-3.2
1.15.0 (2016-07-29)
-------------------
- Add ``canonicalize_url()`` to ``w3lib.url``
1.14.3 (2016-07-14)
-------------------
Bugfix release:
- Handle IDNA encoding failures in ``safe_url_string()`` (issue #62)
1.14.2 (2016-04-11)
-------------------
Bugfix release:
- fix function import for (deprecated) ``urljoin_rfc`` (issue #51)
- only expose wanted functions from ``w3lib.url``, via ``__all__``
(see issue #54, https://github.com/scrapy/scrapy/issues/1917)
1.14.1 (2016-04-07)
-------------------
Bugfix release:
- For bytes URLs, when supplied encoding (or default UTF8) is wrong,
``safe_url_string`` falls back to percent-encoding offending bytes.
1.14.0 (2016-04-06)
-------------------
Changes to safe_url_string:
- proper handling of non-ASCII characters in Python2 and Python3
- support IDNs
- new `path_encoding` to override default UTF-8 when serializing non-ASCII
characters before percent-encoding
html_body_declared_encoding also detects encoding when not sole attribute
in ``<meta>``.
Package is now properly marked as ``zip_safe``.
1.13.0 (2015-11-05)
-------------------
- remove_tags removes uppercase tags as well;
- ignore meta-redirects inside script or noscript tags by default,
but add an option to not ignore them;
- replace_entities now handles entities without trailing semicolon;
- fixed uncaught UnicodeDecodeError when decoding entities.
1.12.0 (2015-06-29)
-------------------
- meta_refresh regex now handles leading newlines and whitespaces in the url;
- include tests folder in source distribution.
1.11.0 (2015-01-13)
-------------------
- url_query_cleaner now supports str or list parameters;
- add support for resolving base URLs in <base> tags with attributes
before href.
1.10.0 (2014-08-20)
-------------------
- reverted all 1.9.0 changes.
1.9.0 (2014-08-16)
------------------
- all url-related functions accept bytes and unicode and now return bytes.
1.8.1 (2014-08-14)
------------------
- w3lib.http.basic_auth_header now returns bytes
1.8.0 (2014-07-31)
------------------
- add support for big5-hkscs encoding.
1.7.1 (2014-07-26)
------------------
- PY3 fixed headers_raw_to_dict and headers_dict_to_raw;
- documentation improvements;
- provide wheels.
1.6 (2014-06-03)
----------------
- w3lib.form.encode_multipart is deprecated;
- docstrings and docs are improved;
- w3lib.url.add_or_replace_parameter is re-implemented on top of
stdlib functions;
- remove_entities is renamed to replace_entities.
1.5 (2013-11-09)
----------------
- Python 2.6 support is dropped.
1.4 (2013-10-18)
----------------
- Python 3 support;
- get_meta_refresh encoding handling is fixed;
- check for '?' in add_or_replace_parameter;
- ISO-8859-1 is used for HTTP Basic Auth;
- fixed unicode handling in replace_escape_chars;
1.3 (2012-05-13)
----------------
- support non-standard gb_2312_80 encoding;
- drop Python 2.5 support.
1.2 (2012-05-02)
----------------
- Detect encoding for content attr before http-equiv in meta tag.
1.1 (2012-04-18)
----------------
- w3lib.html.remove_comments handles multiline comments;
- Added w3lib.encoding module, containing functions for working with character
encoding, like encoding autodetection from HTML pages.
- w3lib.url.urljoin_rfc is deprecated.
1.0 (2011-04-17)
----------------
First release of w3lib.