Chinese chars (for example) break lucille #2

gavinc · 2013-10-01T20:06:40Z

/giphy 最高

This will result in:
Traceback (most recent call last): File "lucille.py", line 98, in <module> encoded_t = urllib.quote_plus(t) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 1257, in quote_plus return quote(s, safe) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 1250, in quote return ''.join(map(quoter, s)) KeyError: u'\u8c22'

Running on OS X 10.8

The text was updated successfully, but these errors were encountered:

gavinc · 2013-10-02T20:45:39Z

I did a brutal hack (that there's no way in heck I'm going to suggest you pull) to fix this issue for me, the note is that even tho "t in terms" is getting a urlencoded version made, I believe urllib does nothing to sanitize the 'extended' chars in this example.

My 'solution' was to restrict each "t in terms" to only allow ascii, and dropping (via t.encode('ascii', 'ignore'), t.encode('ascii', 'replace'), t.encode('ascii', 'xmlcharrefreplace') in brute force "i really shouldn't be spending time on this" mode). Had to act upon "t in terms" because of the use of the "no results" processing referring back again to the original terms that weren't encoded.

So apologies to non-english speaking peoples. You probably want to fix this the "right" way :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chinese chars (for example) break lucille #2

Chinese chars (for example) break lucille #2

gavinc commented Oct 1, 2013

gavinc commented Oct 2, 2013

Chinese chars (for example) break lucille #2

Chinese chars (for example) break lucille #2

Comments

gavinc commented Oct 1, 2013

gavinc commented Oct 2, 2013