Skip to content
This repository has been archived by the owner on Jun 2, 2020. It is now read-only.

Chinese chars (for example) break lucille #2

Open
gavinc opened this issue Oct 1, 2013 · 1 comment
Open

Chinese chars (for example) break lucille #2

gavinc opened this issue Oct 1, 2013 · 1 comment

Comments

@gavinc
Copy link

gavinc commented Oct 1, 2013

/giphy 最高

This will result in:
Traceback (most recent call last): File "lucille.py", line 98, in <module> encoded_t = urllib.quote_plus(t) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 1257, in quote_plus return quote(s, safe) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 1250, in quote return ''.join(map(quoter, s)) KeyError: u'\u8c22'

Running on OS X 10.8

@gavinc
Copy link
Author

gavinc commented Oct 2, 2013

I did a brutal hack (that there's no way in heck I'm going to suggest you pull) to fix this issue for me, the note is that even tho "t in terms" is getting a urlencoded version made, I believe urllib does nothing to sanitize the 'extended' chars in this example.

My 'solution' was to restrict each "t in terms" to only allow ascii, and dropping (via t.encode('ascii', 'ignore'), t.encode('ascii', 'replace'), t.encode('ascii', 'xmlcharrefreplace') in brute force "i really shouldn't be spending time on this" mode). Had to act upon "t in terms" because of the use of the "no results" processing referring back again to the original terms that weren't encoded.

So apologies to non-english speaking peoples. You probably want to fix this the "right" way :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant