-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BIG-3884 - Use PSL support from url-cpp. #45
Conversation
I prefer option 1 where we have a (replaceable) Singleton PSL in the Url package. Otherwise I have to think too often about the dependence of parse results on the particular PSL being used. I'd prefer that hidden somewhere under the covers. |
install_requires = [ | ||
'publicsuffix' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to keep an empty install_requires list, or lose the while thing until we need it again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to keep it.
We are just naming the PSL by date. Do we need to be more specific about its source or particular version on that date (does this thing change more than daily)? |
011c63a
to
eaced36
Compare
To add some substantiation to the claim I made about speed: import timeit
setup = '''
import url as URL
urls = map(URL.parse, [
'http://moz.com',
'http://amazon.co.uk',
'http://a.b.domain.biz',
'http://www.test.ac.jp',
'http://xn--85x722f.xn--55qx5d.cn'
])'''
stmt = '''
for url in urls:
url.pld
'''
timeit.repeat(stmt, setup, number=10000) The results:
|
I too vote for Option 1, but I'd rather you pass a PSL object rather than a path to a string so it let's you mock out the PSL lookup more easily. (eg. |
('http://foo.გე' , 'foo.გე'), | ||
('http://bar.foo.გე' , 'foo.გე'), | ||
('http://foo.xn--node' , 'foo.xn--node'), | ||
('http://bar.foo.xn--node', 'foo.xn--node') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like any of your tests check for a non-trivial TLD like .co.uk
.
b201778
to
4d36209
Compare
I ended up adding support for option 1, though not as Brandon suggested. Currently it accepts a string, and creates the corresponding |
Can we make sure to create a ticket or issue, so that idea doesn't get lost? |
|
||
# Grab it from the PSL site | ||
import requests | ||
url.set_psl(requests.get('https://publicsuffix.org/list/public_suffix_list.dat')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the return value of requests.get
string-y enough to work with PSL.fromString
, or should this use requests.get(...).text
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops! I meant to tack content
on there.
47b7c8c
to
aeede32
Compare
Sorry, I may have merged this a little prematurely. Let me know if there's any additional feedback, and I'll happily apply it retroactively. |
This adds support for:
punycode
andunpunycode
formsRelates to #39 and #29
I'd like to figure out a nice way to allow clients to provide their own lists. Any opinions on which is the preferred interface?
@b4hand @lindseyreno @neilmb