You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The PSL is provided in a format that no common parsers exist for. Every consumer will need to parse the list themselves, either through a library or wrangling with strings themselves. At best, this means each language/ecosystem will need to re-implement a PSL parser; at worst, this means that any code that wants to use the PSL has to implement a parser itself.
However, experience has shown that multiple implementations of parsers for bespoke human-friendly formats means multiple low-quality parsers with random infelicities and bugs. (Cf. HTTP/1.1.) Further, it's O(N) in the number of package ecosystems, giving each a chance to introduce new and exciting bugs. This is especially important as eTLD determination can be security critical.
As a case study, the JavaScript ecosystem doesn't seem to presently have a general-purpose parser (i.e., one that doesn't come with preconceptions about how you're going to use the list), and of the four PSL libraries I could find, they're all limited and sometimes buggy:
I'm not confident in any of their testing regimes.
And, on top, none of them make it convenient to get just the list in a structured format, meaning that, if you're doing anything beyond simple eTLD checking, you need to roll your own (probably buggy) parser.
If the JS ecosystem, with its size and Web focus, doesn't have a robust, general-purpose library for PSL parsing, I don't hold out much hope for smaller ecosystems.
In contrast, if a structured format was provided by publicsuffix.org, no-one would need to implement their own parsers - there would just be one, hopefully high-quality, parser. It goes from O(N) to O(1); less code gets written and there's less chance for random bugs to bite people.
My concrete strawman (if that's not a contradiction in terms) is to provide a JSON file that looks like this:
Every ecosystem has a JSON parser, and this format should be ready to use for any purpose without any further parsing. The less string wrangling that has to be done, the more reliable software using the PSL will be. If the string wrangling can be centralised and done once instead of many times, I think it would be a big win for the PSL's users.
Providing JSON was previously raised as #445 and closed as WONTFIX, but that issue doesn't actually provide a rationale for wanting JSON. I hope that this issue remedies that in a convincing way.
The text was updated successfully, but these errors were encountered:
Conversely, the JS ecosystem may not have a robust set of parsers precisely
because the use cases of JS do not align with the use cases of the PSL that
much.
I don’t believe there’s any desire to maintain multiple versions of the
PSL, so the question is do we require all existing consumers to change, in
order to accommodate new consumers. I don’t think we have any plans to
introduce that churn at this time, but at least that more clearly explains
the cost calculus.
Given that the new format has just as much chance of introducing semantic
bugs (if not more), it also does it doesn’t seem compelling from a security
point of view.
Note that many consumers already post-process the list into a form that is
well suited to their use case. For example, Mozilla generates a hash table,
Chromium and Opera generate a DAFSA, several languages generate tries -
it’s not unreasonable to generate a JSON representation for your JS
consumption as part of a list processing step.
The PSL is provided in a format that no common parsers exist for. Every consumer will need to parse the list themselves, either through a library or wrangling with strings themselves. At best, this means each language/ecosystem will need to re-implement a PSL parser; at worst, this means that any code that wants to use the PSL has to implement a parser itself.
However, experience has shown that multiple implementations of parsers for bespoke human-friendly formats means multiple low-quality parsers with random infelicities and bugs. (Cf. HTTP/1.1.) Further, it's O(N) in the number of package ecosystems, giving each a chance to introduce new and exciting bugs. This is especially important as eTLD determination can be security critical.
As a case study, the JavaScript ecosystem doesn't seem to presently have a general-purpose parser (i.e., one that doesn't come with preconceptions about how you're going to use the list), and of the four PSL libraries I could find, they're all limited and sometimes buggy:
If the JS ecosystem, with its size and Web focus, doesn't have a robust, general-purpose library for PSL parsing, I don't hold out much hope for smaller ecosystems.
In contrast, if a structured format was provided by publicsuffix.org, no-one would need to implement their own parsers - there would just be one, hopefully high-quality, parser. It goes from O(N) to O(1); less code gets written and there's less chance for random bugs to bite people.
My concrete strawman (if that's not a contradiction in terms) is to provide a JSON file that looks like this:
Every ecosystem has a JSON parser, and this format should be ready to use for any purpose without any further parsing. The less string wrangling that has to be done, the more reliable software using the PSL will be. If the string wrangling can be centralised and done once instead of many times, I think it would be a big win for the PSL's users.
Providing JSON was previously raised as #445 and closed as WONTFIX, but that issue doesn't actually provide a rationale for wanting JSON. I hope that this issue remedies that in a convincing way.
The text was updated successfully, but these errors were encountered: