-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No handling for encoded URLs #46
Comments
The archiver extension in CKAN appears to be unintentionally double percent-encoding URLs that are already percent encoded. For instance, a URL path like According to RFC 3986:
This means that your suggestion of always decoding incoming URLs is not in compliance with RFC standards. Instead, the percent character ("%") should be used as an indicator to determine whether decoding needs to be performed. It's also worth considering related discussions in issue #91 for additional context and potential solutions to this problem. |
I have run into this issue here: https://danepubliczne.gov.pl/dataset/informacja-kwartalna-o-stanie-finansow-publicznych/resource/86454cff-556a-4162-aa65-433158c133f4
Basically the provider has linked external resource as:
http://www.mf.gov.pl/documents/764034/1002163/Informacja+kwartalna++III+kwarta%C5%82+2016+r.
. To make it more clear let's assume the filename iskwarta%C5%82+2016
This file is saved to disk as is, meaning
kwarta%C5%82+2016
.It is then served by Apache escaping percents:
kwarta%25C5%2582+2016
while CKAN links archived version as in orginal URLkwarta%C5%82+2016
. That leads to 404 error on the archived link.I think we should decode any incoming urls (below) or erase all encoded chars. What do you think?
The text was updated successfully, but these errors were encountered: