Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Addressable gem to handle URIs #174

Merged
merged 10 commits into from
Nov 25, 2014
Merged

Use Addressable gem to handle URIs #174

merged 10 commits into from
Nov 25, 2014

Conversation

RST-J
Copy link
Contributor

@RST-J RST-J commented Nov 2, 2014

This would cover #173
It adds support for Asian URLs.
When I changed the validation for the URI format to Addressable, I thought that the value should not be escaped (it was introduced to establish compatibility with Asian URIs by eca92af). The value either is parsable as URI as is or not and in the latter case no valid URI.
What do you think?

@@ -3,6 +3,7 @@ source "https://rubygems.org"
gemspec

gem "json", :platforms => :mri_18
gem "addressable", '~> 2.3'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add this to the gemspec instead, so that the dependency is picked up when users gem install json-schema.

@pd
Copy link
Contributor

pd commented Nov 2, 2014

Confirmed that this fixes #100 (or, at least, I can run all tests from a dir json schema).

I'm not sure if the chomp code introduced in #109 still needs to be around after this. Maybe! As @iainbeeston mentioned, that might've been entirely an artifact of the URI::File class we had.

For escaping URIs, addressable happily parses them with spaces, but open-uri still can't deal with it unless escaped:

require 'addressable/uri'
require 'open-uri'

unescaped = Addressable::URI.parse('http://foo.com/spaces in/the path#/not/fragment')
escaped = Addressable::URI.parse(Addressable::URI.escape('http://foo.com/spaces in/the path#/not/fragment'))

open(unescaped) #=> URI::InvalidURIError
open(escaped) #=> OpenURI::HTTPError: 404 Not Found

@RST-J
Copy link
Contributor Author

RST-J commented Nov 2, 2014

For the { "$ref": "/etc/passwd" } thing, I just tested that and it in deed does as you guessed.
And I do not see a reason to allow absolute file-URIs (also not file:///etc/passwd) to be referenced. If at all this should be something a user can activate via a security option after being informed about the dragons.

For the chomp thing I'll check and remove if its superfluous now.

@RST-J
Copy link
Contributor Author

RST-J commented Nov 2, 2014

Loading local files

In the last commit I tried to forbid using local file system references for security reasons. As @pd already said, this would break backwards compatibility (a lot of the reference tests actually rely on loading files from the disk).

I'm not sure how to interpret section 5.27. id from draft-03 in the context of a root schema.
What does

If this schema is not contained in
any parent schema, the current URI of the parent schema is held to be
the URI under which this schema was addressed. If id is missing, the
current URI of a schema is defined to be that of the parent schema.

mean for e.g. this schema, what is the id/URI for the following schema:

    {
      "$schema" => "http://json-schema.org/draft-03/schema#",
      "type" => "object",
      "properties" => {
        "b" => {
          "required" => true
        }
      }
    }

Currently, the best I can think of to not lose backwards compatibility, would be an option that allows loading relative references from disk. This option unfortunately needs to default to true then.
Forbidding absolute paths is not the solution I guess.
A solution could be that if the user wants to allow loading schemas from file then he can specify one or more directories from where (including sub folders) schemas may be loaded.
As this implies again more changes I'd suggest to move usage Addressable to v3.

@pd
Copy link
Contributor

pd commented Nov 3, 2014

I don't think this means we need to hold off on addressable until a 3.x release; it definitely solves problems today, means we could drop the URI::File code, etc. We just shouldn't break filesystem lookups (yet!). The issue with absolute paths already exists on master: JSON::Validator.validate({"$ref" => "/etc/passwd"}, {}).

If you force pushed this branch without 463f0e3, I think it's pretty much good to go.

A few days back I left a comment in #152 with my ideas for how to deal with preventing arbitrary filesystem lookups, accessing remote URIs without meaning to, etc etc. I haven't had a chance to spike a version of that yet, but I think we can get it into 2.x, then swap to the more restrictive default in 3.x.

@@ -7,8 +7,8 @@ def self.validate(current_schema, data, fragments, processor, validator, options
if data.is_a?(String)
error_message = "The property '#{build_fragment(fragments)}' must be a valid URI"
begin
URI.parse(URI.escape(data))
rescue URI::InvalidURIError
Addressable::URI.parse(Addressable::URI.escape(data))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you already explained this in the PR description (I didn't quite follow everything) but do we still need to escape the uri here now we're using addressable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the addressable readme, I suspect we need to parse urls from strings, and then normalise them (probably immediately before calling open on them)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first comment: We "need" the escape call here, because only 'parse' won't raise the expected expception for line 278 in draft4-tests. I'd also suspect it to not be actually necessary.

Addressable::URI.parse(data)

has the following output

(byebug) Addressable::URI.parse(data)
Addressable::URI::InvalidURIError Exception: Cannot assemble URI string with ambiguous path: '::boo'
(byebug) Addressable::URI.parse(data).class
Addressable::URI
(byebug) Addressable::URI.parse(data).to_s
Addressable::URI::InvalidURIError Exception: Cannot assemble URI string with ambiguous path: '::boo'
nil

You see that there is an exception somewhere, at least the text, but its not thrown unless e.g. to_s is called. I should probably file an issue on this as I'd expect an exception to be thrown by parse right away.

I'll go for to_s for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the second: It depends I guess. File uris are a problem in general so far. open does not like the file:// scheme nor escaped paths. I agree for the parts that are actual URIs but we don't now that before e.g. nested refs have been resolved against their parent scheme's URI.
So I wouldn't do much about this until we decided how to handle files.

@iainbeeston
Copy link
Contributor

As @pd said I'd love to see if the chomp from #109 could be removed in this branch.

I also suspect that we should be using normalise (rather than unescape) with addressable... (Might need more investigation)

uri = URI.parse("file://#{Dir.pwd}/#{data}")
end
uri = Addressable::URI.parse(data)
# Check for absolute path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressable has a convert_path method that converts relative paths to file URIs :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't help us much, because open can't handle them. However, when we have a custom loader, I think its good when the scheme is properly set.
I wonder why there is no FileURI neither in the URI nor in Addressable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh what? That was half of my reasoning for using addressable. My bad. Then perhaps the URI::File class should be kept after all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I side-stepped this in #175 by avoiding the open(foo).read pattern for URIs with nil or file schemes. We shouldn't be using that on files anyhow, since it will leave dangling file descriptors open. eg:

$ pry
[1] pry(main)> $PROCESS_ID
=> 55886

[2] pry(main)> open('/etc/passwd').read
=> [lol snip]

# From another terminal, or after ctrl-z, whatever:
$ lsof -p 55886
ruby    55886 pd    8r   REG                1,3      5253   515836 /private/etc/passwd

Definitely absurd that open(Addressable::URI.convert_path('/etc/passwd')) fails yet open(Addressable::URI.parse('/etc/passwd')) works fine, though =)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also think we should keep URI::File.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What worries me, is that if I google other implementations of URI::File, I can't find any. In fact, only this gem comes up, which makes me wonder why nobody else is doing this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a feature reques from a year ago and one comment mentions the URI::File solution.

I also wonder why this topic seems to have few/no attention, but I think URI::File is a better solution than writing a custom wrapper around open (i.e. its the better wrapper).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree; URI::File is the wrong direction to go in. We only use it so that we can keep using open-uri, but we shouldn't be using open-uri to open local files -- there are many more obvious means of doing that (eg, File.read, Pathname#read, etc etc).

We can assign schema URIs as file:///tmp/ohai.json, but when we read that file, it should be using File.read(uri.path) or something similar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So going back to my original question (if we're not using file uris) should we use convert path here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should merge this; my Loader PR has additional changes to more aggressively detect files vs remote URIs, so I think it can be taken care of there.

@@ -0,0 +1,3 @@
{
"name" : "john"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No tests actually hit this file yet, unless I'm overlooking something.

@RST-J
Copy link
Contributor Author

RST-J commented Nov 7, 2014

I updated the PR to work along with #177
Just wondering whether we still can open files with spaces in names and whether we did not reintroduce the problem with hash tags in file names on Windows.

@iainbeeston
Copy link
Contributor

If we can simplify the code, that would be great. Maybe try removing any workarounds and then see if anyone can test this branch on windows?=

@pd
Copy link
Contributor

pd commented Nov 8, 2014

I can test this on a windows box some time this weekend.

@pd
Copy link
Contributor

pd commented Nov 14, 2014

Aforementioned windows box apparently runs ubuntu now so uhhh, scratch that plan. I have no further access to a windows machine to try this out on. =\

@RST-J
Copy link
Contributor Author

RST-J commented Nov 22, 2014

Any more concerns here?

rescue URI::InvalidURIError
error_message = "The property '#{build_fragment(fragments)}' must be a valid URI"
# TODO
# Addressable only throws an exception on to_s for invalid URI strings, although it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to fix this todo before merging? If not could the comment link to a github issue in addressable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any upstream reports of this. I'm not sure exactly what to report if I were to open an issue. It's partly an artifact of our test data:

Addressable::URI.parse('http:')
#=> raises Addressable::URI::InvalidURIError

Addressable::URI.parse('::boo')
#=> returns a URI

Addressable::URI.parse('::boo').to_s
#=> raises Addressable::URI::InvalidURIError: Cannot assemble URI string with ambiguous path: '::boo'

addressable has a lot of tests that assert it can raise on .parse() alone, and indeed in most cases it does. I can't look up a URI RFC atm, but is ::boo somehow valid? Is it ambiguous?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using addressable 2.3.6 I get the following results:

a = Addressable::URI.parse('http:')
# raises Addressable::URI::InvalidURIError: Absolute URI missing hierarchical segment: 'http:'
# a == nil

b = Addressable::URI.parse('::boo')
# raises nothing but returns Addressable::URI::InvalidURIError
# hence, b holds a Addressable::URI::InvalidURIError

I'd expect the second statement to also raise an exception but in no case to return one.
I'll go open an issue in addressable (sporkmonger/addressable#177).

So for now I think we should leave this as is.

@iainbeeston
Copy link
Contributor

I'd like to see this merged, once all concerns from the comments are sorted out. I think that if we can 1. Get rid of some hacky fixes we have for unusual urls (because of limitations in URI), and 2. Get better url parsing, then this is the way to go

@pd
Copy link
Contributor

pd commented Nov 22, 2014

@RST-J -- you'll need to rebase to deal with merge conflicts.

Just got ruby+git+etc booted on a windows VM and the test suite is green, including from dirs with spaces in the path. Really wish we had some way to run tests on windows that wasn't so insane =)

pd added a commit that referenced this pull request Nov 25, 2014
Use Addressable gem to handle URIs
@pd pd merged commit 7f4682b into voxpupuli:master Nov 25, 2014
iainbeeston added a commit to iainbeeston/json-schema that referenced this pull request Feb 6, 2015
To ensure that invalid uris of the form "::http" throw exceptions rather
than failing silently. This aught to make uri parsing more robust

See sporkmonger/addressable#177 and
voxpupuli#174
iainbeeston added a commit to iainbeeston/json-schema that referenced this pull request Feb 6, 2015
To ensure that invalid uris of the form "::http" throw exceptions rather
than failing silently. This aught to make uri parsing more robust

See sporkmonger/addressable#177 and
voxpupuli#174
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants