-
Notifications
You must be signed in to change notification settings - Fork 926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting multiple API versions #2353
Supporting multiple API versions #2353
Conversation
This setting is going to be removed, as part of supporting multiple api versions. The 0.6 is hardcoded elsewhere, e.g. in the routes, and so can be hardcoded here for now too.
I haven't looked through all the details yet, so please bear with me. I was wondering a bit how you would handle evolutionary or even revolutionary changes to the data model. Typically this is one of the most troubling and complicated aspects of new API versions, and I think it would be good to have some basic concept in place as well here. |
I can sum this up with the phrase "it depends"!
But it all really depends. I think the underlying question might be "how do we handle drastically non-backwards-compatible changes, like converting all closed ways into an area type (or if we introduce an area type into API N+1, how does that work with API N clients); or how would a change like removing segments work with two api versions running in parallel. I don't have an answer for these major structural changes, except to say that if it's logically possible at all to access the data between versions, the code approach in this PR will be able to handle it. And perhaps there'll be some change to the datastructure that prevents parallel API versions and it'll trigger a hard cutover between versions. But lets not get stuck on the biggest problem, and one that's not yet in hand. In the meantime, there's a ton of backwards-compatible but needs-API-bump changes that have been stuck for years, so we can at least sort out those ones 😄 |
I should have said, if there's specific API changes that you're thinking about, even if they are just to illustrate a point, let me know and I'll see how they fit. |
Likely making myself very unpopular:
|
Find your favourite API-0.7 wishlist, ignore the stuff about areas, and there's your list. 😄 Different people have created different lists over the years. I'm not intending to implement many changes, just the ones that I've been personally complaining about since API 0.6 was released over a decade ago (like incorrect http status codes, and plain-text responses, and stuff like that).
I know that you want them first, but that doesn't mean that they need to be done first. In particular, if the GDPR-related changes can be made without changing the API version then they can be implemented in parallel to this work, and so they are not interdependent. If they need to break API compatibility, then this work will need to come first anyway. So either way, they don't need to come first. |
Just off the top of my head, we'd see some immediate performance improvement on the iD side from:
(these require some coordination from the CGImap side too, but I don't think that's a blocker) |
This branch is already big enough
This ensures that raw XML links point to the latest available version
This could be reworked with some meta programming to get the latest API version in future.
This is because the changesets api is now multi-version
I've updated this PR so that the test suite passes, and it's now ready for review. Currently only a few routes have been adapted for multiple API version support, namely:
I intend to work on the rest of the routes in subsequent PRs. The settings in this PR ensure that only 0.6 is deployed by default, so it can be merged without any side effects. |
Could we get it on the sandbox first? |
First? Do you mean before code review? Or if you want it before merging, to what end? So far this PR is just internal code refactoring, there's no changes to the API (other than dropping one line from the api/0.7/capabilities response) and even if this is merged there will still be no changes since 0.7 is disabled. So we can ask Tom to set up a sandbox "first", but I'm not sure what you would want to do with it? Of course, it'll be worth having a sandbox available later on, but I don't think it's worthwhile effort at this stage. If you still have concerns, let me know what I can do to help. |
Before deployment, which in our case implies before merger.
Famous last words. In reality there are always things that might break, for example as when the authorisation refactoring was deployed. Being able to test against a deployment, while not a panacea, at least gives us a fighting chance to ferret out any assumptions that no longer hold true and so on. |
That's the bit I least like about the current multiple API version idea: it seems to focus on something I would describe as cosmetic changes only. That's ok, except for it only adds work to consumers of the API while offering no real value to them at all. Since they already have a working API integration, they would need to spend extra development time and effort to get back to a status quo. Isn't there anything more compelling to do that would give API consumers more of an incentive to move to the latest and greatest version? Unrelated question: do we want to support API clients using both 0.6 and 0.7 at the same time to support a gradual transition phase? |
existing consumers of the API. Some of these headaches and quirks need to be solved by every API consumer, and the total number of future API consumers yet to be written vastly exceeds the ones written so far. So the sooner we fix them, the better, and it's a shame they've been known about and unfixed for so long already.
Maybe we will want to put some of the more compelling things in v0.8, or v0.9, or something. But I'm determined to avoid getting into the same situation as has happened over, and over, and over again, where the scope of v0.7 expands inexorably until it collapses under its own weight! I'd rather break up the logjam and work on smaller, more frequent improvements (e.g. every 18-36 months, rather than 10+ years and counting) to the API. And I'd rather keep the API changes small so that a developer can upgrade their app with a small amount of code changes that's feasible in a weekend of hacking, rather than some significant API upgrade that puts them off doing any of it and leaves them stuck on v0.6. And making life easier for the developers is the whole point of adding multiple version support, so that they can upgrade when it suits them best. Anyway, let's talk about this further in a future issue, since we're burying the point of this PR ("is this the best code approach to multiple version support? Can you see a better way of coding the tests?") in wider discussions. |
@gravitystorm you might want to consider making a statement as to versioning of the API (for example if it will follow semver semantics going forward), or will we have to continue to assume that every version change is breaking as it is now (which is the real reason why the version is stuck at 0.6). PS: you probably have to consider decoupling data model version from the API version too, as changing the versions used for the former will likely break every tool out there. |
@@ -7,27 +7,54 @@ class CapabilitiesControllerTest < ActionController::TestCase | |||
def test_routes | |||
assert_routing( | |||
{ :path => "/api/capabilities", :method => :get }, | |||
{ :controller => "api/capabilities", :action => "show" } | |||
{ :controller => "api/v06/capabilities", :action => "show", :api_version => "0.6" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say we want to use semantic versioning like 1.2.0, how would this be reflected specifically in directory names like v06
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmd-osm a reasonable assumption IMHO would be to only actually have different "directory" names for major versions (aka breaking changes), the client can then determine from the capabilities which minor version is actually supported and from that determine which backwards compatible features are present.
@@ -10,7 +10,7 @@ OSM = { | |||
MAX_REQUEST_AREA: <%= Settings.max_request_area.to_json %>, | |||
SERVER_PROTOCOL: <%= Settings.server_protocol.to_json %>, | |||
SERVER_URL: <%= Settings.server_url.to_json %>, | |||
API_VERSION: <%= Settings.api_version.to_json %>, | |||
API_VERSION: <%= Settings.api_versions.min_by(&:to_f).to_json %>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this value set to "min_by", and what are the implications of it? Does &:to_f
play nice with semver (e.g. 1.2.0)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mainly just min_by to keep everything working for now, since it will pick 0.6 unless the site operator chooses to only deploy 0.7.
It's used for the bits of the website that talk to the API, like notes and changeset comments. Since there's no changes yet, it's more of a "pick either" situation. .max_by
would work fine too.
On a wider point, I'd rather work on refactoring those bits of the site to just be regular webpages like diary entry comments, but that's a different project!
# Our api versions are decimals, but controllers cannot start with a number | ||
# or contain punctuation | ||
def v_string(version) | ||
"v#{version.delete('.')}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking of semver again, would it be better to replace "." by "_" maybe, so it's v0_7
and v1_2
? How about patch version number? Ignore or include?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in the comments, semver would only use major version numbers here. So e.g. v7
or v124
. But it's a great point to raise, thank you.
|
||
# simple diff to create a node way and relation using placeholders | ||
diff = <<CHANGESET.strip_heredoc | ||
<osmChange> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At one point we'd also need to include a version number in <osmChange>
(maybe not now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I noticed that during the refactoring, but lets leave that for now since we're not planning any changes to the osmchange format (afaik).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I pointed out above, we return the data model version in all kinds of places, not just in OSC format. That needs to be decoupled from the API version.
So, trying (desperately) to drag the conversation back to the contents of this PR - does anyone have any comments on handling the indentation of the tests? I really dislike PRs that combine widespread indentation with additional changes to the code, since I find it very hard to spot where the real changes are and which lines are just indented with no other change. I've been considering whether I should split out the indentation into a separate PR, with a dummy def indent
yield
end and then make a PR with the methods indented but with no other changes def test_something
- code
- code
- post :foo
- code
+ indent
+ code
+ code
+ post :foo
+ code
+ end
end and then in the second PR it will be more obvious which of the lines have real changes. def test_other_thing
- indent
+ all_api_versions.each do |version|
code
code
- post :foo
+ post :foo, params => { etc }
code
end
end Or am I worrying about this too much, and the PR is fine the way it is now? |
@gravitystorm If it helps, one of the options in the GitHub pull request view is to ignore whitespace changes. It's in one of the hamburger menus when viewing the diffs. |
@gravitystorm you are essentially making an argument for versioning the representation too. Up to now the abstract underlying data representation has essentially been documented (if you so want) by the XML representation and has been in lock step with that. Now clearly we could change the representation in isolation, as your example suggests, without changing the API nor the underlying abstract data model and this would clearly require changing the API version too, because without that you would not know that you are getting a non-backwards compatible XML doc. But the ITU lies in the direction of doing and supporting that kind of change, so I hope we are not venturing there. But the other way around does not imply the same thing, for example all the changes that have been made to the API to date work just fine with 0.6 XML documents. And for example it would be -very- surprising if implementing #2348 and jacking up the API version would suddenly result in output that is not parseable by tools (at least those that do proper validation of their input) that would in principle work just fine if the gratious change to the "representation version" hadn't been made. |
No, I'm explaining that the So I'll say it again, the
No, it hasn't. There's been lots of changes to the underlying data model in the last 10 years, none of which have had a change in API version number - because even though the data model has changed, the API request/responses have only changed in backwards compatible ways. For example, we added notes. That was a backwards-incompatible change to the data model (you can't fit notes into a database that doesn't have a table for them, for example) but a backwards-compatible change to the API, hence no version number change.
Nobody has suggested that adding a new backwards-compatible API call needs a new API version - except for you, when you wrote "Without it (this PR) we would have simply added them as a 0.6 feature". So I'm not going to rebut something of your own creation that you are now arguing against. If you have another example, that would be helpful.
I'm not proposing any "gratuitous" changes to the API version. The only reason I'm proposing to change the API version is to allow us to make backwards-incompatible changes that can't otherwise be made. The whole point of this PR is to allow us to run multiple API versions in parallel - it's even in the title! So any tools that can only parse one version can keep using that version. Gratuitous change would be to yank the old version when the new one is deployed, or to bump version numbers for backwards-compatible changes, and I'm proposing neither of those. |
Thanks @iandees, that's really helpful. Took me a while to find it, even after you'd told me it was there somewhere! |
OK, you change the error responses to be returned in a structured format and not the current plain text. |
@tomhughes I'd love to hear your feedback on this PR - for example on the indentation question, or the overall approach for how to support multiple versions |
I still very much disagree with the overall approach to tightly couple the OSM XML header version number to the version number that is part of the URL I found this blog post raises a valid point here:
Let's be fair, changing the OSM XML version header from 0.6 to 0.7 would cause some breakage for no good reason, where in reality you're maybe only trying to change the HTTP response codes or produce some nice error messages. In both cases there's no valid reason at all to also change the OSM XML version number. I still haven't seen an answer to a point I raised earlier on, how the long term evolution of new version numbers should look like, in particular, for how long old versions will be supported. Without a clear strategy in place, you would keep on adding more and more versions over time to accommodate for API consumers unable or unwilling to move to a newer version. Today, they don't think about "sun-setting" a version, but in the future they will have to, and they need to be aware of that up front. Handling multiple versions in parallel will add some mental strain when working with the code, even when they share a large part of the same code base. Please keep that in mind so the additional complexity won't kill the code in the long run. |
It'd have to depend on the what is a reasonable time for clients to move to a new version, and how much dev work is involved in maintaining the old version. It might be necessary to make an API version read-only too |
But these are, and always have been, the same number. The number in the response is the API version number, full stop. There's no such thing as an "OSM XML header version number" as an independent concept as you are describing, one that provides a "format version" that could stay the same as the API behaviour and version number changes. Consider the "api/X/gpx/1" response. Every time the API version number changed, the number in the trace response changed, even when there's no other change in the XML. Therefore that number is the API version number, and not some independent "XML format version" which just happens to be 0.7 too. Anyway, this is not a new concept being introduced by this PR, so I'm going to try to avoid debating it further here. Perhaps there is a need for a "OSM XML format version" (or, as previously discussed, an "OSM data model version", which is a third distinct concept), but that would be separate from the API version currently contained in the responses.
Given that we haven't decided what the final list of features are in the next version, it's a bit premature to say that it's going to be for "no good reason"! One of the points of this PR is to introduce a mechanism that allows us to implement whichever changes we see fit, independently of when they are deployed, by using a "feature flag" concept. That way we can implement a bunch of different improvements, and it gives us flexibility to decide when enough things are implemented to make the deployment of 0.7 "worth it". Until that point, it all lives behind the Also, in reality few things are going to break when 0.7 is released, since they can keep working with 0.6. Again, one of the purposes of this PR is to allow us to run multiple versions side by side. So no clients will talk to 0.7 until the developers make them compatible with 0.7. For software that is consuming OSM data without interacting with the API (e.g. osm2pgsql), sure, some of them wouldn't understand today what to do with e.g. an osm file saved from a So we'll need to see at that point whether it's for "no good reason" or not, it's not something I can decide on before we've even started implementing anything.
As Paul says, "that depends". I'm not going to debate here how long to keep 0.6 running after 0.7 is released, since at this rate we're never going to see 0.7 in the first place!
Absolutely, that's a big factor I considered while implementing this PR. If you have any comments on the approach used in this PR, I'd like to hear them. I'm very much open to suggestions as to how to streamline having multiple versions in the codebase. I'm happy with the current approach but alternative suggestions are valuable. |
Just as a data point: we have 100'000s of files, if not millions, on planet.openstreetmap.org that have nothing to do with the API that reference a version 0.6, not even to mention the ubiquitous PBF format that currently has a "OsmSchema-V0.6" field. |
Yes, and there are files there that reference, 0.5, 0.4, and even 0.3, so we've survived previous version changes. I don't know what your question might be, and I'm not going to guess. |
I have to agree with others here. Changing the version number in the XML and PBF files will break a lot of software, including mine. This software is out there and, even if we change new versions now to accept a new version number, old version of this software will be out there for years. There is absolutely no way we can break the compatibility if it is not absolutely necessary. @gravitystorm As you mentioned yourself, 0.6 has been around for a very long time, so comparing the situation to 0.5 and earlier versions doesn't make sense. When we changed from 0.5 to 0.6 OSM was much smaller. And writting an I think the only way forward here is to decouple API and file format versions. And I don't see why this is a problem really. Keep the We might also need to think not only about file versions and API versions but versions of the (abstract) data model behind it. I am not sure myself whether they are the same or what exactly their relationship is. BTW: We already have some variants of the XML file format (JOSM, Overpass, ...) and they are not really compatible so should have gotten some kind of identifier, but that's a totally different issue again. |
I'd like to check what you're proposing here. For example, lets say in the next version of the API, the nodes/ways/relations output is the same, but the format of the notes output is different[1]. Are you proposing a) all endpoints should continue to output Secondly, what if we introduce a new OSM document in the API? Until now, we've just used the existing number. For example, if we add a <diary_entry> API[2]. What number would that then output? Do you think it would be best to a) start again from 0.1 for fresh OSM document types [1] This is a genuine plan of mine, to make notes have a description instead of a special first comment, so I'm not making a strawman |
IMHO if you start considering stuff outside of just the core OSM data and API it becomes tricky because there has never been any consensus on carving up the API so that essentially independent parts can actually be run independent of each other (see @zerebubuth suggestions way back). This is particularly noticeable with the Notes API which is really a third party service bolted on to the existing data and user API (btw just to make things complicated notes use a different XML document for on disk storage than what is returned from the API). In any case if we really want to indulge in 2nd system syndrome, I would suggest separating at least the documents and potentially the APIs for core user data, osm data, notes and for any social media functionality. Starting with 0.6 for everything that exists right now. |
Unfortunately that's what I have to do - everything under What do you consider as the limits of "core OSM data"? I assume at least nodes, ways, relations, and the output of the map call (e.g. the element). Would you also include the users and changesets output? Both are referenced from a map response but only by id.
I haven't seen much discussion of that, if any. I know there were discussions in the past about running all of I don't think it would solve any of the problem at hand though. If we split the API up into multiple software projects, it could still be run transparently at
Since I think given that splitting things up doesn't solve the topic of what numbers to put in the responses, I'm going to avoid this "2nd system syndrome" and declare it out of scope for this PR. |
Yes, that concept sounds familiar, see my comment here: #2162 (comment) Quoting myself: Today, we have so many different API endpoints under the umbrella of the API 0.6 that don't really belong there. Examples could be changeset discussions, gps traces, user data, and probably also map notes. Any incompatible changes here don't really warrant a new OSM API version in a strict sense IMHO, and this further complicates overall API evolution. Maybe that's worth exploring in another issue. |
So it's clear that this PR has got stuck, and that I need to find some way to move this forward. I've been trying to figure out what that would be for the last few weeks. This PR was (and still is) just a tiny first step towards general support for multiple versions, never mind any decisions as to how the versions will differ from each other when finally deployed. However, it's reasonable that people want to pitch in with their thoughts on the wider concepts around the API versions. Two recurring themes come up:
The latter is particularly what has got this PR stuck. We can't have a fully informed decision on what number(s) to put into the responses until we see more examples of what's actually going to change. But we can't code any of the specific changes until I have implemented general support for multiple versions. And this is just the first in ~20 PRs that will be needed for the general support, never mind the ~?? PRs which will be about actually changing any API responses. So my proposal is:
I think this will allow us to unblock the general development work while the wider issues are discussed more fully. |
I do have this in my queue for a technical review as I think you requested a while ago, but I kind of held of because it sort of blew up again. I don't expect to find any major technical issues though - if support for multiple API versions in parallel is necessary then I'm sure this is broadly speaking the correct approach. You and I disagree on whether support for multiple API versions in parallel is required - actually that's not quite true as I have no problem with the idea in principle but I just don't think there is any realistic way we can achieve it. Separately a lot of people think that the object model should be versioned separately to the API version, which you don't feel is necessary, but which I tend to agree with I think. In fact that ties in very much with my thoughts about multiple API versions in that I think there is no problem with multiple API versions where the object model remains the same but what people really want to do is to enhance the object model and I very doubtful that we can support multiple versions of that in parallel. |
I agree there are changes that will be hard or impossible to support in parallel. But that shouldn't stop us from making all the other changes that can. |
Closing this PR as per #2353 (comment) |
In their recent S2S meeting the OSMF Board discussed how they can support rewriting of the API ("Supporting API: confirm goals and identify next steps - supporting Andy Allen’s rewrite of the API (Paul, Joost, Guillaume)") -> https://wiki.osmfoundation.org/wiki/Board/Minutes/2020-10-S2S Maybe this is referring to some previous @gravitystorm blog post on supporting multiple versions? Anyone have some insights what this is all about? |
The board doing random stuff without coordination with anybody else? It's the norm, see also a budget without asking the WGs for their requirements and many other things. |
Don't get carried away. You know as well as I do that it could very well be a harmless question discussed internally ("can/should we use resources on this") and only after a preliminary nod would the board approach third parties. |
In a first step, I really wanted to get some better understanding what the intended scope of those requirements were, irrespective of effort, feasibility, etc. Agreed, communication could be a bit more transparent at times (otherwise I wouldn't be asking those questions), but this issue isn't a good place for that discussion. |
Just for clarity - I haven't spoken recently with any board members on this topic, so I don't have any insights as to this particular conversation or what was intended (or even what they mean by 'rewrite of the API'). But I am always happy to have more help, either on the narrow topic of the API or the wider topic of the everything else covered by this repo. |
Frederik supposes correctly. We're seeing how important and Andy's work is, how progress there makes the work of others easier, and discussed if and how we could support it. There was a confusion between rails port and API, and we ended up chatting mostly about the rails port. We haven't decided to fund any projects, or indeed even had a chat with Andy recently (hi!). We're currently running the microgrants, three software Grant pilot projects (osm2pgsql, Nominatim, Potlatch) and paying Quincy to work on iD full-time. We intend experience from those to support a reactivated EWG who would be in charge of managing the projects and allocating a budget. We're very interested in hearing from anyone who would like their projects to be supported by the OSMF, or anyone who would be interested in joining EWG. |
@grischard : thank you for the clarification, the meeting minutes make a whole lot more sense now. |
I'd like to support multiple API versions, so that we can deploy API 0.7 in parallel to API 0.6. It's not something we've ever done before, but I think it's the only reasonable way to handle API version changes these days!
I'm opening this PR just to receive any initial feedback on this proposed approach to the code changes. So far this approach demonstrates the basic concepts, including:
API::V06::CapabilitiesController
, and new code lives in the normal place e.g.API::CapabilitiesController
. This makes future upgrades easier, since the assumption is that new code will be used in version N+1 too.If you are interested in seeing how it works, then the changes to "config/routes.rb" along with the output from "bundle exec rake routes" are the best way to see the overall idea.
The biggest drawbacks so far are around the tests.
First, I think it makes sense that every API version that the codebase knows about is fully tested, i.e. even if the result is expected to be the same, a given api test should run once for each api version. This leads to the indentation of all the tests changing, since we need to add a "all_api_versions" loop around almost every test. So the diffs and git blame are horrible.
Secondly, the controller methods involve lots of changes like this:
It leads to a lot of extra
params => ...
to skim read, which isn't ideal. I've tried working around this but the workarounds have their own drawbacks.The tests don't all pass yet, so this is not yet in shape to be committed, but I hope to get it there soon. In the meantime, and before I make any further changes that you might not be happy with, any feedback is very welcome!