-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Taking API surface seriously: Proposing Syntax for declaring API #49973
Comments
I think the motivation here is solid! Well said :-) I don't so much like the feel like the proposed Also I don't understand why Did you consider the alternative of having the API declared separately, in a similar way to how the For example, we could have scoped export A, B The idea is to export A and B as public API, but keep their names "scoped" within the namespace of the exporting module. A nice thing about this is that it could be implemented as a contextual keyword: the Regardless of my quibbles, I think this is an important issue and it's great that you've put the effort into writing up your thoughts. Thank you! |
I explicitly haven't mentioned other proposals because I want to see how people react to this taken at face value - not to mention that I think most other ideas in this space are a bit underspecified in what exactly the desired semantics are. I think that's supported by the amount of discussion around what the various PRs & issues mean & want, which is what I attempted to give here directly. Most of the proposal follows from the requirements imposed by "what is a breaking change" when looking at functions, methods, variables etc from the POV of a user upgrading a version of a package. My reasoning for placing Another reason is that And a third reason - if the whole function is always considered API, what about new methods added from outside of the package defining the function? Are they also automatically considered API, and who supports them? That's not something you can do with Finally, I don't think it's difficult per se to add something like
as a syntax for an API list, marking all of these symbols as API. The difficulty lies in the semantics - what the semantics of that Apologies, this got longer than expected - thanks for asking about it!
This proposal can also be done like that, no? I've given an explicit list of places where writing julia> api function foo end
ERROR: syntax: extra token "function" after end of expression
Stacktrace:
[1] top-level scope
@ none:1
julia> api module Foo end
ERROR: syntax: extra token "module" after end of expression
Stacktrace:
[1] top-level scope
@ none:1
julia> api const bar = 1
ERROR: syntax: extra token "const" after end of expression
Stacktrace:
[1] top-level scope
@ none:1
julia> api abstract type MyAbstract end
ERROR: syntax: extra token "abstract" after end of expression
Stacktrace:
[1] top-level scope
@ none:1
julia> api struct Foo end
ERROR: syntax: extra token "struct" after end of expression
Stacktrace:
[1] top-level scope
@ none:1
julia> api foo::Int = 1
ERROR: syntax: extra token "foo" after end of expression
Stacktrace:
[1] top-level scope
@ none:1 Thank you for taking the time to read through the proposal! I really appreciate your thoughts, especially because of your work on JuliaSyntax. If the scheme parser weren't so hard to jump into and hack on, this might have already been a draft PR instead of a proposal :) |
I'm a bit skeptical about the per-method thing. At least it needs some good thinking about what would be breaking. Coming from above example api function foo(arg1, arg2)
arg1 + arg2
end A user can now happily do function foo(arg1::Int, arg2::Float64)
arg1 * arg2
end The same call as before is now accessing an internal method. |
Yes, that is a good example for an accidental breaking change, of the sort described in the proposal that really must not happen. I realize now that I've only written about the case of removing
I think there is room for having this testable as part of a testsuite - for example, one could take the declared signature of all Another way this could be done without checking past versions is have something of the sort of @test_api foo(::Number, ::Number) to check whether the given signature (and its subtypes) dispatches to an |
I don't have much insignt into the merits of marking api per-method vs per-function, but I fear that per-method would be too complex and lead many devs to just don't bother with |
Well conversely, the In either case, I think it's good that this discussion is happening - keep the feedback coming! |
Thanks for a well-written proposal! I 100% agree that there is an issue with a lack of consensus around what makes an API of a package, and that it would be nice to have a rigid way to specify this. However, first, in the bikeshed department, I don't like the keyword
Second, I'm a bit worried about how the notion of an API surface and the genericity of Julia code composes. It is often very convenient to define a function like Third, I think if something like this is implemented, having the notion of an "experimental API" or something similar will be invaluable as a way to experiment with new features, while declaring the intent that an interface could become stable in a later, non-breaking release. Finally, like with |
Thank you for your time and discussing the proposal!
The bikeshed is noted, but barring an alternative.. 🤷 I'm only partial to
I originally considered proposing an API-string macro, used like so: api"""
my_function(args...)
This is a nicely documented function that does what you expect.
"""
function my_function(args...) end and implemented like struct APIMarker
doc::String
end
macro api_str(s::String)
APIMarker(s)
end which would give any documentation rendering the ability to distinguish API from non-API. The issue with that or with placing the marker in a docstring is that you now need to parse & process the docstring to be able to say "this is API" for doing code analysis. It's no longer a static property of the source code as written, and is something that can change at runtime -
That whole paragraph is a very good point, and I have thought about that! There's also a subtlety that I forgot to put into my original proposal, so let me take this opportunity to reply and add that subtlety here as well, by lieu of an example. Consider a function with a single method like function foo(arg1, arg2)
arg1 + arg2
end that we say is API of some package. It's signature is So, barring being able to express those kinds of constraints, the next best thing is to say "well, I support this signature, provided you implement functions X, Y, Z with arguments W, V, S on your type T" in a docstring. The api function foo(x::MyAbstract, y)
2x+3y
end and being able to express "I only support
Well, I don't per se see an issue with that - not every implementation detail is part of the API of a function. While some implicit constraints on the arguments can be, in practice those may be considered an implementation detail, and if they are, it's fine if non-breaking change (according to the docstring/additional documentation of the method) the breaks some user code that relied on them. The correct solution here is to lift the actual requirements to the type domain, by using a dispatch type for those kinds of invariants/requirements, not to make the API more restrictive by lifting the implicit constraints to
I'd implement that with this proposal by having an inner module
Well.. from my POV, that's kind of what we're doing all the time though already, by having surface-level API functions like In part, the "API per method" section of the proposal stems from my (perhaps irrational) fear that people are going to slap |
I appreciate your point, but I disagree with the analysis. I could define a function: """
multiply_by_two(x)
Return `x` multiplied by two.
""""
multiply_by_two(x) = x This function will never error. It is the identity function. However, it is a buggy implementation, as it does in fact not do what it states. An API is not something that could or, I would argue, even should be analyzed from a code perspective. It's a slightly more fuzzy human-to-human contract written in prose.
As my point above, I don't see any compelling argument for why this shouldn't be allowed to change at runtime, though. As I see it, having API declarations in the code is a feature that is purely intended to help users (and potentially linters). Intentionally making it less flexible than the rest of Julia's documentation system seems to be in stark contrast to Julia's dynamic nature, and I think analysers will be able to overcome the docstring-parsing requirement.
While I agree in principle that it would be nice to use the type system to restrict these cases, based on my own experience, I don't think the type system is the correct level to do this at in Julia. And there are several cases where it would be impractical to require something to subtype a particular abstract type.
I also think it would be really nice to have all methods exposed at the surface-level of the dispatch tree, but there are a lot of practical reasons for having function indirection. Often it's with an |
Right, and I'm not claiming that
I disagree; that's exactly what a testsuite is. A docstring is the prose describing the desired behavior, and a testsuite is the code checking that the functional implementation matches the prose.
I agree with you here, this is not currently a desirable thing to do because there's only one possible direct abstract supertype. That's still an orthogonal concern to whether or not a single method is considered API or not though. Maybe that's an argument for holding off on method-level-API until such a time when we can express these kinds of constraints, of saying "I require the invariants these 2+ abstract types provide", better? |
personally, I would not prefer the syntax be a modifier to existing definitions like an API is kind of like an "interface for a package," so what if the declaration format were similar to an interface proposal I have seen being discussed recently? https://github.com/rafaqz/Interfaces.jl Mockup below with a new syntax
|
also, possibly this discussion should be folded together with #48819 ? |
I don't think that should be tackled in this proposal. I don't aim to modify the type system here at all - this is purely for allowing developers to declare their existing API, without having to migrate to new concepts in terms of what they want to develop. Also, while I can see a path for something like interfaces to be a thing, I don't think the approach in Interfaces.jl is right, in particular because it's a big disruption to the ecosystem due to the amount of "legacy" code we have. I have a different idea for how to tackle that (mostly based on type theoretic consequences/additions to the current type system), which does integrate with this proposal nicely without special cases while keeping code churn as small as possible (not larger than the code churn produced by this proposal, in fact), but that's for a different discussion.
While that discussion is the first time I wrote down parts of the ideas contained in this proposal, it's not a concrete discussion about a concrete proposal. Thus, I think this issue is more appropriate to talk about this proposal at hand, instead of the larger discussion whether we want something like access modifiers at all. So far, the only major gripe with this proposal seems to be the per-method-API and some minor bikeshedding on the name/syntax to expose this (which I think I have argued sufficiently now why it ought to be syntax, and not dynamic information). |
The only similarity is superficial. Otherwise, the "interface" is just a tuple of pairs without introducing any new type concepts. It almost doesn't require syntax at all; if modules were types one could just dispatch on some function |
IMO the less needed by package developers to convert the current code to an API-aware world is better. Where there is an API function in a package, I would expect any non-API methods of it to be an exception rather than the rule. So it would probably make more sense to allow marking a specific method of an API function as non-API, rather than the reverse? However, the better solution is to just make it a different function. Already pretty common I think to have e.g., The argument that a keyword will help people be mindful is undermined by having an all-method construct; at that point it's just as easy to define the API methods in a central location and rely on tooling to raise awareness of which functions are API. And a central location is a more natural extension of what we do with As far as the scenario where Package B adds a method to an API function (say, |
I have not given this anywhere near the amount of thought you have, @Seelengrab, but I am also a bit skeptical about the per-method thing. I kind of like @c42f's scoped export f
f(args...) = ... if they want that, too. ( As a way of ensuring that detached |
Trying to draw inspiration from how other languages do this, I'm reminded of how method overrides are handled in languages with classes. If an abstract method is marked as public, any overrides must be public as well. Less familiar with the space of languages with traits/protocols, but the couple I do know of have similar behaviour when it comes to implementing them (visibility of implementer must be >= visibility of trait/protocol methods). Not all languages use SemVer so it's difficult to compare breaking change policies, but the Rust guidelines related to types and functions are quite reminiscent of the ones in this proposal. |
I've segmented the responses according to who I've quoted, so it's easier to follow :)
Explicitly marking things as non-API has the issue that it's easy to forget - I mentioned that somewhere, I think. You really don't want to end up in the situation of having to do a technically breaking change in a non-breaking release (by semver) just because you forgot to add the non-API marker. No marker -> private/internal should be the default, rather than the exception. Rust goes as far as not even allowing you to access things that haven't been explicitly made available (which we don't want, and also can't do before 2.0 (but more like 10.0) because it's massively breaking).
That takes away agency from the developer of Package B, in that it doesn't allow the developer to have a public type, but without having To be fair, the example is a bit abstract, but I think the only reason something like this hasn't been a problem so far is because we really don't know what the API surface of any given thing is, so we somewhat expect things to break. It's also hard to opt into someone elses code, since we currently only have single abstract subtyping and the most succesful attempt at having such a wide interface package, Tables.jl, doesn't use abstract types for signaling dispatches at all (there are trait functions though). I think that's a missing feature in julia, but that's for another day and another proposal :)
I'm strongly opposed to calling this a
I'm not sure Aqua rules are enough for this - it's not a linting thing after all. I think something like
This is no coincidence - the rules laid out both by the Rust guidelines as well as the ones I've described in the proposal are purely a consequence of the type system and what it means to change something from version A to B. It's mostly a type theoretic argument that leads to the interpretation that it ought to be possible to mark individual methods as API. I feel though that to make this point more clear, I'll have to write some more about what I (and I think the wider programming language design community at large) considers a "type". So little time though... Taking that lense, as well as thinking about what SemVer means, leads to a very natural derivation of these rules. On the flipside, it's of course possible to take a coarser approach and mark whole functions as API (that's also described in the proposal after all, that's If dropping per-method-API gets more support for this syntax proposal, good - let's do it. We can always add it back later if we feel that we do want that granularity after all, since it's always allowed to widen your API ;) |
I guess there are two aspects: (1) what you name it (I'm happy to drop that aspect of the proposal), and (2) where you put it. If we drop the per-method idea, then there is no reason you couldn't put these annotations near all your |
To be clear, my point was that Rust appears to support many of the same rules about methods and API compat despite not allowing for private implementations of public methods. That's not to say allowing method-level granularity when specifying API surface area is a bad thing, but it suggests to me that these two pieces are somewhat orthogonal. |
I agree that consensus on what "API" means is good and should probably be documented. Thanks for thinking about this and making a concrete proposal. One of the advantages of a concrete proposal is that I can point to concrete and hopefully-easy-to-fix problems.
I think this is too strict. A better requirement is that any subtyping relation between two API types should be preserved. For example, This is okay because folks who are dispatching based on
This is too permissive because narrowing the type signature of a nonconstant global can cause writes to error. As for how to mark things as API or not API, I concur with some others in this thread that marking API similarly to export is a good idea.
* |
@LilithHafner along those lines, I enjoyed the proposal from a few years ago here #39235 (comment) where in this case |
That's a good point, and an even better one for the need for I like the idea of only keeping the guarantees you've actually promised to provide.
That's a predicament then - I originally wanted to allow widening of the type to one that gives fewer guarantees, but that obviously doesn't work because then reads may fail in dispatch (you can't take away guarantees you've given in a previous version). Taken together, these two imply that types of API globals must not change, at least without releasing a breaking version. Not related to the syntax proposal per se, but quite unexpected nonetheless! Since I'm not sure it's quite clear, my motivation for having |
I agree that having specific signatures be part of the official API makes sense, but I think that might be orthogonal from the Methods themselves. I say "sort of" because So I'd propose a modification: to give it teeth we have to tie it to signatures, but don't tie this to specific methods. Effectively, we're protecting the integrity of the callers rather than the callees. |
How can that be? In my mind, for that macro to be able to check that a call receiving that signature has any hope to succeed, a method taking After all, if no method
I'm sympathetic to that idea, but I consider it out of scope for this particular proposal. I think the idea "declare api" extends to other modifications we might want to do with the type system later on quite well; the semantic of "I support these additional guarantees" ought to be possible to tack onto almost anything after all, even after the fact. EDIT: To be clear, I consider the formalization of type signatures in the type system distinct from methods out of scope; not tying API to signature, as that's exactly what I want to do with per-method API :) |
You can implement foo(::Real, ::Real) # method 1
foo(::Any, ::Any) # method 2 and that signature is fully covered. The coverage is nominally wider than what we're guaranteeing to be API, but that's OK. In fact it's more than OK: as soon as you think about the implications of trying to actually write Again, the point is that you cannot implement https://timholy.github.io/SnoopCompile.jl/stable/jet/ provides a concrete example, if this is still still unclear. I am concerned that inference failures are going to mean that we'll frequently run up against |
I am inclined to prefer demarcating API at the level of symbols because it is easier to tell a human Nevertheless, looking for an example where more specificity is needed, let's say I have a method foo(x, y) = x + y # version 1 and I think I might eventually want to change it to foo(x, y) = x - (-y) # version 2 Which has slightly different behavior for some input types (e.g. With per-function API this would look like api foo(x::Int, y::Int) = _foo(x, y)
_foo(x, y) = x + y |
One issue with per-method API granularity is that a user won't necessarily know if they are or are not calling an internal method if the types aren't known until runtime. Extended - what's the concrete benefit of having API and internal methods of the same function, over giving the internal functions a different name? Arguably the latter leads to a cleaner design for both developers and users to reason about. |
A couple of thoughts that I can't quite congeal into a thesis:
|
Specific Modules, Abstract Types, Concrete Types / Structs, (and anything else I missed) may have their own api. |
After spending quite some time at JuliaCon talking to lots of people, I'm only strengthened in my belief that having a default meaning for what is considered guaranteed when something is API is not only a good idea, but going to become increasingly important for industry adoption. Having to look at every single docstring for every little thing about what even can change between releases simply does not scale. To be more specific, I've identified two key areas that, from my POV, were the core essence of this proposal:
While I don't understand why point 1 was wholeheartedly rejected in triage way back when @LilithHafner voluntarily brought their own PR #50105 up for discussion there (and just to make it perfectly clear - I don't have any personal grudge against Lilith), the impression I got from the discussions I had at JuliaCon with potential industry adopters were very much in favor of having something like these default API guarantees. It just makes it so much easier to develop Julia code. I don't want to/can't impose what the core developer team ultimately goes with, but I do want to point out that Yuri liked this proposal as well, judging from the 👍 in the OP, for whatever that is worth... So I at least hope that this discussion can continue in the light of thinking about what's best for the language as a whole, and not around what would be simpler/easier to do as a first step. I think it pays to be greedy here - the time for "simple" steps seems to me to have passed long ago, seeing as the "correctness debate" surrounding Julia has been ongoing for much too long, in my opinion. Regardless, in the time since I last looked at this issue I've published RequiredInterfaces.jl, which (for my usecases at least) solves point 2 above, by allowing developers to declare "this method needs to be implemented, and then you'll receive these other methods from my interface". This pretty much covers all what I want to do with per-method API, which has been (seemingly) hard to get across in the 99 messages above. @timholy & @dalum , since you've shown great interest in the proposed semantics above, I've also included a lengthy document describing the intended usage in more detail, which should further clarify the points already raised in this issue.
Well, I certainly didn't understand the objections in that light - the communication from my end really felt more like wholehearted rejection of the entire notion 🙂 The fact of the matter is, API stability is a hard problem and the fact that we're years into this Julia experiment now and we're STILL struggling with it, in my opinion, just proves the point that this is a hard problem that doesn't have an easy solution. However, once we DO have fallback definitions that work with the semantics of the language, it ought to be pretty trivial to have a "julia-semver conformity checker" that can just be run when a new version is registered, or that people can run preemptively in CI of their packages. Just because it's not trivial to come up with such a ruleset doesn't mean that we should eskew it entirely. The longer we wait, the more we paint ourselves into a corner of having "too many exceptions", which (let's not kid ourselves) is something we must clean up sooner or later anyway. Bad API design doesn't vanish just because we close our eyes to the complexity of reworking it. |
I like having a formal interface spec, but unless we can solve #5 (which would be gerat!), it should have an option to be decoupled from an abstract type. Subtyping What you've done with method errors for incompletely implemented interfaces in RequiredInferfaces.jl is great! I want that. Forward compatibility is something to be super aware of with the @Seelengrab, sorry I missed you at JuliaCon, I'd love to hop on a call with you if you want to talk about this over voice chat. You can reach me on the Julia slack, or via email. |
Yes, the exact same issues come up, because it's type theoretically the same thing (to an extent - some issues don't apply at all). Just as traits are, mind you - the same issues come up with Holy Traits too, as I've described in the documentation. I don't want this thread to become a new version of issue 5 though, and I'm well aware of the discussions surrounding it (modulo some of the email chains that spurred the creation of it in 2014). I do think though that some form of I implore you to bring further discussion on
No worries! I'll be in Boston until the 2nd of august, so there's still plenty of time. |
I think a combination of the following things will carry us far:
A possible implementation of the check would be like this: struct InterfaceError <: Exception
X
T
f
InterfaceError(@nospecialize(X), @nospecialize(T), @nospecialize(f)) = new(X, T, f)
end
# TODO: implement printing
# Variant for concrete `T, N`
function required_interface(::Type{AbstractArray{T,N}}, ::Type{X}) where {T,N,X}
Base._return_type(Tuple{typeof(size), X}) <: NTuple{N,Integer} || throw(InterfaceError(X, AbstractArray{T,N}, size))
Base._return_type(Tuple{typeof(getindex), X, Vararg{Int,N}}) === T ||
Base._return_type(Tuple{typeof(getindex), X, Int}) === T || throw(InterfaceError(X, AbstractArray{T,N}, getindex))
end
# Variant for concrete `N`
function required_interface(::Type{AbstractArray{T,N} where T}, ::Type{X}) where {N,X}
Base._return_type(Tuple{typeof(size), X}) <: NTuple{N,Integer} || throw(InterfaceError(X, AbstractArray{T,N} where T, size))
Base._return_type(Tuple{typeof(getindex), X, Vararg{Int,N}}) !== Union{} ||
Base._return_type(Tuple{typeof(getindex), X, Int}) !== Union{} || throw(InterfaceError(X, AbstractArray{T,N} where T, getindex))
end
# Variant for concrete `T`
function required_interface(::Type{AbstractArray{T,N} where N}, ::Type{X}) where {T,X}
Base._return_type(Tuple{typeof(size), X}) <: Tuple{Vararg{Integer}} || throw(InterfaceError(X, AbstractArray{T,N} where N, size))
Base._return_type(Tuple{typeof(getindex), X, Vararg{Int}}) === T ||
Base._return_type(Tuple{typeof(getindex), X, Int}) === T || throw(InterfaceError(X, AbstractArray{T,N} where N, getindex))
end
# Variant for arbitrary `T,N`
function required_interface(::Type{AbstractArray}, ::Type{X}) where {X}
Base._return_type(Tuple{typeof(size), X}) <: Tuple{Vararg{Integer}} || throw(InterfaceError(X, AbstractArray, size))
Base._return_type(Tuple{typeof(getindex), X, Vararg{Int}}) !== Union{} ||
Base._return_type(Tuple{typeof(getindex), X, Int}) !== Union{} || throw(InterfaceError(X, AbstractArray, getindex))
end and then julia> required_interface(AbstractArray, Array)
true
julia> required_interface(AbstractArray, Set)
ERROR: InterfaceError(Set, AbstractArray, getindex)
Stacktrace:
[1] required_interface(#unused#::Type{AbstractArray}, #unused#::Type{Set})
@ Main ~/tmp/interfaces.jl:31
[2] top-level scope
@ REPL[3]:1
julia> required_interface(AbstractArray{Bool}, BitArray)
true
julia> required_interface(AbstractVector, Base.Sort.WithoutMissingVector) # this one fails, would be nice if it didn't. Can `U<:AbstractVector`?
ERROR: InterfaceError(Base.Sort.WithoutMissingVector, AbstractVector, size)
Stacktrace:
[1] required_interface(#unused#::Type{AbstractVector}, #unused#::Type{Base.Sort.WithoutMissingVector})
@ Main ~/tmp/interfaces.jl:18
[2] top-level scope
@ REPL[5]:1
julia> required_interface(AbstractVector{Float64}, Vector{Float64})
true I think this should be in |
I originally thought about using
julia> using RequiredInterfaces
julia> @required AbstractArray begin
Base.size(::AbstractArray)
Base.getindex(::AbstractArray, ::Int)
end
getindex (generic function with 184 methods)
julia> struct A{T,N} <: AbstractArray{T,N} end
julia> A() = A{Int,1}()
A
julia> size(A())
ERROR: NotImplementedError: The called method is part of a fallback definition for the `AbstractArray` interface.
Please implement `Base.size(::AbstractArray)` for your type `T <: AbstractArray`.
Stacktrace:
[1] size(::A{Int64, 1})
@ Main ./none:0
[2] top-level scope
@ REPL[6]:1
julia> getindex(A(), 1)
ERROR: NotImplementedError: The called method is part of a fallback definition for the `AbstractArray` interface.
Please implement `Base.getindex(::AbstractArray, ::Int)` for your type `T <: AbstractArray`.
Stacktrace:
[1] getindex(::A{Int64, 1}, ::Int64)
@ Main ./none:0
[2] top-level scope
@ REPL[7]:1 The details on how this is exactly implemented are of course malleable (I'd personally prefer language-level support for this, perhaps without actually defining methods? Would certainly get around the ugly method inspection hack I use at the moment), the current design is just a result of what I had to settle on to allow precompilation, (almost) zero-overhead, nice error messages for users all while being in a package. Checking whether something is part of a method-based interface is from my POV the "easy" part - making that easy & nice to use for users without having to themselves write Still, if there's an asserted return type like
that'd be trivial to add with |
That's what
I agree with the problems of relying on inference. But not using it has, I think, more serious problems of its own. I think you really can't declare something as passing unless it returns the kind of object that you'd describe in the documentation: just checking whether a method exists does not seem sufficient. It would be really broken, for example, to create an |
Do you mean that your version would be checked/called at every callsite of the interface..? Otherwise I don't see how you get the error message printed in regular user code, as is the case now with RequiredInterfaces.jl.
Right, if the interface requires the type to be the same or some specific type, that should obviously be part of the I've opened Seelengrab/RequiredInterfaces.jl#8 as a feature request for these kinds of return type annotations. Shouldn't be too bad to add. |
I mean add special printing for
I'm not sure there's a big usability difference between
vs
(which you could automate given the |
Ok, but I was asking about when that error would be thrown/displayed. It sounds like you're suggesting that the error should only be displayed when actively checking for conformity with the interface, is that correct?
There absolutely is - one informs of a missing implementation, the other informs of an incorrect one. That may be a little thing to you, but can be the difference between "Oh I forgot to implement X" and "what does this thing want, I thought I did everything?" for a new user. |
And now I suddenly know why Julia error messages/stacktraces are like that 😅 Tim Holy is too big-brained for the job of writing error messages. We need to find someone dumber to write error messages that I can understand |
Have there been any further development with regards to how to handle fields? IIUC With regards to fields, I'm assuming since user can already define property methods there's some hesitancy to make any changes in how marking a field |
Fields should rarely be public. Properties, on the other hand, are sometimes public. The current story for public properties is that if you want a property to be public, document that property in a docstring and/or manual. To document internals, you have to document them and also say they're internal (just like all names prior to the public keyword). |
Now that we have *public*, which provides a way to include a function or
property in the api associated with a module ...
What about the complementary *private*, which would exclude a function from
use outside of a module?
```
module Example
export foo
public bar
private baz
function fn(x)
x
end
```
```
using Example: baz
PrivacyError baz is not available outside of Example
```
and
```
using Example
using Example: bar, fn
a = foo(x)
b = bar(x)
c = fn(x)
d = Example.baz(x)
PrivacyError baz is not available outside of Example
```
…On Sat, Nov 25, 2023 at 4:31 PM Lilith Orion Hafner < ***@***.***> wrote:
Fields should rarely be public. Properties, on the other hand, are
sometimes public. The current story for public properties is that if you
want a property to be public, document that property in a docstring and/or
manual. To document internals, you have to document them and also say
they're internal (just like all names prior to the public keyword).
—
Reply to this email directly, view it on GitHub
<#49973 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAM2VRQY4VA5ZQKBG25SYEDYGJPRTAVCNFSM6AAAAAAYRJ5G2KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRWGQZDGOJSG4>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I'd be against expressly disallowing use of internals - part of the appeal of Julia is that there are few or no hard barriers to anything you might want to do. Perhaps there could be a warning if an internal (anything not public) is accessed from another module, but it should be suppressible. |
I do not advocate warning if anything not public is accessed from another
module .. only if something explicitly marked private is accessed from
another module.
…On Sun, Nov 26, 2023 at 8:58 PM Nicholas Bauer ***@***.***> wrote:
Now that we have *public*, which provides a way to include a function or
property in the api associated with a module ... What about the
complementary *private*, which would exclude a function from use outside
of a module?
I'd be against expressly disallowing use of internals - part of the appeal
of Julia is that there are few or no hard barriers to anything you might
want to do. Perhaps there could be a warning if an internal (anything not
public) is accessed from another module, but it should be suppressible.
—
Reply to this email directly, view it on GitHub
<#49973 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAM2VRWDWZH33NEHVJSSDU3YGPXT7AVCNFSM6AAAAAAYRJ5G2KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRXGAYTGMJWGE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
But why would you want or need this third level of accessibility? |
```
export simulate_the_universe
private _prepare_for_a_universe
function simulate_the_universe()
step1 = _prepare_for_a_universe()
universe(step1)
end
```
…On Mon, Nov 27, 2023 at 9:20 AM Nicholas Bauer ***@***.***> wrote:
I do not advocate warning if anything not public is accessed from another
module .. only if something explicitly marked private is accessed from
another module.
But why would you want or need this third level of accessibility?
public/export indicates the symbols the comprise the API surface area
that should follow SemVer guidelines. Everything else is not. What would
this third level be needed for?
—
Reply to this email directly, view it on GitHub
<#49973 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAM2VRXA7RZFT7OBGHEDE73YGSOSVAVCNFSM6AAAAAAYRJ5G2KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRXHEZDMNRXGI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Not breaking existing code limits what we can do. However, there's a decent argument to be made that since what's breaking is not a public API, blocking access to package internals is not actually technically breaking according to semver. Of course, we still have to be cautious to prevent massive ecosystem breakage. It would not go well to just flip the "no private access" switch for the whole ecosystem at once. Here's a possible transition strategy:
I'm not sure about the implementation side. If we can control module internals access per depender that would be ideal. That suggests that private/public needs to be metadata associated with the binding itself, which is kind of gnarly. Maybe we can do something with auto-wrapping each package in a public-only wrapper that rebinds only the public bindings of the internal package. That's the simplest to implement, but feels kind of icky. Having public/private annotations on fields in structs would also be good, but I'm not quite ready to tackle that. |
For what it's worth (and that's the last I'll say about this here), discussing "disallowing access to internals" without a concrete & clear picture of what exactly is included in that "internals" moniker and what "access" means (what this entire issue/proposal has been about) seems not fruitful to me. Base isn't alone in having things "internal" that are not as clear cut as "this symbol is internal, this other one is not"; the I'm pretty opposed to disallowing access to internals entirely, as I've stated multiple times in this issue. If this is a direction the Julia project wants to go in, I humbly request discussion about it to take place in a different issue/github discussion. |
I've copied @StefanKarpinski's comment verbatim to the OP of a new issue. I think that that particular comment is actionable and specific and makes a good OP. |
hard agree on this. IIRC, some Julia folks that are way more knowledgeable than me pointed out the "private" stuff as seen in C++/Java doesn't make programs safer more than they make programmers' life harder :( |
It would be a weak private—programmers would still be able to access internals if they want, it would just require different compat declarations. |
One thing that hasn't really been discussed so far but which often causes breakages (e.g. in Julia updated) is modification or addition of method signatures that causes new ambiguities to occur for existing calls of other methods of some function. So far, from Julia's p.o.v we kind of brush these under the rug and let the packages deal with it or try put in some tweak to fix the problems that occur in practice but if we want to be serious about this I think it needs to be properly addressed. |
To give a small(ish) example PkgA 1.0.0 module PkgA
public foo
"Returns a tuple and does not throw"
foo(k::Any, v::Any) = (k, v)
end PkgB 1.0.0, depends on PkgA version 1 module PkgB
using PkgA
public B
struct B end
PkgA.foo(k::B, v::Any) = (:b, v)
end PkgC 1.0.0, depends on PkgA version 1 module PkgC
using PkgA
public C
struct C end
end PkgD 1.0.0, depends on PkgA version 1, PkgB version 1, and PkgC version 1 module PkgD
using PkgA, PkgB, PkgC
const k = PkgA.B()
const v = PkgA.C()
const tup = PkgA.foo(k, v)
end PkgC 1.1.0, depends on PkgA version 1 module PkgC
using PkgA
public C
struct C end
PkgA.foo(k::Any, v::C) = (k, :c)
end PkgD loads when loaded with [email protected], [email protected], and [email protected] but fails to load when loaded with [email protected], [email protected], and [email protected]. Because no package depends on the internals of any other package, this means PkgC version 1.1.0 was a breaking change even though all it did was add a non-pirated method. |
The general goal of this proposal is to formalize what it means to be considered API in julia, to give a framework to talk about what even is a "breaking" change to make it easier for developers to provide more documentation for their packages as well as avoiding breakage in the first place. The reasoning in this document comes from the current practices in use in the ecosystem, as well as type theoretic requirements of the type system, though having a background in type theory is not required to participate in the discussion.
Please do reply if you have something to add/comment on!
Motivation
The current stance of Base and much of the package ecosystem of what is considered to be API under semver is "If it's in the manual, it's API". Other approaches are "If it has a docstring it's API", "If it's exported, it's API", "If it's in the Internal submodule it's API" or "If we explicitly mention it, it's API". These approaches are inconsistent with each other and have a number of issues:
@deprecate
is hard to use correctly and often not used at all.This proposal has three parts - first, presenting a full-stack solution to both the Discoverability and Maintainability issues described above, and second, proposing a small list of things that could be done with the proposed feature to make Reviewability easier. Finally, the third part is focused on a (preliminary, nonexhaustive) "How do we get there" list of things required to be implemented for this proposal.
There is also a FAQ list at the end, hoping to anticipate some questions about this proposal that already came up in previous discussions and thoughts about this design.
The
api
keywordThe main proposal for user-facing interactions with declaring whether an object is the new
api
keyword. The keyword can be placed in front of definitions in global scope, likestruct
,abstract type
,module
,function
andconst
/type annotated global variables. Usingapi
is declaring the intent of a developer, about which parts of the accessible symbols they consider API under semver and plan to support in newer versions.When a project/environment importing a package wants to access a symbol not marked as API (if it is not the same project/environment originally defining the symbol), a warning is displayed, making the user aware of the unsupported access but doesn't otherwise hinder it. This behavior should be, to an extent, configurable, to support legitimate accesses of internals (insofar those exist). There is the caveat that silencing these warnings makes no difference to whether or not the access is supported by a developer. This is intended to provide an incentive to either use the supported subset of the API, or encourage a user to start a discussion with the developer to provide an API for what they would like to do. The result of such a discussion can be a "No, we won't support this"! This, however, is a far more desirable outcome to accessing internals and fearing breakage later on, if it would have been avoided by such a discussion.
The following sections explain how
api
interacts with various uses, what the interactions ought to mean semantically as well as the reasoning for choosing the semantics to be so.function
Consider this example:
This declares the function
foo
, with a single method taking two arguments of any type. Theapi
keyword, when written in front of such a method definition, only declares the given method as API. If we were to later on define a new method, taking different arguments like sothe new method would not be considered API under semver. The reasoning for this is as simple - once a method (any object, really) is included in a new release as API, removing it is a breaking change, even if that inclusion as API was accidental. As such, being conservative with what is considered API is a boon to maintainability.
Being declared on a per-method case means the following:
api
MAY NOT be removed in a non-breaking release, without splitting the existing method into multiple definitions that are able to fully take on the existing dispatches of the previous single method. In type parlance, this means that the type union of the signatures of the replacement methods MUST be at least as specific as the original method, but MAY be less specific. This is to prevent introducing accidentalMethodError
s where there were none before.api
MAY NOT introduce an error where there was none before, without that being done in a breaking release.api
MAY change the return type of a given set of input arguments, even in a non-breaking release. Developers are free to strengthen this toMAY NOT
if they feel it appropriate for their function/method.api
MAY remove an error and introduce a non-throwing result.api
MAY change one error type to another, but developers are free to strengthen this toMAY NOT
if they feel it appropriate for their function.This is not enforced in the compiler (it can't do so without versioned consistency checking between executions/compilations, though some third party tooling could implement such a mechanism for CI checks or similar), but serves as a semantic guideline to be able to anticipate breaking changes and allow developers to plan around and test for them easier. The exact semantics a method that is marked as API must obey to be considered API, apart of its signature and the above points, are up to the developers of a function.
Depending on usecase (for example, some interface packages), it is desirable to mark all methods of a function as API. As a shorthand, the syntax
declares ALL methods of
bar
to be public API as an escape hatch - i.e., the above syntax declares the functionbar
to be API, not just individual methods. In Base-internal parlance, theapi
keyword on a single method only marks an entry in the method table as API, while the use on a zero-arg definition marks the whole method table as API. An API mark on the whole method table trumps a nonexistent mark on a single method - it effectively acts as if there were a method taking a sole::Vararg{Any}
argument and marking that as API.struct
The cases elaborated on here are (effectively) already the case today, and are only mentioned here for clarity. They are (mostly) a consequence of subtyping relationships and dispatch.
Consider this example:
A struct like the above annotated with
api
guarantees that the default constructor methods are marked asapi
. The subtyping relationship is considered API under semver, up to and includingAny
. The existence of the fieldsa
andb
are considered API, as well as their relationship to the type parametersT
andS
.In the example above, the full chain
MyStruct{T,S} <: AbstractFoo <: Any
is considered API under semver, which means that methods declared as taking either anAbstractFoo
or anAny
argument must continue to also take objects of typeMyStruct
. This means that changing a definition like the one above into one like thisis a breaking change under semver. It is however legal to do the following:
because the old subtyping chain
MyStruct{T,S} <: AbstractFoo <: Any
is a subchain of the new chainMyStruct{T,S} <: AbstractBar <: AbstractFoo <: Any
. That is, it is legal to grow the subtyping chain downwards.Notably, making
MyStruct
API does not mean thatAbstractFoo
itself is API, i.e. adding new subtypes toAbstractBar
is not supported and is not considered API purely by annotating a subtype asAPI
.Since the new type in a changing release must be useable in all places where the old type was used, the only additional restriction placed on
MyStruct
as defined above is that no type parameters may be removed. Due to the way dispatch is lazy in terms of matching type parameters, it is legal to add more type parameters without making a breaking change (even if this makes uses of things likeMyStruct{S,T}
in structs containing objects of this type type unstable).In regards to whether field access is considered API or not, it is possible to annotate individual fields as
api
:This requires the main struct to be annotated as
api
as well - annotating a field as API without also annotating the struct as API is illegal. This means that accessing an object of typeMyStruct
viagetfield(::MyStruct, :b)
orgetproperty(::MyStruct, :b)
is covered under semver and considered API. The same is not true of the fielda
, its type or the connection to the first type parameter, the layout ofMyStruct
or the internal padding bytes that may be inserted into instances ofMyStruct
.abstract type
abstract type
behaves similarly tostruct
, in that it is illegal to remove a type from a subtype chain while it being legal to extend the chain downwards or to introduce new supertypes in the supertype chain.Consider this example:
The following changes do not require a breaking version:
The following changes require a new breaking version:
What the
api
keyword used on abstract types effectively means for the users of a package is that it is considered API to subtype the abstract type, to opt into some behavior/set of API methods/dispatches the package provides, as long as the semantics of the type (usually detailed in its docstring) are followed. In particular, this means that methods likeapi function foo(a::MyAbstract)
are expected to work with new objectsMyConcrete <: MyAbstract
defined by a user, but methods likefunction bar(a::MyAbstract)
(note the lack ofapi
) are not.At the same time, a lack of
api
can be considered an indicator that it is not generally expected nor supported to subtype the abstract type in question.type annotated/
const
global variablesMarking a global
const
or type annotated variable asapi
means that the variable binding is considered API under semver and is guaranteed to exist in new versions as well. For a type annotated global variable, both reading from and writing to the variable by users of a package is considered API, while forconst
global variables only reading is considered API, writing is not API.The type of a type annotated variable is allowed to be narrowed to a subtype of the original type (i.e. a type giving more guarantees), since all uses of the old assumption (the weaker supertype giving less guarantees) are expected to continue to work.
Non-type-annotated global variables can never be considered API, as the variable can make no guarantees about the object in question and any implicit assumptions of the object that should hold ought to be encoded in an abstract type representing those assumptions/invariants. It is legal to explicitly write
api Bar::Any = ...
.It should be noted that it is the variable binding that is considered API, not the the variable refers to itself. It is legal to document additional guarantees or requirements of an object being referred to through a binding marked as
api
.module
Annotating an entire
module
expression withapi
means that all first-level symbols defined in that module are considered API under semver. This means that they cannot be removed from the module and accessing them should return an object compatible with the type that same binding had in the previous version.Consider this example:
In this example,
Foo
,Foo.f
,Foo.Bar
,Foo.Bar.baz
are considered API, whileFoo.Bar.bak
is not.Consider this other example:
In this example,
Foo.g
,Foo.Bar
,Foo.Bar.baz
andFoo.Bar.bak
are considered API, whileFoo
andFoo.f
are not.Consider this third example:
Only
Foo.Bar.baz
is considered API, the other names in and fromFoo
andFoo.Bar
are not.Uses
This is a list of imagined uses of this functionality:
api
bindings available in the final image/binary/shared objectapi
marker could be used for not mangling the names of julia functions when compiling an.so
, as currently all names are mangled by default (unless marked as@ccallable
, if I'm not mistaken, which is limited to taking C-compatible types in a C-style ABI/calling convention).Do you have ideas? Mention them and I'll edit them in here!
Required Implementation Steps
api
keyword in the correct places and produce an expression the compiler can use later onapi
marker handling to methods and the method table implementation, as well as to binding lookup in moduleshelp>
mode aware ofapi
tags.api
Known Difficulties with the proposal
Expr(:function)
does not have space in its first two arguments for additional metadata, so this would need to be added to either a third argument, or create a newExpr(:api_function)
. Analogous issues exist forExpr(:struct)
,Expr(:=)
,Expr(:module)
etc. Both approaches potentially require macros to be changed, to be aware ofapi
in their expansion.FAQ
requires quite deep changes to
Method
and other (internal?) objects ofBase
, exposing thisas a macro would also mean exposing this as an API to the runtime, even though this
api
distinction isnot about dynamicness - the
api
surface of a package really ought to be fixed in a given version,and not change dynamically at runtime.
private
instead?into the language. Additionally, marking things as
private
,internal
or similar instead ofapi
means that any time a developer accidentally forgets to add that modifier means a technically breaking
change in a release by adding that. The whole point of this proposal is to avoid this kind of breakage.
public
?public
overloads thisalready overloaded term in the wider programming community too much.
public
/private
are commonlyassociated with access modifiers, which is decidedly not what this proposal is about.
api
, which makes its intent very clear. It would also be prudent to have that discussion after we'vecome to a compromise about the desired semantics.
export
?export
is a bit tricky, since it doesn't distinguish between methods the wayapi
does. I thinkit could work to mark all
export
ed symbols withapi
as well (this is certainly not without itsown pitfallse..), though I also think that
export
is a bit of an orthogonal concept to
api
, due to the former being about namespacing and the latterbeing exclusively about what is considered to be actually supported. I think a good example is
the way
save
/load
are implemented with FileIO.jl. While the parent interface package exportssave
and
load
, packages wishing to register a new file format define new, private functions for theseand register those on loading with FileIO (or FileIO calls into them if they're in the environment).
This means that
MyPkg.save
is not exported fromMyPkg
, but is nevertheless a supported APIprovided by
MyPkg
. The intention is to support these kinds of usecases, whereexport
isundesirable for various reasons, while still wishing to provide a documented/supported API surface
to a package.
I hope this proposal leads to at least some discussion around the issues we face or, failing to get implemented directly, hopefully some other version of more formalized API semantics being merged at some point.
The text was updated successfully, but these errors were encountered: