Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taking API surface seriously: Proposing Syntax for declaring API #49973

Open
Seelengrab opened this issue May 27, 2023 · 123 comments
Open

Taking API surface seriously: Proposing Syntax for declaring API #49973

Seelengrab opened this issue May 27, 2023 · 123 comments
Labels
design Design of APIs or of the language itself feature Indicates new feature / enhancement requests

Comments

@Seelengrab
Copy link
Contributor

Seelengrab commented May 27, 2023

The general goal of this proposal is to formalize what it means to be considered API in julia, to give a framework to talk about what even is a "breaking" change to make it easier for developers to provide more documentation for their packages as well as avoiding breakage in the first place. The reasoning in this document comes from the current practices in use in the ecosystem, as well as type theoretic requirements of the type system, though having a background in type theory is not required to participate in the discussion.

Please do reply if you have something to add/comment on!

Motivation

The current stance of Base and much of the package ecosystem of what is considered to be API under semver is "If it's in the manual, it's API". Other approaches are "If it has a docstring it's API", "If it's exported, it's API", "If it's in the Internal submodule it's API" or "If we explicitly mention it, it's API". These approaches are inconsistent with each other and have a number of issues:

  • Discoverability
    • As a user writing code, it is nigh impossible to figure out whether a given function they found being used in some other code is stable under semver or not, unnecessarily increasing the chances for accidental breakage later on.
    • Tooling can't reliably query/show whether some symbol/object is stable API or not, because the inconsistencies between the approaches only lend themselves to bespoke solutions on a package by package basis.
  • Maintainability
    • The fuzziness of some of these approaches disincentivizes writing documentation for internal functionality, for fear of others taking the existence of a docstring as guarantor for the function in question being API.
    • This fuzziness and lack of internal documentation increases the "time to first contribution" by newcomers and returning developers as well, due to having to reverse engineer the package to contribute in the first place.
    • "Unofficial" API surfaces grow that are using internals (often because the developers in question wrote the internal in the first place), leading to a sort-of tacit implication that it is ok to use internal functionality of Base/a package.
    • Accidental exposure of an internal as API (even just perceived API!) generally means that removing that internal would necessitate a breaking change - this doesn't scale well, and would quickly lead to not being able to change any code.
    • Due to the fuzziness of what exactly is considered API, deprecations are difficult - @deprecate is hard to use correctly and often not used at all.
  • Reviewability
    • It is all too easy to accidentally rely on an internal of another package due to the bad discoverability of what actually is API. This increases the burden on reviewers to check contributions for whether or not the functionality a PR is introducing is relying on/bringing in internal uses, even if they may not be familiar with the dependency in question.
    • The lax stance around what exactly is considered API and the fact that internals rarely get docstrings or even comments explaining what the code does means that contributions don't usually get checked for whether they (accidentally or not) increase the API surface or whether a particular aspect of a contribution or new feature ought to have a documented API in the first place.

This proposal has three parts - first, presenting a full-stack solution to both the Discoverability and Maintainability issues described above, and second, proposing a small list of things that could be done with the proposed feature to make Reviewability easier. Finally, the third part is focused on a (preliminary, nonexhaustive) "How do we get there" list of things required to be implemented for this proposal.

There is also a FAQ list at the end, hoping to anticipate some questions about this proposal that already came up in previous discussions and thoughts about this design.

The api keyword

The main proposal for user-facing interactions with declaring whether an object is the new api keyword. The keyword can be placed in front of definitions in global scope, like struct, abstract type, module, function and const/type annotated global variables. Using api is declaring the intent of a developer, about which parts of the accessible symbols they consider API under semver and plan to support in newer versions.

When a project/environment importing a package wants to access a symbol not marked as API (if it is not the same project/environment originally defining the symbol), a warning is displayed, making the user aware of the unsupported access but doesn't otherwise hinder it. This behavior should be, to an extent, configurable, to support legitimate accesses of internals (insofar those exist). There is the caveat that silencing these warnings makes no difference to whether or not the access is supported by a developer. This is intended to provide an incentive to either use the supported subset of the API, or encourage a user to start a discussion with the developer to provide an API for what they would like to do. The result of such a discussion can be a "No, we won't support this"! This, however, is a far more desirable outcome to accessing internals and fearing breakage later on, if it would have been avoided by such a discussion.

The following sections explain how api interacts with various uses, what the interactions ought to mean semantically as well as the reasoning for choosing the semantics to be so.

function

Consider this example:

api function foo(arg1, arg2)
   arg1 + arg2
end

# equivalently:
# api foo(arg1, arg2) = arg1 + arg2

This declares the function foo, with a single method taking two arguments of any type. The api keyword, when written in front of such a method definition, only declares the given method as API. If we were to later on define a new method, taking different arguments like so

function foo(arg1::Int, arg2::Float64)
  arg1 * arg2
end

the new method would not be considered API under semver. The reasoning for this is as simple - once a method (any object, really) is included in a new release as API, removing it is a breaking change, even if that inclusion as API was accidental. As such, being conservative with what is considered API is a boon to maintainability.

Being declared on a per-method case means the following:

  • A method annotated with api MAY NOT be removed in a non-breaking release, without splitting the existing method into multiple definitions that are able to fully take on the existing dispatches of the previous single method. In type parlance, this means that the type union of the signatures of the replacement methods MUST be at least as specific as the original method, but MAY be less specific. This is to prevent introducing accidental MethodErrors where there were none before.
  • A function/method annotated with api MAY NOT introduce an error where there was none before, without that being done in a breaking release.
  • A function/method annotated with api MAY change the return type of a given set of input arguments, even in a non-breaking release. Developers are free to strengthen this to MAY NOT if they feel it appropriate for their function/method.
  • A function/method annotated with api MAY remove an error and introduce a non-throwing result.
  • A function/method annotated with api MAY change one error type to another, but developers are free to strengthen this to MAY NOT if they feel it appropriate for their function.

This is not enforced in the compiler (it can't do so without versioned consistency checking between executions/compilations, though some third party tooling could implement such a mechanism for CI checks or similar), but serves as a semantic guideline to be able to anticipate breaking changes and allow developers to plan around and test for them easier. The exact semantics a method that is marked as API must obey to be considered API, apart of its signature and the above points, are up to the developers of a function.

Depending on usecase (for example, some interface packages), it is desirable to mark all methods of a function as API. As a shorthand, the syntax

api function bar end

declares ALL methods of bar to be public API as an escape hatch - i.e., the above syntax declares the function bar to be API, not just individual methods. In Base-internal parlance, the api keyword on a single method only marks an entry in the method table as API, while the use on a zero-arg definition marks the whole method table as API. An API mark on the whole method table trumps a nonexistent mark on a single method - it effectively acts as if there were a method taking a sole ::Vararg{Any} argument and marking that as API.

struct

The cases elaborated on here are (effectively) already the case today, and are only mentioned here for clarity. They are (mostly) a consequence of subtyping relationships and dispatch.

Consider this example:

abstract type AbstractFoo end

api struct MyStruct{T, S} <: AbstractFoo
    a::T
    b::S
end

A struct like the above annotated with api guarantees that the default constructor methods are marked as api. The subtyping relationship is considered API under semver, up to and including Any. The existence of the fields a and b are considered API, as well as their relationship to the type parameters T and S.

In the example above, the full chain MyStruct{T,S} <: AbstractFoo <: Any is considered API under semver, which means that methods declared as taking either an AbstractFoo or an Any argument must continue to also take objects of type MyStruct. This means that changing a definition like the one above into one like this

abstract type AbstractBar end
abstract type AbstractFoo end

api struct MyStruct{T,S} <: AbstractBar
    a::T
    b::S
end

is a breaking change under semver. It is however legal to do the following:

abstract type AbstractFoo end
abstract type AbstractBar <: AbstractFoo end

api struct MyStruct{T,S} <: AbstractBar
    a::T
    b::S
end

because the old subtyping chain MyStruct{T,S} <: AbstractFoo <: Any is a subchain of the new chain MyStruct{T,S} <: AbstractBar <: AbstractFoo <: Any. That is, it is legal to grow the subtyping chain downwards.

Notably, making MyStruct API does not mean that AbstractFoo itself is API, i.e. adding new subtypes to AbstractBar is not supported and is not considered API purely by annotating a subtype as API.

Since the new type in a changing release must be useable in all places where the old type was used, the only additional restriction placed on MyStruct as defined above is that no type parameters may be removed. Due to the way dispatch is lazy in terms of matching type parameters, it is legal to add more type parameters without making a breaking change (even if this makes uses of things like MyStruct{S,T} in structs containing objects of this type type unstable).

In regards to whether field access is considered API or not, it is possible to annotate individual fields as api:

api struct MyStruct{T,S}
    a::T
    api b::S
end

This requires the main struct to be annotated as api as well - annotating a field as API without also annotating the struct as API is illegal. This means that accessing an object of type MyStruct via getfield(::MyStruct, :b) or getproperty(::MyStruct, :b) is covered under semver and considered API. The same is not true of the field a, its type or the connection to the first type parameter, the layout of MyStruct or the internal padding bytes that may be inserted into instances of MyStruct.

abstract type

abstract type behaves similarly to struct, in that it is illegal to remove a type from a subtype chain while it being legal to extend the chain downwards or to introduce new supertypes in the supertype chain.

Consider this example:

abstract type AbstractBar end
api abstract type MyAbstract <: AbstractBar end
# MyAbstract <: AbstractBar <: Any

The following changes do not require a breaking version:

# introducing a supertype
abstract type AbstractBar end
abstract type AbstractFoo <: AbstractBar end
api abstract type MyAbstract <: AbstractFoo end
# MyAbstract <: AbstractFoo <: AbstractBar <: Any

The following changes require a new breaking version:

# removing a supertype
api abstract type MyAbstract <: AbstractFoo end
# MyAbstract <: <: Any
# removing the `api` keyword
abstract type AbstractBar end
abstract type MyAbstract <: AbstractFoo end
# MyAbstract <: AbstractBar <: Any

What the api keyword used on abstract types effectively means for the users of a package is that it is considered API to subtype the abstract type, to opt into some behavior/set of API methods/dispatches the package provides, as long as the semantics of the type (usually detailed in its docstring) are followed. In particular, this means that methods like api function foo(a::MyAbstract) are expected to work with new objects MyConcrete <: MyAbstract defined by a user, but methods like function bar(a::MyAbstract) (note the lack of api) are not.

At the same time, a lack of api can be considered an indicator that it is not generally expected nor supported to subtype the abstract type in question.

type annotated/const global variables

api MyFoo::Foo = ...
api const MyBar = ...

Marking a global const or type annotated variable as api means that the variable binding is considered API under semver and is guaranteed to exist in new versions as well. For a type annotated global variable, both reading from and writing to the variable by users of a package is considered API, while for const global variables only reading is considered API, writing is not API.

The type of a type annotated variable is allowed to be narrowed to a subtype of the original type (i.e. a type giving more guarantees), since all uses of the old assumption (the weaker supertype giving less guarantees) are expected to continue to work.

Non-type-annotated global variables can never be considered API, as the variable can make no guarantees about the object in question and any implicit assumptions of the object that should hold ought to be encoded in an abstract type representing those assumptions/invariants. It is legal to explicitly write api Bar::Any = ....

It should be noted that it is the variable binding that is considered API, not the the variable refers to itself. It is legal to document additional guarantees or requirements of an object being referred to through a binding marked as api.

module

Annotating an entire module expression with api means that all first-level symbols defined in that module are considered API under semver. This means that they cannot be removed from the module and accessing them should return an object compatible with the type that same binding had in the previous version.

Consider this example:

api module Foo
    module Bar
      api const baz = 1
      const bak = 42
    end

    f() = "hello"
end

In this example, Foo, Foo.f, Foo.Bar, Foo.Bar.baz are considered API, while Foo.Bar.bak is not.

Consider this other example:

module Foo
    api module Bar
      const baz = 1
      const bak = 42
    end

    f() = "hello"
    api g() = 
end

In this example, Foo.g, Foo.Bar, Foo.Bar.baz and Foo.Bar.bak are considered API, while Foo and Foo.f are not.

Consider this third example:

module Foo
    module Bar
      api const baz = 1
      const bak = 42
    end

    f() = "hello"
end

Only Foo.Bar.baz is considered API, the other names in and from Foo and Foo.Bar are not.

Uses

This is a list of imagined uses of this functionality:

  • Linting hints in LSP.jl, to make users aware of accidentally using internals of a package/Base.
  • API surface tracking over time, especially in regards to test coverage and breaking changes.
  • Tree-shaking for Pkg images/static compilation, to only make api bindings available in the final image/binary/shared object
    • This is primarily thought of in the context of compilation to a shared object - the api marker could be used for not mangling the names of julia functions when compiling an .so, as currently all names are mangled by default (unless marked as @ccallable, if I'm not mistaken, which is limited to taking C-compatible types in a C-style ABI/calling convention).
  • Automatic generation of documentation hints about API surface in Documenter.jl
  • Easier tracking of deprecated functionality before it is ultimately removed in a breaking change
  • Have a PR template mentioning "Does this PR introduce new API?"
  • CI Check enforcing that new API bindings have a docstring associated with them

Do you have ideas? Mention them and I'll edit them in here!

Required Implementation Steps

  • Parse the api keyword in the correct places and produce an expression the compiler can use later on
  • Add api marker handling to methods and the method table implementation, as well as to binding lookup in modules
  • Make the REPL help> mode aware of api tags.
  • Go through all of Base and the stdlibs and mark the bindings currently residing in the manual as api

Known Difficulties with the proposal

  • Expr(:function) does not have space in its first two arguments for additional metadata, so this would need to be added to either a third argument, or create a new Expr(:api_function). Analogous issues exist for Expr(:struct), Expr(:=), Expr(:module) etc. Both approaches potentially require macros to be changed, to be aware of api in their expansion.
  • There is a lot of doc churn, but sadly there's no way around that. My hope is that this can make it easier to just write more docstrings, by virtue of not "promoting" a function to API status just by virtue of having a docstring.
  • This proposal requires quite a bit of julia-internals expertise, touching the parser, probably lowering as well, lots of internal objects/state, the REPL and external packages. It's a very large amount of work, and there's likely no chance of any one person being able to implement all of it - this will be a group effort.

FAQ

  • Why not a macro instead of new syntax?
    • A macro has the disadvantage of not composing nicely with existing macros. Additionally, since this
      requires quite deep changes to Method and other (internal?) objects of Base, exposing this
      as a macro would also mean exposing this as an API to the runtime, even though this api distinction is
      not about dynamicness - the api surface of a package really ought to be fixed in a given version,
      and not change dynamically at runtime.
  • What about annotating bindings as private instead?
    • There is no intention of preventing access to an internal object or otherwise introduce access modifiers
      into the language. Additionally, marking things as private, internal or similar instead of api
      means that any time a developer accidentally forgets to add that modifier means a technically breaking
      change in a release by adding that. The whole point of this proposal is to avoid this kind of breakage.
  • What about naming the keyword public?
    • As there is no intention to provide access modifiers, I feel like naming this public overloads this
      already overloaded term in the wider programming community too much. public/private are commonly
      associated with access modifiers, which is decidedly not what this proposal is about.
  • What about naming the keyword ?
    • Bikeshedding the name is always welcome, though I think it hard to compete with the short conciseness of
      api, which makes its intent very clear. It would also be prudent to have that discussion after we've
      come to a compromise about the desired semantics.
  • How does this proposal interact with export?
    • export is a bit tricky, since it doesn't distinguish between methods the way api does. I think
      it could work to mark all exported symbols with api as well (this is certainly not without its
      own pitfallse..), though I also think that export
      is a bit of an orthogonal concept to api, due to the former being about namespacing and the latter
      being exclusively about what is considered to be actually supported. I think a good example is
      the way save/load are implemented with FileIO.jl. While the parent interface package exports save
      and load, packages wishing to register a new file format define new, private functions for these
      and register those on loading with FileIO (or FileIO calls into them if they're in the environment).
      This means that MyPkg.save is not exported from MyPkg, but is nevertheless a supported API
      provided by MyPkg. The intention is to support these kinds of usecases, where export is
      undesirable for various reasons, while still wishing to provide a documented/supported API surface
      to a package.
  • Why not prototype this in a package?
    • There are various prototypes of similar things in some packages, none of which has been widely adopted as far as I know and I think this is something Base itself could really use as well. Not to mention that starting this in a package is IMO going to splinter the "this is how we define API" discussion further.

I hope this proposal leads to at least some discussion around the issues we face or, failing to get implemented directly, hopefully some other version of more formalized API semantics being merged at some point.

@c42f
Copy link
Member

c42f commented May 30, 2023

I think the motivation here is solid! Well said :-)

I don't so much like the feel like the proposed api keyword directly in front of syntactic constructs. Basically for similar reasons that I'm against the same thing for export, as expressed here #8005 (comment)

Also I don't understand why api would make sense as a per-method thing rather than a per-function thing. It's generic functions which are named/called by the user's code, not individual methods. So the entire generic function should have the same API status.

Did you consider the alternative of having the API declared separately, in a similar way to how the export list currently is? Something more with the flavor of the proposal in #30204?

For example, we could have scoped export:

scoped export A, B

The idea is to export A and B as public API, but keep their names "scoped" within the namespace of the exporting module.

A nice thing about this is that it could be implemented as a contextual keyword: the scoped would only be a keyword when followed by export in the same way that mutable is not a keyword unless followed by struct. Thus not breaking any existing code. It would also be easy to support in Compat.jl as @scoped export A, B (which would just return export A, B on older Julia versions, and would retain the usual special case parsing rules for export lists).

Regardless of my quibbles, I think this is an important issue and it's great that you've put the effort into writing up your thoughts. Thank you!

@inkydragon inkydragon added design Design of APIs or of the language itself feature Indicates new feature / enhancement requests labels May 30, 2023
@Seelengrab
Copy link
Contributor Author

Did you consider the alternative of having the API declared separately, in a similar way to how the export list currently is?

I explicitly haven't mentioned other proposals because I want to see how people react to this taken at face value - not to mention that I think most other ideas in this space are a bit underspecified in what exactly the desired semantics are. I think that's supported by the amount of discussion around what the various PRs & issues mean & want, which is what I attempted to give here directly. Most of the proposal follows from the requirements imposed by "what is a breaking change" when looking at functions, methods, variables etc from the POV of a user upgrading a version of a package.

My reasoning for placing api directly in front, rather than as an explicit list, is that it gives a small incentive/reminder to be careful when working on that code. The whole point is to be more explicit & thoughtful with what could be breaking changes, and I think having the API marker at the place of definition helps with that (it also makes it obvious that there's an undocumented API thing, which makes it extremely easy for newcomers to come in with a docstring PR, since they then "only" have to figure out what a function is doing to be able to document it, instead of having to figure out that there's an undocumented thing in the first place).

Another reason is that api is a relatively weak marker, giving only the most necessary semantics required from the type system (in case of methods/dispatch/structs/abstract types). For any more details what api means in the context of a given package, a docstring giving the exact surface is required - otherwise, every single bit of observable behavior could be taken as supported API, meaning you suddenly couldn't change the time complexity of a function anymore, for example.

And a third reason - if the whole function is always considered API, what about new methods added from outside of the package defining the function? Are they also automatically considered API, and who supports them? That's not something you can do with export - you can't export new names from a foreign module (without eval, but that breaks everything anyway). I think it's better to be a bit more conservative here - while the symbol is important in terms of what is directly accessible at all, often not the entire method surface is considered API, and while there could be a an argument made for subtly enforcing forwards to actual internal functions, I think that's quite an unnecessary indirection, requiring possible duplication of docs to devdocs, duplication of (future) additions in permitted dispatch, taking even more care to double check that you don't accidentally violate an invariant guaranteed by the parent caller etc. This, to me, feels like too much additional boilerplate, when we already have dispatch to distinguish these cases from each other - so why not expose that in API as well?

Finally, I don't think it's difficult per se to add something like

api foo, bar, baz

as a syntax for an API list, marking all of these symbols as API. The difficulty lies in the semantics - what the semantics of that api marker are depends strongly on what foo, bar and baz are, and so using such a list requires re-inspection into what foo etc are, at a time when they (potentially) aren't even defined yet (and requires users to jump around in the source code while writing docs - something I'd like to avoid for dev-UX reasons). So I guess that's a fourth reason why I prefer local api - unlike namespacing of symbols, which is a global property due to the symbol being unique, api really has local differences in meaning & requirements. Not to mention that Jeff already expressed dislike for having "more than one way" of writing the same thing when talking about export - maybe he feels differently about api, but we'll have to wait for him to chime in to know that :)

Apologies, this got longer than expected - thanks for asking about it!

A nice thing about this is that it could be implemented as a contextual keyword:

This proposal can also be done like that, no? I've given an explicit list of places where writing api would be legal, after all :) All of them are currently syntax errors, so adding this as a contextual keyword shouldn't break any existing code:

julia> api function foo end
ERROR: syntax: extra token "function" after end of expression
Stacktrace:
 [1] top-level scope
   @ none:1

julia> api module Foo end
ERROR: syntax: extra token "module" after end of expression
Stacktrace:
 [1] top-level scope
   @ none:1

julia> api const bar = 1
ERROR: syntax: extra token "const" after end of expression
Stacktrace:
 [1] top-level scope
   @ none:1

julia> api abstract type MyAbstract end
ERROR: syntax: extra token "abstract" after end of expression
Stacktrace:
 [1] top-level scope
   @ none:1

julia> api struct Foo end
ERROR: syntax: extra token "struct" after end of expression
Stacktrace:
 [1] top-level scope
   @ none:1

julia> api foo::Int = 1
ERROR: syntax: extra token "foo" after end of expression
Stacktrace:
 [1] top-level scope
   @ none:1

Thank you for taking the time to read through the proposal! I really appreciate your thoughts, especially because of your work on JuliaSyntax. If the scheme parser weren't so hard to jump into and hack on, this might have already been a draft PR instead of a proposal :)

@martinholters
Copy link
Member

I'm a bit skeptical about the per-method thing. At least it needs some good thinking about what would be breaking. Coming from above example

api function foo(arg1, arg2)
   arg1 + arg2
end

A user can now happily do foo(1, 2.0). But the package in a new version now adds

function foo(arg1::Int, arg2::Float64)
  arg1 * arg2
end

The same call as before is now accessing an internal method.

@Seelengrab
Copy link
Contributor Author

Seelengrab commented May 30, 2023

The same call as before is now accessing an internal method.

Yes, that is a good example for an accidental breaking change, of the sort described in the proposal that really must not happen. I realize now that I've only written about the case of removing foo(arg1, arg2), but of course this applies to adding new dispatches as well. I can add something like this to the list of "MUST"s, if you think that's correct:

  • Adding new methods to a function MUST NOT reduce the available API surface. Dispatches that were api before the addition must also be marked api after the addition, such that code relying on the old dispatch doesn't have to access a non-api method.

I think there is room for having this testable as part of a testsuite - for example, one could take the declared signature of all api methods of foo in the last commit, and compare that against the dispatches of those signatures in all methods of foo in the new commit. If any signature would dispatch to a non-api marked method in the new commit, you have a testfailure due to an accidental breaking change. This does have the downside of requiring the testsuite to have awareness of being a testsuite and being able to check against a known-good version, but IMO that's something we should do anyway - we currently just do it (badly) in our heads when writing the code.

Another way this could be done without checking past versions is have something of the sort of

@test_api foo(::Number, ::Number)

to check whether the given signature (and its subtypes) dispatches to an api method or not. I feel like there's also room for improvement there, but I haven't thought about that part too deeply yet (compared to the other parts of the proposal anyway).

@rfourquet
Copy link
Member

rfourquet commented May 30, 2023

I don't have much insignt into the merits of marking api per-method vs per-function, but I fear that per-method would be too complex and lead many devs to just don't bother with api because of cognitive overhead. On the other hand, the rule for per-function seems straightforward enough: "an api function must not break use cases allowed by the attached docstring". And for those motivated, it seems the code can always be re-organized to make "non api methods" unavailable by renaming them with e.g. a leading underscore.

@Seelengrab
Copy link
Contributor Author

Seelengrab commented May 30, 2023

Well conversely, the api function foo end functionality would already give that exact behavior 🤷 I think a per-method annotation could be a boon, as described above, but I'm willing to drop that if others feel that it's too much mental overhead (though I'll gripe about this, since it'll inevitably lead to more accidental breakages down the road 😛 the complexity of API surface doesn't vanish by marking the whole function as api after all, it just hides it away until it becomes an issue).

In either case, I think it's good that this discussion is happening - keep the feedback coming!

@dalum
Copy link
Contributor

dalum commented May 30, 2023

Thanks for a well-written proposal! I 100% agree that there is an issue with a lack of consensus around what makes an API of a package, and that it would be nice to have a rigid way to specify this.

However, first, in the bikeshed department, I don't like the keyword api. Unlike all other keywords, it's a lowercase form of an abbreviation, and I'd much rather see a keyword that isn't pronounced letter-by-letter. Or maybe this would be the motivation to start pronouncing it "ay-pai". I'm also not even sure if we need a new keyword. It feels like something that could be added to the docstring for special-handling. That would also nudge developers to write docstrings for things considered public API. Something like, if the docstring starts with @API:

"""@API
    my_function(args...)
    
This is a nicely documented function that does what you expect.
"""
function my_function(args...) end

Second, I'm a bit worried about how the notion of an API surface and the genericity of Julia code composes. It is often very convenient to define a function like my_public_function(::Any, ::Any), which then, in principle, has an API surface covering anything the user throws at it that doesn't error, but in practice, and as perhaps written in its docstring, it requires some trait-like behaviour. I can definitely foresee situations where my_public_function would work for some arguments by accident in one version, and through a non-breaking change, according to the docstring of the function, suddenly wouldn't work in a later version, because the internal implementation changed. While it is, in a sense, problematic that the user code breaks in this case, I think a strength of Julia comes from actually being a bit fuzzy here. However, if I understand the semantics you proposed correctly, this would be considered an accidental breaking change that should "never happen", which I guess I then disagree with.

Third, I think if something like this is implemented, having the notion of an "experimental API" or something similar will be invaluable as a way to experiment with new features, while declaring the intent that an interface could become stable in a later, non-breaking release.

Finally, like with export, I do not think it is useful to think of the API at a method level. I think there has been a fairly strong push in Julia to make sure that a given function has just one meaning, which means that methods are simply implementation details on how to implement that functionality for a given set of types. I wouldn't want to propose a convention where only specific methods are considered part of the public API. Something feels off to me to think about my_function being public if you call it with types X and Y, but internal if you call it with Z. This does clash a bit with declaring the API in the docstring, but that might not be important.

@Seelengrab
Copy link
Contributor Author

Seelengrab commented May 30, 2023

Thank you for your time and discussing the proposal!

However, first, in the bikeshed department, I don't like the keyword api.

The bikeshed is noted, but barring an alternative.. 🤷 I'm only partial to api myself because it's a ridicolously well-known concept that fits the intended meaning precisely, as well as being quick to type and unambiguous with other keywords (tab completion can then work its magic wonders). I do agree that it being an abbreviation is kind of bad though. Suggestions are welcome! :)

I'm also not even sure if we need a new keyword. It feels like something that could be added to the docstring for special-handling. That would also nudge developers to write docstrings for things considered public API. Something like, if the docstring starts with @API:

I originally considered proposing an API-string macro, used like so:

api"""
    my_function(args...)
    
This is a nicely documented function that does what you expect.
"""
function my_function(args...) end

and implemented like

struct APIMarker
    doc::String
end

macro api_str(s::String)
     APIMarker(s)
end

which would give any documentation rendering the ability to distinguish API from non-API. The issue with that or with placing the marker in a docstring is that you now need to parse & process the docstring to be able to say "this is API" for doing code analysis. It's no longer a static property of the source code as written, and is something that can change at runtime - Base.Docs allows a developer mutating documentation at runtime as they see fit, which isn't really something that should be done for what is considered the API surface of a package, in a given version - that should be static information, hence the proposal for syntax. I really do think this needs to be a language feature.

Second, I'm a bit worried about how the notion of an API surface and the genericity of Julia code composes.

That whole paragraph is a very good point, and I have thought about that! There's also a subtlety that I forgot to put into my original proposal, so let me take this opportunity to reply and add that subtlety here as well, by lieu of an example.

Consider a function with a single method like

function foo(arg1, arg2)
   arg1 + arg2
end

that we say is API of some package. It's signature is Tuple{typeof(foo), Any, Any}, and taken at face value, we might think that we can call foo with literally any two objects. This is of course wrong - there's an implicit constraint on the types of arg1 and arg2 to also have a method on +, taking their types - Tuple{typeof(+), typeof(arg1), typeof(arg2)}. Neither exporting a whole function name, nor marking a single method as api can represent that constraint accurately, and in fact we don't even currently have ways of discovering those kinds of constrains (I did play around with abstract interpretation for this, but quickly ran into dynamic dispatch ruining my day, so I postponed that idea). This is quite a bit more powerful a requirement than even Haskell does with its typeclasses, I think, though I don't have a proof of that. While typing arguments as Any is tempting and can often help with interoperability, it's really unfortunate that these kinds of implicit constraints in reality constrain Any further, so it wasn't ever going to work with truly Anything anyway. I guess the "hot take" here is that any API function that claims to take Any is a lie, and shouldn't be trusted (well, other than the identity function - at least, I haven't seen any object that couldn't be returned from a function. Sounds like a fun challenge..).

So, barring being able to express those kinds of constraints, the next best thing is to say "well, I support this signature, provided you implement functions X, Y, Z with arguments W, V, S on your type T" in a docstring. The foo(::Any, ::Any) example is just the most extreme case of this; the use case I had in mind when writing this was something along the lines of

api function foo(x::MyAbstract, y)
    2x+3y
end

and being able to express "I only support foo with a first argument that's obeying the constraints of MyAbstract", effectively transferring the constraint of *(::Int, ::MyAbstract) to the contract/api that MyAbstract provides/that subtypes of MyAbstract are required to provide. From my POV, an abstract type is already what we'd call an informal "interface" in other languages - they just don't compose at all, due to only being able to subtype a single abstract type, leading to the proliferation of Any in signatures (which ends up hiding the true required surface).

I can definitely foresee situations where my_public_function would work for some arguments by accident in one version, and through a non-breaking change, according to the docstring of the function, suddenly wouldn't work in a later version, because the internal implementation changed.

Well, I don't per se see an issue with that - not every implementation detail is part of the API of a function. While some implicit constraints on the arguments can be, in practice those may be considered an implementation detail, and if they are, it's fine if non-breaking change (according to the docstring/additional documentation of the method) the breaks some user code that relied on them. The correct solution here is to lift the actual requirements to the type domain, by using a dispatch type for those kinds of invariants/requirements, not to make the API more restrictive by lifting the implicit constraints to Any.

Third, I think if something like this is implemented, having the notion of an "experimental API" or something similar will be invaluable as a way to experiment with new features, while declaring the intent that an interface could become stable in a later, non-breaking release.

I'd implement that with this proposal by having an inner module Experimental, like Base.Experimental, which isn't marked as api (perhaps individual functions/methods inside of that module are marked API though, to make clear which part of the Experimental module is considered part of the experimental API and which parts are just implementation details of the experimental functionality). The reasoning for this is that experimental features, by definition, can't be relied upon under semver, as they may be removed at any time, so they must not be marked as part of the API surface of the package. Mixing experimental API into the same namespace as stable API seems to me like a way to more accidental breakages, through people assuming that something being marked as soft-API will be stable at some point, so it's fine to rely on it (which is not the case - usage of experimental functionality needs to be guarded by version checks, just like any other use of internal functionality).

Something feels off to me to think about my_function being public if you call it with types X and Y, but internal if you call it with Z.

Well.. from my POV, that's kind of what we're doing all the time though already, by having surface-level API functions like foo that forward their Any typed arguments to _foo or _foo_impl where the actual disambiguation through dispatch (with MethodError to boot if a combination is not supported) happens. All I'm proposing is removing that indirection, by making the API method-specific. If your _foo_impl has a different signature entirely to gather some more dispatchable data you don't need your users to give themselves (see an example here), it's of course still fine to forward to that internal function.

In part, the "API per method" section of the proposal stems from my (perhaps irrational) fear that people are going to slap api on everything to make the warnings go away, and then being undisciplined with actually providing the guarantees they claim to give with that marker. I think this issue already exists today, but it's just masked by noone being able to consistently tell whether something is part of the API of a package or not - my gut says that a surprising amount of code runs on the "happy path" and always uses the latest version of things anyway, but the few times a big upgrade in a version from 1-2 year old code and subsequent breakage of that code came up on slack just got stuck in my mind. If people were able to consistently rely on some given methods being API and developers were able to actually accurately test that their API surface on a method-by-method basis doesn't change, there'd be less of a risk of such breakage occuring. But again, that may be an irrational fear of mine 🤷

@dalum
Copy link
Contributor

dalum commented May 30, 2023

It's signature is Tuple{typeof(foo), Any, Any}, and taken at face value, we might think that we can call foo with literally any two objects. This is of course wrong - there's an implicit constraint on the types of arg1 and arg2 to also have a method on +, taking their types - Tuple{typeof(+), typeof(arg1), typeof(arg2)}. Neither exporting a whole function name, nor marking a single method as api can represent that constraint accurately, and in fact we don't even currently have ways of discovering those kinds of constrains (I did play around with abstract interpretation for this, but quickly ran into dynamic dispatch ruining my day, so I postponed that idea).

I appreciate your point, but I disagree with the analysis. I could define a function:

"""
    multiply_by_two(x)
    
Return `x` multiplied by two.
""""
multiply_by_two(x) = x

This function will never error. It is the identity function. However, it is a buggy implementation, as it does in fact not do what it states. An API is not something that could or, I would argue, even should be analyzed from a code perspective. It's a slightly more fuzzy human-to-human contract written in prose.

It's no longer a static property of the source code as written, and is something that can change at runtime - Base.Docs allows a developer mutating documentation at runtime as they see fit, which isn't really something that should be done for what is considered the API surface of a package, in a given version - that should be static information, hence the proposal for syntax. I really do think this needs to be a language feature.

As my point above, I don't see any compelling argument for why this shouldn't be allowed to change at runtime, though. As I see it, having API declarations in the code is a feature that is purely intended to help users (and potentially linters). Intentionally making it less flexible than the rest of Julia's documentation system seems to be in stark contrast to Julia's dynamic nature, and I think analysers will be able to overcome the docstring-parsing requirement.

The correct solution here is to lift the actual requirements to the type domain, by using a dispatch type for those kinds of invariants/requirements, not to make the API more restrictive by lifting the implicit constraints to Any.

While I agree in principle that it would be nice to use the type system to restrict these cases, based on my own experience, I don't think the type system is the correct level to do this at in Julia. And there are several cases where it would be impractical to require something to subtype a particular abstract type.

Well.. from my POV, that's kind of what we're doing all the time though already, by having surface-level API functions like foo that forward their Any typed arguments to _foo or _foo_impl where the actual disambiguation through dispatch (with MethodError to boot if a combination is not supported) happens. All I'm proposing is removing that indirection, by making the API method-specific. If your _foo_impl has a different signature entirely to gather some more dispatchable data you don't need your users to give themselves (see an example here), it's of course still fine to forward to that internal function.

I also think it would be really nice to have all methods exposed at the surface-level of the dispatch tree, but there are a lot of practical reasons for having function indirection. Often it's with an args... capture (Any-typed), etc., and it'd be very inconvenient to have to explicitly type those in every method.

@Seelengrab
Copy link
Contributor Author

Seelengrab commented May 30, 2023

This function will never error. It is the identity function. However, it is a buggy implementation, as it does in fact not do what it states.

Right, and I'm not claiming that api (or the compiler) could or should be a total, machine readable description of all the invariants a function requires to be called - that would indeed be impractical and against the dynamic nature of julia. What I am saying/proposing is that the ability for a developer to declare "multiply_by_two is part of the API of my package" for a given version on a method-by-method basis is valuable; whether the actual implementation matches the desired behavior for a function is a different matter. Checking that is the job of a testsuite, not some declaration about what a developers sees as public API for their package (which is the only thing I'm concerned about here - giving developers the ability to say "this is what I support", in a consistent, Base supported manner. As such, it IMO needs to be flexible enough to accomodate granularity).

An API is not something that could or, I would argue, even should be analyzed from a code perspective. It's a slightly more fuzzy human-to-human contract written in prose.

I disagree; that's exactly what a testsuite is. A docstring is the prose describing the desired behavior, and a testsuite is the code checking that the functional implementation matches the prose.

While I agree in principle that it would be nice to use the type system to restrict these cases, based on my own experience, I don't think the type system is the correct level to do this at in Julia. And there are several cases where it would be impractical to require something to subtype a particular abstract type.

I agree with you here, this is not currently a desirable thing to do because there's only one possible direct abstract supertype. That's still an orthogonal concern to whether or not a single method is considered API or not though. Maybe that's an argument for holding off on method-level-API until such a time when we can express these kinds of constraints, of saying "I require the invariants these 2+ abstract types provide", better?

@adienes
Copy link
Contributor

adienes commented May 30, 2023

personally, I would not prefer the syntax be a modifier to existing definitions like api function ... because then the declarations are still scattered throughout the code. the primary problem for me is discoverability, which is maximized when the interface(s) are more or less centralized, or in some very clear hierarchical structure

an API is kind of like an "interface for a package," so what if the declaration format were similar to an interface proposal I have seen being discussed recently? https://github.com/rafaqz/Interfaces.jl

Mockup below with a new syntax interface block operating kind of like a header file, with arbitrary "tags" for API. the only information given about the functions is the names and signatures. Hopefully this should provide enough information for good tooling ? anything more substantive should probably just go in a docstring. interface could also support the where keyword so that methods can be made api with as much specificity as desired. also multiple declarations could be made so the entire interface doesn't have to fit in a single format if that pattern is desired. Another benefit of declaring it this way is that things can be made api conditionally via package extensions

module Animals
    export speak, Dogs

    abstract type Animal end
    speak(::Animal) = "Hello, World!"
    speak(::Animal, s <: AbstractString) = "Hello, $s !"

    module Dogs
        using ..Animals

        struct Dog <: Animals.Animal end
        Animals.speak(::Dog) = "woof"

        bark(d::Dog) = Animals.speak(d)
    end

    module Cats
        using ..Animals

        struct Cat <: Animals.Animal end

        Animals.speak(::Cat) = "meow"
        fight(::Cat, ::Cat) = String["hiss", "screech"]
        getalong(::Cat, ::Cat) = nothing
    end
end


interface Animals where T <: AbstractString
    :exported => (
        speak => Tuple{Animal} => String,
        speak => Tuple{Animal, T} => String,
        Dogs => bark => Tuple{Dog} => String
    ),
    :stable => (
        Animals => Cats => speak => Tuple{Cat} => String
    )
end

interface Animals
    :experimental => (
        Animals => Cats => fight => Tuple{Cat, Cat} => Vector{String}
    ),
    :deprecated => (
        Animals => Cats => getalong => Tuple{Cat, Cat} => Nothing
    )
end

@adienes
Copy link
Contributor

adienes commented May 30, 2023

also, possibly this discussion should be folded together with #48819 ?

@Seelengrab
Copy link
Contributor Author

Seelengrab commented May 31, 2023

an API is kind of like an "interface for a package," so what if the declaration format were similar to an interface proposal I have seen being discussed recently? https://github.com/rafaqz/Interfaces.jl

I don't think that should be tackled in this proposal. I don't aim to modify the type system here at all - this is purely for allowing developers to declare their existing API, without having to migrate to new concepts in terms of what they want to develop.

Also, while I can see a path for something like interfaces to be a thing, I don't think the approach in Interfaces.jl is right, in particular because it's a big disruption to the ecosystem due to the amount of "legacy" code we have. I have a different idea for how to tackle that (mostly based on type theoretic consequences/additions to the current type system), which does integrate with this proposal nicely without special cases while keeping code churn as small as possible (not larger than the code churn produced by this proposal, in fact), but that's for a different discussion.

also, possibly this discussion should be folded together with #48819 ?

While that discussion is the first time I wrote down parts of the ideas contained in this proposal, it's not a concrete discussion about a concrete proposal. Thus, I think this issue is more appropriate to talk about this proposal at hand, instead of the larger discussion whether we want something like access modifiers at all. So far, the only major gripe with this proposal seems to be the per-method-API and some minor bikeshedding on the name/syntax to expose this (which I think I have argued sufficiently now why it ought to be syntax, and not dynamic information).

@adienes
Copy link
Contributor

adienes commented May 31, 2023

I don't think that should be tackled in this proposal. I don't aim to modify the type system here at all

The only similarity is superficial. Otherwise, the "interface" is just a tuple of pairs without introducing any new type concepts. It almost doesn't require syntax at all; if modules were types one could just dispatch on some function interface(m::MyModule)

@BioTurboNick
Copy link
Contributor

IMO the less needed by package developers to convert the current code to an API-aware world is better.

Where there is an API function in a package, I would expect any non-API methods of it to be an exception rather than the rule. So it would probably make more sense to allow marking a specific method of an API function as non-API, rather than the reverse? However, the better solution is to just make it a different function. Already pretty common I think to have e.g., myfunc (exported) and _myfunc and __myfunc for next-level internals, for example.

The argument that a keyword will help people be mindful is undermined by having an all-method construct; at that point it's just as easy to define the API methods in a central location and rely on tooling to raise awareness of which functions are API. And a central location is a more natural extension of what we do with export.

As far as the scenario where Package B adds a method to an API function (say, getindex on one of its types) goes, I think you could say that Base.getindex(T::MyType, i...) is an API of Package B if and only if MyType is API type of Package B. And non-API types shouldn't be returned by API methods.

@timholy
Copy link
Member

timholy commented Jun 1, 2023

I have not given this anywhere near the amount of thought you have, @Seelengrab, but I am also a bit skeptical about the per-method thing. I kind of like @c42f's scoped export A, B, which would be fairly minimal intervention needed. I understand your points about proximity but developers can always add this locally:

scoped export f
f(args...) = ...

if they want that, too. (exports don't all have to go at the top of a module.)

As a way of ensuring that detached scoped exports don't lead to violations, we could add Aqua testing for your API rules.

@ToucheSir
Copy link

Trying to draw inspiration from how other languages do this, I'm reminded of how method overrides are handled in languages with classes. If an abstract method is marked as public, any overrides must be public as well. Less familiar with the space of languages with traits/protocols, but the couple I do know of have similar behaviour when it comes to implementing them (visibility of implementer must be >= visibility of trait/protocol methods). Not all languages use SemVer so it's difficult to compare breaking change policies, but the Rust guidelines related to types and functions are quite reminiscent of the ones in this proposal.

@Seelengrab
Copy link
Contributor Author

Seelengrab commented Jun 1, 2023

I've segmented the responses according to who I've quoted, so it's easier to follow :)


@BioTurboNick

So it would probably make more sense to allow marking a specific method of an API function as non-API, rather than the reverse? However, the better solution is to just make it a different function. Already pretty common I think to have e.g., myfunc (exported) and _myfunc and __myfunc for next-level internals, for example.

Explicitly marking things as non-API has the issue that it's easy to forget - I mentioned that somewhere, I think. You really don't want to end up in the situation of having to do a technically breaking change in a non-breaking release (by semver) just because you forgot to add the non-API marker. No marker -> private/internal should be the default, rather than the exception. Rust goes as far as not even allowing you to access things that haven't been explicitly made available (which we don't want, and also can't do before 2.0 (but more like 10.0) because it's massively breaking).

As far as the scenario where Package B adds a method to an API function (say, getindex on one of its types) goes, I think you could say that Base.getindex(T::MyType, i...) is an API of Package B if and only if MyType is API type of Package B. And non-API types shouldn't be returned by API methods.

That takes away agency from the developer of Package B, in that it doesn't allow the developer to have a public type, but without having getindex be part of the API of that type. In a sense, requiring this kind of proliferation means that every time you extend the API-function/method foo of some other package with an API-type of your package means that you are automatically now responsible for how the package uses foo internally, and that it doesn't break. This feels a bit hard to do, when you don't have control over that. Selecting API on a per-method basis circumvents that issue, because it allows the developer of package B to say "no, I want to use this internally, but I don't really want to support this publicly".

To be fair, the example is a bit abstract, but I think the only reason something like this hasn't been a problem so far is because we really don't know what the API surface of any given thing is, so we somewhat expect things to break. It's also hard to opt into someone elses code, since we currently only have single abstract subtyping and the most succesful attempt at having such a wide interface package, Tables.jl, doesn't use abstract types for signaling dispatches at all (there are trait functions though). I think that's a missing feature in julia, but that's for another day and another proposal :)


@timholy

I kind of like @c42f's scoped export A, B, which would be fairly minimal intervention needed. I understand your points about proximity but developers can always add this locally:

I'm strongly opposed to calling this a scoped export, because it does nothing of the sort. It is NOT exporting a symbol, and I think conflating the meaning of "this is OK to access, I support this" with export is what brought us into this mess in the first place. It's a subtlety that I think is best to highlight by giving it a different name, so that people think of "This is my API" when using api (or whatever we end up calling it) and "this is what I want you to have directly available after using" when using export.

As a way of ensuring that detached scoped exports don't lead to violations, we could add Aqua testing for your API rules.

I'm not sure Aqua rules are enough for this - it's not a linting thing after all. I think something like @test_api is good to have in Base, since we do have the issue of fuzziness of API in Base too (and it's much more prevalent than in some parts of the ecosystem 😬 )


@ToucheSir

Not all languages use SemVer so it's difficult to compare breaking change policies, but the Rust guidelines related to types and functions are quite reminiscent of the ones in this proposal.

This is no coincidence - the rules laid out both by the Rust guidelines as well as the ones I've described in the proposal are purely a consequence of the type system and what it means to change something from version A to B. It's mostly a type theoretic argument that leads to the interpretation that it ought to be possible to mark individual methods as API. I feel though that to make this point more clear, I'll have to write some more about what I (and I think the wider programming language design community at large) considers a "type". So little time though... Taking that lense, as well as thinking about what SemVer means, leads to a very natural derivation of these rules.

On the flipside, it's of course possible to take a coarser approach and mark whole functions as API (that's also described in the proposal after all, that's api function foo end) and if that's the direction we want to go in, great! As I said elsewhere (and too often), anything to mark "this symbol is intended API, even if not exported" is better than the nothing we have now.

If dropping per-method-API gets more support for this syntax proposal, good - let's do it. We can always add it back later if we feel that we do want that granularity after all, since it's always allowed to widen your API ;)

@timholy
Copy link
Member

timholy commented Jun 1, 2023

I'm strongly opposed to calling this a scoped export

I guess there are two aspects: (1) what you name it (I'm happy to drop that aspect of the proposal), and (2) where you put it. If we drop the per-method idea, then there is no reason you couldn't put these annotations near all your exports. Personally I like that design, it makes it clearer it applies to the function rather than the method.

@ToucheSir
Copy link

To be clear, my point was that Rust appears to support many of the same rules about methods and API compat despite not allowing for private implementations of public methods. That's not to say allowing method-level granularity when specifying API surface area is a bad thing, but it suggests to me that these two pieces are somewhat orthogonal.

@LilithHafner
Copy link
Member

I agree that consensus on what "API" means is good and should probably be documented. Thanks for thinking about this and making a concrete proposal. One of the advantages of a concrete proposal is that I can point to concrete and hopefully-easy-to-fix problems.

in the example above [MyStruct is API and AbstractFoo is not], the full chain MyStruct{T,S} <: AbstractFoo <: Any is considered API under semver

I think this is too strict. A better requirement is that any subtyping relation between two API types should be preserved. For example, _InternalAbstractType can be removed in MyStruct <: _InternalAbstractType <: Any.

This is okay because folks who are dispatching based on _InternalAbstractType are already using internals and if the package defines an API function or method declared api public_method(::_InternalAbstractType) = 0, that method accepts ::MyStruct, so it must continue to accept ::MyStruct even after MyStruct <: _InternalAbstractType no longer holds.

For a type annotated global variable, both reading from and writing to the variable by users of a package is considered API ... The type of a type annotated variable is allowed to be narrowed to a subtype of the original type (i.e. a type giving more guarantees), since all uses of the old assumption (the weaker supertype giving less guarantees) are expected to continue to work.

This is too permissive because narrowing the type signature of a nonconstant global can cause writes to error.


As for how to mark things as API or not API, I concur with some others in this thread that marking API similarly to export is a good idea.

enters namespace with using does not enter namespace with using
part of API export my_function, my_constant, MyType foo_bar my_function, my_constant, MyType
not part of API don't do this default

*foo_bar could be scoped export, api, export scoped=true etc.

@adienes
Copy link
Contributor

adienes commented Jun 2, 2023

@LilithHafner along those lines, I enjoyed the proposal from a few years ago here #39235 (comment)

where in this case foo_bar might be a semicolon separator

@Seelengrab
Copy link
Contributor Author

This is okay because folks who are dispatching based on _InternalAbstractType are already using internals and if the package defines an API function or method declared api public_method(::_InternalAbstractType) = 0, that method accepts ::MyStruct, so it must continue to accept ::MyStruct even after MyStruct <: _InternalAbstractType no longer holds.

That's a good point, and an even better one for the need for @test_api! What your version effectively means is that abstract types give some guarantees (existence of fields, dispatches of functions, invariants..) and subtyping an abstract type means saying "I give those same guarantees" (as well as dispatching like them, because you give those same guarantees). Viewed under that light, having a non-API supertype just means that you don't give the guarantees of that supertype! That's quite elegant - the only way it could be even more elegant is if we'd be able to subtype more than one abstract type, which would effectively mean saying "I give those guarantees too" (you can, of course, get in hot water if you claim to e.g. both be an even number, as well as not being an even number, should such abstract types representing these concepts exist).

I like the idea of only keeping the guarantees you've actually promised to provide.

This is too permissive because narrowing the type signature of a nonconstant global can cause writes to error.

That's a predicament then - I originally wanted to allow widening of the type to one that gives fewer guarantees, but that obviously doesn't work because then reads may fail in dispatch (you can't take away guarantees you've given in a previous version). Taken together, these two imply that types of API globals must not change, at least without releasing a breaking version. Not related to the syntax proposal per se, but quite unexpected nonetheless!


Since I'm not sure it's quite clear, my motivation for having api markers on a per-method basis is that it makes it abundently clear what is API; only marking the name API makes it (in my eyes) much more likely to forget to add a test for the API surface - if it's marked on a per-method basis, it's quite trivial to take that exact function signature, put it into @test_api foo(bar, baz) and you're safe from accidental API regressions. That's just not the case when doing a name-wide API declaration.

@timholy
Copy link
Member

timholy commented Jun 2, 2023

it's quite trivial to take that exact function signature, put it into @test_api foo(bar, baz) and you're safe from accidental API regressions.

I agree that having specific signatures be part of the official API makes sense, but I think that might be orthogonal from the Methods themselves. @test_api foo(::Number, ::Number) is something that can be implemented (sort of) independent of whether there is an actual method with that signature.

I say "sort of" because @test_api evidently depends on type-inference, and that presumably works only up to runtime dispatch boundaries. Unless one is then going to check all potentially-applicable methods (for the callee of that runtime dispatch), in which case is once again out of the land of specific method signatures.

So I'd propose a modification: to give it teeth we have to tie it to signatures, but don't tie this to specific methods. Effectively, we're protecting the integrity of the callers rather than the callees.

@Seelengrab
Copy link
Contributor Author

Seelengrab commented Jun 2, 2023

@test_api foo(::Number, ::Number) is something that can be implemented (sort of) independent of whether there is an actual method with that signature.

How can that be? In my mind, for that macro to be able to check that a call receiving that signature has any hope to succeed, a method taking (::Number, ::Number) has to exist; the semantic meaning of @test_api is that no MethodError happens when calling foo. This doesn't mean that there can't be a MethodError further down the callchain of course, as a dynamic dispatch isn't generally penetrable to inference, as you rightly point out.

After all, if no method foo(::Number, ::Number) (or wider, as foo(::Any, ::Any) also works) exists, it must be a failing test case, since the signature is statically known not to dispatch (barring eval, but that throws a wrench into everything).

So I'd propose a modification: to give it teeth we have to tie it to signatures, but don't tie this to specific methods. Effectively, we're protecting the integrity of the callers rather than the callees.

I'm sympathetic to that idea, but I consider it out of scope for this particular proposal. I think the idea "declare api" extends to other modifications we might want to do with the type system later on quite well; the semantic of "I support these additional guarantees" ought to be possible to tack onto almost anything after all, even after the fact.

EDIT: To be clear, I consider the formalization of type signatures in the type system distinct from methods out of scope; not tying API to signature, as that's exactly what I want to do with per-method API :)

@timholy
Copy link
Member

timholy commented Jun 2, 2023

How can that be? In my mind, for that macro to be able to check that a call receiving that signature has any hope to succeed, a method taking (::Number, ::Number) has to exist; the semantic meaning of @test_api is that no MethodError happens when calling foo.

You can implement

foo(::Real, ::Real)   # method 1
foo(::Any, ::Any)    # method 2

and that signature is fully covered. The coverage is nominally wider than what we're guaranteeing to be API, but that's OK. In fact it's more than OK: as soon as you think about the implications of trying to actually write @test_api , this issue is going to come up at every runtime-dispatch boundary. So allowing this at the "entry point" is not really any different than allowing it at runtime dispatch boundaries.

Again, the point is that you cannot implement @test_api without relying on inference, and when inference fails at runtime dispatch then you have to consider the universe of all methods that might be applicable. Otherwise you haven't proven that you fully support the API you claim to support.

https://timholy.github.io/SnoopCompile.jl/stable/jet/ provides a concrete example, if this is still still unclear.

I am concerned that inference failures are going to mean that we'll frequently run up against bar(::Any) but we don't propose to make bar(::Any) guaranteed API. Presumably we can only apply the API guarantee to things that have no (or limited) inference failures. So loading a package seems like it could potentially break the API guarantee, but unless that package tests the universe of all API-guaranteed objects, you might not catch this in the package's tests.

@LilithHafner
Copy link
Member

I am inclined to prefer demarcating API at the level of symbols because it is easier to tell a human sort! is API and ht_getindex is not, than to tell a human sort!(::AbstractVector) is API and sort!(::AbstractVector, ::Int, ::Int, ::Base.Sort.Algorithm, ::Base.Order.Ordering) is not. Further, it would be really nice for discoverability if names(Module) returned exactly all the symbols that are a part of the public API.

Nevertheless, looking for an example where more specificity is needed, let's say I have a method

foo(x, y) = x + y    # version 1

and I think I might eventually want to change it to

foo(x, y) = x - (-y) # version 2

Which has slightly different behavior for some input types (e.g. foo(1, typemin(Int16))). I want to guarantee the behavior on foo(::Int, ::Int) now and leave the rest as internal. Can I do this & how?

With per-function API this would look like

api foo(x::Int, y::Int) = _foo(x, y)
_foo(x, y) = x + y

@BioTurboNick
Copy link
Contributor

One issue with per-method API granularity is that a user won't necessarily know if they are or are not calling an internal method if the types aren't known until runtime.

Extended - what's the concrete benefit of having API and internal methods of the same function, over giving the internal functions a different name? Arguably the latter leads to a cleaner design for both developers and users to reason about.

@mrufsvold
Copy link
Contributor

A couple of thoughts that I can't quite congeal into a thesis:

  1. While api is not of global concern in the way that export is, I agree that api is a similar to export in that it is of external concern, so semantics to support quick discoverability (list at the top) makes sense to me.
  2. On the other hand, I am also very persuaded by the idea that there should be a proximate reminder that you are editing an API object.
  3. It seems like, if a solution settles around (1), then we could solve (2) with linting. If a symbol is marked with <insert final "scoped export" name here>, then we could change its highlight color and tool tip description.
  4. I generally agree that the "mark as not API" approach is bad because it's too easy to forget. But, if we go the direction of "mark functions as API", I could see the case for an opt-out option for specific methods to handle the problematic cases that @Seelengrab pointed out.

@JeffreySarnoff
Copy link
Contributor

Specific Modules, Abstract Types, Concrete Types / Structs, (and anything else I missed) may have their own api.
It seems reasonable to consider a very shallow hierarchy of api specificities (even if they are handled in the same way).

@Seelengrab
Copy link
Contributor Author

Seelengrab commented Jul 30, 2023

After spending quite some time at JuliaCon talking to lots of people, I'm only strengthened in my belief that having a default meaning for what is considered guaranteed when something is API is not only a good idea, but going to become increasingly important for industry adoption. Having to look at every single docstring for every little thing about what even can change between releases simply does not scale. To be more specific, I've identified two key areas that, from my POV, were the core essence of this proposal:

  1. Having a clear definition of what it means for stability across versions when marking something as supported under semver.
  2. Having queryable per-method requirements & guarantees.

While I don't understand why point 1 was wholeheartedly rejected in triage way back when @LilithHafner voluntarily brought their own PR #50105 up for discussion there (and just to make it perfectly clear - I don't have any personal grudge against Lilith), the impression I got from the discussions I had at JuliaCon with potential industry adopters were very much in favor of having something like these default API guarantees. It just makes it so much easier to develop Julia code. I don't want to/can't impose what the core developer team ultimately goes with, but I do want to point out that Yuri liked this proposal as well, judging from the 👍 in the OP, for whatever that is worth... So I at least hope that this discussion can continue in the light of thinking about what's best for the language as a whole, and not around what would be simpler/easier to do as a first step. I think it pays to be greedy here - the time for "simple" steps seems to me to have passed long ago, seeing as the "correctness debate" surrounding Julia has been ongoing for much too long, in my opinion.

Regardless, in the time since I last looked at this issue I've published RequiredInterfaces.jl, which (for my usecases at least) solves point 2 above, by allowing developers to declare "this method needs to be implemented, and then you'll receive these other methods from my interface". This pretty much covers all what I want to do with per-method API, which has been (seemingly) hard to get across in the 99 messages above. @timholy & @dalum , since you've shown great interest in the proposed semantics above, I've also included a lengthy document describing the intended usage in more detail, which should further clarify the points already raised in this issue.


I recall that triage was opposed to fallback definitions on the grounds of "what are you doing marking an undocumented symbol as public?!?" (#26919 👀) and opposed to universal minimum requirements because nobody has come up with a nontrivial set of requirements that doesn't have exceptions.

Well, I certainly didn't understand the objections in that light - the communication from my end really felt more like wholehearted rejection of the entire notion 🙂 The fact of the matter is, API stability is a hard problem and the fact that we're years into this Julia experiment now and we're STILL struggling with it, in my opinion, just proves the point that this is a hard problem that doesn't have an easy solution. However, once we DO have fallback definitions that work with the semantics of the language, it ought to be pretty trivial to have a "julia-semver conformity checker" that can just be run when a new version is registered, or that people can run preemptively in CI of their packages. Just because it's not trivial to come up with such a ruleset doesn't mean that we should eskew it entirely. The longer we wait, the more we paint ourselves into a corner of having "too many exceptions", which (let's not kid ourselves) is something we must clean up sooner or later anyway. Bad API design doesn't vanish just because we close our eyes to the complexity of reworking it.

@LilithHafner
Copy link
Member

I like having a formal interface spec, but unless we can solve #5 (which would be gerat!), it should have an option to be decoupled from an abstract type. Subtyping Meet seems semantically equivalent to abstract multiple inheritance, so I wouldn't be surprised if the same issues with #5 come up with Meet.

What you've done with method errors for incompletely implemented interfaces in RequiredInferfaces.jl is great! I want that.

Forward compatibility is something to be super aware of with the public keyword, and it's not too late (for now) to change course if we find a use-case that the public keyword cannot be forward compatible with, interface specs is something I'm thinking of as an example use case we would want to extend this to.

@Seelengrab, sorry I missed you at JuliaCon, I'd love to hop on a call with you if you want to talk about this over voice chat. You can reach me on the Julia slack, or via email.

@Seelengrab
Copy link
Contributor Author

Seelengrab commented Jul 30, 2023

Subtyping Meet seems semantically equivalent to abstract multiple inheritance, so I wouldn't be surprised if the same issues with #5 come up with Meet.

Yes, the exact same issues come up, because it's type theoretically the same thing (to an extent - some issues don't apply at all). Just as traits are, mind you - the same issues come up with Holy Traits too, as I've described in the documentation. I don't want this thread to become a new version of issue 5 though, and I'm well aware of the discussions surrounding it (modulo some of the email chains that spurred the creation of it in 2014). I do think though that some form of Meet is exactly the solution to the issues we do currently face with Holy Traits, namely dispatchability & extensibility (as also elaborated on in the design document). I'm not worried about weird interactions with Tuple{Int,Number} or somesuch because the Meet I'm thinking of is only concerned with abstract types - it doesn't make sense to ask for the Meet of some concrete type with some other concrete type, because any given object can only have one concrete type. It's an undispatchable signature. I'm not sure if I've mentioned that explicitly in the design doc, but it should follow from what's written there. I'll check & add it explicitly, if necessary.

I implore you to bring further discussion on Meet to that repo though, to have it in one central place, instead of being in this issue.

sorry I missed you at JuliaCon, I'd love to hop on a call with you if you want to talk about this over voice chat. You can reach me on the Julia slack, or via email.

No worries! I'll be in Boston until the 2nd of august, so there's still plenty of time.

@timholy
Copy link
Member

timholy commented Jul 31, 2023

I think a combination of the following things will carry us far:

  • Add public keyword #50105
  • a feature like your @required but with a different implementation, see below
  • An Aqua test to check interfaces, which would including failing if the requisite @required-like check hasn't been implemented.

A possible implementation of the check would be like this:

struct InterfaceError <: Exception
    X
    T
    f

    InterfaceError(@nospecialize(X), @nospecialize(T), @nospecialize(f)) = new(X, T, f)
end
# TODO: implement printing

# Variant for concrete `T, N`
function required_interface(::Type{AbstractArray{T,N}}, ::Type{X}) where {T,N,X}
    Base._return_type(Tuple{typeof(size), X}) <: NTuple{N,Integer} || throw(InterfaceError(X, AbstractArray{T,N}, size))
    Base._return_type(Tuple{typeof(getindex), X, Vararg{Int,N}}) === T ||
        Base._return_type(Tuple{typeof(getindex), X, Int}) === T || throw(InterfaceError(X, AbstractArray{T,N}, getindex))
end
# Variant for concrete `N`
function required_interface(::Type{AbstractArray{T,N} where T}, ::Type{X}) where {N,X}
    Base._return_type(Tuple{typeof(size), X}) <: NTuple{N,Integer} || throw(InterfaceError(X, AbstractArray{T,N} where T, size))
    Base._return_type(Tuple{typeof(getindex), X, Vararg{Int,N}}) !== Union{} ||
        Base._return_type(Tuple{typeof(getindex), X, Int}) !== Union{} || throw(InterfaceError(X, AbstractArray{T,N} where T, getindex))
end
# Variant for concrete `T`
function required_interface(::Type{AbstractArray{T,N} where N}, ::Type{X}) where {T,X}
    Base._return_type(Tuple{typeof(size), X}) <: Tuple{Vararg{Integer}} || throw(InterfaceError(X, AbstractArray{T,N} where N, size))
    Base._return_type(Tuple{typeof(getindex), X, Vararg{Int}}) === T ||
        Base._return_type(Tuple{typeof(getindex), X, Int}) === T || throw(InterfaceError(X, AbstractArray{T,N} where N, getindex))
end
# Variant for arbitrary `T,N`
function required_interface(::Type{AbstractArray}, ::Type{X}) where {X}
    Base._return_type(Tuple{typeof(size), X}) <: Tuple{Vararg{Integer}} || throw(InterfaceError(X, AbstractArray, size))
    Base._return_type(Tuple{typeof(getindex), X, Vararg{Int}}) !== Union{} ||
        Base._return_type(Tuple{typeof(getindex), X, Int}) !== Union{} || throw(InterfaceError(X, AbstractArray, getindex))
end

and then

julia> required_interface(AbstractArray, Array)
true

julia> required_interface(AbstractArray, Set)
ERROR: InterfaceError(Set, AbstractArray, getindex)
Stacktrace:
 [1] required_interface(#unused#::Type{AbstractArray}, #unused#::Type{Set})
   @ Main ~/tmp/interfaces.jl:31
 [2] top-level scope
   @ REPL[3]:1

julia> required_interface(AbstractArray{Bool}, BitArray)
true

julia> required_interface(AbstractVector, Base.Sort.WithoutMissingVector)   # this one fails, would be nice if it didn't. Can `U<:AbstractVector`?
ERROR: InterfaceError(Base.Sort.WithoutMissingVector, AbstractVector, size)
Stacktrace:
 [1] required_interface(#unused#::Type{AbstractVector}, #unused#::Type{Base.Sort.WithoutMissingVector})
   @ Main ~/tmp/interfaces.jl:18
 [2] top-level scope
   @ REPL[5]:1

julia> required_interface(AbstractVector{Float64}, Vector{Float64})
true

I think this should be in BaseTest (non-exported), so of course these method definitions should be function Test.required_interface ... end. Or we could create a TestInterfaces stdlib or something.

@Seelengrab
Copy link
Contributor Author

Seelengrab commented Jul 31, 2023

I originally thought about using return_type, but decided against it for two reasons:

  1. We often get told not to use it, relying on inference is often discouraged :)
  2. It's too coarse a tool when checking for faulty implementations - there's no way to distinguish throwing a NotImplementedError from a different error just based on the return type. It's all Union{} 🤷

@required does a bit more than just the method checking part, especially in avoiding 2. from above. The advantage of the current implementation (explicitly checking the method) is that you get nice errors for when you forgot to implement something, right at the callsite:

julia> using RequiredInterfaces

julia> @required AbstractArray begin
           Base.size(::AbstractArray)
           Base.getindex(::AbstractArray, ::Int)
       end
getindex (generic function with 184 methods)

julia> struct A{T,N} <: AbstractArray{T,N} end

julia> A() = A{Int,1}()
A

julia> size(A())
ERROR: NotImplementedError: The called method is part of a fallback definition for the `AbstractArray` interface.
Please implement `Base.size(::AbstractArray)` for your type `T <: AbstractArray`.
Stacktrace:
 [1] size(::A{Int64, 1})
   @ Main ./none:0
 [2] top-level scope
   @ REPL[6]:1

julia> getindex(A(), 1)
ERROR: NotImplementedError: The called method is part of a fallback definition for the `AbstractArray` interface.
Please implement `Base.getindex(::AbstractArray, ::Int)` for your type `T <: AbstractArray`.
Stacktrace:
 [1] getindex(::A{Int64, 1}, ::Int64)
   @ Main ./none:0
 [2] top-level scope
   @ REPL[7]:1

The details on how this is exactly implemented are of course malleable (I'd personally prefer language-level support for this, perhaps without actually defining methods? Would certainly get around the ugly method inspection hack I use at the moment), the current design is just a result of what I had to settle on to allow precompilation, (almost) zero-overhead, nice error messages for users all while being in a package. Checking whether something is part of a method-based interface is from my POV the "easy" part - making that easy & nice to use for users without having to themselves write Base._return_type(....) style functions is the hard part :)

Still, if there's an asserted return type like

getindex(::AbstractArray{T}, ::Int...)::T

that'd be trivial to add with return_type, and is a good addition. I just shy away from using that for conformity checking, that's all - Union{} is inferred for all error types after all.

@timholy
Copy link
Member

timholy commented Jul 31, 2023

you get nice errors

That's what # TODO: implement printing means 🙂. (I defined an Exception subtype to make this possible, but the fallback is to just print the raw exception.)

I originally thought about using return_type, but decided against it for two reasons

I agree with the problems of relying on inference. But not using it has, I think, more serious problems of its own. I think you really can't declare something as passing unless it returns the kind of object that you'd describe in the documentation: just checking whether a method exists does not seem sufficient. It would be really broken, for example, to create an AbstractMatrix subtype where size returns Tuple{Int} (that's appropriate for AbstractVector but wrong for AbstractMatrix), or where getindex(::AbstractMatrix{T}, i, j) returns ::S where S !== T.

@Seelengrab
Copy link
Contributor Author

Seelengrab commented Jul 31, 2023

That's what # TODO: implement printing means slightly_smiling_face. (I defined an Exception subtype to make this possible, but the fallback is to just print the raw exception.)

Do you mean that your version would be checked/called at every callsite of the interface..? Otherwise I don't see how you get the error message printed in regular user code, as is the case now with RequiredInterfaces.jl.

I think you really can't declare something as passing unless it returns the kind of object that you'd describe in the documentation

Right, if the interface requires the type to be the same or some specific type, that should obviously be part of the @required definition. I just don't want to rely on having inferred Union{} to figure out whether or not the interface was implemented at all, which just isn't sufficient. The inspection based on reflection I currently use should, in a proper implementation in Base, certainly be replaced with a check on some internal datastructure, and not rely on inspection. The runtime would then throw the NotImplementedError instead of a MethodError. There wouldn't have to be an actual method actively throwing that itself.

I've opened Seelengrab/RequiredInterfaces.jl#8 as a feature request for these kinds of return type annotations. Shouldn't be too bad to add.

@timholy
Copy link
Member

timholy commented Jul 31, 2023

Do you mean that your version would be checked/called at every callsite of the interface

I mean add special printing for InterfaceError, like in https://github.com/JuliaLang/julia/blob/master/base/errorshow.jl.

I just don't want to rely on having inferred Union{} to figure out whether or not the interface was implemented at all

I'm not sure there's a big usability difference between

ERROR: `MyArray` does not satisfy the `AbstractArray` interface because you never implemented `getindex` for `MyArray`

vs

ERROR: `getindex(::MyArray{T}, Int)` must return an object of type `T` to quality as an `AbstractArray`

(which you could automate given the InterfaceError object or something like it) even if the reason why is that no getindex method exists.

@Seelengrab
Copy link
Contributor Author

I mean add special printing for InterfaceError

Ok, but I was asking about when that error would be thrown/displayed. It sounds like you're suggesting that the error should only be displayed when actively checking for conformity with the interface, is that correct?

I'm not sure there's a big usability difference between

There absolutely is - one informs of a missing implementation, the other informs of an incorrect one. That may be a little thing to you, but can be the difference between "Oh I forgot to implement X" and "what does this thing want, I thought I did everything?" for a new user.

@ParadaCarleton
Copy link
Contributor

ParadaCarleton commented Aug 23, 2023

I'm not sure there's a big usability difference between

ERROR: `MyArray` does not satisfy the `AbstractArray` interface because you never implemented `getindex` for `MyArray`

vs

ERROR: `getindex(::MyArray{T}, Int)` must return an object of type `T` to quality as an `AbstractArray`

And now I suddenly know why Julia error messages/stacktraces are like that 😅

Tim Holy is too big-brained for the job of writing error messages. We need to find someone dumber to write error messages that I can understand

@Tokazama
Copy link
Contributor

Have there been any further development with regards to how to handle fields? IIUC public doesn't actually change access to module symbols (except for names), but provides some additional reflective methods and documentation.

With regards to fields, I'm assuming since user can already define property methods there's some hesitancy to make any changes in how marking a field public would change this? Would it make sense to at least support ispublic for annotated fields and add a warning to getfield docs stating that its functionality doesn't imply stable public fields?

@LilithHafner
Copy link
Member

Fields should rarely be public. Properties, on the other hand, are sometimes public. The current story for public properties is that if you want a property to be public, document that property in a docstring and/or manual. To document internals, you have to document them and also say they're internal (just like all names prior to the public keyword).

@JeffreySarnoff
Copy link
Contributor

JeffreySarnoff commented Nov 27, 2023 via email

@BioTurboNick
Copy link
Contributor

Now that we have public, which provides a way to include a function or property in the api associated with a module ... What about the complementary private, which would exclude a function from use outside of a module?

I'd be against expressly disallowing use of internals - part of the appeal of Julia is that there are few or no hard barriers to anything you might want to do. Perhaps there could be a warning if an internal (anything not public) is accessed from another module, but it should be suppressible.

@JeffreySarnoff
Copy link
Contributor

JeffreySarnoff commented Nov 27, 2023 via email

@BioTurboNick
Copy link
Contributor

I do not advocate warning if anything not public is accessed from another module .. only if something explicitly marked private is accessed from another module.

But why would you want or need this third level of accessibility? public/export indicates the symbols the comprise the API surface area that should follow SemVer guidelines. Everything else is not. What would this third level be needed for?

@JeffreySarnoff
Copy link
Contributor

JeffreySarnoff commented Nov 27, 2023 via email

@StefanKarpinski
Copy link
Member

Not breaking existing code limits what we can do. However, there's a decent argument to be made that since what's breaking is not a public API, blocking access to package internals is not actually technically breaking according to semver. Of course, we still have to be cautious to prevent massive ecosystem breakage. It would not go well to just flip the "no private access" switch for the whole ecosystem at once. Here's a possible transition strategy:

  1. Allow packages to opt into preventing access to their private internals. This should probably be a flag in the project file, but should maybe also be in the registry. There are three cases for a project that depends on a package that blocks private access:
    1. Project doesn't access internals so there's no problem;
    2. Project accesses internals but can easily use a public API instead—minor fix required;
    3. Project accesses internals and can't easily use a public API instead—can't be easily fixed.
  2. To handle the last case, we also allow projects to override one of their dependencies blocking private access. A project opting into this acts as an explicit indicator that non-breaking upgrades can break.
  3. For a while make the flag for whether private access is allowed or not mandatory—you can choose yes or no but you need to have some value in the project file.
  4. Later we change to preventing private access by default. Projects can drop the flag if the value is to block private access since that's the new default and only packages that need to allow private access need to keep the flag in their project files.

I'm not sure about the implementation side. If we can control module internals access per depender that would be ideal. That suggests that private/public needs to be metadata associated with the binding itself, which is kind of gnarly. Maybe we can do something with auto-wrapping each package in a public-only wrapper that rebinds only the public bindings of the internal package. That's the simplest to implement, but feels kind of icky.

Having public/private annotations on fields in structs would also be good, but I'm not quite ready to tackle that.

@Seelengrab
Copy link
Contributor Author

For what it's worth (and that's the last I'll say about this here), discussing "disallowing access to internals" without a concrete & clear picture of what exactly is included in that "internals" moniker and what "access" means (what this entire issue/proposal has been about) seems not fruitful to me. Base isn't alone in having things "internal" that are not as clear cut as "this symbol is internal, this other one is not"; the rand/Sampler API is just one prominent example. So unless you want to have pretty much all of Base fall into the category "don't mark it private, but act as if it was", you're going to have to break API in terms of what is allowed to be overloaded/extended.

I'm pretty opposed to disallowing access to internals entirely, as I've stated multiple times in this issue. If this is a direction the Julia project wants to go in, I humbly request discussion about it to take place in a different issue/github discussion.

@LilithHafner
Copy link
Member

I've copied @StefanKarpinski's comment verbatim to the OP of a new issue. I think that that particular comment is actionable and specific and makes a good OP.

@Moelf
Copy link
Contributor

Moelf commented Nov 27, 2023

I'm pretty opposed to disallowing access to internals entirely, as I've stated multiple times in this issue. If this is a direction the Julia project wants to go in, I humbly request discussion about it to take place in a different issue/github discussion.

hard agree on this. IIRC, some Julia folks that are way more knowledgeable than me pointed out the "private" stuff as seen in C++/Java doesn't make programs safer more than they make programmers' life harder :(

@LilithHafner
Copy link
Member

It would be a weak private—programmers would still be able to access internals if they want, it would just require different compat declarations.

@KristofferC
Copy link
Member

One thing that hasn't really been discussed so far but which often causes breakages (e.g. in Julia updated) is modification or addition of method signatures that causes new ambiguities to occur for existing calls of other methods of some function. So far, from Julia's p.o.v we kind of brush these under the rug and let the packages deal with it or try put in some tweak to fix the problems that occur in practice but if we want to be serious about this I think it needs to be properly addressed.

@LilithHafner
Copy link
Member

To give a small(ish) example

PkgA 1.0.0

module PkgA
public foo 
"Returns a tuple and does not throw"
foo(k::Any, v::Any) = (k, v)
end

PkgB 1.0.0, depends on PkgA version 1

module PkgB
using PkgA
public B
struct B end
PkgA.foo(k::B, v::Any) = (:b, v)
end

PkgC 1.0.0, depends on PkgA version 1

module PkgC
using PkgA
public C
struct C end
end

PkgD 1.0.0, depends on PkgA version 1, PkgB version 1, and PkgC version 1

module PkgD
using PkgA, PkgB, PkgC
const k = PkgA.B()
const v = PkgA.C()
const tup = PkgA.foo(k, v)
end

PkgC 1.1.0, depends on PkgA version 1

module PkgC
using PkgA
public C
struct C end
PkgA.foo(k::Any, v::C) = (k, :c)
end

PkgD loads when loaded with [email protected], [email protected], and [email protected] but fails to load when loaded with [email protected], [email protected], and [email protected]. Because no package depends on the internals of any other package, this means PkgC version 1.1.0 was a breaking change even though all it did was add a non-pirated method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design of APIs or of the language itself feature Indicates new feature / enhancement requests
Projects
None yet
Development

No branches or pull requests