Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document ForeignConvertible #1073

Merged
merged 4 commits into from
Nov 22, 2023
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 57 additions & 7 deletions Library/Hylo/Core/ForeignConvertible.hylo
Original file line number Diff line number Diff line change
@@ -1,13 +1,63 @@
/// A type that can be converted to and from an a foreign representation.
/// A type whose values can be converted to and from a representation suitable for crossing a
/// language boundary.
///
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand what that means. Why would any type's representation in Hylo be unsuitable for crossing a language boundary?

Copy link
Contributor Author

@kyouko-taiga kyouko-taiga Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're talking about C and taking "suitable" to just mean "I can read bytes of the thing", then indeed no type representation is unsuitable. With this narrow interpretation we could just send a pointer to the raw bytes of every Hylo object to the foreign language and let it deal with it.

If we're trying to describe more useful abstractions, then one may need a way to explain how one can look at a Hylo object through the lens of another language's type system. For example, take Union<Pointer<Int>, Pointer<Float64>>. The Hylo representation of this type will likely be a single 64-bit integer but it won't be very useful to say that it's just being presented as char[8] in C. You wouldn't know where the discriminator is, or even how to read the discriminator. We can standardize this information (though I'm not sure we'd want to) but that would still mean the foreign language has to do the work of reconstructing the abstraction.

So I would claim that Union<Pointer<Int>, Pointer<Float64>> is not suitable to be represented in C. What we want is a representation that already makes sense "as is" w.r.t. the abstraction that it represents. Builtin.i64 fits the definition because int64_t makes sense "as is" in C.

Copy link
Collaborator

@dabrahams dabrahams Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're trying to describe more useful abstractions, then one may need a way to explain how one can look at a Hylo object through the lens of another language's type system. For example, take Union<Pointer, Pointer>. The Hylo representation of this type will likely be a single 64-bit integer but it won't be very useful to say that it's just being presented as char[8] in C. You wouldn't know where the discriminator is, or even how to read the discriminator.

This is a fantastic example. Nothing in this API seems to me like it's going to help your C function with that problem. You need—at the very least—some C declarations. The most obvious, and the one I think you're aiming for is a type declaration, but it could be function declarations, see below.

So I would claim that Union<Pointer<Int>, Pointer<Float64>> is not suitable to be represented in C.

Disagreed.
We can certainly create a C struct type—containing a boolean discriminator and a union—that represents the same notional type, which would constitute, “representing Union<Pointer<Int>, Pointer<Float64>> in C.” I think you mean that the data layout is not suitable for C, but even the data layout can be suitable for C if you supply C with a set of functions for accessing the basis operations of the type. I don't believe the right answer for every Hylo type is to serialize/deserialize it into a different representation that is in some sense already known to the other language when passing it across an FFI boundary.

An experience of seamless interop is very nice, but there are lots of ways of approaching it. I think the above shows the framing you're using to describe the problem to me is incomplete or confused. Getting that right is a prerequisite to creating the right API and documenting it properly.

/// Types conforming to `ForeignConvertible` can appear in foreign function interfaces (FFI) and
/// are automatically converted from Hylo to their foreign representation, or vice versa.
public trait ForeignConvertible {
/// A function declaration with the `@ffi` attribute introduces a foreign function interface (FFI),
/// an entity whose implementation is defined externally, typically in a different programming
/// language. Because this other language may not understand the layout of Hylo types, some glue
/// code has to be written to adapt the representations of values crossing the language boundary.
/// Hylo uses conformances to `ForeignConvertible` to generate this code, requiring the parameters
Copy link
Collaborator

@dabrahams dabrahams Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's seems wrong to me. Any language with sufficiently low-level access to memory can decode any data layout.

I think you're trying to say something else entirely. Perhaps something like:

Foreign language X has a datatype Y that notionally corresponds to Hylo's datatype Z, but may not share the same layout. We want to be able to syntactically pass a Z directly to an X function fun f(_: Y), with an implicit conversion from Z to Y mediated by ForeignConvertible.
?

If that interpretation is roughly correct (and I am far from confident that it is), this is an unnecessary syntactic sugar feature because we could always explicitly convert every Z to a Y instead of having that conversion happen implicitly. Moreover, I'm somewhat concerned about what mischief may be hidden behind that implicit code. Does it amount to an implicitly-generated copy in some cases? Why wouldn't we want the Y to be a projection from the Z instead of a returned value?

The rest of the comment seems to be about details of the mechanism, but what remains unaddressed for me is the motivation for having this thing in the first place.

Copy link
Contributor Author

@kyouko-taiga kyouko-taiga Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's seems wrong to me. Any language with sufficiently low-level access to memory can decode any data layout.

As I said above, I don't think that's the right question to ask. You can read raw bytes with pretty much any language but a good FFI solution should at least help you doing it properly.

If that interpretation is roughly correct [...]

I think it is roughly correct. I would add that layout, if it means how fields are laid out in a struct, is not the only thing to take into account. For example, we also have to care about the way one may represent a union in the foreign language because it may not agree with Hylo's approach.

this is an unnecessary syntactic sugar feature because we could always explicitly convert every Z to a Y instead of having that conversion happen implicitly

Yes, except that it will make you expose Builtin to everyone wanting to use a C FFI like fdopen, and have a way to expose the built-in value wrapped in Int, Pointer, etc. In other words, what is today let stdout = fdopen(1, "w") would become let stdout = MemoryAddress(base: fdopen(1.value, "w".utf8.base)) (and it will get worse once utf8 is no longer just a pointer).

Of course you can wrap this boilerplate the standard library, e.g. LibC.fdopen, but that is still busy boilerplate that my trait can eliminate rather elegantly. Plus, the boilerplate would come back for any other FFI our users may want to use.

Think about the convenience of FFIs in Swift. You import a C header and you get a beautiful Swift function fdopen(_: Int, _: UnsafePointer<CChar>) -> UnsafePointer<FILE>. This convenience is entirely due to compiler magic, though. I want a similar feature in Hylo without hardcoding type translations in the compiler.

Does it amount to an implicitly-generated copy in some cases?

That is a valid concern. I thought about it and concluded that it was okay to "pay" for a copy when you use an FFI because you probably can't trust it to uphold the rights/duties of all Hylo's passing conventions anyway. So foreign functions (the actual ones, not the FFI generated around it) take everything with a sink convention.

We can revisit this choice later but since the only crossing types for now are built-in numerics and pointers, a copy is the best strategy anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, except that it will make you expose Builtin to everyone wanting to use a C FFI like fdopen,

I don't see any reason why exposing Builtin should be necessary, and what you say is that "in other words" shows no exposure of Builtin.

Think about the convenience of FFIs in Swift.

It's lovely.

If this is just about how to make something with the semantics of let stdout = fdopen(1, "w") pretty, I think there are lots of possible approaches, and I'm not at all convinced we have the right one here. And, related to this PR, we still don't have a successful description of the meaning of this protocol. There's a broad spectrum of possible FFI functions and I strongly doubt we have enough examples in front of us to design the right interface. Note that I did something like what you've written for Boost.Python years ago so I understand the logic that leads to a design like this, but that's a much more constrained problem in some ways than what this API purports to address—and even that has dimensions you haven't accounted for, like coupling of lifetimes when a C++ function returns a reference to a part of a parameter.

I thought about it and concluded that it was okay to "pay" for a copy when you use an FFI because you probably can't trust it to uphold the rights/duties of all Hylo's passing conventions anyway.

IMO you are talking about two orthogonal concerns (safety and performance), which in turn are orthogonal to my concern, which is about the hidden-ness of the copy.

I would much prefer to start out by having FFIs be uglier than we'd like, and to explore the use cases extensively, before we decide how they should be addressed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you're pushing very hard against an implemented, tested, and used feature only because you don't understand how it works and/or because it doesn't solve all possible FFI problems we can think of. Yes, it doesn't address C++ lifetimes, or talk with GC of a JVM, or interacts with the reference counter of Swift runtime. Many of these problems are open research questions that I don't even try to tackle.

It's pretty clear to me what this trait does, how it works, and why/when/how I would declare additional conformances. I did my best job to describe it. I may have failed but that doesn't mean the trait is not useful. It currently does the job I want to be done: programmatically explain to the compiler how to translate Hylo.Int to the corresponding type in a C function.

I don't want to remove this trait unless we have an equally concise way to call an FFI that doesn't require hardcoding translations in the compiler. If you have a better approach that fits these constraints, I'm happy to merge a PR.

Copy link
Collaborator

@dabrahams dabrahams Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you're pushing very hard against an implemented, tested, and used feature only because you don't understand how it works.

That's a mischaracterization. I understand how it works perfectly. I'm pushing back because it adds complexity and you can't explain to me why it's needed and how it solves any real problems. IIRC another contributor expressed a similar concern in one of our meetings, but I forget who, so it's not just me. Ints and pointers don't need any special translation to work with C: the FFI C function, for which we are manually writing a Hylo declaration, can simply be declared to take an Int or Pointer. Your technique forces me to declare another type to translate my type into/out of (or could I actually translate the type to itself?), and to create a conformance, and it all seems like needless complication. What's the benefit?

You haven't explained how this is going to actually solve FFI problems for any types that you might want to be presented to C with a different layout, and we don't have any actual examples of those today that we can use to validate this abstraction. It seems to me that when we run across such a thing we will still need to declare a Hylo type that has the correct layout and define translations to and from that type, which can be as syntactically lightweight as .to_c and .to_hylo property accesses, which is the same amount of work as a trait conformance, generalizes well to multiple languages, doesn't create any implicit conversions, and works even if multiple C types need to map into the same Hylo type (that is likely to happen for C integer types). Moreover, such types are incredibly rare. They only occur when you have both an existing Hylo layout and a different existing C layout for a given notional type, both of which already in use in both languages . Otherwise whichever language is new to the notional type could simply consume the other language's layout.

Yes, when we can import C and C++ headers (both of which features are a long way off), we will want to automatically map the types declared in those languages into a type consumable by Hylo, and it's important to be able to do the layout-compatible mappings to types already in the standard library (e.g.Int-into-Int) automatically. But for that job, the trait goes in the wrong direction: the compiler needs to look up a Hylo type based on the C type, not the other way around. With this trait you'd still need some way to relate C's int to Int.ForeignRepresentation, which is essentially the type translation Swift is hardcoding in the compiler, and that you want to avoid. Having this protocol doesn't avoid that AFAICT.

User- and library-defined types declared in C will need to be mapped into automatically synthesized (layout-compatible) Hylo counterparts from the imported C module. It's only the rare case where you have the same notional type with different layouts that this trait even becomes relevant as far as I can tell, and the idea that we ever want to do that layout translation silently is still extremely questionable.

I don't want to remove this trait unless we have an equally concise way to call an FFI that doesn't require hardcoding translations in the compiler. If you have a better approach that fits these constraints, I'm happy to merge a PR.

I'm pretty sure that as noted above this trait doesn't accomplish what you say w.r.t hardcoding. I'm happy to write a PR that removes all the mechanism, but I don't know how to solve that hardcoding problem today, and since we aren't ready to use a solution to it until we're importing C headers, I don't want to try. IMO interop beyond manual redeclaration of foreign functions is a complex problem that deserves more attention than we can give it right now.

/// and return types of FFIs to be `ForeignConvertible`.
///
/// Types conforming to `ForeignConvertible` implement two methods for converting instances to and
/// from their foreign representations. These methods are inverse of each other:
///
/// - `init(foreign_value:)` creates an instance from its foreign representation.
/// - `foreign_value()` returns the foreign representation of an instance.
///
/// Given a type `T: ForeignConvertible`, `T.ForeignRepresentation` is either a "crossing type"
/// (i.e., a type whose instances are capable of crossing a language boundary) or another type
/// conforming to `ForeignConvertible`. Either way, the foreign representation of `T` shall not
/// refer to `T`. Crossing types currently include built-in numeric types and built-in pointers.
/// Other types may be added to this list in the future.
///
/// Hylo generates two functions for every declaration annotated with `@ffi`. The first is the
/// foreign function itself, whose declaration is only visible in compiled code. The second is
/// a regular Hylo function that implements the above-mentioned glue code. Specifically:
///
/// 1. Arguments are converted to their foreign representations, from left to right.
/// 2. The foreign function is called.
/// 3. The result of the foreign function is converted to its Hylo representation.
///
/// Conversions are performed using the following algorithms. Note that specialized implementations
/// of these algorithms are synthesized for each FFI. No tests or erasure are actually performed.
///
/// fun convert<T: ForeignConvertible>(
/// from_hylo_value v: T
/// ) -> Any {
/// let w = v.foreign_value()
/// if sink let w: any ForeignConvertible = v {
/// return convert(from_hylo_value: w)
/// } else {
/// return v
/// }
/// }
///
/// fun convert<T: ForeignConvertible>(
/// from_foreign_value v: sink Any
/// ) -> T {
/// if T.ForeignRepresentation is ForeignConvertible {
/// T.init(foreign_value: convert<T.ForeignRepresentation>(from_foreign_value: v))
/// } else {
/// T.init(foreign_value: v as! T.ForeignRepresentation)
/// }
/// }
///
/// You should avoid using `ForeignConvertible` to implement long chains of conversions through
/// intermediate foreign representations.
public trait ForeignConvertible: Equatable {

/// The foreign representation of the type.
///
/// All built-in types conform to ForeignConvertible.
type ForeignRepresentation: ForeignConvertible
type ForeignRepresentation

/// Creates a new instance from its foreign representation.
init(foreign_value: sink ForeignRepresentation)
Expand Down
Loading