Skip to content

Latest commit

 

History

History
244 lines (201 loc) · 9.37 KB

sum_types.md

File metadata and controls

244 lines (201 loc) · 9.37 KB

Sum types

Table of contents

Overview

In Carbon, a sum type is a type whose values are grouped into several distinct named cases, called alternatives. A value of a sum type notionally consists of a discriminator tag that identifies which alternative is present, together with that alternative's value if it has one. Sum types are typically handled with pattern matching.

choice declarations

The choice keyword is used to declare a sum type by specifying its interface, leaving the implementation to the compiler. A choice declaration consists of the choice keyword followed by the name of the type, and then a list of alternatives inside curly braces. An alternative declaration consists of the alternative name, optionally followed by a parameter list in parentheses. If present, the parameter list has the same syntax as in a function declaration. For example:

choice OptionalI32 {
  Some(value: i32),
  None
}

This declares a sum type named OptionalI32 with two alternatives: Some, which holds a single i32 value (the parameter name value has no effect other than documentation), and None, which is empty. Choice types can also be parameterized, like class types:

choice Optional(T:! Type) {
  Some(value: T),
  None
}

A value of a function-like alternative is specified by "calling" it like a function, and a value of an empty alternative like None is specified by naming it:

var my_opt: Optional(i32) = Optional(i32).None;
my_opt = Optional(i32).Some(42);

The value of a choice type can be inspected using a match statement:

match (my_opt) {
  case .Some(the_value: i32) => {
    Print(the_value);
  }
  case .None => {
    Print("None");
  }
}

User-defined sum types

choice declarations are a convenience shorthand for common use cases, but they have limited flexibility. There is no way to control the representation of a choice type, or define methods or other members for it (although you can extend it to implement interfaces, using an external impl or adapter). However, a class type can be extended to behave like a sum type. This is much more verbose than a choice declaration, but gives the author full control over the representation and class members.

The ability to create instances of the sum type can be straightforwardly emulated with factory functions and static constants, and the internal storage layout will presumably involve untagged unions or some other low-level storage primitive which hasn't been designed yet, but the key to defining a sum type's interface is enabling it to support pattern matching. To do that, the sum type has to specify two things:

  • The set of all possible alternatives, including their names and parameter types, so that the compiler can typecheck the match body, identify any unreachable cases, and determine whether any cases are missing.
  • The algorithm that, given a value of the sum type, determines which alternative is present, and specifies the values of its parameters.

It does so by implementing the Match interface, which is defined as follows:

interface Match {
  interface BaseContinuation {
    let ReturnType:! Type;
  }

  let template Continuation:! Type;
  fn Op[me: Self, C:! Continuation](continuation: C*)
    -> C.(MatchContinuation.ReturnType);
}

Continuation must itself be an interface that extends Match.BaseContinuation, and its definition specifies the set of possible alternatives: each alternative is represented as a method of that interface. When compiling a proper pattern (or set of patterns that includes a proper pattern, as with the cases of a match) whose type is a sum type, the compiler generates an implementation of Continuation and passes it to Match.Op. The sum type's implementation of Match.Op is responsible for determining which alternative is present and what its parameters are, and calling the corresponding method of continuation with those parameters. The Match.Op implementation is required to call exactly one such method exactly once before returning. The compiler populates the Continuation method bodies with whatever code should be executed when the corresponding alternatives match.

TODO: if Carbon has explicit support for tail calls, we should probably require that Match.Op invoke the continuation as a tail call.

For example, here's how Optional can be defined as a class:

class Optional(T:! Type) {
  // Factory functions
  fn Some(value: T) -> Self;
  let None:! Self;

  private var has_value: bool;
  private var value: T;

  external impl as Match {
    interface Continuation {
      extends Match.BaseContinuation;
      fn Some[addr me: Self*](value: T) -> ReturnType;
      fn None[addr me: Self*]() -> ReturnType;
    }

    fn Op[me: Self, C:! Continuation](continuation: C*) -> C.ReturnType {
      if (me.has_value) {
        return continuation->Some(me.value);
      } else {
        return continuation->None();
      }
    }
  }

  // Operations like destruction, copying, assignment, and comparison are
  // omitted for brevity.
}

And here's how the compiler-generated implementation of Optional.(Match.Continuation) for the match statement shown earlier might look, if it were written in Carbon:

class __MatchStatementImpl {
  impl as Match(Optional.MatchContinuation) where .ReturnType = () {
    fn Some(the_value: i32) {
      Print(the_value);
    }
    fn None() {
      Print("None");
    }
  }
}

my_opt.(Match.Op)({} as __MatchStatementImpl);

(The name __MatchStatementImpl is a placeholder for illustration purposes; the actual generated class will be anonymous.)

The mechanism described above for proper patterns may also be used for expression patterns if they have the form of an alternative pattern. An expression pattern of type T has the form of an alternative pattern if T implements Match, and the expression consists of an optional expression that names T, followed by a designator that names a method of T.(Match.Continuation), optionally followed by a tuple expression. If an expression pattern has that form, it may be matched using the mechanism above, as if it were a proper pattern, rather than by evaluating the expression and comparing it to the scrutinee using ==. Both possible implementations must be well-formed (and this is enforced by the compiler), but it is unspecified which implementation is used to generate code.

As a result, it is strongly recommended that user-defined sum types ensure that for every alternative there is a factory function or constant member with the same name and parameter list, such that pattern-matching on the result will correctly reproduce the arguments to the factory function. For example, the definition of Optional above satisfies this requirement, because for any regular type T, the expression Optional(T).None evaluates to a value that matches the pattern Optional(T).None (under both possible matching mechanisms), and for any x of type T, the expression Optional(T).Some(x) evaluates to a value that matches the pattern Optional(T).Some(y: T) and binds y to a value that's equal to x. Expression patterns involving a sum type that doesn't meet this requirement will fail to compile, or have behavior that observably changes depending on the compiler's implementation choices.

Another corollary of this rule is that if an alternative takes no arguments, its pattern syntax is the same as its expression syntax. For example, case Optional(i32).None() => ... is not well-formed, because Optional(i32).None() has the form of an alternative pattern, but the implementation in terms of == is not well-formed because Optional(i32).None() is not a well-formed expression. If we had defined Optional.None as a factory function instead of a constant, case Optional(i32).None() => ... would be well-formed but case Optional(i32).None => ... would not be.

Note that the compiler-generated continuation method bodies are not required to contain the code in the case body (or whatever code is in the scope of the pattern). For example, they might only store the parameter values and then return an index that identifies the case body to be executed.

Alternatives considered

References