Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coma backend #968

Merged
merged 4 commits into from
Jun 3, 2024
Merged

Coma backend #968

merged 4 commits into from
Jun 3, 2024

Conversation

xldenis
Copy link
Collaborator

@xldenis xldenis commented Mar 10, 2024

MLCFG has served us quite well until now, but time time has come for us to consider replacements. In particular, MLCFG has one essential limitation, forced upon us by Why3, which is that all functions must be opaque. This is an issue because often for small trivial functions the contract can be as long or longer than the function itself.
Closures suffer acutely from this: the closure is often no more than a single expression while its contract requires at least a precondition and a postcondition.

Fixing this in WhyML is complex, it goes against the current design of the VC, almost any fix would be a hack. Luckily, Andreï, Jean-Christophe and Paul have been working on an alternative in the form of Coma.

Coma is a new IVL with a few interesting design choices:

  1. It's hyper minimalist, clearly not intended to be written / read by humans. The (abstract) grammar is very simple, and has few constructs.
  2. It's CPS structured, functions (called 'handlers') accept zero-or-more continuations to implement control flow. Gives the frontend huge flexibility on how it might want to encode things like exceptions.
  3. It has an explicit abstraction barrier operator. The black box operator opacifies an expression, generating proof obligations to show the safety of this replacement.
    This means we can control where abstraction barriers occur in our programs!
    It even opens the door to partially transparent functions.
  4. The mechanics of Coma even allow us to have pre and post operators which reify the preconditions and postconditions of handlers as predicates. This is again useful for closures as it allows us to provide the correct definitions for the postcondition and precondition functions.

By adopting Coma in Creusot, we can solve our issues with closures, provide the developers of Coma a good initial usecase, and potentially future proof ourselves to changes in Why3.

In this PR

This PR implements a complete port of Creusot to Coma, in some cases I think this simplifies understanding of the code generation, in others it renders it more complex.

There is two essential limitations to consider in Coma: 1. A black box must be inserted to break all cycles between handlers. This is understandable, as a cycle corresponds to something like a loop.
2. Effect inference is handled lexically. If we're not careful, we can lose framing information about mutable variables. Handler definition blocks define a single scope, and thus if a single handler modifies a variable, other sibling handlers will have to consider that variable as potentially modified.
This is only a problem when dealing with loops though as for acyclic control flow, the final VC will "inline" the real value for that variable.

Taking these limitation into consideration, we obtain the following high-level design:

  1. Each cycle is (recursively) translated to a group of handlers.
    a. The loop head is surrounded in a black box and guarded by invariants.
  2. Each basic block becomes one handler.
    a. Within a basic block, we translate each statement to a handler as well. (Purely aesthetic / cleanliness)
  3. For opaque functions: the whole body is wrapped in a black box, and the return continuation is itself guarded by one.

To aid in the translation between direct and continuation-passing styles, we make use of an IntermediateStmt type.
The IntermediateStmt is a 'atomic' (to coma) effectful operation, which is sequenced together.
Once we have finished translating some (set) of MIR constructs, we can take the sequence of generated IntermediateStmts and fold them into a single CPS expression.
In Rust this is much more ergonomic than passing around some boxed continuation closure.

In Coma, only handlers are effectful, while in MLCFG many 'expressions' could be as well, which we were leveraging to encode places.
When a place asserts a constructor (X as Some), we would encode it as a let Some(_) = X, which why3 would then prove was always true.
These expressions could be used in pure contexts in mlcfg, but this does not work in coma.
Instead, we have to use a handler asserting that the passed in value has a specific constructor and producing its component fields.
Thus X as Some becomes some X (fun _0 -> ... )
As places are often quite long, this is the primary source of unreadability in Coma code.
Some effort has been made to improve legibility and prevent right-wards drift of expressions, but even then it remains hard to read.

Limitations

While the PR implements a complete backend, we do have a few notable regressions.

  1. No support for variant, these will have to be manually encoded, we will need to see how we can possibly recover why3's structurally recursive variants.
  2. We lose the "subregion analysis" we were performing on MLCFG. This will need to be re-implemented at the Creusot level, but we can also use this analysis to automatically insert loop invariants for type invariants.

Future work

Besides rectifying the limitations there are some improvements we can still make:

  1. Andreï actually has implemented a syntactic extension which could improve legibility of places but we are not (yet) using it.
    He defines a . operator which works much like Haskell's $.
    Without it, ((X as Some).0 as Some).0 becomes:
some X (fun _0 -> some _0 (fun _0 -> do something with it here))

This is easily unreadable and causes huge nesting.
Using the Coma dot:

some X . (fun _0 -> some _0) . (fun _0 -> do something with it here)
  1. Paul has implemented a nice let%attr syntax which will allow us to de-duplicate our usage of spans which will dramatically improve the legibility of coma expressions and our diffs in tests. let%attr lets us give an attribute a name, which can then be used via [%@ name]

@xldenis
Copy link
Collaborator Author

xldenis commented Mar 11, 2024

A few issues remain:

  1. The code for handling loops needs to be reworked a bit
  2. Casts need to be changed, the of_int portion in particular is not a pure function and thus should be a coma handler.
  3. Reading and writing to places needs to be rebuilt from scratch which is rather annoying.

@xldenis xldenis force-pushed the coma branch 2 times, most recently from 12b98bc to 6606ed0 Compare April 10, 2024 14:23
@xldenis xldenis marked this pull request as ready for review April 30, 2024 07:09
@xldenis xldenis force-pushed the coma branch 3 times, most recently from b3b1a5b to 7c5b35e Compare April 30, 2024 13:10
@xldenis xldenis force-pushed the coma branch 10 times, most recently from 6c6bda9 to 3ee97b0 Compare May 23, 2024 22:06
@xldenis xldenis force-pushed the coma branch 4 times, most recently from b485f57 to fc730f6 Compare June 2, 2024 18:47
@xldenis xldenis merged commit 1d38909 into master Jun 3, 2024
4 checks passed
@xldenis xldenis deleted the coma branch June 3, 2024 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant