Coma backend #968

xldenis · 2024-03-10T09:48:20Z

MLCFG has served us quite well until now, but time time has come for us to consider replacements. In particular, MLCFG has one essential limitation, forced upon us by Why3, which is that all functions must be opaque. This is an issue because often for small trivial functions the contract can be as long or longer than the function itself.
Closures suffer acutely from this: the closure is often no more than a single expression while its contract requires at least a precondition and a postcondition.

Fixing this in WhyML is complex, it goes against the current design of the VC, almost any fix would be a hack. Luckily, Andreï, Jean-Christophe and Paul have been working on an alternative in the form of Coma.

Coma is a new IVL with a few interesting design choices:

It's hyper minimalist, clearly not intended to be written / read by humans. The (abstract) grammar is very simple, and has few constructs.
It's CPS structured, functions (called 'handlers') accept zero-or-more continuations to implement control flow. Gives the frontend huge flexibility on how it might want to encode things like exceptions.
It has an explicit abstraction barrier operator. The black box operator opacifies an expression, generating proof obligations to show the safety of this replacement.
This means we can control where abstraction barriers occur in our programs!
It even opens the door to partially transparent functions.
The mechanics of Coma even allow us to have pre and post operators which reify the preconditions and postconditions of handlers as predicates. This is again useful for closures as it allows us to provide the correct definitions for the postcondition and precondition functions.

By adopting Coma in Creusot, we can solve our issues with closures, provide the developers of Coma a good initial usecase, and potentially future proof ourselves to changes in Why3.

In this PR

This PR implements a complete port of Creusot to Coma, in some cases I think this simplifies understanding of the code generation, in others it renders it more complex.

There is two essential limitations to consider in Coma: 1. A black box must be inserted to break all cycles between handlers. This is understandable, as a cycle corresponds to something like a loop.
2. Effect inference is handled lexically. If we're not careful, we can lose framing information about mutable variables. Handler definition blocks define a single scope, and thus if a single handler modifies a variable, other sibling handlers will have to consider that variable as potentially modified.
This is only a problem when dealing with loops though as for acyclic control flow, the final VC will "inline" the real value for that variable.

Taking these limitation into consideration, we obtain the following high-level design:

Each cycle is (recursively) translated to a group of handlers.
a. The loop head is surrounded in a black box and guarded by invariants.
Each basic block becomes one handler.
a. Within a basic block, we translate each statement to a handler as well. (Purely aesthetic / cleanliness)
For opaque functions: the whole body is wrapped in a black box, and the return continuation is itself guarded by one.

To aid in the translation between direct and continuation-passing styles, we make use of an IntermediateStmt type.
The IntermediateStmt is a 'atomic' (to coma) effectful operation, which is sequenced together.
Once we have finished translating some (set) of MIR constructs, we can take the sequence of generated IntermediateStmts and fold them into a single CPS expression.
In Rust this is much more ergonomic than passing around some boxed continuation closure.

In Coma, only handlers are effectful, while in MLCFG many 'expressions' could be as well, which we were leveraging to encode places.
When a place asserts a constructor (X as Some), we would encode it as a let Some(_) = X, which why3 would then prove was always true.
These expressions could be used in pure contexts in mlcfg, but this does not work in coma.
Instead, we have to use a handler asserting that the passed in value has a specific constructor and producing its component fields.
Thus X as Some becomes some X (fun _0 -> ... )
As places are often quite long, this is the primary source of unreadability in Coma code.
Some effort has been made to improve legibility and prevent right-wards drift of expressions, but even then it remains hard to read.

Limitations

While the PR implements a complete backend, we do have a few notable regressions.

No support for variant, these will have to be manually encoded, we will need to see how we can possibly recover why3's structurally recursive variants.
We lose the "subregion analysis" we were performing on MLCFG. This will need to be re-implemented at the Creusot level, but we can also use this analysis to automatically insert loop invariants for type invariants.

Future work

Besides rectifying the limitations there are some improvements we can still make:

Andreï actually has implemented a syntactic extension which could improve legibility of places but we are not (yet) using it.
He defines a . operator which works much like Haskell's $.
Without it, ((X as Some).0 as Some).0 becomes:

some X (fun _0 -> some _0 (fun _0 -> do something with it here))

This is easily unreadable and causes huge nesting.
Using the Coma dot:

some X . (fun _0 -> some _0) . (fun _0 -> do something with it here)

Paul has implemented a nice let%attr syntax which will allow us to de-duplicate our usage of spans which will dramatically improve the legibility of coma expressions and our diffs in tests. let%attr lets us give an attribute a name, which can then be used via [%@ name]

xldenis · 2024-03-11T09:50:09Z

A few issues remain:

The code for handling loops needs to be reworked a bit
Casts need to be changed, the of_int portion in particular is not a pure function and thus should be a coma handler.
Reading and writing to places needs to be rebuilt from scratch which is rather annoying.

xldenis force-pushed the coma branch from 41ca350 to 444f394 Compare March 12, 2024 17:27

xldenis force-pushed the coma branch from e024982 to ae721ee Compare March 28, 2024 08:45

xldenis force-pushed the coma branch 2 times, most recently from 12b98bc to 6606ed0 Compare April 10, 2024 14:23

xldenis marked this pull request as ready for review April 30, 2024 07:09

xldenis force-pushed the coma branch 3 times, most recently from b3b1a5b to 7c5b35e Compare April 30, 2024 13:10

xldenis force-pushed the coma branch 10 times, most recently from 6c6bda9 to 3ee97b0 Compare May 23, 2024 22:06

xldenis force-pushed the coma branch 4 times, most recently from b485f57 to fc730f6 Compare June 2, 2024 18:47

xldenis added 4 commits June 2, 2024 21:22

Coma backend

a95dfe6

Fix promoted constants and derive macros

0bbe895

Update tests

b8a1a8e

Update sessions

9f1f266

xldenis force-pushed the coma branch from fc730f6 to 9f1f266 Compare June 2, 2024 19:22

xldenis merged commit 1d38909 into master Jun 3, 2024
4 checks passed

xldenis deleted the coma branch June 3, 2024 12:03

Armael mentioned this pull request Jun 3, 2024

language file 'Coma' not found, when launching why3 ide #1018

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coma backend #968

Coma backend #968

xldenis commented Mar 10, 2024 •

edited

Loading

xldenis commented Mar 11, 2024

Coma backend #968

Coma backend #968

Conversation

xldenis commented Mar 10, 2024 • edited Loading

In this PR

Limitations

Future work

xldenis commented Mar 11, 2024

xldenis commented Mar 10, 2024 •

edited

Loading