Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new sort inference algorithm #3673

Merged
merged 52 commits into from
Dec 6, 2023
Merged

Conversation

Scott-Guest
Copy link
Contributor

@Scott-Guest Scott-Guest commented Sep 27, 2023

Closes #3601.

This PR introduces a new sort inference algorithm, aiming to eventually replace the Z3-based approach and pave the way for parametric rules and sorts.

The design is heavily inspired by The Simple Essence of Algebraic Subtyping: Principal Type Inference with Subtyping Made Easy by Lionel Parreaux. A high-level description explaining the relevant background can be found in docs/developers/sort_inference.md.

The current PR implements a limited form of the proposed algorithm, still falling back to the Z3-based algorithm for any terms containing:

  • ambiguities
  • strict casts
  • parametric sorts

For reviewers, I would begin by reading

  • docs/developers/sort_inference.md for a high-level conceptual understanding
  • the comment at the top of SortInferencer.java briefly explaining how the high-level design is actually implemented with our data structures
  • the paper if anything is unclear

@Scott-Guest Scott-Guest self-assigned this Sep 27, 2023
@ehildenb
Copy link
Member

ehildenb commented Oct 5, 2023

Let's make this into a blog post! Thanks for the diligent research :)

  • K is a cool tech we're developing, lots of moving parts.
  • Parsing is a huge part of that, which involves sort inference.
  • We're a research company, so we did research on sort inference, found SimpleSub algorithm good for our cases.
  • Novel from that, we also are able to handle parsing ambiguities! Letting users define grammar makes parsing very difficult, but we solved it!

@Scott-Guest Scott-Guest requested a review from dwightguth October 5, 2023 20:37
@dwightguth
Copy link
Collaborator

Most of this I understand, but I don't completely understand the theory behind the work you're doing with ambiguities. However, that's okay. As long as it works in practice, I understand the algorithm itself well enough to be able to say that it should be fine. Let's see what the implementation looks like and how it performs and behaves once it's complete.

@Scott-Guest
Copy link
Contributor Author

Scott-Guest commented Oct 19, 2023

Road Map

The first step is to implement the core of SimpleSub, but working with our data structures

  • Record constraints as bounds on type variables (ignoring strict casts, parametric sorts, and ambiguities)
  • Collapse bounds / make compact types
  • Perform co-occurrence analysis and simplify

After this, we need to make the necessary adjustments to change the inferred SimpleSub types to actual K sorts

  • Actually compute type joins and meets in bounds, erroring when impossible
  • Monomorphize inferred parametric sorts
  • Hook inference engine into the main pipeline, inserting type annotations
  • Determine which test cases pass, and whether we can use the current implementation with Z3 as a fall-back

Finally, we need to handle the more serious complications related to K:

  • Handle ambiguities
  • Add support for strict casts
  • Add support for MInt
  • Add support for general parametric sorts

final class MonomorphizationError extends SortInferenceError {
// TODO: Produce better error messages!
public MonomorphizationError(HasLocation loc) {
super("Term is not well-sorted due to monomorphization failure.", loc);
Copy link
Contributor Author

@Scott-Guest Scott-Guest Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message is quite poor, but I don't think it's actually possible to hit it currently (see my other comments on the "Over-Simplification" conceptual issue).

@Scott-Guest
Copy link
Contributor Author

Scott-Guest commented Nov 29, 2023

Conceptual Issue - Unrealizable Sub-Term Types

The current algorithm consists of two phases:

  1. Infer a SimpleSub-esque type which may include
    • type intersections / unions such as Int ∧ Bool which do not actually exist in the subsort poset
    • type variables
  2. Realize the SimpleSub-esque type as a K sort by
    • computing type intersections / unions as meets / joins in the subsort poset, erroring if they do not exist
    • monomorphizing type variables

In particular, this process is run on the inferred "function type" of the overall term (i.e. converting TermSort<CompactSort> to TermSort<Sort>). Thus, we only check that the type intersections / unions are valid for the sorts of the variables and the sort of the overall term.

However, I am concerned we could run into the following situation:

  • Both the type of the overall term and the type of every variable can actually be realized as a K sort
  • The type of some sub-term of the overall term cannot actually be realized as a K sort (i.e. it involves a type intersection / union that does not exist), and we never check if this is valid

For example, consider

syntax Foo ::= "foo" [token]
syntax Bar ::= "bar" [token]
syntax Top1 ::= Foo | Bar
syntax Top2 ::= Foo | Bar
rule foo => bar

The rewrite here has type Foo ∨ Bar, but there is no such least upper-bound (Top1 and Top2 are both upper bounds yet incomparable), so this should report a type error. However, the produced TermSort<Sort> contains no variables and the overall sort is just #RuleContent, so SortInferencer never reports any issue.

For rewrites, this is still caught by a later check,

"Could not compute least upper bound for rewrite sort. Possible candidates: " + lub, loc);

so this has not actually caused issues in practice yet. However, it's likely possible to construct other examples not involving the rewrite itself which this check will miss.

Solution

Every Production's declared sorts only involves sort parameters and primitive sorts. Thus, the only way some sub-term / ProductionReference can have a sort which can't be realized is if the instantiation of its sort parameters can't be realized. That is, it suffices to check that every sort parameter's instantiation can be realized in every polarity where it occurs.

Conceptual Issue - Over-Simiplification

Because the top-level sort of the term is almost always #RuleContent, and the sort of the overall term is the only sort in positive position, every sort variable only occurs in negative position and thus will be simplified away, meaning the monomorphization logic never gets hit.

From our view of K Terms as Simple-Sub functions, this makes sense. Consider

𝜆𝑥 . if 𝑥 < 0 then 0 else 𝑥

with nat <: int, < : int → int → bool, and 0 : nat. The inferred type is 𝛼 ⊓ int → nat ⊔ 𝛼 i.e. either int → int or nat → nat. If we wrap this term in some other function ignore : ⊤ → ⊤

𝜆𝑥 . ignore (if 𝑥 < 0 then 0 else 𝑥)

then the inferred type instead because int → ⊤. In a sense, the type simplification only cares if each type variable is relevant to the externally visible type of the function, without regard for the types of any sub-terms. The runtime type of if 𝑥 < 0 then 0 else 𝑥 in the ignore example will still be nat if we pass in a nat for x, but this is irrelevant for the functions actual signature.

Solution

For now, this doesn't actually matter because we fully monomorphize types, but if we eventually want to support parametric rules we will need to ensure that the exact sort of (at least some of the) sub-terms is expressible over the sorts of the variables.

For every sort that we need to express, we can just consider the usages of any type variable there during co-occurrence analysis to avoid simplifying them aways. However, I'm unsure which sorts actually matter, i.e. which of the following is relevant:

  • every rewrite's sort?
  • the sort of the overall rule?
  • the sort of every single sub-term?

Comment on lines 114 to 118
bounds.removeIf(
s ->
subsorts.lessThanEq(s, Sorts.KLabel())
|| subsorts.lessThanEq(s, Sorts.KBott())
|| subsorts.greaterThan(s, Sorts.K()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not confident that filtering the bounds this way is correct, but it was necessary lest we get multiple unrelated bounds like KLabel and K.

Copy link
Collaborator

@dwightguth dwightguth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully understand this code, but it seems well thought out, well documented, and well architected, and I'm going to assume that you've run the entire test suite with the CHECKED mode as the default, which would at least tell you that it's inferring the exact same terms as the old algorithm, so assuming that's correct, this seems reasonable.

See my comment though; if we've adequately tested this on the entire test suite and know that it's identical in behavior to the old system on those tests, and if it's easy to turn off in the event of bugs in other semantics, and it's faster than the old code, we ought to turn this on by default.

"Choose between the Z3-based and SimpleSub-based type inference algorithms, or run both"
+ " and check that their results are equal. Must be one of [z3|simplesub|checked].",
hidden = true)
public TypeInferenceMode typeInferenceMode = TypeInferenceMode.SIMPLESUB;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the rationale behind leaving it off by default. Surely we want it on so that it gets tested by our integration testing moving forward, right?

@dwightguth
Copy link
Collaborator

There might be room to add a bit more documentation on fields and classes that don't currently have javadocs associated with them, though.

Copy link
Contributor

@Baltoli Baltoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've nitpicked the code a bit, but broadly speaking I agree with Dwight - this looks really well thought through, and I'm confident enough in the surrounding infrastructure that we should be OK to get it merged and try it out on live projects using a flag.

Excellent work Scott; well done for seeing this through to the point of merging!

docs/developers/sort_inference.md Outdated Show resolved Hide resolved
"Choose between the Z3-based and SimpleSub-based type inference algorithms, or run both"
+ " and check that their results are equal. Must be one of [z3|simplesub|checked].",
hidden = true)
public TypeInferenceMode typeInferenceMode = TypeInferenceMode.SIMPLESUB;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will also let us selectively roll the feature out to projects one-by-one and keep an eye on them to make sure we don't see bugs or performance regressions there. I agree that in the end we definitely want this on by default, but it's a big enough feature that for now we should take care rolling it out.

@radumereuta
Copy link
Contributor

I've tested the c-semantics with the new and old algorithm and here are some findings:

total time ms
1639 semantics/c.k:477   parse: 375 typeInf:1061
 568 semantics/c.k:477   parse: 339 typeInf:  18 // new alg

254 semantics/implementation/x86_64-linux-gnu/config.k:41   parse:  18 typeInf: 206
 48 semantics/implementation/x86_64-linux-gnu/config.k:41   parse:  13 typeInf:   1 // new alg

The new implementation is much faster.

@rv-jenkins rv-jenkins merged commit fe2916e into develop Dec 6, 2023
8 checks passed
@rv-jenkins rv-jenkins deleted the new-type-inference branch December 6, 2023 01:31
@Baltoli Baltoli mentioned this pull request Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants