Implement new sort inference algorithm #3673

Scott-Guest · 2023-09-27T20:03:27Z

Closes #3601.

This PR introduces a new sort inference algorithm, aiming to eventually replace the Z3-based approach and pave the way for parametric rules and sorts.

The design is heavily inspired by The Simple Essence of Algebraic Subtyping: Principal Type Inference with Subtyping Made Easy by Lionel Parreaux. A high-level description explaining the relevant background can be found in docs/developers/sort_inference.md.

The current PR implements a limited form of the proposed algorithm, still falling back to the Z3-based algorithm for any terms containing:

ambiguities
strict casts
parametric sorts

For reviewers, I would begin by reading

docs/developers/sort_inference.md for a high-level conceptual understanding
the comment at the top of SortInferencer.java briefly explaining how the high-level design is actually implemented with our data structures
the paper if anything is unclear

ehildenb · 2023-10-05T15:13:39Z

Let's make this into a blog post! Thanks for the diligent research :)

K is a cool tech we're developing, lots of moving parts.
Parsing is a huge part of that, which involves sort inference.
We're a research company, so we did research on sort inference, found SimpleSub algorithm good for our cases.
Novel from that, we also are able to handle parsing ambiguities! Letting users define grammar makes parsing very difficult, but we solved it!

dwightguth · 2023-10-09T14:04:12Z

Most of this I understand, but I don't completely understand the theory behind the work you're doing with ambiguities. However, that's okay. As long as it works in practice, I understand the algorithm itself well enough to be able to say that it should be fine. Let's see what the implementation looks like and how it performs and behaves once it's complete.

Scott-Guest · 2023-10-19T20:30:12Z

…for new sort inference.

…tory to be in -kompiled.

…leBody to differentiate macro rules vs rules using a macro on the LHS

Scott-Guest · 2023-11-28T21:59:53Z

...l/src/main/java/org/kframework/parser/inner/disambiguation/inference/SortInferenceError.java

+final class MonomorphizationError extends SortInferenceError {
+  // TODO: Produce better error messages!
+  public MonomorphizationError(HasLocation loc) {
+    super("Term is not well-sorted due to monomorphization failure.", loc);


This error message is quite poor, but I don't think it's actually possible to hit it currently (see my other comments on the "Over-Simplification" conceptual issue).

Scott-Guest · 2023-11-29T00:58:27Z

Conceptual Issue - Unrealizable Sub-Term Types

The current algorithm consists of two phases:

Infer a SimpleSub-esque type which may include
- type intersections / unions such as Int ∧ Bool which do not actually exist in the subsort poset
- type variables
Realize the SimpleSub-esque type as a K sort by
- computing type intersections / unions as meets / joins in the subsort poset, erroring if they do not exist
- monomorphizing type variables

In particular, this process is run on the inferred "function type" of the overall term (i.e. converting TermSort<CompactSort> to TermSort<Sort>). Thus, we only check that the type intersections / unions are valid for the sorts of the variables and the sort of the overall term.

However, I am concerned we could run into the following situation:

Both the type of the overall term and the type of every variable can actually be realized as a K sort
The type of some sub-term of the overall term cannot actually be realized as a K sort (i.e. it involves a type intersection / union that does not exist), and we never check if this is valid

For example, consider

syntax Foo ::= "foo" [token]
syntax Bar ::= "bar" [token]
syntax Top1 ::= Foo | Bar
syntax Top2 ::= Foo | Bar
rule foo => bar

The rewrite here has type Foo ∨ Bar, but there is no such least upper-bound (Top1 and Top2 are both upper bounds yet incomparable), so this should report a type error. However, the produced TermSort<Sort> contains no variables and the overall sort is just #RuleContent, so SortInferencer never reports any issue.

For rewrites, this is still caught by a later check,

k/kernel/src/main/java/org/kframework/compile/AddSortInjections.java

Line 538 in 699568b

    
           "Could not compute least upper bound for rewrite sort. Possible candidates: " + lub, loc);

so this has not actually caused issues in practice yet. However, it's likely possible to construct other examples not involving the rewrite itself which this check will miss.

Solution

Every Production's declared sorts only involves sort parameters and primitive sorts. Thus, the only way some sub-term / ProductionReference can have a sort which can't be realized is if the instantiation of its sort parameters can't be realized. That is, it suffices to check that every sort parameter's instantiation can be realized in every polarity where it occurs.

Conceptual Issue - Over-Simiplification

Because the top-level sort of the term is almost always #RuleContent, and the sort of the overall term is the only sort in positive position, every sort variable only occurs in negative position and thus will be simplified away, meaning the monomorphization logic never gets hit.

From our view of K Terms as Simple-Sub functions, this makes sense. Consider

𝜆𝑥 . if 𝑥 < 0 then 0 else 𝑥

with nat <: int, < : int → int → bool, and 0 : nat. The inferred type is 𝛼 ⊓ int → nat ⊔ 𝛼 i.e. either int → int or nat → nat. If we wrap this term in some other function ignore : ⊤ → ⊤

𝜆𝑥 . ignore (if 𝑥 < 0 then 0 else 𝑥)

then the inferred type instead because int → ⊤. In a sense, the type simplification only cares if each type variable is relevant to the externally visible type of the function, without regard for the types of any sub-terms. The runtime type of if 𝑥 < 0 then 0 else 𝑥 in the ignore example will still be nat if we pass in a nat for x, but this is irrelevant for the functions actual signature.

Solution

For now, this doesn't actually matter because we fully monomorphize types, but if we eventually want to support parametric rules we will need to ensure that the exact sort of (at least some of the) sub-terms is expressible over the sorts of the variables.

For every sort that we need to express, we can just consider the usages of any type variable there during co-occurrence analysis to avoid simplifying them aways. However, I'm unsure which sorts actually matter, i.e. which of the following is relevant:

every rewrite's sort?
the sort of the overall rule?
the sort of every single sub-term?

Scott-Guest · 2023-11-29T01:00:19Z

kernel/src/main/java/org/kframework/parser/inner/disambiguation/inference/CompactSort.java

+    bounds.removeIf(
+        s ->
+            subsorts.lessThanEq(s, Sorts.KLabel())
+                || subsorts.lessThanEq(s, Sorts.KBott())
+                || subsorts.greaterThan(s, Sorts.K()));


I am not confident that filtering the bounds this way is correct, but it was necessary lest we get multiple unrelated bounds like KLabel and K.

dwightguth

I'm not sure I fully understand this code, but it seems well thought out, well documented, and well architected, and I'm going to assume that you've run the entire test suite with the CHECKED mode as the default, which would at least tell you that it's inferring the exact same terms as the old algorithm, so assuming that's correct, this seems reasonable.

See my comment though; if we've adequately tested this on the entire test suite and know that it's identical in behavior to the old system on those tests, and if it's easy to turn off in the event of bugs in other semantics, and it's faster than the old code, we ought to turn this on by default.

dwightguth · 2023-11-29T17:00:16Z

kernel/src/main/java/org/kframework/kompile/KompileOptions.java

+          "Choose between the Z3-based and SimpleSub-based type inference algorithms, or run both"
+              + " and check that their results are equal. Must be one of [z3|simplesub|checked].",
+      hidden = true)
+  public TypeInferenceMode typeInferenceMode = TypeInferenceMode.SIMPLESUB;


I'm not sure I understand the rationale behind leaving it off by default. Surely we want it on so that it gets tested by our integration testing moving forward, right?

dwightguth · 2023-11-29T17:13:50Z

There might be room to add a bit more documentation on fields and classes that don't currently have javadocs associated with them, though.

… distinct Variables instances are unequal.

… could be re-processed.

Baltoli

I've nitpicked the code a bit, but broadly speaking I agree with Dwight - this looks really well thought through, and I'm confident enough in the surrounding infrastructure that we should be OK to get it merged and try it out on live projects using a flag.

Excellent work Scott; well done for seeing this through to the point of merging!

docs/developers/sort_inference.md

Baltoli · 2023-11-30T09:54:07Z

kernel/src/main/java/org/kframework/kompile/KompileOptions.java

+          "Choose between the Z3-based and SimpleSub-based type inference algorithms, or run both"
+              + " and check that their results are equal. Must be one of [z3|simplesub|checked].",
+      hidden = true)
+  public TypeInferenceMode typeInferenceMode = TypeInferenceMode.SIMPLESUB;


It will also let us selectively roll the feature out to projects one-by-one and keep an eye on them to make sure we don't see bugs or performance regressions there. I agree that in the end we definitely want this on by default, but it's a big enough feature that for now we should take care rolling it out.

kernel/src/main/java/org/kframework/parser/inner/ParseInModule.java

kernel/src/main/java/org/kframework/parser/inner/disambiguation/inference/CompactSort.java

...l/src/main/java/org/kframework/parser/inner/disambiguation/inference/SortInferenceError.java

kernel/src/main/java/org/kframework/parser/inner/disambiguation/inference/SortVariable.java

kernel/src/main/java/org/kframework/parser/inner/disambiguation/inference/SortInferencer.java

… message.

…il.mak for testing.

…t is respected everywhere

…e the only option.

…g in LUB/GLB computation.

radumereuta · 2023-12-05T13:23:54Z

I've tested the c-semantics with the new and old algorithm and here are some findings:

total time ms
1639 semantics/c.k:477   parse: 375 typeInf:1061
 568 semantics/c.k:477   parse: 339 typeInf:  18 // new alg

254 semantics/implementation/x86_64-linux-gnu/config.k:41   parse:  18 typeInf: 206
 48 semantics/implementation/x86_64-linux-gnu/config.k:41   parse:  13 typeInf:   1 // new alg

The new implementation is much faster.

Scott-Guest self-assigned this Sep 27, 2023

Scott-Guest requested a review from dwightguth October 5, 2023 20:37

Scott-Guest mentioned this pull request Oct 10, 2023

Develop a more principled type inference algorithm #3601

Closed

Scott-Guest added 2 commits October 19, 2023 16:51

Initial implementation of constraint collection and compactification …

40d7690

…for new sort inference.

Implement co-occurrence analysis and corresponding simplification.

8c1c057

Scott-Guest force-pushed the new-type-inference branch from 93237dd to 8c1c057 Compare October 19, 2023 20:52

Scott-Guest added 9 commits November 8, 2023 12:38

Merge remote-tracking branch 'origin/develop' into new-type-inference

b4ff3d5

Apply google style formatting to ParseInModule

b78669d

Applied formatting. Removed partially-implemented ambiguity logic

f748757

Implement rough draft of cast insertion

e177dcd

Implement monomorphization

a12d545

Add --z3-type-inference feature flag.

ffec1b0

Actually hook new type inference algorithm into parser.

d04a883

Actually check --z3-type-inference flag. Change inference debug direc…

0b4790a

…tory to be in -kompiled.

Merge remote-tracking branch 'origin/develop' into new-type-inference

2b4ed78

Scott-Guest force-pushed the new-type-inference branch from 64c7f50 to 2b4ed78 Compare November 15, 2023 20:17

Scott-Guest added 11 commits November 15, 2023 15:27

Remove unused cast insertion modes from SortInferencer

23d7bbd

Address IntelliJ warnings in Interface.scala

f23d825

Change logging to print whether each rule is inferred with Z3 or not.

375731f

Fix handling of function rules to prevent widening.

ea46d7b

Track if the inferred term is a direct child of a #RuleContent or #Ru…

96c2e9a

…leBody to differentiate macro rules vs rules using a macro on the LHS

Improve error messages. Various bug fixes.

94f7077

Add missing copyright comments.

70925be

Apply scalafmt with 100 column width.

04812a8

Fix all IntelliJ warnings.

2dd8177

Remove id from ProductionReference and store in Map instead.

e794277

Merge remote-tracking branch 'origin/develop' into new-type-inference

9e7ab11

Scott-Guest commented Nov 28, 2023

View reviewed changes

Scott-Guest commented Nov 29, 2023

View reviewed changes

dwightguth reviewed Nov 29, 2023

View reviewed changes

Scott-Guest added 4 commits November 29, 2023 15:30

Make Variable a record as the contained SortVariable now ensures that…

870b529

… distinct Variables instances are unequal.

Improve documentation and comments.

5d208d5

Fix small bug in simplification where already optimized out variables…

939c658

… could be re-processed.

Merge branch 'develop' into new-type-inference

282ba57

Baltoli approved these changes Nov 30, 2023

View reviewed changes

Scott-Guest added 7 commits November 30, 2023 10:55

Restructure conditional TypeInferenceMode.CHECKED assertion.

a60aa74

Remove HashSet double brace initialization

172050e

Improve documentation

9d36065

Make use of enhanced instanceof check where applicable.

87bb328

Add further explanation of why MonomorphizationError has a poor error…

0731f4f

… message.

Merge branch 'develop' into new-type-inference

b1965f7

Disable SimpleSub by default, but enable it in ktest.mak and ktest-fa…

cef8679

…il.mak for testing.

Scott-Guest force-pushed the new-type-inference branch from 3c3f2bb to cef8679 Compare December 1, 2023 18:37

Scott-Guest added 5 commits December 1, 2023 13:42

Merge remote-tracking branch 'origin/develop' into new-type-inference

1a16dfc

Set the default in ParseInModule rather than KompileOptions so that i…

2f282d2

…t is respected everywhere

Fix bounds computation to allow us to infer parser sorts when they ar…

402de1a

…e the only option.

Add K as upper bound upon variable creation rather than special casin…

bbde842

…g in LUB/GLB computation.

Temporarily re-enable CHECKED to ensure CI still passes

536db01

Scott-Guest added 2 commits December 5, 2023 19:04

Merge remote-tracking branch 'origin/develop' into new-type-inference

ff0eed3

Turn feature off for merge

a7e7aa7

Scott-Guest added the automerge label Dec 6, 2023

rv-jenkins merged commit fe2916e into develop Dec 6, 2023
8 checks passed

rv-jenkins deleted the new-type-inference branch December 6, 2023 01:31

Baltoli mentioned this pull request Dec 12, 2023

2023 Goals #3098

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement new sort inference algorithm #3673

Implement new sort inference algorithm #3673

Scott-Guest commented Sep 27, 2023 •

edited

Loading

ehildenb commented Oct 5, 2023

dwightguth commented Oct 9, 2023

Scott-Guest commented Oct 19, 2023 •

edited

Loading

Scott-Guest Nov 28, 2023 •

edited

Loading

Scott-Guest commented Nov 29, 2023 •

edited

Loading

Scott-Guest Nov 29, 2023

dwightguth left a comment

dwightguth Nov 29, 2023

dwightguth commented Nov 29, 2023

Baltoli left a comment

Baltoli Nov 30, 2023

radumereuta commented Dec 5, 2023

Implement new sort inference algorithm #3673

Implement new sort inference algorithm #3673

Conversation

Scott-Guest commented Sep 27, 2023 • edited Loading

ehildenb commented Oct 5, 2023

dwightguth commented Oct 9, 2023

Scott-Guest commented Oct 19, 2023 • edited Loading

Road Map

Scott-Guest Nov 28, 2023 • edited Loading

Choose a reason for hiding this comment

Scott-Guest commented Nov 29, 2023 • edited Loading

Conceptual Issue - Unrealizable Sub-Term Types

Solution

Conceptual Issue - Over-Simiplification

Solution

Scott-Guest Nov 29, 2023

Choose a reason for hiding this comment

dwightguth left a comment

Choose a reason for hiding this comment

dwightguth Nov 29, 2023

Choose a reason for hiding this comment

dwightguth commented Nov 29, 2023

Baltoli left a comment

Choose a reason for hiding this comment

Baltoli Nov 30, 2023

Choose a reason for hiding this comment

radumereuta commented Dec 5, 2023

Scott-Guest commented Sep 27, 2023 •

edited

Loading

Scott-Guest commented Oct 19, 2023 •

edited

Loading

Scott-Guest Nov 28, 2023 •

edited

Loading

Scott-Guest commented Nov 29, 2023 •

edited

Loading