-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zoo zone for introducing inductive data types? #25
Comments
Some pros and cons: Eliminators (induction principles).
Pattern matching + termination checking
It would be good to implement implement several versions of inductive types here. Plain inductive families a la Dybjer is a good choice. Ideally, both the eliminator version and the mini-TT case split version would be included, with perhaps termination checking for the latter. The existing code is actually rather unsatisfactory in some places (see the repo issues), so I'd like to revise them first. After that, I might get to inductive types. |
@AndrasKovacs It seems that the major difficulty is implementing inductive families. If those are not supported, eliminators and case-split are equally simple to implement. As far as I see, the miniTT you mentioned does not to support inductive families. In order to support inductive families you either add first-order unification (solution, conflict, cycle, deletion rule) with rewrites/substitution or you internalise the index unification, cluttering everything with coercions and no-confusion proofs. In Coq/Lean they internalise the unification, while in Agda everything is in the typechecker. There are also possibilities which are half-way between eliminators and full Agda-style pattern-matching, e.g., explicit case trees with unification of indices. There is a proposal by Barras et al. (A New Elimination Rule for the Calculus of Inductive Constructions), where some substitutions are applied to the result type. Maybe since you already have substitutions in the evaluator in order to support metavariables/elaboration, also adding first-order unification with substitution does not seem like a big concern? But overall I am quite unsure about the "best way" to implement inductive families. It would be great to hear your thoughts! |
To me, the distinguishing feature of miniTT case splits is the lack of nesting and argument permutation, when compared to Agda patterns. So I view it as a minimal version of pattern matching. I don't think it's difficult to add index unification on top of simple case splits, especially if we don't care about I think that if we only want to support inductive families but not more general inductive types, then Agda-style pattern matching is convenient and efficient (in terms of core representation and evaluation), so I'd vote for that. On the other hand, with more general inductive types and perhaps more support for metaprogramming with algebraic theories and signatures, I would instead choose eliminators, because I'd want to work on as few new and complicated features as possible. I think that no-confusion bloat can be handled just fine, in terms of performance, in a reasonably efficient implementation (which Coq & Agda aren't). Moreover, in setoid TT no-confusion holds by definition, so there's no bloat. So in my current project about setoid TT implementation I'm planning to use eliminators. |
Yes, this works. I implemented something like that in an NbE typechecker, which is similar to yours and @ollef's. But I am also interested in full pattern matching on the frontend. And I am not convinced that full Agda-style pattern matches should be part of the core representation. I am wondering what a good core representation and target language for the elaboration procedure should look like, if the goal is to have a core language which can possibly be checked by an independent typechecker. Ideally this TC should have a significantly lower complexity than the elaborator.
In particular the unification aspect worries me, since it is an additional mechanism needed purely to support inductive families, while it would not be needed for anything else during the type checking. On the other hand, the eliminator approach is also not perfectly satisfying since everything is cluttered by "meaningless terms".
I wonder about that remark - the possible efficiency gains you are talking about are mostly due to avoiding unfolding everything too eagerly? Or is there something else which Coq and Agda are doing wrong?
Interesting, how does that work? Is there a good reference you can recommend regarding setoid tt? |
MiniTT's splitting is via pattern matching lambdas. They're simpler IMO than Coq-style matching, where the matching is not tied to a lambda.
I wouldn't view any of that as meaningless. No-confusion may appear to be clutter in a simple setting, but when we go to e.g. cubical Agda, no-confusion can be false for some (higher) inductive types, and essential for transport computation in others.
For any inductive type constructor without quotients, we can define the equality type recursively on constructors, computing automatically along constructor injectivity and disjointness. For example
For quotient inductive types, no-confusion might be false, so there we only assume that equality is a congruence on constructors. Here's a quick overview. The setoid TT paper is nice, but its syntax is practically rather unwieldy. My implementations differ significantly, and have some features which aren't yet covered by any rigorous metatheoretic work that I know of.
Unfolding is important, but there are plenty of performance issues which aren't directly related. You can see some Coq examples |
Regarding pattern matching - I am confused. How do you define pattern matching? For me pattern matching is one or more arguments being matched on, potentially nested. While case-splits are single-constructor non-nested matches. In the core rep I would only want either these simple case splits or eliminators. Therefore I am not classifying minitt as pattern-matching.
Agree. They are obviously not meaningless, otherwise they wouldn't be needed - I mean they can be quite easily constructed and are usually not useful to be shown to the user, at least in the simple setting of inductive families. I see this somehow has an abstraction leaking problem, where the underlying type theory leaks through while the user is offered this nice high-level pattern matching. On the other hand it is quite instructive if it is possible to look at every level of a construction, which is not possible by externalising the theory and baking-in everything in the typechecker.
Ouch, there are quite many. These issues are really nice with the superlinear graphs all over the place :) But I don't have enough insight to tell if there is some fundamental issue going on here or if this is just a suboptimal isolated algorithm at work which can be replaced easily by something better.
Interesting what is the reason for doing that? While walking down during elaboration or checking, it seems quite natural to use levels. It seems to me that Agda is just trying to many things at once all in one big typechecker. But once again I don't have the insight here. I've only looked briefly at the agda typechecker a few times but never came very far. Thank you for the explanations/references regarding setoidtt - I will take a look! I have to think how pattern matching can be compiled down there - but basically you are saying that a separate no-confusion construction is unnecessary, since equality already gives you that? |
I tend to use "pattern matching" for anything which is not eliminators.
As an example, this issue is simple and clearly indicates a fundamental issue.
I don't think that's the case; I would actually even compress the pipeline a bit more, as Agda uses a separate scope-aware pass for operator parsing, and I'd instead do scope-sensitive parsing in one go. The design choice to only use indices is probably as old as Agda 2. Back then the NbE elaboration approach was extremely obscure, and probably at the time it hadn't been implemented anywhere.
That's correct. I admit though that this only eliminates the no-confusion proof terms, but the general boilerplate remains. E.g. if I want to write
|
Well, that's only parsing. Probably there things can be compressed, but in my opinion splitting the pipeline is not necessarily a reason for a slow compiler (see for example chez nanopass). I think haskell is simply dropping too much performance on the floor during basic operations. Then many of the standard libraries are not fast. Take for example serialization, with binary and cereal. I forked persist from store to fix that for me, but it is probably still not as good as it should be - in particular regarding deduplication, which is not available. Generally, about doing many things at once - this makes me concerned about the consistency of the core. And it seems to me that the Agda people are also looking into minimizing the core theory somehow - at least I've read in some false issues that it would be nice to have a smaller trusted kernel.
Yes, I expected that - therefore I asked. It is still nice to have no additional constructions. But even if in cic, where one needs no-confusion terms, they are not that huge. And when showing the eliminators, they could somehow be compressed, e.g., hiding the coercions and impossible branches. |
I think the introduction of inductive data types would be great, but it may bring in more problems due to recursion/induction:
T.rec_on
for simple structural recursion?Any plans for examples of this kind? BTW, ❤️ this repo so much for offering great examples of making elaborators!
The text was updated successfully, but these errors were encountered: