Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse trees are in source order instead of argument order #170

Open
digama0 opened this issue Oct 28, 2024 · 6 comments · May be fixed by #173
Open

Parse trees are in source order instead of argument order #170

digama0 opened this issue Oct 28, 2024 · 6 comments · May be fixed by #173

Comments

@digama0
Copy link
Member

digama0 commented Oct 28, 2024

Not all metamath term constructors take their arguments in source order. An important example is wal $p A. x ph $., which has x before ph in source order but because wph $f wff ph $. is declared before vx $f setvar x $., the actual argument order is wal(ph, x), which is visible in syntax proofs in the database.

Currently, grammar.rs does not handle this situation correctly, and always adds the arguments to the tree in source order, resulting in ill formed trees of the form wal(x, ph). This is the source of one of the errors reported at digama0/mm-hammer#1 .

cc: @tirix

@tirix
Copy link
Collaborator

tirix commented Oct 28, 2024

Interesting - In grammar.rs I did not intend to add the arguments to the tree in source order, but only in a deterministic order, i.e. every time a formula is parsed, the tree is ordered the same way, so that two trees can be compared. That order is order of appearance in the formula, so that parsing is fast. I considered the way the tree is encoded and stored as an internal which did not need to be known outside of the program itself.

See the comment and the code for the build_syntax_proof method: it specifically reorders the tree in "frame" order (which is the "source" order you mention), to build the proof in the correct order.

I assume that you need something like this - or maybe even exactly this function.
What is your use case exactly?

digama0 added a commit that referenced this issue Oct 28, 2024
digama0 added a commit that referenced this issue Oct 28, 2024
@digama0 digama0 linked a pull request Oct 28, 2024 that will close this issue
@digama0
Copy link
Member Author

digama0 commented Oct 28, 2024

The trees are not an internal detail, they are the output of parsing. Any tool that wants to use the parser will want the resulting syntax objects after parsing, as a tree, or otherwise what was the point of the parse? And the tree consists of nodes and those nodes are syntax axioms which have a definite order for their arguments which is not lexical order. I have a fix for the parser code in #173.

See the comment and the code for the build_syntax_proof method: it specifically reorders the tree in "frame" order (which is the "source" order you mention), to build the proof in the correct order.

When I said "source" order, I mean textual left-to-right parsing order (what the parser currently does), which is not the same as "frame" order (which is what I believe it should be doing). I am not yet sure what to do about this function in #173.

I considered the way the tree is encoded and stored as an internal which did not need to be known outside of the program itself.

By the way, this also caused difficulties for me, which is one of the reasons I needed to use a branch in order to get at the tree data. I don't think I want build_syntax_proof, because I want literally the structure which is in the tree itself, not encoded as metamath inferences.

@tirix
Copy link
Collaborator

tirix commented Oct 28, 2024

Any tool that wants to use the parser will want the resulting syntax objects after parsing, as a tree, or otherwise what was the point of the parse?

Until now it was sufficient for the tools I wrote to have an "opaque" tree which could be matched against other trees and where variables could be substituted with other trees. I needed parsing and those operations to be as quick as possible, and the move back to the "proof-order" world occurred only once, if a proof was found, and after the proof was build.

By the way, this also caused difficulties for me, which is one of the reasons I needed to use a branch in order to get at the tree data.

Sorry about that!

I don't think I want build_syntax_proof, because I want literally the structure which is in the tree itself, not encoded as metamath inferences.

And what do you do with that structure?

In my use cases until now, I have been using build_syntax_proof to build the syntax proof tree for a formula, and include it in the generated proofs, so the need to access internal information in the parsed tree did not emerge.
I've only read the description of mm-hammer from which I see this discussion stems, and from it I was assuming you have a similar requirement.

Indeed I agree that if the formula needed to be accessed, one might expect the "frame" order (sorry for the confusion of naming the children ordering, I wrote "source order" for "frame order" earlier). But that "frame" order is especially important for building proofs, which build_syntax_proof already does.

PS. I've read some of the MM Hammer source code and it seems you maybe first want to output some LISP style formula to be used as an input for the external tools. Am I correct?

@digama0
Copy link
Member Author

digama0 commented Oct 28, 2024

Any tool that wants to use the parser will want the resulting syntax objects after parsing, as a tree, or otherwise what was the point of the parse?

Until now it was sufficient for the tools I wrote to have an "opaque" tree which could be matched against other trees and where variables could be substituted with other trees. I needed parsing and those operations to be as quick as possible, and the move back to the "proof-order" world occurred only once, if a proof was found, and after the proof was build.

For that, I don't see why you need a tree at all. Metamath already supports substitution and matching of formulas, without trees.

PS. I've read some of the MM Hammer source code and it seems you maybe first want to output some LISP style formula to be used as an input for the external tools. Am I correct?

Yes, that is the use case at the moment. mm-hammer is an adaptation of the translation from MM to MM0, which basically needs to insert parentheses in MM theorem statements and turn them into a lispish format, where the applications of course need to be consistent with the declaration of the term constructor.

I'm looking at sub_build_syntax_proof now and it's doing something quadratic or worse in order to construct symbol strings for every proof step. This seems like a weakness of the ProofBuilder interface, because many implementers don't care about the constants at all, and this way of building them is very costly.

@tirix
Copy link
Collaborator

tirix commented Oct 30, 2024

Matching and substitution.

For that, I don't see why you need a tree at all. Metamath already supports substitution and matching of formulas, without trees.

Metamath does, but metamath-knife did not.
Also, I wanted those operations to be fast. If I remember well, Metamath-exe has an algorithm with some heuristics (for example, using parenthesis matching), which needs backtracking and is probably quadratic for each match. Once one has the formulas parsed, those operations becomes linear at worst.

I'm looking at sub_build_syntax_proof now and it's doing something quadratic or worse in order to construct symbol strings for every proof step.

I believe with your proposal the build_syntax_proof should become linear, because the terms would already be correctly sorted in "frame" order.

This seems like a weakness of the ProofBuilder interface, because many implementers don't care about the constants at all, and this way of building them is very costly.
I'm sorry I don't immediately see how constants play a special role here.

@digama0
Copy link
Member Author

digama0 commented Oct 30, 2024

#173 now implements an updated and improved version of build_syntax_proof which does the tree walk in postorder as long as you don't need the constant strings. When you do need the constant strings, this is effectively the same problem as "rendering" to a string, so we have to do the walk in source order while filling the arrays in postorder, which is a bit messy. Still, it can be done in linear time either way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants