Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1 / Fix issues with pattern matching for Rubi #1176

Merged
merged 10 commits into from
Nov 23, 2024

Conversation

aravindh-krishnamoorthy
Copy link
Collaborator

@aravindh-krishnamoorthy aravindh-krishnamoorthy commented Nov 19, 2024

Overview

In this PR, fixes for pattern matching, esp. those that concern Rubi` , are collected.

Related issues

Failing patterns

(Fixes for items struck out above are moved to the next PR.)

Final review

Tests

Documentation

N/A

@rocky
Copy link
Member

rocky commented Nov 19, 2024

For the kinds of problems this fixes, please let's add a test that gets run for this. This way we can ensure no regressions to the code.

Since MatchQ[] is currently in mathics.core.testing_expressions a test case could be added to test/builtin/test_testing_expressions.py Or if you want a more isolated kind of unit test, a new file test/core/test_patterns.py would be created. Thanks.

@aravindh-krishnamoorthy
Copy link
Collaborator Author

For the kinds of problems this fixes, please let's add a test that gets run for this. This way we can ensure no regressions to the code.

Fully agree! I will definitely add them before I finalise this PR.

@aravindh-krishnamoorthy aravindh-krishnamoorthy changed the title Fix issues with pattern matching for expressions with two or more optionals Fix issues with pattern matching for Rubi Nov 21, 2024
@aravindh-krishnamoorthy
Copy link
Collaborator Author

Pattern x_Integer + y_Integer z_. for 1 + 2 x (Not directly relevant for Rubi)

Possibly due to this optimization?

# "leading_blanks" below handles expressions with leading Blanks H[x_, y_, ...]
# much more efficiently by not calling get_match_candidates_count() on elements
# that have already been matched with one of the leading Blanks. this approach
# is only valid for Expressions that are not Orderless (as with Orderless, the
# concept of leading items does not exist).
#
# simple performance test case:
#
# f[x_, {a__, b_}] = 0;
# f[x_, y_] := y + Total[x];
# First[Timing[f[Range[5000], 1]]]"
#
# without "leading_blanks", Range[5000] will be tested against {a__, b_} in a
# call to get_match_candidates_count(), which is slow.

@rocky
Copy link
Member

rocky commented Nov 21, 2024

We need to get it right before we "optimize". So please fix. Thanks.

@aravindh-krishnamoorthy
Copy link
Collaborator Author

We need to get it right before we "optimize". So please fix. Thanks.

Sorry, @rocky, what do you mean? The mentioned optimization is not in the PR but in the mathics-core code.

@rocky
Copy link
Member

rocky commented Nov 21, 2024

We need to get it right before we "optimize". So please fix. Thanks.

Sorry, @rocky, what do you mean? The mentioned optimization is not in the PR but in the mathics-core code.

I mean please put in PR to fix add to this PR a fix for the mathics-core code. Thanks.

@mmatera
Copy link
Contributor

mmatera commented Nov 22, 2024

Pattern x_Integer + y_Integer z_. for 1 + 2 x (Not directly relevant for Rubi)

Possibly due to this optimization?

# "leading_blanks" below handles expressions with leading Blanks H[x_, y_, ...]
# much more efficiently by not calling get_match_candidates_count() on elements
# that have already been matched with one of the leading Blanks. this approach
# is only valid for Expressions that are not Orderless (as with Orderless, the
# concept of leading items does not exist).
#
# simple performance test case:
#
# f[x_, {a__, b_}] = 0;
# f[x_, y_] := y + Total[x];
# First[Timing[f[Range[5000], 1]]]"
#
# without "leading_blanks", Range[5000] will be tested against {a__, b_} in a
# call to get_match_candidates_count(), which is slow.

If you want to deactivate the optimization, just set leading_blanks=False before the loop.

@mmatera
Copy link
Contributor

mmatera commented Nov 22, 2024

Up to here, the changes look reasonable, but it would be nice to see some tests that pass after these changes.

@rocky
Copy link
Member

rocky commented Nov 22, 2024

Up to here, the changes look reasonable, but it would be nice to see some tests that pass after these changes.

Let me go further and explain. From my side, it is better to have many smaller fixes and PRS of specific independent bugs than wait until everything is done in several weeks/months and have one huge PR.

It is also useful to add tests after each bug fix is done rather than view this as some sort of separate and independent bullet item or task.

@aravindh-krishnamoorthy
Copy link
Collaborator Author

Indeed, it makes sense to have smaller PRs. I will finalise these with tests soon.

@aravindh-krishnamoorthy aravindh-krishnamoorthy changed the title Fix issues with pattern matching for Rubi 1 / Fix issues with pattern matching for Rubi Nov 22, 2024
@aravindh-krishnamoorthy
Copy link
Collaborator Author

Pattern x_Integer + y_Integer z_. for 1 + 2 x (Not directly relevant for Rubi)
Possibly due to this optimization?

# "leading_blanks" below handles expressions with leading Blanks H[x_, y_, ...]
# much more efficiently by not calling get_match_candidates_count() on elements
# that have already been matched with one of the leading Blanks. this approach
# is only valid for Expressions that are not Orderless (as with Orderless, the
# concept of leading items does not exist).
#
# simple performance test case:
#
# f[x_, {a__, b_}] = 0;
# f[x_, y_] := y + Total[x];
# First[Timing[f[Range[5000], 1]]]"
#
# without "leading_blanks", Range[5000] will be tested against {a__, b_} in a
# call to get_match_candidates_count(), which is slow.

If you want to deactivate the optimization, just set leading_blanks=False before the loop.

Perhaps the removal of optionals in this PR can be reverted after we undo the leading_blanks optimization. I'll double-check that in the next PR.

("str_expr", "msgs", "str_expected", "fail_msg"),
[
# Two default arguments (linear)
("MatchQ[1, a_.+b_.*x_]",None,"True",None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that would be interesting is to check MatchQ over expressions of the form
A[x_.,B[y_.*z_]]
and check the match with A[1,B[2,y]], A[1,B[y,2]], A[B[y,2],1] for different settings of the attributes of A and B.

Another group of tests should check if variables are properly assigned. To do that, we should test rules using these patterns. For example,

rule=A[x_.,B[y_.*z_]]->{x,y,z}

and then apply it to the previous cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree! This makes sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add them as soon as I have all your and rocky's comments.

Copy link
Member

@rocky rocky Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see more/some unit tests for mathics.core.pattern as its own unit tests, e.g. (not via MatchQ). Other than that, I don't have any further comments.

When everything is in place, I'd like to look over.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see more/some unit tests for mathics.core.pattern as its own unit tests, e.g. (not via MatchQ). Other than that, I don't have any further comments.

When everything is in place, I'd like to look over.

I guess it's always gonna be via MatchQ or rules, as mmatera suggested. Alternate ideas are welcome!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please feel free to add commits to this PR. I think I've enabled the option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see more/some unit tests for mathics.core.pattern as its own unit tests, e.g. (not via MatchQ). Other than that, I don't have any further comments.
When everything is in place, I'd like to look over.

I guess it's always gonna be via MatchQ or rules, as mmatera suggested. Alternate ideas are welcome!

At some point, we should try to add tests at a lower level (using mathics.eval.patterns.match or directly testing the subclasses of mathics.core.pattern.Pattern) that avoids passing through the evaluation loop. But for now I think this is a good starting point.

@aravindh-krishnamoorthy aravindh-krishnamoorthy marked this pull request as ready for review November 22, 2024 11:57
Copy link
Contributor

@mmatera mmatera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-
"""
Unit tests for mathics pattern matching
Unit tests for mathics.core.pattern
Copy link
Member

@rocky rocky Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the part that feels a little weird to me. We are testing a lower-level function using higher-level tests. I am a little bit sensitive to this kind of thing, because this less-good practice is pervasive in this code.

At one time, we didn't have pytests. There were doctests that took the place of pytests. It took us a while to get unburied from that practice.

The MatchQ tests to me should be put in test.buitin.testing_expressions.expression_tests and the rule tests put wherever rules are located.

Tests here could (and should be) to build a ExpressionPattern and run match on it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Indeed, I can do that if you and mmatera prefer that... But that would be more of a Python test than Wolfram syntax, unlike the current tests...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the part that feels a little weird to me. We are testing a lower-level function using higher-level tests. I am a little bit sensitive to this kind of thing, because this less-good practice is pervasive in this code.

At one time, we didn't have pytests. There were doctests that took the place of pytests. It took us a while to get unburied from that practice.

The MatchQ tests to me should be put in test.buitin.testing_expressions.expression_tests and the rule tests put wherever rules are located.

Tests here could (and should be) to build a ExpressionPattern and run match on it.

Yes, but in any case, at some point we can rewrite these tests as lower-level tests. I do not think that these tests actually test the behavior of MatchQ or ReplaceAll, but the low-level code. I can do the translation later, when I find some time for it.

Copy link
Member

@rocky rocky Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests in test.builtin are testing Mathics3 Builtin functions. Of course, any time one tests a higher-level function one is necessarily testing all sorts of lower-level functions. So all of these tests are valid, it is just organized, in my opinion, in the wrong place.

Tests of the ExpressionPattern match function are good because they isolate testing that from all of the other kinds of stuff, and we pinpoint the problems better. when there is a problem. A MatchQ test could fail because of something totally unrelated to the ExpressionPattern match function.

Another thing that these smaller unit tests do (which shouldn't be needed but in fact is used way too much) is to show developers how one can work with, test, and debug the smaller units. Ideally, if our APIs and modularity are done well, the code needed to do this is not that much. Personally, I have found ways to improve modularity when I have been forced to these smaller parts in isolation.

Copy link
Member

@rocky rocky Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but in any case, at some point we can rewrite these tests as lower-level tests. I do not think that these tests actually test the behavior of MatchQ or ReplaceAll, but the low-level code. I can do the translation later, when I find some time for it.

My experience is that the rewrite tends to happen at a smaller rate than creating more code that is somewhat ill-conceived. In fact, I am in the middle of a larger rewrite, for operators right now. It is not fun. In fact, this aspect is what on the makes me the least happy about working on this project as compared to others.

If you want to do the minimal thing, then move the MatchQ code to where MatchQ should be tested and the ReplaceAll to where ReplaceAll tests are tested. At least, this honestly reflects what the test is doing. If later you want to rewrite, then at that time you can.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we test the evaluation of an expression like MatchQ[1+2 a, x_.+y_.*z_Symbol], many processes are involved: the main evaluation loop, the definitions of Plus and Times and their internal implementation, the pattern matching done in the evaluation loop, and eventually, the implementation of matchQ and the pattern matching mechanisms.
If we want to know if the pattern matching works as it should, it would be better to use mathics.eval.patterns.match, which does not try to evaluate its arguments, but just implements the pattern matching.
I think that this is the right level of testing for these tests.

Lower levels involve testing whether the functions and methods used to evaluate matching do what they are supposed to. But at this point, there are functions for which I am still unable to say exactly what the expected behavior is, even after spending quite a long time trying to understand and document them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes, in that case, the test should be under test/eval/

Copy link
Member

@rocky rocky Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we test the evaluation of an expression like MatchQ[1+2 a, x_.+y_.*z_Symbol], many processes are involved: the main evaluation loop, the definitions of Plus and Times and their internal implementation, the pattern matching done in the evaluation loop, and eventually, the implementation of matchQ and the pattern matching mechanisms. If we want to know if the pattern matching works as it should, it would be better to use mathics.eval.patterns.match, which does not try to evaluate its arguments, but just implements the pattern matching. I think that this is the right level of testing for these tests.

Ok. If testing Matcher.match() in mathics.eval.patterns.match is a good place to test against, then let's do that and that would be done in a test.eval.pattern module.

I am just saying that if we're going to test temporarily using MatchQ, then testing.core.patterns is not the right test module placement for this.

As written, the test is extending MatchQ tests, even if the motivation for doing this is to verify what's going on at a lower level. I leave to @aravindh-krishnamoorthy whether to rewrite using Matcher.match() inside test.eval.pattern or move the existing tests to test.builtin.testing_expressions.expression_tests and wherever Rules are testing.

Lower levels involve testing whether the functions and methods used to evaluate matching do what they are supposed to. But at this point, there are functions for which I am still unable to say exactly what the expected behavior is, even after spending quite a long time trying to understand and document them.

Then I think it was a mistake to "optimize" what is not well understood nor, as we have seen, not well covered in tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll always have something that's more interesting to work on (in this case Rubi). So, I want to do it right right now. I'll first try to rewrite the current tests in this PR using Matcher.match() + move them to eval. Only if it turns out to be too cumbersome, I'll fall back to the current ones and move them to expression_tests.

Copy link
Member

@rocky rocky Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your understanding and patience. Sure, it if turns out to be too cumbersome, do as you say.

In fact, the messes in this code have been so great that often I find that I can't fix up a single mess in one go, but have to break things up into little pieces. As you are learning, Pattern matching is one of those big messes.

Down the line, we will use trie-based machining akin to the Golang expreduce (and may try to copy its algorithm). In preparation for that though, having more and better pattern-matching tests in advance will help greatly.

@rocky
Copy link
Member

rocky commented Nov 23, 2024

Great - thanks!

@rocky rocky merged commit d1cf6a4 into Mathics3:master Nov 23, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants