Rewrites under Local Assumptions in E-Graphs #241

Tarmean · 2023-03-06T17:51:29Z

Tarmean
Mar 6, 2023

The problem

E-Graphs are great at datalog style rules and very bad at backtracking search. I occasionally want to apply rules if it would match under local assumptions, applying the rewrite without making the assumption global.

Take this SQL query as an example:

SELECT U1.name, U2.name
FROM users U1, users U2
WHERE U1.user_id = U2.user_id

Assuming user_id is a primary key of users we can skip the join; U1 and U2 always point to the same tuple. Here is how we could encode this:

Turn proj(U1, user_id) ~ proj(U2, user_id) into an E-Graph equality
Translate fundeps into skolem functions, e.g. U1 ~ user_id_fundep(proj(U1, user_id))
Let congruence closure do the rest

This treats the query as a logic formula which checks whether a tuple is in the result set.
But if the query is nested in some outer join or or statement, we can't just turn the local WHERE clause into global equalities.
The local rewrite is still valid, though, intuitively because the join is in the same select-project-join block as the U1.user_id = U2.user_id guard.

Does anyone have smart tricks to handle such localized but non-backtracking reasoning?

Solutions I have played with

These all seem to sort-of work but none feels super amazing:

E-Graph non-rewriting

I've tried using the E-Graph as an oracle only. Use the E-Graph for pattern matching and analysis, but apply the rewrite to a normal AST. After each AST-rewrite iteration rebuild the E-Graph.
At least rewriting an AST also means you don't accidentally rewrite all references to some node if the rewrite is only valid for some.

Distribute all choices

Distribute all choices so they only occur at the top-level, do one E-Graph for each concrete version, intersect the results. E-Graph intersection is quadratic quadratic and can be lossy, though, and we'd duplicate a lot of work.

Distribute a bit smarter

We could use a decision tree instead of doing all choices at once.
Expand a decision tree node by:

Taking some expression x=or(a,b) , generate child E-Graphs with x=a and x=b
While running, build a diff of what changed
Propagate common changes up the decision tree by intersecting the diffs

Using diffs can make the intersection more tractable but the implementation wants pure functional data structures or push/pop style backtracking.
Plus intersection works best if you rewrite thoroughly (everything has to align syntactically) and if you have exponentially many new terms after each branch the diff approach doesn't help much.

Colored E-Graph

I've heard that some people worked on E-Graphs using colored union finds. This may just answer all my problems, but so far I haven't found an implementation.

Do explicit context passing

Track context info syntactically, ala explicit substitution calculi. That way each context gets hash-consed into a different node. Feels quite awkward and not very E-Graph-y, though.

I'm not in love with any of these. Have others run into similar situations before?

oflatt · 2023-03-06T23:03:31Z

oflatt
Mar 6, 2023
Maintainer

Hi,
As far as I know, there isn't a best or most efficient agreed upon solution, and it probably depends on the domain. You already gave a great overview of the approaches people have tried, I think mostly people encode contexts using syntax. In this paper, they use assume nodes to assume facts about the subcontext.

This is an active area of research, and a difficult open problem! You have some interesting ideas with "distributing smarter". We have also thought about writing a functional egraph implementation as you mentioned.

3 replies

oflatt Mar 6, 2023
Maintainer

Another thing I've been thinking about is how to share more of the matching across contexts to make "colored egraphs" more efficient and usable.

Tarmean Mar 8, 2023
Author

Thank you for the pointer to the paper! Still haven't fully wrapped my head around the details and how to do rewrites with assume. Encoding the assume'd set of predicates as a datalog-y relation seems like it allows nice queries?

I wonder whether seminaive evaluation would be enough for faster matching in these contexts? Guessing usually causes tiny changes, so having rule matching scale with the number of changes could allow for drastic improvements. I don't think it'd be too invasive:

Transform all symbols so there are *-old and *-new, plus a *-temp to collect new inserts
After each iteration do x-old += x-new; x-new = x-temp; x-temp = [] for all x
When matching do x-new tables first
Split rules into separate ones with all combinations of old and new, dropping the one which only touches old data

But I guess this all breaks down when a rule blows up because then suddenly x-new tables can be much larger?

oflatt Mar 9, 2023
Maintainer

That's a great idea, we've also been thinking that seminaive evaluation, which we use in the upcoming next generation of egg called egglog could help here.

I think some combination of seminaive and sharing equalities in contexts would be very cool

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrites under Local Assumptions in E-Graphs #241

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Rewrites under Local Assumptions in E-Graphs #241

Tarmean Mar 6, 2023

The problem

Solutions I have played with

E-Graph non-rewriting

Distribute all choices

Distribute a bit smarter

Colored E-Graph

Do explicit context passing

Replies: 1 comment · 3 replies

oflatt Mar 6, 2023 Maintainer

oflatt Mar 6, 2023 Maintainer

Tarmean Mar 8, 2023 Author

oflatt Mar 9, 2023 Maintainer

Tarmean
Mar 6, 2023

Replies: 1 comment 3 replies

oflatt
Mar 6, 2023
Maintainer

oflatt Mar 6, 2023
Maintainer

Tarmean Mar 8, 2023
Author

oflatt Mar 9, 2023
Maintainer