Rewrites under Local Assumptions in E-Graphs #241
Tarmean
started this conversation in
E-graph theory
Replies: 1 comment 3 replies
-
Hi, This is an active area of research, and a difficult open problem! You have some interesting ideas with "distributing smarter". We have also thought about writing a functional egraph implementation as you mentioned. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The problem
E-Graphs are great at datalog style rules and very bad at backtracking search. I occasionally want to apply rules if it would match under local assumptions, applying the rewrite without making the assumption global.
Take this SQL query as an example:
Assuming
user_id
is a primary key ofusers
we can skip the join; U1 and U2 always point to the same tuple. Here is how we could encode this:proj(U1, user_id) ~ proj(U2, user_id)
into an E-Graph equalityU1 ~ user_id_fundep(proj(U1, user_id))
This treats the query as a logic formula which checks whether a tuple is in the result set.
But if the query is nested in some outer join or
or
statement, we can't just turn the local WHERE clause into global equalities.The local rewrite is still valid, though, intuitively because the join is in the same select-project-join block as the
U1.user_id = U2.user_id
guard.Does anyone have smart tricks to handle such localized but non-backtracking reasoning?
Solutions I have played with
These all seem to sort-of work but none feels super amazing:
E-Graph non-rewriting
I've tried using the E-Graph as an oracle only. Use the E-Graph for pattern matching and analysis, but apply the rewrite to a normal AST. After each AST-rewrite iteration rebuild the E-Graph.
At least rewriting an AST also means you don't accidentally rewrite all references to some node if the rewrite is only valid for some.
Distribute all choices
Distribute all choices so they only occur at the top-level, do one E-Graph for each concrete version, intersect the results. E-Graph intersection is quadratic quadratic and can be lossy, though, and we'd duplicate a lot of work.
Distribute a bit smarter
We could use a decision tree instead of doing all choices at once.
Expand a decision tree node by:
x=or(a,b)
, generate child E-Graphs withx=a
andx=b
Using diffs can make the intersection more tractable but the implementation wants pure functional data structures or push/pop style backtracking.
Plus intersection works best if you rewrite thoroughly (everything has to align syntactically) and if you have exponentially many new terms after each branch the diff approach doesn't help much.
Colored E-Graph
I've heard that some people worked on E-Graphs using colored union finds. This may just answer all my problems, but so far I haven't found an implementation.
Do explicit context passing
Track context info syntactically, ala explicit substitution calculi. That way each context gets hash-consed into a different node. Feels quite awkward and not very E-Graph-y, though.
I'm not in love with any of these. Have others run into similar situations before?
Beta Was this translation helpful? Give feedback.
All reactions