Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] fixpoint iteration support #603

Open
wants to merge 38 commits into
base: master
Choose a base branch
from
Open

Conversation

carljm
Copy link
Contributor

@carljm carljm commented Oct 23, 2024

This PR removes the existing unwind-based cycle fallback support (a plus for WASM compatibility), and replaces it with support for fixpoint iteration of cycles.

To opt in to fixpoint iteration, provide two additional arguments to salsa::tracked on the definition of a tracked function: cycle_initial and cycle_fn. The former is a function which should provide a provisional starting value for fixpoint iteration on this query, and the latter is a function which has the opportunity, after each iteration that failed to converge, to decide whether to continue iterating or fallback to some fixed value. See the added test in cycle_fixpoint.rs for details.

Usability points that should be covered in the documentation:

  • With the old cycle fallback, it was sufficient to avoid panic for at least one query in a cycle to define a cycle fallback. With fixpoint iteration, to avoid cycle panics you must define cycle_fn and cycle_initial on every query that might end up as the "head" of a cycle (that is, queried for its value while it is already executing.)
  • It is entirely possible to define cycle_fn and cycle_initial so as to cause iteration to diverge and never terminate; it's up to the user to avoid this. Techniques to avoid this include a) ensuring that cycles will converge, by defining the initial value and the queries themselves monotonically (for example, in a type-inference scenario, the initial value is the bottom, or empty, type, and types will only widen, never narrow, as the cycle iterates -- thus the cycle must eventually converge to the top type, if nowhere else), and/or b) with a larger hammer, by ensuring that cycle_fn respects the iteration count it is given, and always halts iteration with a fallback value if the count reaches some "too large" value.
  • It's also entirely possible to define cycle_fn and cycle_initial such that memoized results can vary depending only on the order in which queries occur. Avoid this by minimizing the number of tracked functions that support fixpoint iteration and ensuring initial values and fallback values are consistent among tracked functions that may occur in a cycle together.

This is an RFC pull request to get initial reviewer feedback on the design and implementation. Remaining TODO items:

  • add tests for more complex cycles:
    • nested (multiple head) cycles
    • cycles with multiple paths back to the same cycle head
  • add tests for cross-thread cycles
  • add tests that use inputs in cycle recovery functions
  • test in red-knot and validate it works there
  • performance improvements
    • lazy creation of initial-value memo?
  • documentation

Copy link

netlify bot commented Oct 23, 2024

Deploy Preview for salsa-rs canceled.

Name Link
🔨 Latest commit cec63ba
🔍 Latest deploy log https://app.netlify.com/sites/salsa-rs/deploys/6788547db04499000907d5e0

Copy link

codspeed-hq bot commented Oct 23, 2024

CodSpeed Performance Report

Merging #603 will degrade performances by 22.87%

Comparing carljm:fixpoint (cec63ba) with master (e4d65a6)

Summary

❌ 6 regressions
✅ 3 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark master carljm:fixpoint Change
new[Input] 18.5 µs 22.9 µs -19.12%
new[InternedInput] 21.1 µs 25.2 µs -16.45%
mutating[10] 16.5 µs 21.3 µs -22.73%
mutating[20] 16.9 µs 21.9 µs -22.87%
mutating[30] 17 µs 22 µs -22.53%
many_tracked_structs 49.2 µs 55.4 µs -11.11%

@nikomatsakis
Copy link
Member

This is very cool! (Admittedly, I say this pre-review.)

Copy link
Contributor

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I left a few comments where I struggled understanding the implementation or had smaller suggestions.

@@ -179,12 +182,16 @@ macro_rules! setup_tracked_fn {
$inner($db, $($input_id),*)
}

fn cycle_initial<$db_lt>(db: &$db_lt dyn $Db) -> Self::Output<$db_lt> {
$($cycle_recovery_initial)*(db)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that it's possible that the initial value function or the recover functions itself could create new cycles. Is this indeed the case and if so, what's salsa's behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. In the use cases I have in mind (e.g. for red-knot) there should not be any reason to call a query from within one of these functions. And it seems like this could be quite difficult to deal with. So unless we have clear use cases, I would rather consider this out of scope. We could just document "don't do that," or we could add some kind of explicit prevention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I'm not suggesting that users should do that. I only wonder about what happens if a user does it regardless.

Ideally, we wouldn't provide a Db but we can't do that because a user might want to create a tracked struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the simplest approach here would be to add some state on Runtime that causes an error if you try to push a new active query. I'll wait for Niko review before doing that, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a TODO comment on this to get Niko's feedback.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this replaces the old cycle handling, we (rust-analyzer) actually needs to be able to call queries here as we already do that in some of our cycle recovery functions today. I think most of those are in part because of assumptions else where in the code, an example being generic_defaults always returning a list with length matching the number of generics, and to compute the number we need to call a query. That query can run into a cycle though, so we still need to figure out the number of generics in the recovery function which means we need to call the corresponding query.

src/function/execute.rs Outdated Show resolved Hide resolved
src/function/execute.rs Outdated Show resolved Hide resolved
src/function/execute.rs Outdated Show resolved Hide resolved
src/function/execute.rs Outdated Show resolved Hide resolved
src/function/execute.rs Outdated Show resolved Hide resolved
src/zalsa_local.rs Outdated Show resolved Hide resolved
src/zalsa_local.rs Outdated Show resolved Hide resolved
}
}

#[salsa::tracked(cycle_fn=cycle_recover, cycle_initial=cycle_initial)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we define a custom CycleRecovery trait that defines recover and initial methods. It would give us a good point to put cycle handling documentation and avoids the risk for incorrect-macro use when only specifiying one but not boht values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought about this as well. It's not clear to me what is actually the better UX. Implementing a trait is somewhat more boilerplate, and in practice there's not much difference in the experience if you fail to implement one of the methods. Either way the compiler catches the error for you, because its an error in macro expansion if you don't provide both cycle_fn and cycle_initial, with a trait it would be a compiler error that you didn't fully implement the trait.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess with a trait your IDE might give you the right signatures of the methods for free, which is kind of nice...

Copy link
Contributor

@davidbarsky davidbarsky Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I normally don't like traits, but I feel like a trait would be preferable because the current annotation seems a little too wordy for my tastes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, going to still wait for Niko's feedback on this point before updating, but the trait idea makes sense to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm okay with a non-trait based approach. rust-analyzer uses a #[salsa::cycle(path::to::function)]-style annotation, so I think to ease the transition, avoiding a trait would be nice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rust-analyzer uses a #[salsa::cycle(path::to::function)]-style annotation, so I think to ease the transition, avoiding a trait would be nice.

Is cycle handling common in ra? I do see how it reduces the diff size because it isn't necessary to make the function a trait-method but you would still have to change every use because you now have to specify both the cycle and cycle initial functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a TODO comment in the code for this question as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its common but not common enough that it would matter (there are ~11 cycle handlers and the majority of them return a dummy result)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One theoreitcal nice thing about taking the functions here directly would be that we could allow supporting closure expressions in that position inline instead of having to create a whole function (as cycles that are not going to be fixpoint are likely not gonna have complex functions here), a trait would be quite wordy imo.

@carljm
Copy link
Contributor Author

carljm commented Oct 29, 2024

In writing more comprehensive tests for this, I realized that it needs some changes to correctly handle multi-revision scenarios; taking it to Draft mode until I get that fixed.

@carljm carljm marked this pull request as draft October 29, 2024 18:15
@carljm
Copy link
Contributor Author

carljm commented Oct 30, 2024

Ok, multiple-revision cases are now fixed, and we now populate the initial provisional value only lazily, in case a cycle is actually encountered, which should reduce the number of memos created by quite a lot.

Also added a bunch of tests, including multiple-revision cases and one test involving durability. Still need to add cross-thread cycle tests.

@carljm carljm marked this pull request as ready for review October 30, 2024 00:37
Comment on lines 126 to 139
// Diagram nomenclature for nodes: Each node is represented as a:xx(ii), where `a` is a sequential
// identifier from `a`, `b`, `c`..., xx is one of the four query kinds:
// - `Ni` for `min_iterate`
// - `Xi` for `max_iterate`
// - `Np` for `min_panic`
// - `Xp` for `max_panic`
//
// and `ii` is the inputs for that query, represented as a comma-separated list, with each
// component representing an input:
// - `a`, `b`, `c`... where the input is another node,
// - `uXX` for `UntrackedRead(XX)`
// - `vXX` for `Value(XX)`
// - `sY` for `Successor(Y)`
//
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are admittedly obscure-looking chicken scratches. I'm not claiming they are super readable, but they are concise enough to put into an ASCII graph diagram, and (once you get familiar with them) give a lot of information about the behavior of the test. They were really helpful to me in writing and debugging the tests.

Open to feedback that I should do this differently for better readability by future maintainers...

Copy link
Contributor

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lazy creation of the initial value is a neat improvement. Nice for taking the time to work on it !

src/function/execute.rs Outdated Show resolved Hide resolved
src/function/execute.rs Outdated Show resolved Hide resolved
src/function/execute.rs Outdated Show resolved Hide resolved
@MichaReiser
Copy link
Contributor

MichaReiser commented Oct 30, 2024

The benchmarks show a 4-5% regression. It seems that we're now resizing some hash maps more often. Are we reporting more tracked reads than before? Could you take a look what's causing it?

@carljm
Copy link
Contributor Author

carljm commented Nov 1, 2024

Initial experiments using this in the red-knot type checker are promising: astral-sh/ruff#14029

Not yet using it for loopy control flow in that PR, but there are cycles in the core type definitions of Python builtins and standard library, which we previously had a hacky fallback in place for using Salsa's previous cycle fallback support. Moving over to fixpoint iteration just worked, and fixed the type of a builtin impacted by the cycle.

On the downside, it is a performance regression. Need to do more work there.

@@ -73,22 +114,29 @@ where
);

// Check if the inputs are still valid and we can just compare `changed_at`.
if self.deep_verify_memo(db, &old_memo, &active_query) {
return Some(old_memo.revisions.changed_at > revision);
let active_query = zalsa_local.push_query(database_key_index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From looking at the red knot benchmarks, the regression mainly comes from the extra push_query calls here (that probably also applies for queries not participating in cycles) and the constructed hash set in deep_verify_memo (specific to red knot?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move the push_query call into deep_verify_memo after the second shallow_verify_memo or does that result in deadlocks? Just so that we can avoid pushing queries unless it's absolutely necessary

Copy link
Member

@Veykril Veykril Jan 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems weird, this isnt an extra push_query call after all. This merely moved down a couple lines

Edit: Nevermind, there are more in execute

Copy link
Contributor Author

@carljm carljm Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we no longer pass an active query to execute, so we push twice instead of once, even for non-cycle queries.

Once the behavior is right I can look at whether we can eliminate this. Maybe we don't have to push and pop the cycle-head active query for each iteration loop.

{
return false;
loop {
let mut cycle_heads = FxHashSet::default();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional that we create a new cycle_heads in every iteration? Could we re-use the cycle_heads and instead call clear to avoid re-allocating the hash set on every iteration?

})
}
/// Cut off iteration and use the given result value for this query.
Fallback(T),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my own understanding, but having a cycle_fn that returns Fallback immediately would effectively be the same as the previous cycle handling right? And it would never call the initial function then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implementation, the initial function will still be called. If query A(0) is executing, it calls B(0), which in turn calls A(0), the second call to A(0) will call the initial function to get an initial value to return to B(0), which will compute its provisional value and return that to A(0), which will then call the cycle recovery function. If that cycle recovery function immediately returns Fallback, then that fallback value will be used for A(0). (I think we also need to then iterate the cycle one more time in order to populate a correct value for B(0) based on the fallback value for A(0), but that's not currently implemented.)

It seems like it could be possible to instead call the cycle recovery function first thing when we encounter the cycle, and allow it to return Fallback immediately, before the initial function is ever called. That seems worth experimenting with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A key difference in this PR from the current cycle handling is that in the current cycle handling all cycle participants that have a cycle fallback receive a fallback value; this ensures determinism regardless of query order, but it means that you can't have some participants in the cycle compute a real value based on a fallback for some other cycle participant.

In this PR, if your recovery function falls back to some value, that fallback only applies to the individual query, and the rest of the cycle will then resolve normally using that fallback value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks, that explanation made this very clear!

Copy link
Contributor Author

@carljm carljm Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think we also need to then iterate the cycle one more time in order to populate a correct value for B(0) based on the fallback value for A(0), but that's not currently implemented.)

I was wrong, I had already implemented this.

There is a question about what should happen if the cycle_fn says to fallback to some value, and we provisionally insert the fallback value and iterate the cycle one more time, and the cycle still fails to converge. It would be nice if we can require that the cycle does converge using the fallback, because that means we still leave the cycle in a consistent converged state. But maybe we will have to just return the fallback value and leave the cycle in an inconsistent state.

I'm not sure if real-world use cases for fallback will require that we leave the cycle in an inconsistent state. For now I've added a panic, but we can change this.

Comment on lines +33 to +37
pub(crate) enum ClaimResult<'a> {
Retry,
Cycle,
Claimed(ClaimGuard<'a>),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ClamGuard only has one niche so this might affect perf slightly compared to the Option wrapping from before

Funnily enough, we can give it 254 niches by adding a bool field to it which occupies its padding, something to keep in mind.

// only when a cycle is actually encountered.
let mut opt_last_provisional: Option<&Memo<<C as Configuration>::Output<'db>>> = None;

loop {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we branch on the cycle strategy here? There is a lot of work that I don't think needs to happen for Panic strategies that non cycle supporting queries currently pay (and given the branching would happen on a const it should optimized out)

Comment on lines +13 to +23
/// Result of memo validation.
pub enum VerifyResult {
/// Memo has changed and needs to be recomputed.
Changed,

/// Memo remains valid.
///
/// Database keys in the hashset represent cycle heads encountered in validation; don't mark
/// memos verified until we've iterated the full cycle to ensure no inputs changed.
Unchanged(FxHashSet<DatabaseKeyIndex>),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change (bool being replaced by VerifyResult) also seems like it will make non cycle queries pay a hefty perf price. That type has a destructor and is 40 bytes where as the previous return value could fit into a register, this can't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll focus first on getting consistently right behavior (including across threads) and then revisit this and other possible ways to claw back perf. Off the top of my head I'm not sure how to improve this, will need to think about it.

Copy link
Member

@Veykril Veykril Dec 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes of course, I am mainly noting down what I saw, no need to tackle perf nits/reviews until things are working properly. (also happy to help out with perf work on this PR once its at that stage)

@Veykril
Copy link
Member

Veykril commented Dec 29, 2024

On a related perf note, would it make sense to move the dataflow test (or a maybe slightly more involved version) into a benchmark?

* master: (32 commits)
  fix `HashEqLike` for `Box<T>`
  bless ui tests
  expose `Database::unwind_if_revision_cancelled`
  Describe deadlock risk for assemble closure in interning
  Simplify Event construction
  Remove panicking paths from `maybe_changed_after`
  Remove PartialOrd, Ord implementations from keys
  Remove unnecessary `Option` around `Storage::zalsa_impl`
  Remove unreachable unwraps via typing
  Make `QueryEdge` an enum
  Type `QueryEdge`
  Encapsulate `DependencyIndex`
  Fix mutating benchmark benching more than just mutation
  Use proper setup for compare benches
  Fix benchmarks accidentally re-using the same database
  Cleanup
  Add self-ref test
  Split Lookup into two traits
  Implement self-referential interning
  Improve safety comments on function/fetch
  ...
* master:
  fix completions in `#[salsa::tracked]` functions
* master:
  Shrink `QueryRevisions` size by 3 words
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants