Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cross product and join Order are not well defined #1662

Open
jhellerstein opened this issue Jan 16, 2025 · 4 comments
Open

cross product and join Order are not well defined #1662

jhellerstein opened this issue Jan 16, 2025 · 4 comments
Assignees
Labels
hydro lang global choreographic language

Comments

@jhellerstein
Copy link
Collaborator

The DFIR join operator uses symmetric_hash_join_into_iter, which (a) dynamically chooses which side is the "outer loop" based on collection size, and (b) uses a HashMap to determine the order of the "inner loop". This leads to non-determinism for even simple cross_product operators in Hydro:

use hydro_lang::*;
use dfir_rs::futures::StreamExt;
tokio_test::block_on(test_util::stream_transform_test(|process| {
   let tick = process.tick();
   let stream1 = process.source_iter(q!(vec!['a', 'b', 'c']));
   let stream2 = process.source_iter(q!(vec![1, 2, 3]));
   stream1.cross_product(stream2)
}, |mut stream| async move {
// should be ('a', 1), ('a', 2), ('a', 3), ('b', 1), ('b', 2), ('b', 3), ('c', 1), ('c', 2), ('c', 3)
for w in vec![('a', 1), ('b', 1), ('c', 1), ('a', 2), ('b', 2), ('c', 2), ('a', 3), ('b', 3), ('c', 3)] {
    assert_eq!(stream.next().await.unwrap(), w);
}
}));
@jhellerstein jhellerstein self-assigned this Jan 16, 2025
@shadaj
Copy link
Member

shadaj commented Jan 16, 2025

Until we can add a DFIR variant that uses a deterministic algorithm, we should at least mark the return type of cross_product as NoOrder for now.

@jhellerstein
Copy link
Collaborator Author

The fix shouldn't be that hard, so maybe not necessary to have a temporary fix. We have dfir_rs/src/compiled/pull/cross_join.rs we just aren't setting it up in dfir_lang/src/graph/ops/cross_join.rs and dfir_lang/src/graph/ops/cross_join_multiset.rs -- we are reusing the logic from dfir_lang/src/graph/ops/join.rs .. I think to avoid copying a lot of persistence boilerplate.

@shadaj
Copy link
Member

shadaj commented Jan 16, 2025

cross_product will still be non-deterministic for unbounded streams though, which is why I think we should do the conservative thing for now till we figure out what the semantics should be.

@jhellerstein
Copy link
Collaborator Author

jhellerstein commented Jan 17, 2025 via email

@MingweiSamuel MingweiSamuel added hydro lang global choreographic language documentation Improvements or additions to documentation and removed documentation Improvements or additions to documentation labels Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hydro lang global choreographic language
Projects
None yet
Development

No branches or pull requests

3 participants