Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(hydroflow_plus)!: mark non-deterministic operators as unsafe and introduce timestamped streams #1584

Merged
merged 1 commit into from
Nov 27, 2024

Conversation

shadaj
Copy link
Member

@shadaj shadaj commented Nov 21, 2024

Big PR.

First big change is we introduce a Timestamped location. This is a bit of a hybrid between top-level locations and Tick locations. The idea is that you choose where timestamps are generated, and then have a guarantee that everything after that will be atomically computed (useful for making sure we add payloads to the log before ack-ing).

The contract is that an operator or module that takes a Timestamped input must still be deterministic regardless of the stamps on messages (which are hidden unless you tick_batch). But unlike a top-level stream (which has the same constraints), you have the atomicity guarantee. Right now the guarantee is trivial since we have one global tick for everything. But in the future when we want to apply @davidchuyaya's optimizations this will be helpful to know when there are causal dependencies on when data can be sent to others.

Second change is we mark every non-deterministic operator (modulo explicit annotations such as NoOrder) with Rust's unsafe keyword. This makes it super clear where non-determinism is taking place.

I've used this to put unsafe blocks throughout our example code and add SAFETY annotations that argue why the non-determinism is safe (or point out that we've explicitly documented / expect non-determinism). I also added #![warn(unsafe_op_in_unsafe_fn)] to the examples and the template, since this forces good hygiene of annotating sources of non-determinism even inside a module that is intentionally non-deterministic.

Paxos changes are mostly refactors, and I verified that the performance is the same as before.

Copy link

cloudflare-workers-and-pages bot commented Nov 21, 2024

Deploying hydroflow with  Cloudflare Pages  Cloudflare Pages

Latest commit: e5d39ac
Status: ✅  Deploy successful!
Preview URL: https://66f5947c.hydroflow.pages.dev
Branch Preview URL: https://pr1584.hydroflow.pages.dev

View logs

@shadaj shadaj force-pushed the pr1584 branch 10 times, most recently from 92a7be4 to 7974894 Compare November 22, 2024 02:51
@shadaj shadaj marked this pull request as ready for review November 22, 2024 03:44
@shadaj shadaj requested a review from MingweiSamuel November 22, 2024 03:44
@shadaj shadaj marked this pull request as draft November 22, 2024 03:44
@shadaj shadaj removed the request for review from MingweiSamuel November 22, 2024 03:45
@shadaj shadaj force-pushed the pr1584 branch 2 times, most recently from da4c9b5 to edb2843 Compare November 22, 2024 21:55
@shadaj shadaj changed the title feat(hydroflow_plus)!: mark non-deterministic operators as unsafe feat(hydroflow_plus)!: mark non-deterministic operators as unsafe and introduce timestamped streams Nov 22, 2024
@shadaj shadaj force-pushed the pr1584 branch 2 times, most recently from 323b4cd to 95e6ac1 Compare November 22, 2024 22:45
@shadaj shadaj marked this pull request as ready for review November 23, 2024 04:58
@shadaj shadaj force-pushed the pr1584 branch 2 times, most recently from fdb1430 to 843df89 Compare November 23, 2024 07:37
Copy link
Member

@MingweiSamuel MingweiSamuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel a bit weird about abusing unsafe in this way. Would be good to have a naming convention also or instead.

In the current code it's also really hard to tell which chained method call is the unsafe one

hydroflow_plus/src/lib.rs Outdated Show resolved Hide resolved
hydroflow_plus/src/location/tick.rs Outdated Show resolved Hide resolved
hydroflow_plus/src/stream.rs Outdated Show resolved Hide resolved
hydroflow_plus_test/src/cluster/paxos.rs Outdated Show resolved Hide resolved
hydroflow_plus_test/src/lib.rs Outdated Show resolved Hide resolved
template/hydroflow_plus/src/lib.rs Outdated Show resolved Hide resolved
… introduce timestamped streams

Big PR.

First big change is we introduce a `Timestamped` location. This is a bit of a hybrid between top-level locations and `Tick` locations. The idea is that you choose where timestamps are generated, and then have a guarantee that everything after that will be atomically computed (useful for making sure we add payloads to the log before ack-ing).

The contract is that an operator or module that takes a `Timestamped` input must still be deterministic regardless of the stamps on messages (which are hidden unless you `tick_batch`). But unlike a top-level stream (which has the same constraints), you have the atomicity guarantee. Right now the guarantee is trivial since we have one global tick for everything. But in the future when we want to apply @davidchuyaya's optimizations this will be helpful to know when there are causal dependencies on when data can be sent to others.

Second change is we mark every non-deterministic operator (modulo explicit annotations such as `NoOrder`) with Rust's `unsafe` keyword. This makes it super clear where non-determinism is taking place.

I've used this to put `unsafe` blocks throughout our example code and add `SAFETY` annotations that argue why the non-determinism is safe (or point out that we've explicitly documented / expect non-determinism). I also added `#![warn(unsafe_op_in_unsafe_fn)]` to the examples and the template, since this forces good hygiene of annotating sources of non-determinism even inside a module that is intentionally non-deterministic.

Paxos changes are mostly refactors, and I verified that the performance is the same as before.
@shadaj
Copy link
Member Author

shadaj commented Nov 27, 2024

Yeah, went back and forth a bunch on using unsafe. The conclusion I came to is that because we are in a staged context, it's more okay for us to use unsafe in a different way since it doesn't affect runtime logic. I think naming conventions are a bit messy because we'd need end-users to also follow that. In lieu of a naming convention, I enabled semantic highlighting rules for VSCode so that all unsafe API calls stick out in bright red.

@shadaj shadaj requested a review from MingweiSamuel November 27, 2024 22:13
.vscode/settings.json Show resolved Hide resolved
Comment on lines +99 to +110
let out = self
.l
.spin()
.flat_map(q!(move |_| 0..batch_size))
.flat_map_ordered(q!(move |_| 0..batch_size))
.map(q!(|_| ()))
.tick_batch(self)
.timestamped(self);

unsafe {
// SAFETY: at runtime, `spin` produces a single value per tick,
// so each batch is guaranteed to be the same size.
out.tick_batch()
}
Copy link
Member

@MingweiSamuel MingweiSamuel Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now can just leave the rest of the unsafe blocks as is, but would be good as a convention to only wrap the unsafe method calls in unsafe, otherwise the whole call chain goes into unsafe and it doesn't distinguish anything (have to check each method called/memorize to see if that is the unsafe one)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think I will try to write up a style guide soon on what to wrap in unsafe. Sometimes it's nice to include additional API calls if it is a "temporary"/"local" unsafety rather than one which propagates out the function call.

@shadaj shadaj merged commit 9393899 into main Nov 27, 2024
29 checks passed
@shadaj shadaj deleted the pr1584 branch November 27, 2024 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants