-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(hydroflow_plus)!: mark non-deterministic operators as unsafe and introduce timestamped streams #1584
Conversation
Deploying hydroflow with Cloudflare Pages
|
92a7be4
to
7974894
Compare
da4c9b5
to
edb2843
Compare
323b4cd
to
95e6ac1
Compare
fdb1430
to
843df89
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel a bit weird about abusing unsafe
in this way. Would be good to have a naming convention also or instead.
In the current code it's also really hard to tell which chained method call is the unsafe
one
… introduce timestamped streams Big PR. First big change is we introduce a `Timestamped` location. This is a bit of a hybrid between top-level locations and `Tick` locations. The idea is that you choose where timestamps are generated, and then have a guarantee that everything after that will be atomically computed (useful for making sure we add payloads to the log before ack-ing). The contract is that an operator or module that takes a `Timestamped` input must still be deterministic regardless of the stamps on messages (which are hidden unless you `tick_batch`). But unlike a top-level stream (which has the same constraints), you have the atomicity guarantee. Right now the guarantee is trivial since we have one global tick for everything. But in the future when we want to apply @davidchuyaya's optimizations this will be helpful to know when there are causal dependencies on when data can be sent to others. Second change is we mark every non-deterministic operator (modulo explicit annotations such as `NoOrder`) with Rust's `unsafe` keyword. This makes it super clear where non-determinism is taking place. I've used this to put `unsafe` blocks throughout our example code and add `SAFETY` annotations that argue why the non-determinism is safe (or point out that we've explicitly documented / expect non-determinism). I also added `#![warn(unsafe_op_in_unsafe_fn)]` to the examples and the template, since this forces good hygiene of annotating sources of non-determinism even inside a module that is intentionally non-deterministic. Paxos changes are mostly refactors, and I verified that the performance is the same as before.
Yeah, went back and forth a bunch on using |
let out = self | ||
.l | ||
.spin() | ||
.flat_map(q!(move |_| 0..batch_size)) | ||
.flat_map_ordered(q!(move |_| 0..batch_size)) | ||
.map(q!(|_| ())) | ||
.tick_batch(self) | ||
.timestamped(self); | ||
|
||
unsafe { | ||
// SAFETY: at runtime, `spin` produces a single value per tick, | ||
// so each batch is guaranteed to be the same size. | ||
out.tick_batch() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now can just leave the rest of the unsafe blocks as is, but would be good as a convention to only wrap the unsafe method calls in unsafe
, otherwise the whole call chain goes into unsafe
and it doesn't distinguish anything (have to check each method called/memorize to see if that is the unsafe one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think I will try to write up a style guide soon on what to wrap in unsafe
. Sometimes it's nice to include additional API calls if it is a "temporary"/"local" unsafety rather than one which propagates out the function call.
Big PR.
First big change is we introduce a
Timestamped
location. This is a bit of a hybrid between top-level locations andTick
locations. The idea is that you choose where timestamps are generated, and then have a guarantee that everything after that will be atomically computed (useful for making sure we add payloads to the log before ack-ing).The contract is that an operator or module that takes a
Timestamped
input must still be deterministic regardless of the stamps on messages (which are hidden unless youtick_batch
). But unlike a top-level stream (which has the same constraints), you have the atomicity guarantee. Right now the guarantee is trivial since we have one global tick for everything. But in the future when we want to apply @davidchuyaya's optimizations this will be helpful to know when there are causal dependencies on when data can be sent to others.Second change is we mark every non-deterministic operator (modulo explicit annotations such as
NoOrder
) with Rust'sunsafe
keyword. This makes it super clear where non-determinism is taking place.I've used this to put
unsafe
blocks throughout our example code and addSAFETY
annotations that argue why the non-determinism is safe (or point out that we've explicitly documented / expect non-determinism). I also added#![warn(unsafe_op_in_unsafe_fn)]
to the examples and the template, since this forces good hygiene of annotating sources of non-determinism even inside a module that is intentionally non-deterministic.Paxos changes are mostly refactors, and I verified that the performance is the same as before.