Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Creating a new PR for semiring_application #1499

Open
wants to merge 73 commits into
base: semiring_application_2
Choose a base branch
from

Conversation

chaeyeonlee07
Copy link
Collaborator

Semiring_application branch had a lot of lint errors.

shadaj and others added 2 commits October 14, 2024 13:40
The latest Rust Sitter drops the dependency on `tree-sitter-cli`, which
eliminates many transitive dependencies.
Copy link

cloudflare-workers-and-pages bot commented Oct 22, 2024

Deploying hydroflow with  Cloudflare Pages  Cloudflare Pages

Latest commit: 0dc709e
Status: ✅  Deploy successful!
Preview URL: https://e330af08.hydroflow.pages.dev

View logs

@jhellerstein jhellerstein changed the title Creating a new PR for semiring_application feat: Creating a new PR for semiring_application Oct 22, 2024
a couple few spurious `too_many_arguments` and a spurious
`zombie_processes` still on current nightly (`clippy 0.1.84 (4392847410
2024-10-21)`)
jhellerstein and others added 22 commits October 28, 2024 17:34
Generalized Hash Tries are part of the SIGMOD '23 FreeJoin
[paper](https://dl.acm.org/doi/abs/10.1145/3589295) by
Wang/Willsey/Suciu. They provide a compressed ("factorized")
representation of relations. By operating in the factorized domain, join
algorithms can defer cross-products and achieve asymptotically optimal
performance.

---------

Co-authored-by: Mingwei Samuel <[email protected]>
Co-authored-by: Andre Giron <[email protected]>
…module (#1485)

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/hydro-project/hydroflow/pull/1485).
* #1493
* #1492
* #1489
* #1488
* #1487
* #1486
* __->__ #1485
changed --server-addr flag to --address

Co-authored-by: Mingwei Samuel <[email protected]>
git push --set-upstream origin clean

Co-authored-by: “Dee <[email protected]>
…type parameter (#1519)

Now, when the location is a top-level `Process` or `Cluster` that
corresponds to a `NoTick`, and for streams inside a tick we wrap the
location type (e.g. `Tick<Process<...>>`). This simplifies type
signatures for a lot of our example code.
…s in a tick, fix #1518 (#1520)

Adds the minimal reproducer test from @shadaj

Note this may have negative performance implications, as the singleton value now is stored in the state API (heap) instead of locally. If we use singleton syntax this duplicate allocation could probably be avoided.

> Confirmed that this fixed the bugs in our Paxos implementation, no noticeable performance impact. @shadj
shadaj and others added 30 commits November 8, 2024 09:32
This also changes the behavior of `with_default_optimize` to be
terminal, if users want to apply optimizations after these they should
explicitly invoke the optimizations.
…flow_datalog v0.10.0, hydroflow_deploy_integration v0.10.0, hydroflow_macro v0.10.0, lattices_macro v0.5.7, variadics v0.0.7, variadics_macro v0.5.5, lattices v0.5.8, multiplatform_test v0.3.0, pusherator v0.0.9, hydroflow v0.10.0, hydro_deploy v0.10.0, stageleft_macro v0.4.0, stageleft v0.5.0, stageleft_tool v0.4.0, hydroflow_plus v0.10.0, hydro_cli v0.10.0, safety bump 8 crates

SAFETY BUMP: hydroflow_datalog_core v0.10.0, hydroflow_datalog v0.10.0, hydroflow_macro v0.10.0, hydroflow v0.10.0, hydro_deploy v0.10.0, stageleft v0.5.0, hydroflow_plus v0.10.0, hydro_cli v0.10.0
Main breaking change is MDX parsing, which trips up on unescaped `<` in
the generated docs, so we have to adjust the generator logic.
Some rephrasing, extra details, after working thru with Rohit.
…riant (#1559)

Our lifetimes were accidentally made covariant when the lifetime `'a`
was removed from the process/cluster tag type. This fixes that typing
hole, and also loosens some restrictions on the lifetime of deploy
environments.
Reduces namespace pollution when wildcard-importing `hydroflow_plus`.
…e bounded (#1565)

Returning a `Stream` from `union` on unbounded streams was unsound,
since the order of outputs is not deterministic.
Also fixes compiler crashes when using `.enumerate()` on an un-batched
stream.
- Remove TODO.md (contents filed as an issue, #1538)
- Update stack diagram
- Update discussion of stack diagram
- Change `README.md` to reflect the project as a whole, not just
hydroflow.
- Copy Rust getting-started info from Hydroflow docs to HF+ docs
We only need it for examples, so this reduces compilation burden for
`trybuild`.
Significantly improves the appearance of Hydroflow+ logs when the
terminal causes wrapping.
I've decided to leave some more challenging TODOs as comments for now,
just so we can get the live site back into decent shape.

Also changes the "Get Started" button on the landing to point to the HF+
quickstart.
Docusaurus v3 broke Mermaid in the playground due to an API change.
…trengthen determinism guarantees (#1568)

Previously, sending data from a `Cluster` would return a stream assumed
to have deterministic contents **and** ordering, which is false. This
introduces another type parameter for `Stream` which tracks whether
element ordering is expected to be deterministic, and restricts
operators such as `fold` and `reduce` to commutative aggregations
accordingly.
…ced as a global constant (#1574)

This eliminates the need to store `cluster.self_id()` in a local
variable first, instead you can directly reference `CLUSTER_SELF_ID`.
… introduce timestamped streams (#1584)

Big PR.

First big change is we introduce a `Timestamped` location. This is a bit
of a hybrid between top-level locations and `Tick` locations. The idea
is that you choose where timestamps are generated, and then have a
guarantee that everything after that will be atomically computed (useful
for making sure we add payloads to the log before ack-ing).

The contract is that an operator or module that takes a `Timestamped`
input must still be deterministic regardless of the stamps on messages
(which are hidden unless you `tick_batch`). But unlike a top-level
stream (which has the same constraints), you have the atomicity
guarantee. Right now the guarantee is trivial since we have one global
tick for everything. But in the future when we want to apply
@davidchuyaya's optimizations this will be helpful to know when there
are causal dependencies on when data can be sent to others.

Second change is we mark every non-deterministic operator (modulo
explicit annotations such as `NoOrder`) with Rust's `unsafe` keyword.
This makes it super clear where non-determinism is taking place.

I've used this to put `unsafe` blocks throughout our example code and
add `SAFETY` annotations that argue why the non-determinism is safe (or
point out that we've explicitly documented / expect non-determinism). I
also added `#![warn(unsafe_op_in_unsafe_fn)]` to the examples and the
template, since this forces good hygiene of annotating sources of
non-determinism even inside a module that is intentionally
non-deterministic.

Paxos changes are mostly refactors, and I verified that the performance
is the same as before.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants