Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conflict detection and rebase #374

Open
paraseba opened this issue Nov 5, 2024 · 0 comments · May be fixed by #420
Open

Conflict detection and rebase #374

paraseba opened this issue Nov 5, 2024 · 0 comments · May be fixed by #420

Comments

@paraseba
Copy link
Contributor

paraseba commented Nov 5, 2024

Design document

Goal

Goal here is to describe and finish designing the mechanism to "rebase" changes.

When user A tries to commit a change, currently the commit will fail if user
B committed since A's session started. This is the best and safest
default, but it's not necessarily what A wants every time. For example
maybe A wrote to array /array_a and B wrote to /array_b and those
changes are unrelated. In a case like that, A may decide to still
do the commit, accepting the risks if they know exactly what B changes were.

A rebase is then, the process of "merging" a change, potentially modifying it,
on top of other pre-existing changes.

We want to provide:

  • A mechanism for users to execute a rebase after a failed commit.
  • Users can define what changes are OK to rebase and which are not, and how
    their changes must be modified for a clean rebase. Example: if user wrote
    to an array but a previous commit deleted that array, the user may
    indicate to either fail their commit, or to simply rebase ignoring any
    writes to the array.
  • If a rebase fails we need to explain why.

Transaction logs

As part of this change we will introduce the concept of TransactionLog.
These are files we will store on-disk, in their own prefix, and with the same
id as the corresponding snapshot. The transaction log contains a serialization,
somewhat expanded, of the ChangeSet.

They provide at least two utilities:

  • An easy way to know what the conflicting commits changed, to be able to
    execute rebases without having to compare snapshots (it would be very
    expensive).
  • In the future, an easy way to provide diff functionality.

Transaction logs will be generated from the ChangeSet (and probably a bit
of extra information, like the list of existing nodes), and they will be
written during the commit process.

Transaction logs can be made optional. For ultimate performance users may choose
not to use them, but in that case, they'll be giving up on rebase and diff
functionality.

Conflict resolution

In the most detailed case, conflict resolution could be done interactively.
Users may want to investigate their own change, together with the diffs of
the conflicting changes, and decide with full detail how to modify their
change for the rebase. This sounds like a very advanced usage, and we don't need
to support it initially. We just need to make sure it is possible in the future.

In the simpler case, the user will run rebase after a commit failed with
conflict. They will call a rebase function, passing a ConflictSolver
that includes the policy on how to deal with different types of conflicts.

Some conflict resolution examples

  • If two changes write to the same chunk, user can select ours or theirs
  • If writes happen to an entity deleted in a previous change, we may
    support: ignore write or fail the rebase
  • TODO: more

Exhaustive list of conflicts and resolutions

This is WIP

  • When a previous change deleted an array:

    • if chunks were written to it: recoverable by not applying the change
    • if user attributes were set: recoverable by not applying the change
    • if metadata was changed: recoverable by not applying the change
  • When a previous change deleted a group:

    • if user attributes were set on it: recoverable by not applying the change
    • if a new array is created inside of it: recoverable by re creating the
      implicit group
  • When a previous change creates an array

    • if a node is created on the same path: recoverable by not applying the change
    • if an implicit group is created on the same path: recoverable by not
      applying the change
  • When a previous change creates a group

    • if a node is created on the same path, except if it's implicit
  • When a previous change updates user attributes

    • if the same node attributes are also updated
    • if the same node attributes are also updated
  • When a previous change updates zarr metadata

  • When a previous change writes/deletes a chunk

@paraseba paraseba self-assigned this Nov 5, 2024
@mpiannucci mpiannucci linked a pull request Nov 29, 2024 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant