Conflict detection and rebase #374

paraseba · 2024-11-05T00:35:38Z

Design document

Goal

Goal here is to describe and finish designing the mechanism to "rebase" changes.

When user A tries to commit a change, currently the commit will fail if user
B committed since A's session started. This is the best and safest
default, but it's not necessarily what A wants every time. For example
maybe A wrote to array /array_a and B wrote to /array_b and those
changes are unrelated. In a case like that, A may decide to still
do the commit, accepting the risks if they know exactly what B changes were.

A rebase is then, the process of "merging" a change, potentially modifying it,
on top of other pre-existing changes.

We want to provide:

A mechanism for users to execute a rebase after a failed commit.
Users can define what changes are OK to rebase and which are not, and how
their changes must be modified for a clean rebase. Example: if user wrote
to an array but a previous commit deleted that array, the user may
indicate to either fail their commit, or to simply rebase ignoring any
writes to the array.
If a rebase fails we need to explain why.

Transaction logs

As part of this change we will introduce the concept of TransactionLog.
These are files we will store on-disk, in their own prefix, and with the same
id as the corresponding snapshot. The transaction log contains a serialization,
somewhat expanded, of the ChangeSet.

They provide at least two utilities:

An easy way to know what the conflicting commits changed, to be able to
execute rebases without having to compare snapshots (it would be very
expensive).
In the future, an easy way to provide diff functionality.

Transaction logs will be generated from the ChangeSet (and probably a bit
of extra information, like the list of existing nodes), and they will be
written during the commit process.

Transaction logs can be made optional. For ultimate performance users may choose
not to use them, but in that case, they'll be giving up on rebase and diff
functionality.

Conflict resolution

In the most detailed case, conflict resolution could be done interactively.
Users may want to investigate their own change, together with the diffs of
the conflicting changes, and decide with full detail how to modify their
change for the rebase. This sounds like a very advanced usage, and we don't need
to support it initially. We just need to make sure it is possible in the future.

In the simpler case, the user will run rebase after a commit failed with
conflict. They will call a rebase function, passing a ConflictSolver
that includes the policy on how to deal with different types of conflicts.

Some conflict resolution examples

If two changes write to the same chunk, user can select ours or theirs
If writes happen to an entity deleted in a previous change, we may
support: ignore write or fail the rebase
TODO: more

Exhaustive list of conflicts and resolutions

This is WIP

When a previous change deleted an array:
- if chunks were written to it: recoverable by not applying the change
- if user attributes were set: recoverable by not applying the change
- if metadata was changed: recoverable by not applying the change
When a previous change deleted a group:
- if user attributes were set on it: recoverable by not applying the change
- if a new array is created inside of it: recoverable by re creating the
  implicit group
When a previous change creates an array
- if a node is created on the same path: recoverable by not applying the change
- if an implicit group is created on the same path: recoverable by not
  applying the change
When a previous change creates a group
- if a node is created on the same path, except if it's implicit
When a previous change updates user attributes
- if the same node attributes are also updated
- if the same node attributes are also updated
When a previous change updates zarr metadata
When a previous change writes/deletes a chunk

The text was updated successfully, but these errors were encountered:

paraseba self-assigned this Nov 5, 2024

paraseba added the design doc 🧑‍🎨 label Nov 5, 2024

mpiannucci linked a pull request Nov 29, 2024 that will close this issue

Python learns to detect conflicts and rebase #420

Draft

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conflict detection and rebase #374

Conflict detection and rebase #374

paraseba commented Nov 5, 2024

Conflict detection and rebase #374

Conflict detection and rebase #374

Comments

paraseba commented Nov 5, 2024

Design document

Goal

Transaction logs

Conflict resolution

Some conflict resolution examples

Exhaustive list of conflicts and resolutions