Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple creations of the same data across a partitioned network results in repeat results in query APIs #394

Open
pospi opened this issue Jul 18, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@pospi
Copy link
Member

pospi commented Jul 18, 2023

I think this is partially an issue with anchored_record_helpers.rs, but it's only possible to de-duplicate this optimistically for now. (further work needed on reads to resolve duplicate writes of the same data in re-synced network partitions).

@pospi pospi added the bug Something isn't working label Jul 18, 2023
@pospi pospi self-assigned this Jul 18, 2023
@pospi pospi closed this as completed in d09dfe9 Jul 18, 2023
@pospi pospi changed the title Multiple creations of the same Unit result in repeat results in query.units Multiple creations of the same data across a partitioned network results in repeat results in query APIs Jul 18, 2023
@pospi
Copy link
Member Author

pospi commented Jul 18, 2023

This issue mostly only affects Unit records, since the majority of other record types are considered unique in the moment they're created via the addition of some randomBytes() in an internal nonce field.

Now resolved for writes- a sychronised network will no longer unnecessarily add create Action headers. Reopening as an issue to resolve generically for reads of "idempotently unique" data (ie. data with a manually defined retrieval key) which has been duplicated as a result of a network partition or partial sync.

Aside from behaviour in hdk_records/src/anchored_record_helpers.rs this also affects indexing retrieval logic in hdk_semantic_indexes and hdk_time_indexing. At minimum, any feature which depends on link_if_not_linked in the write phase to ensure uniqueness, also must be able to de-duplicate accidental repeat writes in its associated read phase.

There may be other elements to consider in a complete solution, and platform features such as 'bucketing' that could be leveraged to 'compact' data in this way in future versions of the Holochain hdi & hdk libs.

@pospi pospi reopened this Jul 18, 2023
@pospi pospi removed their assignment Jul 18, 2023
@pospi pospi added this to the Holochain core stabilising milestone Jul 27, 2023
@pospi
Copy link
Member Author

pospi commented Jul 27, 2023

I have fixed issues with duplicate Unit record writes in an unpartitioned network in 589aeff. A single agent can no longer cause duplicate entries in the Units read API response by repeatedly writing the same record. This would have actually affected any content-addressable data written into a time index, but Unit is the only record type that operates this way (others have a nonce injected to force them to be unique, so recreating records will result in different hashes).

This does not resolve the issue for writes under partitioned network conditions, which still persists as above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant