Replies: 7 comments 2 replies
-
Raft does not include checksumming. A way to store checksums with keys and values in Khepri would be nice but this needs to be investigated first => moving this to a discussion. |
Beta Was this translation helpful? Give feedback.
-
Yeah, I meant there's a degree of protection initially thanks to replication. |
Beta Was this translation helpful? Give feedback.
-
What kind of corruptions would you like Khepri to detect/handle? Like, problems with disk storage, problems with memory hardware, bugs in Erlang/Raft/Khepri, etc. |
Beta Was this translation helpful? Give feedback.
-
Ideally, all kinds :) |
Beta Was this translation helpful? Give feedback.
-
There's also question of scope - I understand you built it to store (small) metadata in Rabbitmq, but if it could store data reliably, it could become the recommended general purpose db in the BEAM ecosystem. The more data you store, the higher is the standard. |
Beta Was this translation helpful? Give feedback.
-
I personally haven't encountered it. I just thought that it would be a cool feature. But if Ra has it already - that's great! |
Beta Was this translation helpful? Give feedback.
-
Yeah, I meant Raft the protocol/paper, Ra does offer some checksum features and we have seen their effects in practice with RabbitMQ. That should be enough for more than 90%. For others, it may be a good idea to checksum and verify the values stored in Khepri. What is not obvious to me right now is what can be done (besides relying on Ra, that is) to check sum the keys as well. Note that fetching a single value by key does not need this but when you get a list of keys back, you cannot be certain that exactly the requested key was loaded. Corruptions of keys specifically is a very rare scenario but some storage systems, IIRC Cassandra, care about key integrity as much as about that of values. |
Beta Was this translation helpful? Give feedback.
-
Why
I think this is a fantastic library and a game changer for the BEAM world!
In order to get more trust in mission-critical environments I think it would be cool for it to have mechanism to protect itself from silent data corruptions and ability to self-heal automatically.
I understand Raft provides some degree of protection, but does it prevent it completely ? I don't think so
How
Perhaps, it should use check sums ? Periodic checks ? Verify check sum on every read ?
Obviously such feature will have performance penalty and should be optional.
Beta Was this translation helpful? Give feedback.
All reactions