Persistence #14

chimp1984 · 2021-05-21T18:42:23Z

chimp1984
May 21, 2021
Maintainer

As I need rather soon some persistence solution for the P2P data storage I started to look into different data base approaches.

Use cases and requirements

Public data from P2P network
Private application data
Data is small and kept in memory
Read only at startup
No query
Write at data changes
Backups

P2P network data

We have usually quite low amount of data to persist. Most data is in the low MB range or even lower. Only expection is the DAO state with the BSQ blockchain. For that (if it will be integrated to Misq) I consider to break down the blocks to separate files which are very small then as well. Beside that we could treat it as a special case and maybe a data base approach is justified here.

We usually read data only at startup and write to disk at changes so that we get the latest state at next startup. P2P network data is public data and can be reconstructed if it becomes corrupted.

We do keep all data in memory (main use case is to check if the key for a new data is already in the map to decide if we should relay the message or not), so access is very fast and we want to avoid to have duplicated maps (e.g. add it to the DB map and the in memory map).

Databases

I think a document based or key-value based DB in contrast to a relational DB would fit our use case better. To use a relational DB would come with the requirement to build a duplicated schema (additional to protobuf) and I think thats quite some costs reagarding maintaince, backward compatibility and flexibility and I dont see the benefits for our use cases.

I tried out LevelDB, MapDB, Objectbox and Lmdb Here my quick feedback:

LevelDB:

Used in Bitocin core and IPFS. Develped by Google.
Super fast, super simple API but added duplicated keys let the DB grow (though it overwrites the earlier entry). Have not looked deeper why that is the case. The key/values are byte arrays so it would not serve as in-memory map but would require to use it in parallel (e.g. add/remove on both maps) as we do not want to deserialized the value each time we have to access the data.

MapDB:

Similar like LevelDB, seems pretty fast and lightweight, has more flexibility for serializers for key/value but only supports basic data type serializers, so to keep an java object graph as value does not work as well, so leading to the same issue that we need to maintain 2 data sinks.

Lmdb:

Similar like above, very fast, much more feature rich like LevelDB, but an overshoot for what we need.

Objectbox:

They have an annotation based model with code generation which looked interesting but I failed to get it running as the required classed have not been generated. It is focussed on use case for Android and I assume my missing Android setup caused the problems.

Conclusion:

So my current conclusion is to stick with the file based persistence but use different strategies for different types of data.

We have the public data which are only stored for caching reasons to avoid repeated downloads. We can store with low frequencey and data corruption will not cause critical problems. Thats the use case for the p2p network data storage.

As it seems we will stick with protobuf we should use protobuf serialisation to avoid requireing a duplicate schema. As its network data its untrusted and Java serialisation must not be used due its vulnerabilities.

Private application data

Examples of private application based data in Bisq:

Trades
Disputes
Preferences
User (accounts)
UI-states
DAO: My votes, my proposals,...

All those data are from trusted user generated java objects so the security risks from Java serialisation do not apply here.

If we decide to use Java serialisation for those type of data we avoid the need for protobuf definitions beside that data which will be used in messages to the peer like account data for trades. Though its not clear yet if we want to keep those complex object graphs in that area. Using a more flexible format like json might be worth to consider also as we want to be more flexible with custom user defined payment methods.

I think its a small subset of the data which overlaps with network use cases and those need to be serialized with protobuf for security reasons.

In some performance tests I saw that protobuf is about 5 times faster than Java serialisation and results in about 20% smaller data. But as change of those data is triggered by user activity and therefore will not happen very frequently (opposed to network data which can get updates very frequent) we do not have the user case that we have to write those data 100 time a second. So I think those performance differences will not be relevant. Data size similar as those data is all rather small.
All is kept in memory as well and there is no query use case.

So again I do not see the use case for a database. Dealing with data corruption in case of failed writes might be the only justified reason. As those data can be critical, we have to ensure data consistency.

We don't need to decide on that until we work in it, but just wanted to share my preliminary thoughs...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistence #14

{{title}}

Replies: 0 comments

Select a reply

Persistence #14

chimp1984 May 21, 2021 Maintainer

Use cases and requirements

P2P network data

Databases

LevelDB:

MapDB:

Lmdb:

Objectbox:

Conclusion:

Private application data

Replies: 0 comments

chimp1984
May 21, 2021
Maintainer