Replies: 4 comments 4 replies
-
On a test set of ~2300 messages, the results are:
Also noting that with Brotli 11, it took about 10x longer to store messages, which caused a noticeable slowdown in tracking speed. With Brotli 5, the difference in speed was negligible, so that seems to be the way to go. Quality levels 6-9 did not improve on size compared to 5. Unfortunately there is no support for custom dictionaries yet, at least in the .NET APIs. That could be interesting to experiment with, since the message JSON format is fairly rigid. |
Beta Was this translation helpful? Give feedback.
-
As a side note, I might split the main |
Beta Was this translation helpful? Give feedback.
-
Does .NET support zstd? That's a pretty amazing compressor, and it's really fast. It compressed a 700MB file in about 1 second for me! (https://gist.github.com/TheTechRobo/091acf60f80d007557ad35821fdb3a6d) |
Beta Was this translation helpful? Give feedback.
-
I have experimented a bit with LiteDB, which is an embedded JSON database. I have concerns about its stability compared to SQLite, but it would be a more natural way of storing Discord's data. By default, LiteDB databases are quite large compared to SQLite. It also includes a lot of redundancy, so there are opportunities to reduce the size at the expense of time, and sometimes the ability to fully reconstruct the raw data, but I think it could be configurable. Most users probably don't need the raw raw data, which includes null values and empty arrays. There is also a lot of redundancy, which could be compressed using knowledge of Discord's JSON formats and common values. It would not allow for simple extraction of raw data using external tools, but the app could support an export to JSON which would perform the decompression. Tests on 25K messages:
Note: "No Empty Values and Author" strips the message author information which is usually duplicated across messages. It would be possible to store author information separately, possibly even with support for bots/webhooks that might customize author information per-message. The final size would be larger, but likely not much larger than the 50% savings. I have not tested any custom compression at the moment. |
Beta Was this translation helpful? Give feedback.
-
Opening an exploration of storing raw message data in the database. This could mean the eventual possibility of resolving some long-standing feature requests, related to data DHT does not currently store:
The obvious issue is that the database size would expand, possibly by an order of magnitude, so this would have to be an optional feature. .NET has native support for Brotli compression which should help reduce the size.
I will create a proof-of-concept, and determine what is the realistic expectation in terms of database size if this feature was enabled.
Beta Was this translation helpful? Give feedback.
All reactions