How do we implement deserialization on network bytestreams? #2
Replies: 3 comments 4 replies
-
I think the best answer to this is not particularly, but messages are always ordered -- that is, you can't really multithread reading from the stream. So I think the best design to start with is to allow a single thread to pass in a stream or buffer, and if we want multithreading specifically for making structures out of the data then the API user should have a thread reading from a stream and making buffer for each message out of it, and then sending those buffers out to a pool of threads to be decoded. |
Beta Was this translation helpful? Give feedback.
-
Talked with Open about this, we want to support multithreading, so we'll be going with the memory pool approach for networked bytestreams. |
Beta Was this translation helpful? Give feedback.
-
Some findings I wanted to jot down in some form of writing: PipelinesOne thing I stumbled upon that I haven't considered is System.IO.Pipelines, which is what I am unsure if I want to go all-in on Pipelines though, since it looks like we can only read slices of the buffer at a time before advancing, there's no way to obtain ownership of a segment of the buffer without copying, and I'd like to get as close to zero-copy as I can. Still, it could be a useful reference. Pooling typesAnother point to consider is whether to use A high-performance parsing library encountered problems with extra allocations on We may even want to write a custom implementation of |
Beta Was this translation helpful? Give feedback.
-
First architectural decision I need to make is "How do we even implement deseralization?". This writing is going to be a "stream of consciousness".
Cap'n Proto's unit of communication is a message. Messages are sequences of binary blobs called "segments", and it's up to the implementers to decide on a framing for segments, but there is a recommended framing for bytestreams.
The question is how do we pull those sequences from whatever source we have (let's just say it's a network bytestream for now)?
In the C++ implementation, it seems like a new
MessageReader
is instantiated with a reference to the bytestream for each message that is read. Then the message bytes are read lazily, as more segments are requested. If we went down this path for the C# implementation, we'd face two issues:MessageReader
would have to be synchronized in some way (with a Mutex), or locked to a single thread. Effectively only a single thread will be able to read a message at a time.Maybe problem 1 could be mitigated by having the
MessageReader
take ownership of the stream, and some clever use of value types of ref structs, but I'm not certain.Another approach is to read the entirety of a message into a buffer and then pass that slices of that buffer to the deserializer. We could avoid GC pressure by using a
MemoryPool
, which is designed for this usecase. Each message can own their piece of the memory pool using theMemory<T>
object without requiring a copy.The biggest downside here, of course, is that we waste memory and time in some use cases (i.e. the application on reads a single field which is defined early in the message). But the tradeoff here is that we don't need to synchronize
MessageReader
in a multithreaded context.I guess another good question to answer is if I should worry about multithreading scenarios at all. Does the RPC specification require messages to arrive in a particular order? I don't think so but it would be good to get confirmation on this.
Beta Was this translation helpful? Give feedback.
All reactions