How do we implement deserialization on network bytestreams? #2

ILikePizza555 · 2024-04-29T03:55:04Z

ILikePizza555
Apr 29, 2024
Maintainer

First architectural decision I need to make is "How do we even implement deseralization?". This writing is going to be a "stream of consciousness".

Cap'n Proto's unit of communication is a message. Messages are sequences of binary blobs called "segments", and it's up to the implementers to decide on a framing for segments, but there is a recommended framing for bytestreams.

The question is how do we pull those sequences from whatever source we have (let's just say it's a network bytestream for now)?

In the C++ implementation, it seems like a new MessageReader is instantiated with a reference to the bytestream for each message that is read. Then the message bytes are read lazily, as more segments are requested. If we went down this path for the C# implementation, we'd face two issues:

The creation of a heap-allocated object for each message would lead to a lot of GC pressure.
MessageReader would have to be synchronized in some way (with a Mutex), or locked to a single thread. Effectively only a single thread will be able to read a message at a time.

Maybe problem 1 could be mitigated by having the MessageReader take ownership of the stream, and some clever use of value types of ref structs, but I'm not certain.

Another approach is to read the entirety of a message into a buffer and then pass that slices of that buffer to the deserializer. We could avoid GC pressure by using a MemoryPool, which is designed for this usecase. Each message can own their piece of the memory pool using the Memory<T> object without requiring a copy.

The biggest downside here, of course, is that we waste memory and time in some use cases (i.e. the application on reads a single field which is defined early in the message). But the tradeoff here is that we don't need to synchronize MessageReader in a multithreaded context.

I guess another good question to answer is if I should worry about multithreading scenarios at all. Does the RPC specification require messages to arrive in a particular order? I don't think so but it would be good to get confirmation on this.

octylFractal · 2024-04-29T05:38:25Z

octylFractal
Apr 29, 2024

Does the RPC specification require messages to arrive in a particular order?

I think the best answer to this is not particularly, but messages are always ordered -- that is, you can't really multithread reading from the stream. So I think the best design to start with is to allow a single thread to pass in a stream or buffer, and if we want multithreading specifically for making structures out of the data then the API user should have a thread reading from a stream and making buffer for each message out of it, and then sending those buffers out to a pool of threads to be decoded.

1 reply

aiverson Apr 29, 2024
Maintainer

Remember that we want to relax the ordering requirement as we switch to datagram delivery and implement the causal DAG.

ILikePizza555 · 2024-04-29T16:08:17Z

ILikePizza555
Apr 29, 2024
Maintainer Author

Talked with Open about this, we want to support multithreading, so we'll be going with the memory pool approach for networked bytestreams.

1 reply

aiverson Apr 29, 2024
Maintainer

Just remember to ensure that message size limits work, read amplification limits work, and back pressure works.

ILikePizza555 · 2024-04-29T23:52:14Z

ILikePizza555
Apr 29, 2024
Maintainer Author

Some findings I wanted to jot down in some form of writing:

Pipelines

One thing I stumbled upon that I haven't considered is System.IO.Pipelines, which is what System.Buffers was made to support.

I am unsure if I want to go all-in on Pipelines though, since it looks like we can only read slices of the buffer at a time before advancing, there's no way to obtain ownership of a segment of the buffer without copying, and I'd like to get as close to zero-copy as I can. Still, it could be a useful reference.

Pooling types

Another point to consider is whether to use ArrayPool<T> over MemoryPool<T>. From my research it seems like ArrayPool<T> makes fewer higher-gen GC allocations, but I'd need to benchmark.

A high-performance parsing library encountered problems with extra allocations on MemoryPool<T>: https://endjin.com/blog/2020/09/arraypool-vs-memorypool-minimizing-allocations-ais-dotnet

We may even want to write a custom implementation of MemoryPool<T> to support our needs. I found an old discussion on the dotnet issue tracker about the implementation of System.IO.Pipelines:
dotnet/runtime#27748

2 replies

ILikePizza555 Apr 30, 2024
Maintainer Author

I also found this nuget package which may come in handy later: https://www.nuget.org/packages/CommunityToolkit.HighPerformance

ILikePizza555 Apr 30, 2024
Maintainer Author

I think the Pipe implementation might be a useful reference for highly performance I/O in .NET, but it falls short in a few areas:

Reading out of a Pipe pretty necessitates a copy. We want to rent out ownership of the buffer.
Pipe assumes that you'll read entire segments of the stream at a time in order to search for a delimiter, but the bytestream framing of Cap n' Proto means that we can know how large a message is, and read from the stream such that subsequent reads will always align on a message boundary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do we implement deserialization on network bytestreams? #2

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

How do we implement deserialization on network bytestreams? #2

ILikePizza555 Apr 29, 2024 Maintainer

Replies: 3 comments · 4 replies

octylFractal Apr 29, 2024

aiverson Apr 29, 2024 Maintainer

ILikePizza555 Apr 29, 2024 Maintainer Author

aiverson Apr 29, 2024 Maintainer

ILikePizza555 Apr 29, 2024 Maintainer Author

Pipelines

Pooling types

ILikePizza555 Apr 30, 2024 Maintainer Author

ILikePizza555 Apr 30, 2024 Maintainer Author

ILikePizza555
Apr 29, 2024
Maintainer

Replies: 3 comments 4 replies

octylFractal
Apr 29, 2024

aiverson Apr 29, 2024
Maintainer

ILikePizza555
Apr 29, 2024
Maintainer Author

aiverson Apr 29, 2024
Maintainer

ILikePizza555
Apr 29, 2024
Maintainer Author

ILikePizza555 Apr 30, 2024
Maintainer Author

ILikePizza555 Apr 30, 2024
Maintainer Author