Provide access to the raw byte data for a JsonElement for efficient transcoding of simple, custom types without allocating an intermediate string. #42839

mwadams · 2020-09-29T08:52:55Z

Background and Motivation

There are a large number of datatypes that need to be implemented as formats of the string type. The canonical example is time handling, but there are many others. For instance, we choose to model our .NET Date/Time types using the NodaTime entities, to better map to the intent of JSON schema. (See this issue for details.)

On the writing side this is straightforward, as we can use Utf8JsonWriter to encode and write the values efficiently.

On the reading side, we can do this efficiently if we drop to Utf8JsonReader; otherwise we have to allocate dotnet strings and then parse them into the target types.

Dropping to Utf8JsonReader means we then we have to do all of the heavy lifting, all the time, when the JsonDocument provides us with a perfectly efficient parse over the higher-level structure and efficient access to properties.

We would like to be able to access to the raw bytes that back the JsonElement, when necessary to efficiently decode an instance of a type like this.

JsonDocument has parsed the data using its internal db/row structure, and decoding of values internally is handled via GetRawValue() which returns a ReadOnlyMemory<byte> based on the given start index, and searching for an end-index.

JsonElement offers GetRawText() which uses JsonDocument.GetRawValue() and then transcodes it to a string.

We would like JsonElement.GetRawValue() which simply eliminates the transcoding to a string.

The downside of this is that people could potentially misuse the content - it has not been unescaped/validated for example - but it would enable a lot of customization scenarios without throwing out all the good (efficient!) work JsonDocument has already done for us

Proposed API

public readonly partial struct JsonElement
{
+       /// <summary>
+       ///   Gets the original input data backing this value, returning it as a <see cref="ReadOnlyMemory{T}"/> of <see cref="byte"/> representing the UTF8 encoded text.
+       /// </summary>
+      /// <returns>
+      ///   The original input data backing this value, returning it as a <see cref="ReadOnlyMemory{T}"/> of <see cref="byte"/>.
+      /// </returns>
+      /// <exception cref="ObjectDisposedException">
+      ///   The parent <see cref="JsonDocument"/> has been disposed.
+      /// </exception>
+      /// <remarks>
+      ///  This provides the raw, escaped, UTF8 encoded text. You should consider using <see cref="GetRawText()"/> if you
+      ///  require an unescaped <see cref="string"/> value.
+      /// </remarks>
+      public ReadOnlyMemory<byte> GetRawValue()
+      {
+          CheckValidInstance();
+
+          return _parent.GetRawValue(_idx);
+      }

Usage Examples

// Example of parsing a value which is structured like "#######-#######".
ReadOnlyMemory<byte> value = myElement.GetRawValue();
int separatorIndex = FindSeparator(Utf8Hyphen, value);
if (Utf8Parser.TryParse(value.Slice(0,separatorIndex).Span, out long firstLong, out int consumedFirst) && consumedFirst == separatorIndex)
{
    if (Utf8Parser.TryParse(value.Slice(separatorIndex + 1).Span, out long secondLong, out int consumedSecond) && consumedSecond == value.Length - (separatorIndex + 1))
    {
        result = new MyPairOfLongs(firstLong, secondLong);
        return true;
    }
}
result = MyPairOfLongs.Empty;
return false;

Alternative Designs

You can always fall back to Utf8JsonReader for this, but that entails dealing with the ValueSequence/ValueSpan and managing the buffers. JsonDocument has already done that for you with this approach.

You could also use WriteTo(Span<byte>) - but that would require you to allocate (and potentially recycle, although that is not guaranteed) a target buffer. Whether or not you need to allocate, it also means making additional copies of data, which is never good for performance

Risks

The chief risk is that people try to use this low-level API without understanding how the underlying UTF8 byte stream actually works. Ensuring that people are pointed at System.Buffers.Text.Utf8Parser should help mitigate this.

The text was updated successfully, but these errors were encountered:

mwadams · 2020-09-29T10:33:10Z

It is also possible that people could try to hang on to the ReadOnlyMemory<byte> after the underlying document is disposed (just as they could the JsonElement itself).

mwadams · 2020-09-29T12:30:05Z

If a synthetic JsonElement were to be created in a future implementation, which was not backed by a JsonDocument (and corresponding slice of memory), then this would become an 'expensive' method (it would need to allocate a buffer and write the text into it) - although, of course, this would still be no worse than the existing GetRawText() method - there would be no "raw text" to get.

However, I don't believe the primary use case for this (mapping to a specific dotnet type from a string-like value) is likely to be using these synthetic types - it is optimising for reading from source. YMMV.

mwadams · 2020-10-01T10:13:09Z

It is also possible that people could try to hang on to the ReadOnlyMemory<byte> after the underlying document is disposed (just as they could the JsonElement itself).

@idg10 has suggested flipping the API to pass a callback which takes a ReadOnlySpan<byte>.

That's definitely worth considering. It would involve allocating a delegate per call, and would also make the API somewhat more complex for what I consider to be a negligible benefit given that we already have the lifetime consideration for the JsonElement itself.

In implementation, I also considered adding the method to JsonProperty (for symmetry) but that does not support this specific use case, so I think that should be a separate change with a separate justification.

eiriktsarpalis · 2021-10-20T19:24:09Z

Quoting from @bartonjs in an older issue:

That's something we're explicitly keeping out of the API. JsonDocument and JsonElement can apply over UTF-8 or UTF-16 data, exposing the span removes that abstraction.

Related to #54410, we're planning on implementing this for .NET 7.

mwadams added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Sep 29, 2020

Dotnet-GitSync-Bot added area-System.Text.Json untriaged New issue has not been triaged by the area owner labels Sep 29, 2020

layomia removed the untriaged New issue has not been triaged by the area owner label Sep 30, 2020

layomia added this to the Future milestone Sep 30, 2020

mwadams mentioned this issue Oct 1, 2020

Issue 42839 #42945

Closed

eiriktsarpalis closed this as completed Oct 20, 2021

ghost locked as resolved and limited conversation to collaborators Nov 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide access to the raw byte data for a JsonElement for efficient transcoding of simple, custom types without allocating an intermediate string. #42839

Provide access to the raw byte data for a JsonElement for efficient transcoding of simple, custom types without allocating an intermediate string. #42839

mwadams commented Sep 29, 2020 •

edited

Loading

mwadams commented Sep 29, 2020

mwadams commented Sep 29, 2020 •

edited

Loading

mwadams commented Oct 1, 2020 •

edited

Loading

eiriktsarpalis commented Oct 20, 2021

Provide access to the raw byte data for a JsonElement for efficient transcoding of simple, custom types without allocating an intermediate string. #42839

Provide access to the raw byte data for a JsonElement for efficient transcoding of simple, custom types without allocating an intermediate string. #42839

Comments

mwadams commented Sep 29, 2020 • edited Loading

Background and Motivation

Proposed API

Usage Examples

Alternative Designs

Risks

mwadams commented Sep 29, 2020

mwadams commented Sep 29, 2020 • edited Loading

mwadams commented Oct 1, 2020 • edited Loading

eiriktsarpalis commented Oct 20, 2021

mwadams commented Sep 29, 2020 •

edited

Loading

mwadams commented Sep 29, 2020 •

edited

Loading

mwadams commented Oct 1, 2020 •

edited

Loading