Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Utf8JsonTransformer #2239

Merged
merged 18 commits into from
Jun 15, 2024
Merged

add Utf8JsonTransformer #2239

merged 18 commits into from
Jun 15, 2024

Conversation

knaeckeKami
Copy link
Contributor

@knaeckeKami knaeckeKami commented Jun 8, 2024

New Pull Request Checklist

  • I have searched for a similar pull request in the project and found none
  • I have read the Documentation
  • I have updated this branch with the latest main branch to avoid conflicts (via merge from master or rebase)
  • I have added the required tests to prove the fix/feature I'm adding
  • I have updated the documentation (if necessary)
  • I have run the tests without failures
  • I have updated the CHANGELOG.md in the corresponding package

Additional context and info (if any)

fixes #2238

This PR adds the Utf8JsonTransformer.

I made sure that it is as close to BackgroundTransformer as possible.
The only behavior change is that is checks to Content-Length header for deciding whether to switch to a background isolate for decoding or not.
If the Content-Lenght header is not set or not valid, it will fall back to counting the bytes in the response.

This is because it's slightly faster to feed to the response stream directly into the decoder without consolidating all the response bytes into a single Uint8List first. In case the server does not send a valid Content-Length header, the Transformer will check the actual bytes from the response, as the BackgroundTransformer does.

The remaining question is if we should set the Transformer as default.
IMO it would make sense to change the default transformer to

  /// The default [Transformer] that transfers requests and responses
  /// into corresponding content to send.
  @override
  Transformer transformer = Utf8JsonTransformer(contentLengthIsolateThreshold: 50 * 1024);

so all users get the improved performance by default.

@knaeckeKami knaeckeKami requested a review from a team as a code owner June 8, 2024 15:43
@ueman
Copy link
Contributor

ueman commented Jun 8, 2024

https://github.com/knaeckeKami/dio_benchmark

dio/lib/src/transformer.dart Outdated Show resolved Hide resolved
dio/lib/src/transformers/utf8_json_transformer.dart Outdated Show resolved Hide resolved
dio/lib/src/transformers/utf8_json_transformer.dart Outdated Show resolved Hide resolved
import 'package:dio/src/compute/compute.dart';

/// A [Transformer] that has a fast path for decoding utf8-encoded JSON.
/// If the response is utf8-encoded JSON and no custom decoder for a Request is specified, this transformer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the Request referencing here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the RequestOptions. If a custom decoder is set in to RequestOptions, we respect that, but in this case we cannot use the fused decoder.
I changed the comments to clarify that.

dio/lib/src/transformers/utf8_json_transformer.dart Outdated Show resolved Hide resolved
/// By default, this transformer will transform responses in the main isolate,
/// but a custom threshold can be set to switch to an isolate for large responses by passing
/// [contentLengthIsolateThreshold].
class Utf8JsonTransformer extends Transformer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about FusedTransformer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine by me, changed the name

final shouldUseIsolate = !(contentLengthIsolateThreshold < 0) &&
contentLength >= contentLengthIsolateThreshold;
if (shouldUseIsolate) {
// we can't send the stream to the isolate, so we need to decode the response bytes first
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that lead to a memory issue compare to the previous transformer?

Copy link
Contributor Author

@knaeckeKami knaeckeKami Jun 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, the previous transformers always decode into a single Uint8List first. (even in a horribly slow way, this will copy byte-by-byte, potentially even creating wrapper objects for each byte)
SyncTransformer:

    final chunks = await responseBody.stream.toList();
    final responseBytes = Uint8List.fromList(chunks.expand((c) => c).toList());

It's an optimization in this Transformer that it has a path where it can avoid that.

} else {
if (!hasContentLengthHeader || contentLength == 0) {
responseBytes ??= await _consolidateStream(responseBody.stream);
// if the response is empty, return null, since the decoder would throw
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain more on this case with examples? Also comments should start with capital letter and end with dot too:

Suggested change
// if the response is empty, return null, since the decoder would throw
// Returns `null` to let the decoder throws when the response is empty.

Copy link
Contributor Author

@knaeckeKami knaeckeKami Jun 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a more detailed explanation as comment.
This branch is for backwards compatibility with the other Transformers.

The old behavior was, that if a server responds with an Content-Type: application/json header. but the response body was empty, the transformer would return null.
But the fused decoder would throw an exception in this case.

So, if we don't have a valid Content-Length header, we first check if the body is empty before converting the bytes to json. If it's empty, we manually return null.
If we do have a Content-Length header which is >0, we can feed the response bytes directly into the decoder, which is the most performant path.

@Reprevise
Copy link
Contributor

Can you add a jsonDecodeCallback parameter to this new transformer? I have a long running isolate who's purpose is just to decode JSON whereas the internal compute function spawns a new isolate per request.

@ueman
Copy link
Contributor

ueman commented Jun 13, 2024

Can you add a jsonDecodeCallback parameter to this new transformer? I have a long running isolate who's purpose is just to decode JSON whereas the internal compute function spawns a new isolate per request.

That doesn't really make sense for this specific transformer as far as I can tell. It's an interesting idea though. What benefits do you get by doing it?

@knaeckeKami
Copy link
Contributor Author

knaeckeKami commented Jun 13, 2024

We could allow to pass in a jsonDecodeCallback, but it would need to operate an on a Uint8List, not a String like the BackgroundTransformer to leverage the fused decoder.

However, we should measure if that even makes sense in current versions of Dart.
Spawning a new isolate got a lot cheaper recently, and an advantage of using compute() is that it can use Isolate.exit to return the decoded json to the calling isolate, which avoids a copy. (at least on recent versions of Dart which have Isolate groups)

see https://api.dart.dev/stable/3.4.4/dart-isolate/Isolate/exit.html

If the port is a native port -- one provided by ReceivePort.sendPort or RawReceivePort.sendPort -- the system may be able to send this final message more efficiently than normal port communication between live isolates. In these cases this final message object graph will be reassigned to the receiving isolate without copying. Further, the receiving isolate will in most cases be able to receive the message in constant time.

So the question is, is spawning a new Isolate more expensive than copying the decoded map back into the main isolate?
I don't know, we could benchmark it.

@knaeckeKami
Copy link
Contributor Author

knaeckeKami commented Jun 13, 2024

FWIW, I found that a large portion of the performance improvement actually comes from the more efficient consolidating of the Stream<Uint8List> to a single Uint8List

The SyncTransformer and the extending BackgroundTransformer do

Uint8List.fromList(chunks.expand((c) => c).toList()); which copies every byte sequentially, probably also creating wrapper objects for each byte.

Using a ByteBuffer instead is an optimization that could also be integrated into SyncTransformer without breaking changes.

@Reprevise
Copy link
Contributor

So the question is, is spawning a new Isolate more expensive than copying the decoded map back into the main isolate?
I don't know, we could benchmark it.

I'm just interested in the best performance possible. In my project, the size cutoff is 50KB. Any larger than that and it gets sent to the long-running Isolate for parsing. If we find spawning many isolates is better, then I'll concede.

@Reprevise
Copy link
Contributor

FWIW, I found that a large portion of the performance improvement actually comes from the more efficient consolidating of the Stream to a single Uint8List

Sounds like a free win. Maybe a separate PR is in order? I think this PR should move forward as well though, I think the performance benefits of using the fused decoder is worth it.

@knaeckeKami
Copy link
Contributor Author

Sounds like a free win. Maybe a separate PR is in order? I think this PR should move forward as well though, I think the performance benefits of using the fused decoder is worth it.

Yeah, when this is merged I'll prepare a new one, that extracts the _consolidateStream function and make SyncTransformer use it as well.

@knaeckeKami
Copy link
Contributor Author

I'm just interested in the best performance possible. In my project, the size cutoff is 50KB. Any larger than that and it gets sent to the long-running Isolate for parsing. If we find spawning many isolates is better, then I'll concede.

The question is, for which aspect do we want to optimize?
We get the least amount of CPU usage overall with decoding the JSON in the main isolate because no data needs to be copied between isolates and no new isolates need to be spawned.

The least amount of CPU usage on the main isolate is likely achieved by spawning a new isolate each time, as the spawning of the new isolate happens on another thread, and the decoded JSON does not need to be copied back in the main isolate.

With a long-running Isolate, the question is first consolidating the byte stream on the main isolate, then copying it to another isolate and then copying the decoded JSON map back into the main isolate is even less CPU intensive than just decoding it in the main isolate in the first place.

But taking the https://github.com/knaeckeKami/dio_benchmark repo and adapting it to show how a long running isolate performs might be a good idea.

Copy link
Contributor

@ueman ueman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This is awesome. Thank you!

Copy link
Contributor

Code Coverage Report: Only Changed Files listed

Package Base Coverage New Coverage Difference
dio/lib/src/compute/compute_io.dart 🔴 0% 🟠 69.81% 🟢 69.81%
dio/lib/src/transformer.dart 🟢 100% 🟢 84% 🔴 -16%
dio/lib/src/transformers/fused_transformer.dart 🔴 0% 🟢 85.71% 🟢 85.71%
dio/lib/src/transformers/sync_transformer.dart 🟢 86.67% 🟢 100% 🟢 13.33%
Overall Coverage 🟢 80.9% 🟢 82.46% 🟢 1.56%

Minimum allowed coverage is 0%, this run produced 82.46%

@ueman ueman merged commit 4d310a2 into cfug:main Jun 15, 2024
3 checks passed
ueman pushed a commit that referenced this pull request Jun 16, 2024
…dTranformer (#2248)

Follow up to #2239:

The more efficient Stream<UintList> -> Uint8List conversion can be
backported to `SyncTransformer` & the extending `BackgroundTransformer`
@AlexV525
Copy link
Member

The remaining question is if we should set the Transformer as default. IMO it would make sense to change the default transformer to

  /// The default [Transformer] that transfers requests and responses
  /// into corresponding content to send.
  @override
  Transformer transformer = Utf8JsonTransformer(contentLengthIsolateThreshold: 50 * 1024);

so all users get the improved performance by default.

We are not doing this currently, right? IMOContent-Length might not be specified in regular requests so consolidating bytes will still be its place. If there are no drawbacks then we definitely can move on with the new one.

@ueman
Copy link
Contributor

ueman commented Jun 17, 2024

Excactly, that's not yet done

@knaeckeKami
Copy link
Contributor Author

knaeckeKami commented Jun 17, 2024

We are not doing this currently, right? IMOContent-Length might not be specified in regular requests so consolidating bytes will still be its place. If there are no drawbacks then we definitely can move on with the new one.

There are no drawbacks compared to the old Transformer, as the old transformer does the same unconditionally.
Also, I think in practice pretty much any server that returns JSON also sets Content-Length anyway.

Yes, I'll prepare a new PR for setting the default transformer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Opportunity to improve the json decoding performance
4 participants