Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not create string objects from consumerTag, exchange and routingKey, or get them from a string cache. #1231

Open
zgabi opened this issue Jun 22, 2022 · 34 comments
Assignees
Milestone

Comments

@zgabi
Copy link

zgabi commented Jun 22, 2022

In our real application the most allocated objects are Strings. Most of them are allocated in RabbitMQ Client:

image

They usually contains the same value.

Please consider using ReadOnlySpan<char> or ReadOnlySpan<byte> with a pooled array in the background similar to the data.
Alternatively get the strings from a cache table (maybe separate table for each "string type" (consumerTag, exchange...)), since most of them are the same.

By the way in the screenshot the 3rd, 4th, 5th and 6th allocations also belongs to RabbitMQ Client, but maybe some of them are already solved in the main branch. (I'm using the latest 6.x package)
The 2nd one is WFP, I have a PR for that.

@michaelklishin
Copy link
Member

How much difference would this make to even a very basic (tutorial 1-2 style) consumer?

@zgabi
Copy link
Author

zgabi commented Jun 22, 2022

I don't undertand your question. We have 1 consumer in this process.

@lukebakken
Copy link
Contributor

Feel free to submit a pull request implementing your suggestion.

@zgabi
Copy link
Author

zgabi commented Jun 22, 2022

Which one do you prefer?

  1. Receive ReadOnlySpan<char> in the IBasicConsumer's Handle* methods (e.g HandleBasicDeliver), so it is already UTF-8 decoded characters
  2. Receive ReadOnlySpan<byte> in the IBasicConsumer's Handle* methods (e.g HandleBasicDeliver), the raw bytes
  3. Do not change the interface, but cache the strings in WireFormatting.ReadShortstr.. maybe this one is slower, because the cached strings should be compared with the received data. But do not require to change the interface.

@lukebakken lukebakken self-assigned this Jun 22, 2022
@zgabi
Copy link
Author

zgabi commented Jun 22, 2022

In the framework? dotnet/runtime#28368

@lukebakken
Copy link
Contributor

No for some reason I thought string interning worked differently.

I'd like to get feedback from the "usual gang", aka @bollhals, @bording, @danielmarbach and @stebet if they have time. I'm a bit surprised nobody has reported this issue in their use of this library.

@lukebakken lukebakken added this to the 6.5.0 milestone Jun 22, 2022
@michaelklishin
Copy link
Member

@zgabi my point is that caching is always harder than it sounds. What would be the gains (in terms of memory footprint and allocations) to a realistic consumer (application) like yours, or even a basic tutorial-style one.

Previously all efficiency improvements in this client were driven by profiler results. Sure,
you already have some but I am curious about the difference it makes.

@zgabi
Copy link
Author

zgabi commented Jun 22, 2022

I'd like to get feedback from the "usual gang", aka @bollhals, @bording, @danielmarbach and @stebet if they have time. I'm a bit surprised nobody has reported this issue in their use of this library.

Because they are just some small allocations. 2-3 small strings / message ("exchange" is empty string for me, which is not allocated)
And everybody is happy since a lot of other allocations were removed in the past 3-4 years. (Me, too) Version 6.x is much better and faster than 4.x was 4 years ago. And I saw that the main branch has more new related improvelemts (BasicProperties is a struct, etc...)

We are sending a lot of messages (~200/ sec), because it is a requirement for us to send everything between the server and the clients through RabbitMQ... even video streams.
This string allocation problem is also not a big issue for us... 1.8 M messages allocations / 38 minutes. GC can handle it :)

I reporteed this issue, since it is the top 1 in the memory profiler.

@zgabi
Copy link
Author

zgabi commented Jun 23, 2022

Yes, caching is harder than simply passing spans to the user.

For exmaple:

  • caculating some hash from the data
  • put it to a thread static dictionary (with this the string will exist only 1 time / thread, which is quite good, and does not need locking).

So probably this is slower than just simply creating a new string.

So I'd prefer passing span (or memory) of byte to the user.... similar to the data.
It needs the modification of the IBasicConsumer, but does not have any other disadvantages. It will be even faster, since no UTF-8 decoding is needed. And no string allocation, although it is fast...

@michaelklishin
Copy link
Member

@zgabi we can do this for 7.0. Option 2 above is more in the spirit of RabbitMQ and clients but I can see how many might vote for option 1.

@lukebakken
Copy link
Contributor

I agree with @michaelklishin that this should be 7.0 only if there is an API change.

@lukebakken lukebakken modified the milestones: 6.5.0, 7.0.0 Jun 23, 2022
@bollhals
Copy link
Contributor

In our real application the most allocated objects are Strings. Most of them are allocated in RabbitMQ Client:

Yeah the strings always were a bit of a problem, but we've focused on everything else so far.

Which one do you prefer?

  1. Receive ReadOnlySpan<char> in the IBasicConsumer's Handle* methods (e.g HandleBasicDeliver), so it is already UTF-8 decoded characters
  2. Receive ReadOnlySpan<byte> in the IBasicConsumer's Handle* methods (e.g HandleBasicDeliver), the raw bytes
  3. Do not change the interface, but cache the strings in WireFormatting.ReadShortstr.. maybe this one is slower, because the cached strings should be compared with the received data. But do not require to change the interface.

Regarding the options
ROS<> in general is problematic for as long as we want to support async methods.
We can think about ROM<>.

Caching, I've actually 1.5 years ago started experimenting with option 3 back then (see commit here) and even saw gains in overall performance if the string I think was > 16 chars. But it's been a while, so one would had to check again.

In principal the API is a complex topic and depends which usecase we want to support.

  • Caching is the default easy to use for the customer API that the user already knows. (Though fine tuning might require some new API, e.g. for cleaning up the cache)
  • ROM is still kind of convenient to use, but often you have to have a own copy / ToArray() it, where you'd loose all gains again. (E.g. all ConsumeTag are stored in the consumer, so we'd have to adapt the API there as well and allocate here to store the information.)
  • ROM on to of the above, we'd save on the encoding. But pay the price of usability. In some cases (especially where you don't use it) you have a lot of gain, but on others where you do want to access it en encode, it looks like it's more a step in the wrong direction

=> So for me the caching option looks like the best option if the API can somewhat be simple enough for the common use case.

@zgabi
Copy link
Author

zgabi commented Jun 24, 2022

What about distinguish the consumerTag from the another 2 strings. As you wrote consumerTags are problematic, they are stored in dictionary and hashset. Multiple handlers receve it.
It is quite easy to get rid of the another two to change it to ROM<byte> or similar custom struct which has an overridden ToString... so the user's can easily get the string value.

I've investigated a little bit.

This is measure from the current main branch:
image

(Btw: you can see that everything other disappeared from the list which was in v 6.4 in my first comment)

This measure is when I change the exchange and routingKey to ROM<byte>:
image

Changes: zgabi@a6ac5c4

I tried to make similar time periods for the measure, but check the ratio between the number of the objects. In the latest main code there are almost the same number ofd strings as WPF EventKeys. In the modified code about half of them.

I could change the ROM to another struct with an overridden ToString. It won't affect the memory allocations, and also not really changes the performance. So the user can get the string... but he has the opportunity to convert to Span<char> to without allocation if he also cares it.

@danielmarbach
Copy link
Collaborator

I haven't looked in depth into the code base for a while... I was wondering if we have already looked at places where we can benefit from the low allocation APIs like string.Create to give use some gains there before we go into the complexity of string caching?

@zgabi
Copy link
Author

zgabi commented Jun 24, 2022

@danielmarbach string.Create is also creating a new string object :) This issue is not about string concetanation where string.Create would be useful. There are no concatenation in this case.

@danielmarbach
Copy link
Collaborator

yeah I'm aware of that. My point was not about the specific problem but more about making sure where we use string and concatenation we do it efficiently before we go into complex string caching.

@zgabi
Copy link
Author

zgabi commented Jun 24, 2022

As far as I saw there are no string concatenation issues in the code. After the connection is established 100% of the new strings in RabbitMQ client are cunsumerTags, excanges and routingKeys.

The only string concatientaion in this library is in the ToString method of the ShutdownEventArgs.. which is (according to the method comment's) only for debugging:

/// <summary>
/// Override ToString to be useful for debugging.
/// </summary>
public override string ToString()
{
return $"AMQP close-reason, initiated by {Initiator}"
+ $", code={ReplyCode}"
+ (ReplyText != null ? $", text='{ReplyText}'" : string.Empty)
+ $", classId={ClassId}"
+ $", methodId={MethodId}"
+ (Cause != null ? $", cause={Cause}" : string.Empty);
}

@zgabi
Copy link
Author

zgabi commented Jun 24, 2022

Created a test commit for removing the string allocation.
Since the consumers are stored in a dictionary, and the dictionary lookup is needed anyway, RabbitMQ client can store the the known clientTag strings in that dictionary. I haven't measured, but probably this one is slightly faster, because the "utf8 data" is about half than the "string data", so the hash calcualtion and comparsion should be faster.
The string allocation is needed only when there is no consumer for the consumerTag.
For me it removes 100% of the string allocations. I don't know whether it is a problem for anybody that the allocations are not removed when consumer does not exists. (How can that happen?)

Commit:
zgabi@4b67b23

Memory profiler result after both commits:
image

So basically the consulerTag remains string everywhere, but (only) in the HandleBasicDeliver it is not allocated when consumer exists)

@danielmarbach
Copy link
Collaborator

FYI Ben Adams wrote once an intern pool https://github.com/benaadams/Ben.StringIntern.

Yet it seems by the first look your approach is elegant and low effort. Need to look closer though a bit next week

@bollhals
Copy link
Contributor

One thing you need to take care of if you store it in a dictionary or cache it, is that the backing storage is a rented array, hence it will be returned and potentially overwritten, which changes the content of your ROM

I'll try to look into the options and provide a bit more context.

Btw, we already have this CachedString class here, the intend when introducing it was cases like this.

@zgabi
Copy link
Author

zgabi commented Jun 24, 2022

Yes, I create a new byte[] in the AddConsumer method. No rented array is used after it is returned to the pool.

@bollhals
Copy link
Contributor

If I'm not mistaken then you return the array right after you call dispatch. Whereas for de body array we pass it on to the dispatcher.

@zgabi
Copy link
Author

zgabi commented Jun 25, 2022

Oh, so you mean the excahnge and routingKey.

Yes.

I've created new commits, fixed some other things which was commented by @danielmarbach (See comments here: zgabi@4b67b23)

New commits:
zgabi@926b2f8
and
zgabi@fc636a2

@bording
Copy link
Collaborator

bording commented Jun 25, 2022

I've created new commits, fixed some other things which was commented by @danielmarbach (See comments here:

You should make your commits on a separate branch and open a draft PR. It makes it much easier to see what you're doing that way.

@zgabi
Copy link
Author

zgabi commented Jun 25, 2022

Ok, created a draft PR (#1233)

@bollhals
Copy link
Contributor

I've had a look at it and I think it's very important to be consistent in the API.

Meaning if we expose e.g. ExchangeName as ReadOnlyMemory, then every API that takes a Exchange name, should have the possibility to work with ReadOnlyMemory. Otherwise it's very confusing for the customer to get something different from what the user provided elsewhere.
This means e.g. QueueBind should have a overload that takes ReadOnlyMemory.

On the topic of ReadOnlyMemory vs caching:

  • ROM is easier to implement / maintain for the library.
  • With ROM it's not obvious that it is a UTF8 string. So more work for the devs to figure this out to be usable.
    => For me more a question what direction the library wants to go for, that's though is not my decision.
    (My Opinion: I'd go for simplicity in the library, UTF8 byte array should be common enough for the devs that use it to not be a huge burden)

@zgabi
Copy link
Author

zgabi commented Jun 27, 2022

Yes, all the methods which has routingKey / exchange should accept ROM<byte>,too.. even more, that should be the prefered use.
For compatibility they can have an overload which accepts string, but internally they should only convert the received parameter to byte[]. So the users should know that when they use the string parameter version, it is slower.
This makes the CachedString and some relates classes unnecessary. (For example the class called BasicPublis... always the BasicPublishMemory will be used - which should be renamed to BasicPublish)

Instead of accepting (and providing in the Handle* methods) ROM, the parameter type could be a "CustomString" (todo: give a good name) struct, which internally similar to the ROM<byte>, but it will be obvoius to the users that they are strings. Can for example explicit/implicit cast from/to string or call ToString().

In this case it will be enough to have only 1 parameter combinataion for for example the BasicPublish method

void BasicPublish<TProperties>(CustomString exchange, CustomString routingKey, ref TProperties basicProperties, ReadOnlyMemory<byte> body = default, bool mandatory = false)

And the users can call it with strings (implicit cast):
BasicPublish("myexchange", "myRoutingKey", ref basicProperties, body)

And the users who are interested in the performance will "cache" the CustomString stuct.

@michaelklishin
Copy link
Member

@zgabi in the protocol, the type used by those fields is called "short strings". It can be a decent type name to use in the context of this client.

@bollhals
Copy link
Contributor

The default up until now was and still is the string case, so we should "try" not to make this case worse, but rather provide a even better alternative. (The CachedString overload was just recently added and can probably easily be modified if needed)

Nesting structs in structs will have it's own challenges and inefficiencies, but if wanted it can be managed. But I'd hold off implicit conversions to this type in general, and from this type if they allocate. (E.g. No implicit converstion to string, if it means allocating a string. then I rather have an explicit ConvertToString or similar)

@zgabi
Copy link
Author

zgabi commented Jun 27, 2022

I created a new commit which is using CachedStrings for every exhange and routingKey (sometimes the parameter name is "name" or "source", but it contains exchange or routingKey)

This branch and commit does not contain the counsumerTag change as requested:
zgabi@2e71e8b

All public method has both string and CachedString overloads. (Except the HandleBasicDeliver, which is a callback, it makes no sense to add and call both) Internal is only CachedString (when it is enough)

Should I create a new PR?

@zgabi
Copy link
Author

zgabi commented Aug 1, 2022

Any news on this issue?

@michaelklishin
Copy link
Member

FTR, the PR was submitted as #1233.

@lukebakken
Copy link
Contributor

@zgabi as you can see, in #1233 I have brought your code in-line with main.

Could you please provide precise instructions, including code if necessary, for me to see the benefit of this PR? Assume I have never done memory profiling (because it has been so long I don't remember the details). Thank you.

@lukebakken
Copy link
Contributor

lukebakken commented May 16, 2024

@zgabi @bollhals @michaelklishin

Hi everyone, now that I've done some work on #1233 and have gotten more familiar with the rationale behind this discussion and associated PR, I have the following to ask -

  • I'd really like to get version 7 of this library done May of 2024. Do the benefits to this change justify extending the release date?
  • Currently, Do not create string objects from consumerTag, exchange and routingKey, or get them from a string cache #1233 modifies the public API for handling deliveries such that exchange and routing key strings are ReadOnlyMemory<byte>. I agree with @bollhals that the public API must be consistent in this library, so I am leaning towards re-naming CachedString to ShortStr (as is named in the AMQP 091 spec) and using it consistently through the API for all string arguments. Thoughts?

@lukebakken lukebakken modified the milestones: 7.0.0, 6.9.0, 8.0.0 Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants