Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leaks while using fs2-kafka consumer #687

Closed
vkorchik opened this issue Feb 1, 2024 · 16 comments · Fixed by zio/zio#8660
Closed

Memory leaks while using fs2-kafka consumer #687

vkorchik opened this issue Feb 1, 2024 · 16 comments · Fixed by zio/zio#8660

Comments

@vkorchik
Copy link
Contributor

vkorchik commented Feb 1, 2024

I have a ZIO-based app but for kafka I am still using fs2-kafka. As of interops I was using zio-interop-cats:23.0.0.0. Everything was fine.

After bumping all the libs to the newest versions I got memory leak. After some investigation I thought memory leak could be due to inconsistency of cats-effect versions: the newest one is 3.5.3 but zio-interops-cats are compiled against 3.4.8. Starting from 3.5.3 there are some crucial changes in runtime semantics so I have rollbacked versions a bit so they are the same throughout the whole stack, but memory leak still appears.

I have examined different versions of cats/interops/zio but no success.

After even more debugging I see that the consumer side is leaking: once consumer consumes events it does something wrong. A lot of objects of zio.ZIO$EvaluationStep$UpdateRuntimeFlags$$anon$28 are created, but never released. Number is just growing.

Versions that are working fine:

  • zio:2.0.1
  • zio-interop-cats:23.0.0.0
  • cats-effect:3.3.13
  • fs2-kafka:2.5.0

Versions that I am testing against and getting leaks:

  • zio:2.0.16
  • zio-interop-cats:23.1.0.0
  • cats-effect:3.4.8
  • fs2-kafka:3.0.0

Versions that are leaking are many but basically all of them are based on cats-effect:3.4.x.

This is the link to the repo and to the heap dump from leaking run.

And also picture of heap usage. Once heap is close to the current max size of heap it just breaks through it and this never ends until all heap is used.
image

Appreciate any help regarding this issue.

@vkorchik
Copy link
Contributor Author

vkorchik commented Feb 1, 2024

After some more tests I have found out that problem started at zio-interop-cats:23.0.0.4 and persists till the newest version. 23.0.0.3 works fine for no matter which version of zio as well as fs2-kafka.

To be even more precise, the reason is actually bump to zio greater than 2.0.10. Any version above affects code (Consumer in my case) so the memory is leaking (amount of zio.ZIO$EvaluationStep$UpdateRuntimeFlags$$anon$28 is constantly growing).

@vkorchik
Copy link
Contributor Author

vkorchik commented Feb 4, 2024

After even more tests I have found the commit/PR and the exact line of code that has broken cats-interop back then and my example as well. It was one of the PRs of 2.0.11 release.

If I would revert this line (and also the other two dependent lines - 2575 and 5044 of code, otherwise compile fails) my example works just fine. But adding to a working example just the problematic line (not touching those two others 2575 and 5044) will break the code again. So obviously the problem is somehow connected to line 2589.

Maybe the actual code that has to be fixed is in cats-interop itself, but I am not able to track it down, and the only thing I was able to find is actually the exact change that caused my example to fail.

@vkorchik
Copy link
Contributor Author

@adamgfraser ,
I want to add some new findings here.

It was confusing to me how changing type from Fiber.Runtime to FiberRuntime helps to run streams without memory leak. And now I know it.
Since the memory leak is observed on consumer side, and leak was not present on interop:23.0.0.3 and below. The thing is that ZIO in project was of version above 2.0.10. And since signature of several methods was changed (like makeFiberChild) and consumer is using those methods...there was simply a MethodNotFoundException. Which was not printed out since the process was forked (I found this error later, much later). Consumer was not started - memory is not leaking.
Yes, I know, stupid mistake of mine.

TLDR, I have made wrong conclusion. Using more specific type does not help us. Having that said - the memory leak is still there.

I am testing older versions of ZIO and interops to find a 100% working solution. I will put any new information here.

@calvinlfer
Copy link
Member

calvinlfer commented Feb 15, 2024

Hey @adamgfraser, just to echo @vkorchik
I have also tried to build the latest 2.0.x release with the fix you made and I want to also confirm that @vkorchik findings are accurate, the memory leak is still present even with the changed type. Another reproducer is here that @vkorchik and I worked on, https://github.com/calvinlfer/kafka-zio-integration-leak-reproducer/blob/main/src/main/scala/com/kaizensolutions/experiments/ReproducerFs2KafkaZIO.scala

Also cc’ing @neko-kai or @joroKr21 for their expertise

@joroKr21
Copy link
Contributor

Sorry, it doesn't ring a bell for me.

@jdegoes jdegoes reopened this Feb 17, 2024
@jdegoes
Copy link
Member

jdegoes commented Feb 17, 2024

/bounty $500

Copy link

algora-pbc bot commented Feb 17, 2024

💎 $500 bounty created by ZIO
💎 $500 bounty created by kaizen-solutions

🙋 If you start working on this, comment /attempt #687 along with your implementation plan
👉 To claim this bounty, submit a pull request that includes the text /claim #687 somewhere in its body
📝 Before proceeding, please make sure you can receive payouts in your country
💵 Payment arrives in your account 2-5 days after the bounty is rewarded
💯 You keep 100% of the bounty award
🙏 Thank you for contributing to zio/interop-cats!

👉 Add a bountyShare on socials

Attempt Started (GMT+0) Solution
🟢 @vkorchik Feb 17, 2024, 10:51:43 AM #688

@vkorchik
Copy link
Contributor Author

vkorchik commented Feb 17, 2024

It seems to me that issue first appeared in interop:23.0.0.2 and after this PR to be precise.

And what causes the issue is switching from parentFiber.scope to FiberScope.global according to cats-effect semantics.

I have switched back from global scope to parent one, and the heap usage is stable for me (that zio.ZIO$EvaluationStep$UpdateRuntimeFlags$$anon$28 also does not grow), comparing to what we have seen above in this issue and in discord thread.
image

I am not a PRO and it is hard for me to explain why parent scope vs global scope makes such a difference.

Anyway, I will create PR soon, so everyone can check it out and share thoughts, and we can compare the results for both zio 2.0.x as well as 2.1.x.

PS: should I use /attempt #687 tag?

@neko-kai
Copy link
Member

neko-kai commented Feb 17, 2024

Thanks for investigating this!
Seems like this should be fixed in ZIO Core.
Moving race to parent scope breaks cats-effect semantics, e.g. it prohibits the loser fiber from continuing to run if the parent fiber is interrupted after raceWith has finished.

@vkorchik
Copy link
Contributor Author

@neko-kai
thanks for explanation.
Failing tests in my PR show this, I think. Simply switching from global to parent scope (as it was before 23.0.0.2) will make the app work, but bunch of tests are failing, those for checking interruptions.
It could have been a quick fix, but not necessarily a good and proper one 🙂

Also I am not sure I can tackle with this issue on my own, I am really very novice here, so, if you know how to fix it, please, go on 💪

Copy link

algora-pbc bot commented Feb 17, 2024

💎 kaizen-solutions added a $500 bounty into the prize pool!

@vkorchik
Copy link
Contributor Author

vkorchik commented Feb 17, 2024

I have added explicit onInterrupt to raceFibersWith as was before the change with scopes, and it seems working for now, my memory consumption tests are still in progress. Will share results asap.

PS: results are here.

Copy link

algora-pbc bot commented Feb 17, 2024

💡 @vkorchik submitted a pull request that claims the bounty. You can visit your bounty board to reward.

@calvinlfer
Copy link
Member

calvinlfer commented Feb 19, 2024

Hello again, I think it's safe to close this unless you all say otherwise (i.e. if there's anything more to be done in ZIO core?)

Here is my reproducer with interop 23.1.0.1 (which has @vkorchik's changes) and zio 2.0.21:

import fs2.Stream
import fs2.kafka.*
import zio.{durationInt as _, *}
import zio.interop.catz.*
import zio.logging.backend.SLF4J

import scala.concurrent.duration.*

// Leak
object ReproducerFs2KafkaZIO extends ZIOAppDefault:
  override val bootstrap: ZLayer[ZIOAppArgs, Any, Any] =
    Runtime.removeDefaultLoggers ++ SLF4J.slf4j

  val topic = "example-topic"

  val consumer: Stream[Task, Unit] =
    val settings =
      ConsumerSettings[Task, String, String]
        .withBootstrapServers("localhost:9092")
        .withGroupId("test-consumer-group-id-2")
        .withAutoOffsetReset(AutoOffsetReset.Earliest)

    KafkaConsumer
      .stream[Task, String, String](settings)
      .evalTap(_.subscribeTo(topic))
      .stream
      .mapChunks(_.map(_.offset))
      .through(commitBatchWithin[Task](2048, 10.seconds))

  val producer: Stream[Task, ProducerResult[String, String]] =
    val settings =
      ProducerSettings[Task, String, String]
        .withBootstrapServers("localhost:9092")
        .withBatchSize(128)
        .withAcks(Acks.One)
        .withEnableIdempotence(false)
        .withRetries(128)

    val producerPipe = KafkaProducer.pipe[Task, String, String](settings)

    Stream
      .iterate[Task, Long](0L)(_ + 1L)
      .map(n => ProducerRecord(topic = topic, key = s"key: $n", value = s"value: $n"))
      .chunkN(n = 128, allowFewer = true)
      .map(ProducerRecords[fs2.Chunk, String, String])
      .through(producerPipe)

  override val run: Task[Unit] =
    Stream(producer, consumer).parJoinUnbounded.compile.drain

As you can see below, nice and flat, stable heap usage:
image

Prior to this change, I was seeing the following:
image

So this definitely fixed it

Copy link

algora-pbc bot commented Feb 19, 2024

🎉🎈 @vkorchik has been awarded $500 by kaizen-solutions.io! 🎈🎊

Copy link

algora-pbc bot commented Feb 19, 2024

🎉🎈 @vkorchik has been awarded $500 by ZIO! 🎈🎊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants