Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Aedes is running out of memory when starting with 30 million subscriptions in MongoDB. #973

Open
zagadheesh opened this issue Aug 16, 2024 · 12 comments
Labels

Comments

@zagadheesh
Copy link

System Information
Aedes: 0.52.0
NodeJS: 16.15.0

  • OS: Amazon linux EC2

Describe the bug
I configured node js to use more than 5gb of memory using "--max-old-space-size=6000". I have 30 million subscriptions stored in Mosca, and I am in the process of transferring these subscriptions from Mosca's MongoDB to Aedes' MongoDB using a Java tool. This tool performs the necessary transformation by converting the 'client' attribute to 'clientId' in the subscription JSON. It streams the data from Mosca's MongoDB, transforms it, and inserts it into Aedes' MongoDB. I have continously monitored server stats and i can confirm aedes loading huge data in memory because of which Node.js throws the following error:

<--- Last few GCs --->

[1761157:0x614f740] 176313 ms: Mark-Compact 5895.4 (6035.5) -> 5884.2 (6037.8) MB, 3557.40 / 0.01 ms (average mu = 0.133, current mu = 0.029) task; scavenge might not succeed
[1761157:0x614f740] 179961 ms: Mark-Compact 5899.0 (6038.0) -> 5887.0 (6040.8) MB, 3420.21 / 0.01 ms (average mu = 0.099, current mu = 0.062) task; scavenge might not succeed

<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----

1: 0xb80c98 node::OOMErrorHandler(char const*, v8::OOMDetails const&) [node]
2: 0xeede90 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
3: 0xeee177 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
4: 0x10ffd15 [node]
5: 0x11002a4 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [node]
6: 0x1117194 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const*) [node]
7: 0x11179ac v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
8: 0x117076c v8::internal::MinorGCJob::Task::RunInternal() [node]
9: 0xd368e6 [node]
10: 0xd39e8f node::PerIsolatePlatformData::FlushForegroundTasksInternal() [node]
11: 0x18af2d3 [node]
12: 0x18c3d4b [node]
13: 0x18afff7 uv_run [node]
14: 0xbc7be6 node::SpinEventLoopInternal(node::Environment*) [node]
15: 0xd0ae44 [node]
16: 0xd0b8dd node::NodeMainInstance::Run() [node]
17: 0xc6fc8f node::Start(int, char**) [node]
18: 0x7f786d23feb0 [/lib64/libc.so.6]
19: 0x7f786d23ff60 __libc_start_main [/lib64/libc.so.6]
20: 0xbc430e start [node]
run.sh: line 3: 1761157 Aborted (core dumped) node --nouse-idle-notification --max-old-space-size=6000 /apps/aedes/server-mongo.js > /apps/aedes/logs/console
$cdata.log

Please suggest me for furthur steps?

To Reproduce
Steps to reproduce the behavior:

  1. Manually copy 30 million subscriptions to aedes
  2. Run the aedes with command like "node --nouse-idle-notification --max-old-space-size=6000 /apps/aedes/server-mongo.js"

Expected behavior
Aedes should start and function properly.

@zagadheesh zagadheesh added the bug label Aug 16, 2024
@robertsLando
Copy link
Member

I suggest you to take a dump and check where those memory goes. I suggest to use a tool like https://github.com/davidmarkclements/0x and/or https://clinicjs.org/flame

@zagadheesh
Copy link
Author

zagadheesh commented Sep 6, 2024

I took the dump and identified that aedes is loading all the subscriptions into memory while initializing aedes-persistence-mongodb - persistence.js . Below is the dump

[JavaScript]:
ticks total nonlib name
10628 14.1% 23.3% JS: *deserializeObject /apps/aedes/node_modules/bson/lib/parser/deserializer.js:65:27
1150 1.5% 2.5% JS: * /apps/aedes/node_modules/aedes-persistence-mongodb/persistence.js:102:26
824 1.1% 1.8% JS: *QlobberSub._add_value /apps/aedes/node_modules/qlobber/aedes/qlobber-sub.js:46:44
618 0.8% 1.4% JS: *Qlobber._add /apps/aedes/node_modules/qlobber/lib/qlobber.js:297:35
554 0.7% 1.2% JS: *QlobberSub._initial_value /apps/aedes/node_modules/qlobber/aedes/qlobber-sub.js:16:48
518 0.7% 1.1% JS: *_readNext /apps/aedes/node_modules/mongodb/lib/cursor/abstract_cursor.js:621:14
474 0.6% 1.0% JS: *slice node:buffer:621:12
357 0.5% 0.8% JS: *getEncodingOps node:buffer:706:24
352 0.5% 0.8% JS: *next /apps/aedes/node_modules/mongodb/lib/cursor/abstract_cursor.js:501:14
247 0.3% 0.5% JS: *copy node:buffer:803:16
240 0.3% 0.5% JS: *readableAddChunkPushObjectMode node:internal/streams/readable:514:40
131 0.2% 0.3% JS: *ObjectId /apps/aedes/node_modules/bson/lib/objectid.js:24:22
90 0.1% 0.2% JS: * /apps/aedes/node_modules/mongodb/lib/cursor/find_cursor.js:81:35
56 0.1% 0.1% JS: *alloc node:buffer:388:30
27 0.0% 0.1% JS: *processTicksAndRejections node:internal/process/task_queues:67:35
26 0.0% 0.1% JS: *serializeInto /apps/aedes/node_modules/bson/lib/parser/serializer.js:553:23

I think, below is the code where it is loading all the subscriptions into memory

https://github.com/moscajs/aedes-persistence-mongodb/blob/9e2eb68567c6bb88cf007b177e5c89b719f29d8d/persistence.js#L99

@robertsLando Could you please suggest?

@robertsLando
Copy link
Member

@zagadheesh Unfortunately there is no other way to do this, subscriptions must be loaded on memory

@zagadheesh
Copy link
Author

@robertsLando Thank you very much for confirming! I was wondering if we could integrate a feature that implements a subscriptions cache as an LRU (Least Recently Used) cache. This way, we can limit the memory usage of the subscriptions cache. I’m thinking of implementing it in a manner similar to Qlobber -> QlobberLRU. If this approach seems feasible, I can raise a PR with the necessary changes.

@robertsLando
Copy link
Member

That's just wrong as the cache should always contain all the subscriptions not only the least recently used in order to allow broker to forward messages to clients. The only solution to this problem is to use aedes in a cluster environment so clients will spread across instances and so the subscriptions

@zagadheesh
Copy link
Author

Even when using Aedes in a clustered environment, each instance still loads all the subscriptions at startup, consuming memory for all subscriptions in every instance. Is there an alternative way to prevent this?

@robertsLando
Copy link
Member

@zagadheesh how many clients do you have? Is the clean flag set to true or false on this clients?

@zagadheesh
Copy link
Author

30 million clients and the clean flag is false

@robertsLando
Copy link
Member

I think a possible improvement could be to just load the subscriptions of connected clients + subscriptions from non-clean sessions, this needs to be tested BTW, would you like to submit a PR?

@zagadheesh
Copy link
Author

I understand how to load subscriptions for connected clients. However, what exactly does it mean to load subscriptions from non-clean sessions? I have 30 million non-clean subscriptions (some of which may not have pending outgoing packets stored in MongoDB). Can we optimize by caching only the subscriptions of connected clients?

@robertsLando
Copy link
Member

@zagadheesh sorry I misread you comment above. I think your problem so is that those clients should connect with clean flag set to true and not false! That's a common error that lot of users do, what you may look for is to use retained messages instead so when and if clients disconnect for any reason their state will be back in sync on reconnect as they will receive last message sent on the subscription topic. Having tons of clients with clean flag set to false is generally a bad practice and I explain you why: if for any reason one of your clients goes offline forever your broker will keep storing its offline packets forever causing huge memory leaks!

Generally using clean set to false is only useful when you have an application that needs to store historical data sent to the broker (not the case of aedes as you can intercept messages directly) so in that case that will be the only one client connecting with clean set to false and you are sure that if for some reason it goes offline it will receive all the packets sent by clients while offline on next reconnection.

@zagadheesh
Copy link
Author

@robertsLando Thank you for your valuable insights. We're using this broker for a chat messaging application, so retaining historical data for each subscription is essential. In this case, the clean flag should be set to false. Could you please advise whether using this MQTT broker for chat functionality with historical data retention is the appropriate architecture?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants