Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent NoClassDefFoundError: com/mongodb/internal/binding/ReferenceCounted #43290

Open
Osmyslitelny opened this issue Nov 26, 2024 · 15 comments
Labels
status: feedback-reminder We've sent a reminder that we need additional information before we can continue status: waiting-for-feedback We need additional information before we can continue status: waiting-for-triage An issue we've not yet triaged

Comments

@Osmyslitelny
Copy link

Introduction:
Before, I created an issue at spring-data-mongodb, but they recommended creating an issue for spring boot. Therefore, I copied the issue and created a new one in this repository.

Context:
We have scheduled services (jobs) that run multiple times a day. However, there is an issue where 1-3 runs per day fail with an error. All other runs are problem-free. Therefore, the error seems blinked.

Environment:

  1. We run these services using an docker image on Kubernetes.
  2. We connect to AWS DocumentDB.

Versions:
I think the main dependencies are Spring and related MongoDB libraries (such as mongo-driver). We obtain these dependencies through spring-boot-dependencies. Also, every spring version have different mongo related lib versions. We have tested them with different versions:

  1. spring 3.3.2 - no issues (or we never catch them)
  2. spring 3.3.4 - error occurs
  3. spring 3.3.5 - error occurs
  4. spring 3.4.0 - error occurs

Verification:
Since the issue is not consistent, we have examined the logs and checked for common causes such as missing classpath or empty BOOT-INF data and etc. Everything appears correct (and most of the runs are successful).

Error full stacktrace:
trace.txt

Notes:

  1. At different versions, sometimes we encounter a NoClassDefFoundError with other com/mongodb classes.
  2. Mongo FAQ that related to issue but looks like everything at our sine is correct. That why was decided that problem could be on spring side.
  3. Local Integration test + mongo testcontainer (not AWS DocumentDB) never get error.
  4. Maybe something related with 38611
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Nov 26, 2024
@wilkinsona
Copy link
Member

Thanks for the report.

There certainly appears to be some similarity with #38611, but in this case it seems that the failure's occurring during startup and on the main thread. The problem in #38611 was occurring off the main thread and only when it had been interrupted. Unfortunately, without a way to reproduce the failure, I don't think we'll be able to diagnose the problem and either fix it or determine that it has an external cause.

You could try using the classic loader implementation to see if that makes a difference. The new loader was introduced in Spring Boot 3.2.0 so, given that 3.3.2 works, it may not make a difference. It would also be interesting to know if the problem occurs with 3.3.3 as that would narrow down the differences between a working version and a failing version.

Beyond that, I think we're really going to need a minimal sample that reproduces the problem to make any progress here.

@wilkinsona wilkinsona added the status: waiting-for-feedback We need additional information before we can continue label Nov 26, 2024
@wilkinsona wilkinsona changed the title Blinker NoClassDefFoundError: com/mongodb/internal/binding/ReferenceCounted Intermittent NoClassDefFoundError: com/mongodb/internal/binding/ReferenceCounted Nov 26, 2024
@Osmyslitelny
Copy link
Author

We tried using the classic loader on Spring 3.4.0 with a cron job set to run every 5 minutes. In the past 24 hours, we haven't had any failures, which is better than our usual rate of around 3 failures. Next, we will test version 3.3.3 without the classic loader for the next 24 hours, and also continue running the job without the classic loader for 48 hours to further evaluate its result.

Unfortunately, I have no idea how to reproduce the failure because when you get 3 failures at day with this (5 min) schedule there is no idea what to do to reproduce that or catch.

Tomorrow, I will update the information regarding 3.3.3. If you have any ideas on how to reproduce or expand the logs, or any other suggestions, please feel free to share them.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Nov 27, 2024
@Osmyslitelny
Copy link
Author

We checked other version and current data is:

spring 3.3.2 - no issues (or we never catch them)
spring 3.3.3 - no issues (or we never catch them)
spring 3.3.4 - error occurs
spring 3.3.5 - error occurs
spring 3.4.0 - error occurs
spring 3.4.0 with classic loader - no issues (or we never catch them)

If its also needed we use amazoncorretto:21.0.5

@wilkinsona
Copy link
Member

Thanks for the additional details.

Looking more closely at the original stack trace, I've noticed a couple of things:

  1. A MultiServerCluster is being created
  2. The failure's occurring beneath com.mongodb.internal.Locks.withInterruptibleLock which casts some doubt on the theory that thread interruption is the cause

1 requires some non-default configuration of Mongo. Please share that configuration with us.

For 2, if I call Thread.currentThread.interrupt() before Mongo client's created, it quickly fails with a com.mongodb.MongoInterruptedException. If I, through the debugger, interrupt the thread once the lock has been taken, creating of the cluster succeeds without any class loading problems. I won't discount thread interruption completely at this point, but it is looking unlikely so we need some other avenues to explore.

The loader can be configured to output some debug information (-Dloader.debug=true). It could be informative to compare this output from a run that succeeds and a run that fails. The two runs should be with the exact same binary so that all of the information about the positions of files in the archive and their sizes remains constant. Could you please gather this information and share it with us?

@wilkinsona wilkinsona added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Nov 28, 2024
@Osmyslitelny
Copy link
Author

I will collect all the mentioned information on various versions to compare. The error hasn't happened again in the last few days, but please don't close the issues for now. Once I have all the required logs, I will update the thread.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Dec 2, 2024
@wilkinsona wilkinsona added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Dec 2, 2024
@AceOfSnakes
Copy link

looks like -Dloader.debug=true solved or masked real problem

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Dec 2, 2024
@philwebb
Copy link
Member

philwebb commented Dec 2, 2024

That may well be because the System.out logging causes enough of a delay that the race condition (or whatever is causing the problem) doesn't occur.

@AceOfSnakes
Copy link

Yes - exactly - by my opinion

@wilkinsona
Copy link
Member

That's unfortunate, but thanks for trying it. Working on the theory that it may be a race condition, could you try deploying the apps with -Dspring.backgroundpreinitializer.ignore=true?. This will (largely) limit class loading during startup to the main thread.

@Osmyslitelny
Copy link
Author

Osmyslitelny commented Dec 4, 2024

We collected full logs with different settings that you requested. Mongo settings could be found at 'without debug file'. They printed as 'log info' object.

spring_335_wihtout_debug_failed.log
spring_335_without_debug_passed.log
spring_335_debug_failed.log
spring_335_debug_passed.log

Also, I run -Dspring.backgroundpreinitializer.ignore=true without debug flag but we need time to observe it (1-2 days I think).

@Osmyslitelny
Copy link
Author

With -Dspring.backgroundpreinitializer.ignore=true still get the same error (

@wilkinsona
Copy link
Member

Thanks, @Osmyslitelny. I had hoped to compare the debug_failed and debug_passed logs but, unfortunately, the former seems to be incomplete. debug_failed starts differently to debug_passed and only seems to contain content related to loading Spring Boot's failure analysers. This would only happen after the NoClassDefFoundError has been thrown so my guess is that some earlier logging has somehow been missed.

Would it be possible to capture the output again please?

@wilkinsona wilkinsona added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Dec 12, 2024
@Osmyslitelny
Copy link
Author

I agree that logs from debug_failed look incomplete, but this is the full logs that pods return with the -Dloader.debug=true flag. For me, it is not obvious why all logs disappear with this additional flag that should make logs more "wider". I going to try to get more logs but actually I don't know why it happens and open to any idea and recommendation how get full logs and keep this flag.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Dec 12, 2024
@wilkinsona
Copy link
Member

Sorry, I don't know why that would be. The passed logs should that output from the beginning of start up can be produced successfully. If I were to guess, I'd guess that something in the pod is limiting or tailing the output, perhaps only once it has reached a certain length.

@wilkinsona wilkinsona added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Dec 12, 2024
@spring-projects-issues
Copy link
Collaborator

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

@spring-projects-issues spring-projects-issues added the status: feedback-reminder We've sent a reminder that we need additional information before we can continue label Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: feedback-reminder We've sent a reminder that we need additional information before we can continue status: waiting-for-feedback We need additional information before we can continue status: waiting-for-triage An issue we've not yet triaged
Projects
None yet
Development

No branches or pull requests

5 participants