Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/healthcheck fails when start TMail with Redis event bus from scratch #1556

Closed
quantranhong1999 opened this issue Feb 20, 2025 · 1 comment
Closed
Assignees
Labels
bug Something isn't working

Comments

@quantranhong1999
Copy link
Member

Why

When deploying a new TMail using Redis event bus from scratch, /healthcheck fails with 500 error code.

Error log:

{
    "timestamp": "2025-02-20T04:59:56.453Z",
    "level": "ERROR",
    "thread": "qtp829000452-191",
    "mdc": {
        "host": "localhost:8000",
        "verb": "GET",
        "action": "/healthcheck",
        "protocol": "webadmin"
    },
    "logger": "spark.http.matching.GeneralError",
    "message": "",
    "context": "default",
    "exception": "com.github.fge.lambdas.ThrownByLambdaException: java.io.IOException
    at com.github.fge.lambdas.predicates.ThrowingPredicate.test(ThrowingPredicate.java:27)
    at java.base/java.util.stream.ReferencePipeline$2$1.accept(Unknown Source)
    at java.base/java.util.stream.ReferencePipeline$3$1.accept(Unknown Source)
    at java.base/java.util.stream.ReferencePipeline$3$1.accept(Unknown Source)
    at java.base/java.util.stream.Streams$StreamBuilderImpl.tryAdvance(Unknown Source)
    at java.base/java.util.stream.Streams$ConcatSpliterator.tryAdvance(Unknown Source)
    at java.base/java.util.stream.ReferencePipeline.forEachWithCancel(Unknown Source)
    at java.base/java.util.stream.AbstractPipeline.copyIntoWithCancel(Unknown Source)
    at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source)
    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
    at java.base/java.util.stream.FindOps$FindOp.evaluateSequential(Unknown Source)
    at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source)
    at java.base/java.util.stream.ReferencePipeline.findAny(Unknown Source)
    at org.apache.james.events.RabbitEventBusConsumerHealthCheck.check(RabbitEventBusConsumerHealthCheck.java:74)
    at org.apache.james.events.RabbitEventBusConsumerHealthCheck.lambda$check$0(RabbitEventBusConsumerHealthCheck.java:60)
    at com.github.fge.lambdas.functions.FunctionChainer.doApply(FunctionChainer.java:20)
    at com.github.fge.lambdas.functions.ThrowingFunction.apply(ThrowingFunction.java:17)
    at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:106)
    at reactor.core.publisher.SerializedSubscriber.onNext(SerializedSubscriber.java:99)
    at reactor.core.publisher.FluxRetryWhen$RetryWhenMainSubscriber.onNext(FluxRetryWhen.java:178)
    at reactor.core.publisher.MonoSubscribeOn$SubscribeOnSubscriber.onNext(MonoSubscribeOn.java:146)
    at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2571)
    at reactor.core.publisher.MonoSubscribeOn$SubscribeOnSubscriber.trySchedule(MonoSubscribeOn.java:189)
    at reactor.core.publisher.MonoSubscribeOn$SubscribeOnSubscriber.onSubscribe(MonoSubscribeOn.java:134)
    at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:55)
    at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:53)
    at reactor.core.publisher.Mono.subscribe(Mono.java:4568)
    at reactor.core.publisher.MonoSubscribeOn$SubscribeOnSubscriber.run(MonoSubscribeOn.java:126)
    at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84)
    at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37)
    at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)
    
    Suppressed: com.rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no queue 'mailboxEvent-workQueue-org.apache.james.events.GroupRegistrationHandler$GroupRegistrationHandlerGroup' in vhost 'tmail', class-id=50, method-id=10)
        at com.rabbitmq.client.impl.AMQChannel.processShutdownSignal(AMQChannel.java:437)
        at com.rabbitmq.client.impl.ChannelN.startProcessShutdownSignal(ChannelN.java:295)
        at com.rabbitmq.client.impl.ChannelN.close(ChannelN.java:624)
        at com.rabbitmq.client.impl.ChannelN.close(ChannelN.java:557)
        at com.rabbitmq.client.impl.ChannelN.close(ChannelN.java:550)
        at com.rabbitmq.client.impl.recovery.AutorecoveringChannel.lambda$close$0(AutorecoveringChannel.java:74)
        at com.rabbitmq.client.impl.recovery.AutorecoveringChannel.executeAndClean(AutorecoveringChannel.java:102)
        at com.rabbitmq.client.impl.recovery.AutorecoveringChannel.close(AutorecoveringChannel.java:74)
        at org.apache.james.events.RabbitEventBusConsumerHealthCheck.lambda$check$0(RabbitEventBusConsumerHealthCheck.java:59)
        ... 20 common frames omitted

    Suppressed: java.lang.Exception: #block terminated with an error
        at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:104)
        at reactor.core.publisher.Mono.block(Mono.java:1779)
        at org.apache.james.webadmin.routes.HealthCheckRoutes.validateHealthChecks(HealthCheckRoutes.java:128)
        at spark.ResponseTransformerRouteImpl$1.handle(ResponseTransformerRouteImpl.java:47)
        at spark.http.matching.Routes.execute(Routes.java:61)
        at spark.http.matching.MatcherFilter.doFilter(MatcherFilter.java:134)
        at spark.embeddedserver.jetty.JettyHandler.doHandle(JettyHandler.java:50)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1598)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.eclipse.jetty.server.Server.handle(Server.java:516)
        at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
        at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
        at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
        at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
        ... 1 common frames omitted

    Caused by: java.io.IOException: null
        at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:140)
        at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:136)
        at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:158)
        at com.rabbitmq.client.impl.ChannelN.queueDeclarePassive(ChannelN.java:1033)
        at com.rabbitmq.client.impl.ChannelN.consumerCount(ChannelN.java:1052)
        at com.rabbitmq.client.impl.recovery.AutorecoveringChannel.consumerCount(AutorecoveringChannel.java:377)
        at org.apache.james.events.RabbitEventBusConsumerHealthCheck.lambda$check$1(RabbitEventBusConsumerHealthCheck.java:73)
        at com.github.fge.lambdas.predicates.PredicateChainer.doTest(PredicateChainer.java:21)
        at com.github.fge.lambdas.predicates.ThrowingPredicate.test(ThrowingPredicate.java:23)
        ... 34 common frames omitted"
}

Reason: RabbitEventBusConsumerHealthCheck rely on the hardcode James GroupRegistrationHandlerGroup therefore expects the queue mailboxEvent-workQueue-org.apache.james.events.GroupRegistrationHandler$GroupRegistrationHandlerGroup to exist.

However, the Redis event bus relies on its own TmailGroupRegistrationHandler which results in the mailboxEvent-workQueue-org.apache.james.events.TmailGroupRegistrationHandler$GroupRegistrationHandlerGroup.

And /healthcheck webadmin endpoint fails with 500 error as a consequence, likely because RabbitEventBusConsumerHealthCheck asserts the James group queue always exists.

I took the chance to review deeper the Redis event bus. I spotted that when we use RabbitMQAndRedisEventBus, we create these unused queues by starting RabbitMQEventBus and register some dedicated listeners:

  • jmapEvent-workQueue-org.apache.james.events.GroupRegistrationHandler$GroupRegistrationHandlerGroup
  • emailAddressContactEvent-workQueue-org.apache.james.events.GroupRegistrationHandler$GroupRegistrationHandlerGroup

And... ScheduledReconnectionHandler is checking the James group queues, not the Tmail group queues cf https://github.com/linagora/tmail-backend/blob/master/tmail-backend/guice/distributed/src/main/java/com/linagora/tmail/ScheduledReconnectionHandler.java#L322.

How

I propose to refactor a bit:

  • RabbitMQAndRedisEventBus should use the same group as RabbitMQEventBus which results in the same group queue name as James, which makes sense IMO as the group handling is the same.
  • Refactor the Guice module JMAPEventBusModule, RabbitMQEventBusModule so we can split the RabbitMQEventBus starting part. Therefore when we use RabbitMQAndRedisEventBus, we won't start RabbitMQEventBus and create un-used queues and un-used consumers.

DoD

In DistributedServerWithRedisEventBusKeysTest /healthcheck pass.

@hungphan227
Copy link
Contributor

james: apache/james-project#2656
tmail: #1566

@chibenwa chibenwa closed this as completed Mar 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants