Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package v2.5.0 causes broker null pointer exception #1799

Open
7 tasks done
alegtk opened this issue Aug 16, 2024 · 2 comments
Open
7 tasks done

Package v2.5.0 causes broker null pointer exception #1799

alegtk opened this issue Aug 16, 2024 · 2 comments
Labels

Comments

@alegtk
Copy link

alegtk commented Aug 16, 2024

Description

Greetings.
We're trying to upgrade to package v2.5.0 from v2.4.0 but found a hard time.

  • Confluent community v6.1 brokers and consumer with package 2.4.0 works
  • Confluent community v6.1 and consumer 2.5.0 works
  • Confluent community v7.5.3 and consumer 2.5.0 not OK
  • Confluent community v7.5.3 and consumer 2.4.0 works

Tests were made just changing confluent-kafka version at the consumer. All other things remained the same.
Producers work fine with both versions of confluent-kafka and brokers (four sets of testing). The problem arises with the consumer.

After turning on debug logging at the consumer one can find
20240816 12:11:02.356 FETCHERR [canario-uno#consumer-1] [thrd:sasl_ssl://dfcdsrvv7526.srv.cd.metal:9092/bootstrap]: sasl_ssl://dfcdsrvv7526.srv.cd.metal:9092/3: 12706.files.incremental [4]: Fetch failed at offset 11548 (leader epoch -1): UNKNOWN_TOPIC_ID

On the broker side
java.lang.NullPointerException
[KafkaApi-2] Unexpected error handling request RequestHeader(apiKey=FETCH, apiVersion=15, clientId=canario-uno, correlationId=52, headerVersion=2) -- FetchRequestData(clusterId=null, replicaId=-1, replicaState=ReplicaState(replicaId=-1, replicaEpoch=-1), maxWaitMs=500, minBytes=1, maxBytes=52428800, isolationLevel=1, sessionId=0, sessionEpoch=-1, topics=[FetchTopic(topic='', topicId=AAAAAAAAAAAAAAAAAAAAAA, partitions=[FetchPartition(partition=1, currentLeaderEpoch=20, fetchOffset=17059, lastFetchedEpoch=-1, logStartOffset=-1, partitionMaxBytes=1048576), FetchPartition(partition=5, currentLeaderEpoch=23, fetchOffset=13667, lastFetchedEpoch=-1, logStartOffset=-1, partitionMaxBytes=1048576), FetchPartition(partition=0, currentLeaderEpoch=21, fetchOffset=27572, lastFetchedEpoch=-1, logStartOffset=-1, partitionMaxBytes=1048576)])], forgottenTopicsData=[], rackId='') with context RequestContext(header=RequestHeader(apiKey=FETCH, apiVersion=15, clientId=canario-uno, correlationId=52, headerVersion=2), connectionId='10.139.66.190:9092-172.24.45.139:50232-24318', clientAddress=/172.24.45.139, principal=User:12706.usr2, listenerName=ListenerName(INTERNAL), securityProtocol=SASL_SSL, clientInformation=ClientInformation(softwareName=confluent-kafka-python, softwareVersion=2.5.0-rdkafka-2.5.0), fromPrivilegedListener=false, principalSerde=Optional[org.apache.kafka.common.security.authenticator.DefaultKafkaPrincipalBuilder@1f09d5d6])

The topic is named 12706.files.incremental and has six partitions. At the consumer logs (debug_240.txt and debug_250.txt) one can see the result for get_watermark_offsets as the first entry. Consumer time is in UTC-3 and broker time in UTC.

Can you please further investigate?

How to reproduce

Broker based on Confluent Community v7.5.3
Consumer based on confluent-kafka v2.5.0
You need a simple producer and a simple consumer just calling poll(). The consumer 2.5.0 never fetches anything. The consumer 2.4.0 on the other hand fetches messages as expected.

Checklist

Please provide the following information:

  • confluent-kafka-python and librdkafka version (confluent_kafka.version() and confluent_kafka.libversion()):
    confluent_kafka.version()
    ('2.5.0', 33882112)
    confluent_kafka.libversion()
    ('2.5.0', 33882367)
  • Apache Kafka broker version: Confluent Community 7.5.3
  • Client configuration: {'security.protocol': 'SASL_SSL', 'sasl.mechanisms': 'SCRAM-SHA-512', 'sasl.username': os.environ['KAFKA_SASL_USR_EXTRA'], 'sasl.password': os.environ['KAFKA_SASL_PWD_EXTRA'], 'ssl.ca.location': PEM, 'metadata.broker.list': os.environ['KAFKA_BROKERS'], 'group.id': os.environ['TOPIC_KAFKA_ID_GROUP_EXTRA'], 'client.id': 'canario-uno', 'enable.auto.commit': 'false', 'enable.auto.offset.store': 'false', 'log_level': 0, 'debug': 'consumer,cgrp,topic,fetch'}
  • Operating system: linux x86_64
  • Provide client logs (with 'debug': '..' as necessary)
    debug_240.txt
    debug_250.txt
  • Provide broker log excerpts
    broker_log_250.txt
  • Critical issue
    Causes me worries.
@emasab
Copy link
Contributor

emasab commented Aug 16, 2024

Hi we've a PR confluentinc/librdkafka#4806 to fix this. It happens with Fetch version 12 (CC 7.x) when
inter.broker.protocol.version is set to less than 2.8 so a workaround if you want to use versions 7.x and CKPy 2.5.0 is to upgrade that broker property, otherwise you can upgrade client version directly to next one.

@alegtk
Copy link
Author

alegtk commented Aug 19, 2024

Hello, @emasab. Many thanks for your attention!

@pranavrth pranavrth added the bug label Aug 23, 2024
daniil-quix added a commit to quixio/quix-streams that referenced this issue Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants