You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are trying to capture events from Azure Event Hub and save them in Datalake after some processing using Databricks Workflow. The workflow works fine for dev and stg environments but when trying to run it on prd (with different Event Hub, Databricks and Datalake Account) with exactly same configurations and JARs, it gives the following error: In partition 7 of <event-hub-name>, with consumer group <consumer-group-name>, request seqNo 210 is less than the received seqNo 198. The earliest seqNo is 210, the last seqNo is 210, and received seqNo 198.
The total number of partitions in the event-hub is 8 and there are total 8 event-hubs (including this one) in the namespace, all with Message retention of 7 days.
Expected behavior
The data gets processed and sent to the Datalake as it does for other environments.
The error doesn't make any sense since it says that request seqNo 210 is less than the received seqNo 198. Although 210 is literally not less than 198.
After looking at the messages that the job is processing using EventPosition.earliest(), it was found out that the event hub is giving messages with sequence numbers between 188 and 194. However, it's important to note that messages with lower sequence numbers than 194 were added to the event hub about 2 weeks ago, and the event hub is set to keep messages for only 7 days, so all but one of those messages should have expired by now. This is also supported by the error message from the job, which sets up sequence number 194 as the starting point, but then complains that it received sequence number 188.
However, the error itself seems strange, because it says that a larger sequence number (194) is considered smaller than a smaller sequence number (188).
Bug Report:
We are trying to capture events from Azure Event Hub and save them in Datalake after some processing using Databricks Workflow. The workflow works fine for
dev
andstg
environments but when trying to run it onprd
(with different Event Hub, Databricks and Datalake Account) with exactly same configurations and JARs, it gives the following error:In partition 7 of <event-hub-name>, with consumer group <consumer-group-name>, request seqNo 210 is less than the received seqNo 198. The earliest seqNo is 210, the last seqNo is 210, and received seqNo 198.
The total number of partitions in the event-hub is 8 and there are total 8 event-hubs (including this one) in the namespace, all with Message retention of 7 days.
The data gets processed and sent to the Datalake as it does for other environments.
Spark version is 2.12.
artifactId: azure-eventhubs-spark_2.12
version: 2.3.22
The text was updated successfully, but these errors were encountered: