Slowdown due to blocking track records in RecordProcessorImpl.processRecords #12

trucnguyenlam · 2020-01-28T13:57:07Z

I have observed a problem using this library while processing Kinesis stream.
I currently have a stream of 20shards, and I want to read it from trim horizon, therefore I create 20 workers (same application name, different ids) to process the stream. In theory, 20 shards would allow me to read up to 2MB x 20 x 60 = 2400MB per minute, however, I only observer a maximum of ~ 1000MB. This was very weird hence I did an experiment by changing the code (commented out the trackRecords). Then the snapshot build could obtain the maximum read throughput (however, there is no checkpoint on DynamoDB table due to no tracking).

    abortStreamOnError("processRecords") {
      val records = transformRecords(processRecordsInput.records())

      //trackRecords(records)
      //checkpointIfNeeded(processRecordsInput.checkpointer())

      records.grouped(EnqueueBatchSize).foreach { r =>
        enqueueRecords(r)
        checkpointIfNeeded(processRecordsInput.checkpointer())
      }
    }

Could you please let me know is there anyway to increase the throughput without losing correctness?

Many thanks,
Truc

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slowdown due to blocking track records in RecordProcessorImpl.processRecords #12

Slowdown due to blocking track records in RecordProcessorImpl.processRecords #12

trucnguyenlam commented Jan 28, 2020 •

edited

Loading

Slowdown due to blocking track records in RecordProcessorImpl.processRecords #12

Slowdown due to blocking track records in RecordProcessorImpl.processRecords #12

Comments

trucnguyenlam commented Jan 28, 2020 • edited Loading

trucnguyenlam commented Jan 28, 2020 •

edited

Loading