Handle Shutdown Whilst Processing Batches #8

markglh · 2017-05-30T11:02:37Z

Currently the Kinesis Shutdown works as follows:

Shutdown is called on the KinesisConsumer (either explicitly or via the jvm shutdown hook)
This then calls requestShutdown on the KCL Worker, blocking until completion.
The KCL Worker propagates this down to the ConsumerProcessingManager (Which is the IRecordProcessor) - calling shutdownRequested on each instance (one per shard).
When shutdownRequested is called, this sends a GracefulShutdown message to the ConsumerWorker Actor, blocking until a response is received (Ask + Await).
On receipt of this message, the ConsumerWorker switches context to ignore all future messages. If a batch is currently being processed, it responds to the sender of that batch (the manager), which will currently be blocking awaiting confirmation of the batch (this is by design, the KCL requires that we don't complete the processRecords function until we have finished the batch, otherwise the next batch is immediately sent)
The ConsumerWorker then forces a final checkpoint, responding to the manager once completed (or failed), which allows shutdown to continue and the KinesisConsumer to shutdown.

So this all sounds great, however if we're processing a batch (and therefore blocking processRecords), the KCL doesn't allocate a separate thread to call shutdownRequested. This means that even though in the ConsumerWorker we allow the batch processing to be aborted early, this never happens because until batch processing is complete the processRecords thread is blocked.

Possible solutions
What needs to happen is the KCL calls requestShutdown on a separate thread, we'll then unblock processRecords automatically and checkpoint accordingly. This will require a change to the KCL (assuming the issue is indeed with the KCL). We'd need to write a test which reproduces this (using Java). The raising an issue in the KCL github for them to fix it.

Alternatively, maybe the issue is with us? Potentially the GracefulShutdown message is stuck in the mailbox whilst we process the batch. If this is the case (a test could prove this where the message isn't acked before sending shutdown), then one option is to use the a priority mailbox to allow GracefulShutdown message a higher priority - skipping the queue.

The text was updated successfully, but these errors were encountered:

markglh · 2017-09-06T16:28:32Z

it does now allocate a separate thread.... testy testy!!!
https://github.com/awslabs/amazon-kinesis-client/pull/191/files

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle Shutdown Whilst Processing Batches #8

Handle Shutdown Whilst Processing Batches #8

markglh commented May 30, 2017

markglh commented Sep 6, 2017

Handle Shutdown Whilst Processing Batches #8

Handle Shutdown Whilst Processing Batches #8

Comments

markglh commented May 30, 2017

markglh commented Sep 6, 2017