Batch Consumer (Feature Request) #13

etspaceman · 2017-07-15T15:45:08Z

Hi There -

Something that I'm struggling with when using this library is that I can only set up single-event processors for the Consumer via ConsumerWorker. This limits me from using this library for something like a Redshift endpoint, which encourages users to batch their commands through a series of staged tables. I cannot simply run INSERT/UPDATE/DELETE commands on Redshift when reacting to Kinesis events because the speed is incredibly slow.

If we were able to use an actor that received the sequence of events being processed here:

https://github.com/WW-Digital/reactive-kinesis/blob/master/src/main/scala/com/weightwatchers/reactive/kinesis/consumer/ConsumerWorker.scala#L327

We could batch the commands against all records received and efficiently update batch-biased endpoints like Redshift.

markglh · 2017-08-16T15:17:48Z

@etspaceman thanks for this. Need to think about how but it's certainly doable.

How do you envisage the checkpointing working? Couple of options:

Checkpoint the batch as a whole (you're basically saying I have processed all messages in this batch, move onto the next one). This would be a bigger code change because it would mean reworking the checkpointing logic to allow this different path. Currently it checkpoints per message internally - and subsequently retries per message. So retries here would differ, retry the whole batch?
Checkpoint per message as it does now, so not checkpointing change.
Some other elaborate option?

markglh · 2017-08-16T15:18:13Z

Apologies for taking a month to respond, I lost access to the project and so didn't get notified.

etspaceman · 2017-08-16T16:03:33Z

Hey @markglh - I think we would need to be able to configure checkpointing by either respectively. Off the top of my head, the biggest benefit to being able to check-point / retry by batch is to allow for transactional rollbacks of persisted data stores, i.e. Redshift, and a retry of that batch at a later point. If we were to checkpoint by message, I'm not sure that we could accomplish this.

Users should be able to use the existing retry strategy on batch loads as well, considering they may not care if the whole batch succeeded or if only certain messages did.

markglh added the enhancement label Aug 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch Consumer (Feature Request) #13

Batch Consumer (Feature Request) #13

etspaceman commented Jul 15, 2017

markglh commented Aug 16, 2017

markglh commented Aug 16, 2017

etspaceman commented Aug 16, 2017 •

edited

Loading

Batch Consumer (Feature Request) #13

Batch Consumer (Feature Request) #13

Comments

etspaceman commented Jul 15, 2017

markglh commented Aug 16, 2017

markglh commented Aug 16, 2017

etspaceman commented Aug 16, 2017 • edited Loading

etspaceman commented Aug 16, 2017 •

edited

Loading