Skip to content
This repository has been archived by the owner on Oct 23, 2023. It is now read-only.

Batch Consumer (Feature Request) #13

Open
etspaceman opened this issue Jul 15, 2017 · 3 comments
Open

Batch Consumer (Feature Request) #13

etspaceman opened this issue Jul 15, 2017 · 3 comments

Comments

@etspaceman
Copy link
Contributor

Hi There -

Something that I'm struggling with when using this library is that I can only set up single-event processors for the Consumer via ConsumerWorker. This limits me from using this library for something like a Redshift endpoint, which encourages users to batch their commands through a series of staged tables. I cannot simply run INSERT/UPDATE/DELETE commands on Redshift when reacting to Kinesis events because the speed is incredibly slow.

If we were able to use an actor that received the sequence of events being processed here:

https://github.com/WW-Digital/reactive-kinesis/blob/master/src/main/scala/com/weightwatchers/reactive/kinesis/consumer/ConsumerWorker.scala#L327

We could batch the commands against all records received and efficiently update batch-biased endpoints like Redshift.

@markglh
Copy link
Contributor

markglh commented Aug 16, 2017

@etspaceman thanks for this. Need to think about how but it's certainly doable.

How do you envisage the checkpointing working? Couple of options:

  • Checkpoint the batch as a whole (you're basically saying I have processed all messages in this batch, move onto the next one). This would be a bigger code change because it would mean reworking the checkpointing logic to allow this different path. Currently it checkpoints per message internally - and subsequently retries per message. So retries here would differ, retry the whole batch?
  • Checkpoint per message as it does now, so not checkpointing change.
  • Some other elaborate option?

@markglh
Copy link
Contributor

markglh commented Aug 16, 2017

Apologies for taking a month to respond, I lost access to the project and so didn't get notified.

@etspaceman
Copy link
Contributor Author

etspaceman commented Aug 16, 2017

Hey @markglh - I think we would need to be able to configure checkpointing by either respectively. Off the top of my head, the biggest benefit to being able to check-point / retry by batch is to allow for transactional rollbacks of persisted data stores, i.e. Redshift, and a retry of that batch at a later point. If we were to checkpoint by message, I'm not sure that we could accomplish this.

Users should be able to use the existing retry strategy on batch loads as well, considering they may not care if the whole batch succeeded or if only certain messages did.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

2 participants