You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 23, 2023. It is now read-only.
Something that I'm struggling with when using this library is that I can only set up single-event processors for the Consumer via ConsumerWorker. This limits me from using this library for something like a Redshift endpoint, which encourages users to batch their commands through a series of staged tables. I cannot simply run INSERT/UPDATE/DELETE commands on Redshift when reacting to Kinesis events because the speed is incredibly slow.
If we were able to use an actor that received the sequence of events being processed here:
@etspaceman thanks for this. Need to think about how but it's certainly doable.
How do you envisage the checkpointing working? Couple of options:
Checkpoint the batch as a whole (you're basically saying I have processed all messages in this batch, move onto the next one). This would be a bigger code change because it would mean reworking the checkpointing logic to allow this different path. Currently it checkpoints per message internally - and subsequently retries per message. So retries here would differ, retry the whole batch?
Checkpoint per message as it does now, so not checkpointing change.
Hey @markglh - I think we would need to be able to configure checkpointing by either respectively. Off the top of my head, the biggest benefit to being able to check-point / retry by batch is to allow for transactional rollbacks of persisted data stores, i.e. Redshift, and a retry of that batch at a later point. If we were to checkpoint by message, I'm not sure that we could accomplish this.
Users should be able to use the existing retry strategy on batch loads as well, considering they may not care if the whole batch succeeded or if only certain messages did.
Hi There -
Something that I'm struggling with when using this library is that I can only set up single-event processors for the Consumer via ConsumerWorker. This limits me from using this library for something like a Redshift endpoint, which encourages users to batch their commands through a series of staged tables. I cannot simply run INSERT/UPDATE/DELETE commands on Redshift when reacting to Kinesis events because the speed is incredibly slow.
If we were able to use an actor that received the sequence of events being processed here:
https://github.com/WW-Digital/reactive-kinesis/blob/master/src/main/scala/com/weightwatchers/reactive/kinesis/consumer/ConsumerWorker.scala#L327
We could batch the commands against all records received and efficiently update batch-biased endpoints like Redshift.
The text was updated successfully, but these errors were encountered: