fix: limit paginated queries by buffer size #69

pvlugter · 2024-08-13T04:06:38Z

Refs #67

Query switching to and from backtracking relies on limited queries.

Will check this in tests against actual DynamoDB.

core/src/main/resources/reference.conf

core/src/main/scala/akka/persistence/dynamodb/internal/QueryDao.scala

patriknw

good catch

core/src/main/scala/akka/persistence/dynamodb/DynamoDBSettings.scala

patriknw · 2024-08-13T06:52:31Z

core/src/main/scala/akka/persistence/dynamodb/internal/QueryDao.scala

-            } else {
-              createSerializedJournalItem(item, includePayload = true)
-            }
+        .mapConcat(_.items.iterator.asScala)


Wonder if we can go too far ahead of the backtracking window anyway. If the last event of the 1000 events (bufferSize) has a timestamp that is greater than previous backtracking timestamp + backtracking-window? In the query we have a toTimestamp, but for the normal query that is always based on current time. Maybe that toTimestamp should also be limited by where the previous backtracking query was.

I'm not sure this is needed since I think we always prioritize backtracking to catch up first.

However, we must also be aware of that there can be a period of time when there are no events at all, and that duration can be larger than the backtracking window.

Yes, we could possibly get too far ahead with 1000 events, especially if the processing is slow enough.

Also thought about limiting the toTimestamp with the latest backtracking and window. But periods of time without events would need to be handled too, by moving forward in some way for empty results.

An alternative to limiting the end timestamp, would be doing it at the returned event level. So like pubsub skip if too far ahead, ignore events from regular query if they're too far ahead of the backtracking window, and process again once backtracking has caught up.

That would mean that we would rely on that backtracking would catch the dropped events, and if there are many such dropped events that could make the performance situation even worse since backtracking events don't contain the payload, but will load the payload on demand when the event is accepted (not a duplicate).

Wouldn't be covered by backtracking, but by next regular query, as backtracking goes up to the latest query timestamp. Would just stop processing on the regular query if it's ahead by the backtracking window, then backtracking to catch up, then continue from where it was.

We probably do need this extra safeguard. I can look at it separate from the fix in this PR.

Trying to come up with a realistic scenario for this.

So to have a backtracking out-of-sequence error, we need to have had a delayed write, missed by the regular query, and for this to be further behind the latest query timestamp than the backtracking window (2 minutes) on restart.

Noting that it's only guaranteed to catch delayed events up to the backtracking behind-current-time, which is 10 seconds, when queries are keeping up-to-date. If events are delayed by more than 10 seconds, we could miss them regardless of restarts.

One way for the timestamp range in results to be more than the backtracking window is for the latest query timestamp to already be further behind, and events spread out enough. But if it's that far behind the write side (more than 2 minutes), we expect all delayed writes to already be there and will be seen by the regular query.

So to create this scenario, we need to have had queries be fairly up-to-date originally (within 10 seconds), miss a delayed write on the regular queries, and then keep processing slowly enough that it gets over 2 minutes ahead of that delayed write (and then restart before backtracking catches it). The toTimestamp is based on the current time before querying. Switch to backtracking is when more items than 3 x buffer size or more than half the backtracking window ahead. So it would need to have slow enough processing to stretch out more than 2 minutes, but without triggering backtracking first.

May be possible for DynamoDB — with the query streams being per slice, and then merged for the slice range, the downstream projection processing could be much slower relative to the buffer size and handling for each stream. We've also possibly seen something similar with a customer with r2dbc when under-provisioned.

Also thinking that there could be an optimisation for when queries are falling behind. Because we're only catching delayed events up to 10 seconds behind up-to-date backtracking, the expectation is that we won't have delayed writes further than this. If queries are even further behind the current time, then backtracking is not needed and could be disabled, reducing load to allow queries to better catch up, and which will see all events anyway.

Maybe the async update of the secondary index could be more delayed with DynamoDB? Some pids more delayed than others within the same slice. Maybe we should have more tolerance to with DynamoDB by increasing the default configuration for the 10 seconds and 2 minutes?

Haven't seen any issues with secondary index delays. But should probably add some specific instrumentation for that, checking if we ever catch events with backtracking in large tests.

Tests so far with the fix in this PR all look good, with falling behind and projection restarts not causing issues. I'll move these ideas to a separate issue — to explore safeguarding against any edge cases we can think up and possible optimisations.

Moved to #70.

core/src/main/scala/akka/persistence/dynamodb/internal/QueryDao.scala

patriknw · 2024-08-13T07:05:57Z

core/src/main/scala/akka/persistence/dynamodb/internal/SnapshotDao.scala

@@ -310,19 +310,20 @@ import software.amazon.awssdk.services.dynamodb.model.UpdateItemRequest
      .keyConditionExpression(s"$EntityTypeSlice = :entityTypeSlice AND $EventTimestamp BETWEEN :from AND :to")
      .expressionAttributeValues((attributeValues ++ filterAttributeValues).asJava)
      // Limit won't limit the number of results you get with the paginator.


at least I noticed that the limit is about pagination 😄

pvlugter · 2024-08-14T01:06:50Z

Testing of this looks good.

pvlugter · 2024-08-14T01:11:35Z

Buffer size back to 100 as before. Since these queries are per slice and then merged, the buffer size probably shouldn't be too big, as this is multiplied for each projection instance.

patriknw

LGTM

fix: limit paginated queries by buffer size

2bf3fa3

pvlugter commented Aug 13, 2024

View reviewed changes

core/src/main/resources/reference.conf Outdated Show resolved Hide resolved

pvlugter commented Aug 13, 2024

View reviewed changes

core/src/main/scala/akka/persistence/dynamodb/internal/QueryDao.scala Show resolved Hide resolved

patriknw reviewed Aug 13, 2024

View reviewed changes

update: remove separate page size setting

601d62f

pvlugter marked this pull request as ready for review August 14, 2024 01:06

patriknw approved these changes Aug 14, 2024

View reviewed changes

pvlugter merged commit 80ecc99 into main Aug 14, 2024
3 checks passed

pvlugter deleted the paginated-query-limit branch August 14, 2024 23:46

pvlugter mentioned this pull request Aug 14, 2024

Query backtracking improvements #70

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: limit paginated queries by buffer size #69

fix: limit paginated queries by buffer size #69

pvlugter commented Aug 13, 2024

patriknw left a comment

patriknw Aug 13, 2024

patriknw Aug 13, 2024

pvlugter Aug 13, 2024 •

edited

Loading

patriknw Aug 13, 2024

pvlugter Aug 13, 2024

pvlugter Aug 14, 2024

pvlugter Aug 14, 2024 •

edited

Loading

patriknw Aug 14, 2024

pvlugter Aug 14, 2024

pvlugter Aug 14, 2024

patriknw Aug 13, 2024

pvlugter commented Aug 14, 2024

pvlugter commented Aug 14, 2024

patriknw left a comment

fix: limit paginated queries by buffer size #69

fix: limit paginated queries by buffer size #69

Conversation

pvlugter commented Aug 13, 2024

patriknw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pvlugter Aug 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pvlugter Aug 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pvlugter commented Aug 14, 2024

pvlugter commented Aug 14, 2024

patriknw left a comment

Choose a reason for hiding this comment

pvlugter Aug 13, 2024 •

edited

Loading

pvlugter Aug 14, 2024 •

edited

Loading