CVR Strategy #8

tantaman · 2023-11-09T21:59:40Z

tantaman
Nov 9, 2023
Maintainer

CVR Definition

CVR: Client View Record. Background: https://doc.replicache.dev/strategies/row-version

What we're trying to capture with a CVR is a record of all data that the server has ever sent to the client.

The point of this is so that the server can exclude data it has previously sent the client from future sends. In Replicache, clients get new data by pulling from the server.

What the CVR is doing, on each pull, is the following:

-- initial pull
SELECT id, version FROM foo
-- pull 2
SELECT id, version FROM foo EXCEPT _what was sent in pull1_
-- pull 3
SELECT id, version FROM foo EXCEPT _what was sent in pull1 + pull2_
-- pull 4
SELECT id, version FROM foo EXCEPT _what was sent in pull1 + pull2 + pull3_
-- etc...

Where the contents of the CVR is what is being excepted / excluded from the result set.

For the except step to work correctly:

Every new CVR must include everything that was in the prior CVR. CVR_n = data_from_this_pull + CVR_n-1
Or each CVR just represents the current pull but we check all historical CVRs on each pull

Note that we also want to EXCEPT in the opposite direction. To detect rows that were deleted we need to diff the contents of the CVR with what currently exists in the database.

Note that in both cases we can make an optimization by only keeping one entry for a row in the CVR. In other words, the CVR doesn't need to record every version of every row that was sent but only the latest version of the row sent down. A CVR missing a row is the same as the CVR showing the row at an earlier version that what the database has.

Why not use cursors?

Why not sync like:

SELECT * FROM foo WHERE version > ?last_pulled_version

where each row has a version that comes from a universal and monotonically increasing source.

The issue is that data can come into view without ever being modified.

Consider this query:

SELECT * FROM todo JOIN filter ON filter.owner = ?userid AND filter.status = todo.status WHERE version > ?last_pulled_version;

We're syncing todos but only those that match a given filter. This query could return the TODO with the largest version and then the user changes their filter to some other status. Since we've used a cursor we'll not see the new TODOs that have come into view and never sync them down.

CVRs and client driven sync

Ideally the client drives sync by specifying the queries it is interested in. These queries will change over time as a client navigates through an application.

Since the server needs to know about all the data the client has, we should union all CVRs produced by all pulls sent to a given client group.

Note that CVRs must be versioned. We can't be certain that the client actually received the data that was part of a CVR. The only way we know that the client in fact did receive this data is if the client, on the next pull, sends a CVR version equal to the last CVR we sent the client. If they send an old CVR we wind back to that CVR on the server during that pull.

Implementing a CVR in Postgres

todo: document your findings related to join performance, space saving by overwriting versions, delete culling, etc.

Performance (time)

todo: document query tricks to make this fast as well as how client driven queries make perf a minimal to no problem

Performance (space)

todo: envelope math on increased data weight

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CVR Strategy #8

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

CVR Strategy #8

tantaman Nov 9, 2023 Maintainer

CVR Definition

Why not use cursors?

CVRs and client driven sync

Implementing a CVR in Postgres

Performance (time)

Performance (space)

Replies: 0 comments

tantaman
Nov 9, 2023
Maintainer