-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CF-557] Changelog updates should wait for transaction commit #137
base: master
Are you sure you want to change the base?
Conversation
3c24db3
to
b8c6173
Compare
…e writing out the changes, and include the ledger sequence in the changelog
…ices, propagate ledger sequence for each statement by storing it into the SQLiteWatchChange, also add a boolean for whether the changelog entry is from a transaction or not, and add another test case
func NewChangelogEntry(seq int64, family string, table string, key []interface{}) *ChangelogEntry { | ||
return &ChangelogEntry{Seq: seq, Family: family, Table: table, Key: key} | ||
} | ||
|
||
func (w *ChangelogWriter) WriteChange(e ChangelogEntry) error { | ||
structure := struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really get why we inline this struct definition instead of just putitng the json tags on the ChangeLogEntry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question -- I don't get it either.
It reminds me that I'm unsure about why we have 2 separate structs for the changelog entries. Felt like I was having to sprinkle the same fields all over the place to get it through the pipes.
- writing (this code):
ctlstore/pkg/changelog/changelog_writer.go
Lines 30 to 36 in 3652065
func (w *ChangelogWriter) WriteChange(e ChangelogEntry) error { structure := struct { Seq int64 `json:"seq"` Family string `json:"family"` Table string `json:"table"` Key []interface{} `json:"key"` }{ - reading:
Lines 3 to 11 in 3652065
// entry represents a single row in the changelog // e.g. // {"seq":1,"family":"fam","table":"foo","key":[{"name":"id","type":"int","value":1}]} type entry struct { Seq int64 `json:"seq"` Family string `json:"family"` Table string `json:"table"` Key []Key `json:"key"` }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if that's a consequence of calling the root of this pkg
vs internal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like there should be just one definition of this data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the json annotations too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, answers by Copilot:
what would you suspect the reason why in this file that the structure in WriteChange function duplicates the ChangelogEntry? why would the developer wrote the code not reuse the ChangelogEntry
The developer might have chosen to define a separate structure within the
WriteChange
function for a few reasons:
Field Tagging: The inline structure allows the developer to add struct tags (
json:"seq"
,json:"family"
, etc.) which are used for JSON marshalling. These tags define how the struct fields are encoded to JSON, which is not specified in theChangelogEntry
struct.Encapsulation: The
ChangelogEntry
struct might be used in various parts of the codebase, and the developer might not want to tie its implementation too closely to the JSON encoding used in this specific function. By using a separate structure for the JSON encoding, changes to theChangelogEntry
struct will not directly impact theWriteChange
function.Flexibility: If in the future, the
WriteChange
function needs to include additional data or transform the data before writing, having a separate structure within the function makes it easier to do so without affecting theChangelogEntry
struct.In general, this approach provides a level of decoupling between the
ChangelogEntry
data structure and the specific requirements of theWriteChange
function.
@@ -114,7 +114,7 @@ type supervisorCliConfig struct { | |||
type ledgerHealthConfig struct { | |||
Disable bool `conf:"disable" help:"disable ledger latency health attributing (DEPRECATED: use disable-ecs-behavior instead)"` | |||
DisableECSBehavior bool `conf:"disable-ecs-behavior" help:"disable ledger latency health attributing"` | |||
MaxHealthyLatency time.Duration `conf:"max-healty-latency" help:"Max latency considered healthy"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one of the annoying aspects of this being OSS is this is a change that we can't verify doesn't break anything. Internally, I didn't see any hits for healty
related to ctlstore though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I searched internally too -- it's a good call out that this could break something externally, but my belief is that no one is actually using this system outside of Segment, so I'm not too concerned.
That being said, I could be convinced to add an "unsupported" entry that has the typo'ed configuration name and then panics. i.e., with this change if someone had the typo'ed configuration knob it would just be silently ignored AFAIU.
Or even have logic to have that typo'ed configuration entry configure the "right" one and spit out a loud deprecation warning message. We don't have real releases so it's a bit unclear what the best strategy is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine just doing a breaking change. It may not be great OSS stewardship, but I have a hard time believing there's even a single other user of ctlstore outside of segment
ChangeBuffer *sqlite.SQLChangeBuffer | ||
// Accumulated changes across multiple ApplyDMLStatement calls | ||
transactionChanges []sqlite.SQLiteWatchChange |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we consider a cap on the size of a transaction so we don't buffer too much?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting proposal, issues around that are where my head was at as I was adding the metrics.
This consideration relates to the "support SoR transactions" project. We don't yet know how large SoR transactions can get, especially for those that we care about for that project. Putting some cap here with a loud complaint that we exceeded it (log + metric) and then dumping the currently buffered changes to the changelog (i.e., invoking callbacks) seems a fine behavior for now.
We know that the changes would not be greater than 200 right now based on hardcoded limits elsewhere & the behavior of the REPLACE INTO
, so perhaps just setting it to like 500 for now would be good. i.e.,
REPLACE INTO
is translated by SQLite into aDELETE
op then anINSERT
op ➡️ 2 changes.- max of 100 entries in a ledger transaction:
ctlstore/pkg/executive/db_executive.go
Lines 455 to 458 in 3652065
// Reject requests that are too large if len(requests) > limits.LimitMaxMutateRequestCount { return &errs.PayloadTooLargeError{Err: "Number of requests exceeds maximum"} }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that sounds good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not relevant to today's transactions since they are size limited but rather to the future when we are supporting SoR transactions. A sane number of statements within a transaction is smart, but by doing this we need to ensure this limit is well understood externally. Once we say we support transactions we need to be sure we really do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 Nice work Erik!
…ssibly premature optimization of updating passed-in slice; also remove metric for ldb_changes_accumulated in favor of just using the ldb_changes_written one
Description
CF-557
Ensure that we don't propagate changelog updates until after a transaction is committed.
Prior to this fix, every DML statement processed (even within a transaction) would immediately lead to a changelog event being written out. That's problematic when client applications see the changelog event occur, since the clients are racing the transaction commit when that event prompts them to read the LDB for the associated row.
We avoid this by detecting that a transaction has begun. When in a transaction we buffer the changes and wait for the transaction to end before propagating them to the changelog.
Details
ApplyDMLStatement
inldb_callback_writer.go
).ApplyDMLStatement
inldb_writer_with_changelog.go
about how that code isn't actually used.ledgerSeq
: ledger sequence corresponding to the DML statement that led to this changelog entrytx
: boolean for whether the changelog entry is from a transactionMinor fixes
NewChangelogEntry
functionmax-healty-latency
->max-healthy-latency
TODO
REPLACE INTO
results in 2 SQLite changes like it does for so many of the real ledger statements we have.Testing
Testing completed successfully if all of these are checked:
tail -F /var/spool/ctlstore/changelog
and a simple program that uses the Event Iterator.