-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: abort span access is expensive (2% cpu on oltp_read_write) #122719
Labels
A-kv-transactions
Relating to MVCC and the transactional model.
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
o-perf-efficiency
Related to performance efficiency
T-kv
KV Team
Comments
nvanbenschoten
added
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
A-kv-transactions
Relating to MVCC and the transactional model.
T-kv
KV Team
labels
Apr 19, 2024
tbg
changed the title
kv: abort span access is expensive
kv: abort span access is expensive (2% cpu)
Dec 2, 2024
tbg
changed the title
kv: abort span access is expensive (2% cpu)
kv: abort span access is expensive (2% cpu on oltp_read_write)
Dec 2, 2024
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Dec 2, 2024
Informs cockroachdb#122719. This commit updates DeclareKeysForBatch to only declare the abort span key when the transaction has acquired locks and will need to check the abort span. We were previously declaring the abort span key for all batches, even if we did not intend to check the abort span. This is a short-term patch until we get around to reworking abort span access more thoroughly (see cockroachdb#122719). ``` name old time/op new time/op delta Sysbench/KV/1node_local/oltp_read_only-10 334µs ± 4% 325µs ± 4% -2.63% (p=0.035 n=10+9) Sysbench/KV/1node_local/oltp_read_write-10 863µs ± 9% 860µs ±15% ~ (p=0.661 n=10+9) Sysbench/KV/1node_local/oltp_point_select-10 15.6µs ± 4% 15.9µs ±12% ~ (p=0.529 n=10+10) Sysbench/SQL/1node_local/oltp_read_only-10 1.88ms ±26% 1.80ms ± 5% ~ (p=1.000 n=10+9) Sysbench/SQL/1node_local/oltp_read_write-10 4.22ms ± 5% 4.18ms ±11% ~ (p=0.400 n=9+10) Sysbench/SQL/1node_local/oltp_point_select-10 114µs ± 5% 120µs ±21% ~ (p=0.796 n=10+10) name old alloc/op new alloc/op delta Sysbench/KV/1node_local/oltp_read_write-10 487kB ± 0% 484kB ± 1% -0.55% (p=0.011 n=8+9) Sysbench/KV/1node_local/oltp_read_only-10 260kB ± 0% 259kB ± 1% -0.50% (p=0.011 n=8+10) Sysbench/SQL/1node_local/oltp_point_select-10 27.8kB ± 0% 27.6kB ± 0% -0.47% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_read_only-10 878kB ± 0% 876kB ± 0% -0.25% (p=0.003 n=10+10) Sysbench/SQL/1node_local/oltp_read_write-10 1.25MB ± 1% 1.25MB ± 1% ~ (p=0.146 n=10+8) Sysbench/KV/1node_local/oltp_point_select-10 4.68kB ± 1% 4.68kB ± 2% ~ (p=0.474 n=9+9) name old allocs/op new allocs/op delta Sysbench/KV/1node_local/oltp_read_only-10 522 ± 0% 507 ± 0% -2.72% (p=0.000 n=10+10) Sysbench/KV/1node_local/oltp_read_write-10 1.51k ± 0% 1.50k ± 0% -0.92% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_point_select-10 238 ± 0% 237 ± 0% -0.42% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_read_only-10 4.76k ± 0% 4.74k ± 0% -0.39% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_read_write-10 7.55k ± 0% 7.53k ± 0% -0.24% (p=0.003 n=10+10) Sysbench/KV/1node_local/oltp_point_select-10 29.0 ± 0% 29.0 ± 0% ~ (all equal) ``` Release note: None
craig bot
pushed a commit
that referenced
this issue
Dec 3, 2024
136523: kv: only declare abort span key when txn has locks r=nvanbenschoten a=nvanbenschoten Informs #122719. This commit updates `DeclareKeysForBatch` to only declare the abort span key when the transaction has acquired locks and will need to check the abort span. We were previously declaring the abort span key for all batches, even if we did not intend to check the abort span. This is a short-term patch until we get around to reworking abort span access more thoroughly (see #122719). ``` name old time/op new time/op delta Sysbench/KV/1node_local/oltp_read_only-10 334µs ± 4% 325µs ± 4% -2.63% (p=0.035 n=10+9) Sysbench/KV/1node_local/oltp_read_write-10 863µs ± 9% 860µs ±15% ~ (p=0.661 n=10+9) Sysbench/KV/1node_local/oltp_point_select-10 15.6µs ± 4% 15.9µs ±12% ~ (p=0.529 n=10+10) Sysbench/SQL/1node_local/oltp_read_only-10 1.88ms ±26% 1.80ms ± 5% ~ (p=1.000 n=10+9) Sysbench/SQL/1node_local/oltp_read_write-10 4.22ms ± 5% 4.18ms ±11% ~ (p=0.400 n=9+10) Sysbench/SQL/1node_local/oltp_point_select-10 114µs ± 5% 120µs ±21% ~ (p=0.796 n=10+10) name old alloc/op new alloc/op delta Sysbench/KV/1node_local/oltp_read_write-10 487kB ± 0% 484kB ± 1% -0.55% (p=0.011 n=8+9) Sysbench/KV/1node_local/oltp_read_only-10 260kB ± 0% 259kB ± 1% -0.50% (p=0.011 n=8+10) Sysbench/SQL/1node_local/oltp_point_select-10 27.8kB ± 0% 27.6kB ± 0% -0.47% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_read_only-10 878kB ± 0% 876kB ± 0% -0.25% (p=0.003 n=10+10) Sysbench/SQL/1node_local/oltp_read_write-10 1.25MB ± 1% 1.25MB ± 1% ~ (p=0.146 n=10+8) Sysbench/KV/1node_local/oltp_point_select-10 4.68kB ± 1% 4.68kB ± 2% ~ (p=0.474 n=9+9) name old allocs/op new allocs/op delta Sysbench/KV/1node_local/oltp_read_only-10 522 ± 0% 507 ± 0% -2.72% (p=0.000 n=10+10) Sysbench/KV/1node_local/oltp_read_write-10 1.51k ± 0% 1.50k ± 0% -0.92% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_point_select-10 238 ± 0% 237 ± 0% -0.42% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_read_only-10 4.76k ± 0% 4.74k ± 0% -0.39% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_read_write-10 7.55k ± 0% 7.53k ± 0% -0.24% (p=0.003 n=10+10) Sysbench/KV/1node_local/oltp_point_select-10 29.0 ± 0% 29.0 ± 0% ~ (all equal) ``` Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Dec 3, 2024
136523: kv: only declare abort span key when txn has locks r=nvanbenschoten a=nvanbenschoten Informs #122719. This commit updates `DeclareKeysForBatch` to only declare the abort span key when the transaction has acquired locks and will need to check the abort span. We were previously declaring the abort span key for all batches, even if we did not intend to check the abort span. This is a short-term patch until we get around to reworking abort span access more thoroughly (see #122719). ``` name old time/op new time/op delta Sysbench/KV/1node_local/oltp_read_only-10 334µs ± 4% 325µs ± 4% -2.63% (p=0.035 n=10+9) Sysbench/KV/1node_local/oltp_read_write-10 863µs ± 9% 860µs ±15% ~ (p=0.661 n=10+9) Sysbench/KV/1node_local/oltp_point_select-10 15.6µs ± 4% 15.9µs ±12% ~ (p=0.529 n=10+10) Sysbench/SQL/1node_local/oltp_read_only-10 1.88ms ±26% 1.80ms ± 5% ~ (p=1.000 n=10+9) Sysbench/SQL/1node_local/oltp_read_write-10 4.22ms ± 5% 4.18ms ±11% ~ (p=0.400 n=9+10) Sysbench/SQL/1node_local/oltp_point_select-10 114µs ± 5% 120µs ±21% ~ (p=0.796 n=10+10) name old alloc/op new alloc/op delta Sysbench/KV/1node_local/oltp_read_write-10 487kB ± 0% 484kB ± 1% -0.55% (p=0.011 n=8+9) Sysbench/KV/1node_local/oltp_read_only-10 260kB ± 0% 259kB ± 1% -0.50% (p=0.011 n=8+10) Sysbench/SQL/1node_local/oltp_point_select-10 27.8kB ± 0% 27.6kB ± 0% -0.47% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_read_only-10 878kB ± 0% 876kB ± 0% -0.25% (p=0.003 n=10+10) Sysbench/SQL/1node_local/oltp_read_write-10 1.25MB ± 1% 1.25MB ± 1% ~ (p=0.146 n=10+8) Sysbench/KV/1node_local/oltp_point_select-10 4.68kB ± 1% 4.68kB ± 2% ~ (p=0.474 n=9+9) name old allocs/op new allocs/op delta Sysbench/KV/1node_local/oltp_read_only-10 522 ± 0% 507 ± 0% -2.72% (p=0.000 n=10+10) Sysbench/KV/1node_local/oltp_read_write-10 1.51k ± 0% 1.50k ± 0% -0.92% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_point_select-10 238 ± 0% 237 ± 0% -0.42% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_read_only-10 4.76k ± 0% 4.74k ± 0% -0.39% (p=0.000 n=10+10) Sysbench/SQL/1node_local/oltp_read_write-10 7.55k ± 0% 7.53k ± 0% -0.24% (p=0.003 n=10+10) Sysbench/KV/1node_local/oltp_point_select-10 29.0 ± 0% 29.0 ± 0% ~ (all equal) ``` Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
@nvanbenschoten and I saw this while looking at an (admittedly, overloaded) 3x8vcpu tpcc-nowait wh=1000 cluster: Notably, this was running with #136523 as well as #122862, both of which in principle reduce abort span access. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv-transactions
Relating to MVCC and the transactional model.
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
o-perf-efficiency
Related to performance efficiency
T-kv
KV Team
The abort span (
pkg/kv/kvserver/abortspan
) is a mechanism that sets markers for aborted transactions to provide protection against an aborted but active transaction not reading values it wrote due to those intents having been removed.The "span" is a slice of the range-id-local keyspace which is read on each
BatchRequest
that is part of a read-write transaction. The logic for this is here:cockroach/pkg/kv/kvserver/replica_evaluate.go
Lines 208 to 227 in 55991cb
This is an additional LSM read per BatchRequest, which can be seen prominently in CPU profiles under
checkIfTxnAborted
, accounting for 3.59% of CPU time on write-heavy workloads:profile_abort_span.pb.gz
Some basic experimentation with the sysbench workload (
sysbench/oltp_write_only/nodes=7/cpu=16/conc=128
) demonstrates about a 2% increase in throughput by disabling this abort span read (i.e. not callingcheckIfTxnAborted
). This testing reveals the cost of the mechanism. Optimizations (up and including disabling it) could provide up to this much benefit to throughput.Given how significant this cost is and how much of an edge case the scenarios that the abort span is protecting against are, we should reevaluate whether there's something better that we can do here. Are there simple optimizations that could make this mechanism perform better? Could we make it a little weaker to avoid most of the cost? These questions are worthwhile to explore.
At a minimum, we should expose an option to disable these abort span checks.
Jira issue: CRDB-38032
Epic CRDB-40199
The text was updated successfully, but these errors were encountered: