-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(core): Primary key index scans and single-column secondary index scans #350
Conversation
023f7f6
to
8c141a9
Compare
Copypasting from discord: sqlite compat is failing due to limbo returning '1' here, while sqlite returns '182': sqlite> SELECT id from users limit 1; For some reason, sqlite3 chooses to use age_idx here. (much later edit: I think it's an optimization because the index is smaller due to only containing 2 columns so less IO)
This is sort of a tricky compatibility test to fix because what the fuck is the heuristic that makes it use the age index to get id from a single row EDIT: I made a workaround for this by changing the test to be:
because it is more resilient to sqlite planner idiosyncrasies EDIT2: I did the same for another compat test where it also decided to use |
1d3a683
to
d3e797f
Compare
831452e
to
01cfcb4
Compare
472dea4
to
15a66ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing from phone so small nits
core/storage/btree.rs
Outdated
let comparison = match cmp { | ||
SeekOp::GT => rowid_key <= *_rowid, | ||
SeekOp::GE => rowid_key < *_rowid, | ||
SeekOp::EQ => rowid_key < *_rowid, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be == ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one is correct because if we're trying to find a record in a leaf node where key == rowid
, then we need to go to left_child_page
when ´key < interior_cell_rowid´, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait nvm, if interior rowid is 10
then also rowid 10
is in the left child page (i.e. <=
not <
). I remembered that wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To answer your question: the updated comparisons are now:
SeekOp::GT => rowid_key < *_rowid,
SeekOp::GE => rowid_key <= *_rowid,
SeekOp::EQ => rowid_key <= *_rowid,
Explanation:
Assume the search key is: rowid 100, and we want to find exactly that leaf node (EQ)
if internal node rowid is over 100, we need to go left. if its exactly 100, we need to go left. otherwise we need to go right. hence <=
Assume the search key is: rowid 100, and we want to find the first leaf node that is greater or equal to our search key (GE)
if internal node rowid is over 100, we need to go left, because greater or equal keys are on the left. if its exactly 100, we need to go left, because a leaf with rowid 100 is on the left. otherwise we need to go right. hence <=
Assume the search key is: rowid 100, and we want to find the first leaf node that is GREATER than our search key (GT)
if internal node rowid is over 100, we need to go left, because greater keys exist on the left. if its exactly 100 or less, we need to go right, because there are no leaves with rowid>100 on the left. hence <
core/storage/btree.rs
Outdated
let comparison = match cmp { | ||
SeekOp::GT => index_key <= &record, | ||
SeekOp::GE => index_key < &record, | ||
SeekOp::EQ => index_key < &record, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explained above
core/storage/btree.rs
Outdated
mem_page.advance(); | ||
let record = crate::storage::sqlite3_ondisk::read_record(payload)?; | ||
let rowid = match record.values.last() { | ||
Some(OwnedValue::Integer(rowid)) => *rowid as u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't index keys also blobs/strings/whatever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are getting the rowid here and AFAIK it's always the last value in the payload record
Another weird buggerino I introduced in this PR:
doesnt return a row (3815 does, etc). so btree traversal is probably wrong in multiple ways as @pereman2 alluded to this works correctly:
so something is up with the EDIT: think i got a fix for these, but found more bugs, stay tuned 😂 EDIT2: the problem is that there are so few actual page changes with this amount of data that a lot of the existing tests just didnt hit any of those boundaries |
done for today... i added some regression tests, with one failing: theres a single users row with rowid
getting a bit lost, probably getting back to this next weekend or something EDIT: i think the issue is that since index interior cells also contain payloads, when scanning thru the index we need to first 1. iterate all the child keys and return them, and THEN: 2. also return the key from the parent |
9629dc5
to
15a66ea
Compare
45cd116
to
e5cf052
Compare
interior index cells have values that are not in the leaves, e.g. (interior: 3) / \ (leaf: 2) (leaf: 4) so their values need to be emitted after the left subtree is emitted.
6283963
to
fc71f2b
Compare
yeah so the issue was that the index was not traversed properly. index interior cells have payloads, and those payloads do not exist in the leaves (unlike in table btrees where an interior cell and a leaf cell may share the same rowid). this commit addresses that: penberg@fc71f2b basically we need to traverse the left subtree first (returning all the values from the subtree in order), THEN return the value from the interior cell, and then advance to the next interior cell. |
core/storage/btree.rs
Outdated
unreachable!("index seek key should be a record"); | ||
}; | ||
let record = crate::storage::sqlite3_ondisk::read_record(payload)?; | ||
let comparison = match cmp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we rename comparison to found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to found
in seek()
and to target_leaf_page_is_in_left_subtree
in move_to()
core/storage/btree.rs
Outdated
if comparison { | ||
let rowid = match record.values.get(1) { | ||
Some(OwnedValue::Integer(rowid)) => *rowid as u64, | ||
_ => unreachable!("index cells should have an integer rowid"), | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this part. We assume record.values.get(1)
is the rowid. But I imagine there could be indexes with multiple columns right? Shouldn't the rowid be the last value? Furthermore, should we maybe change rowid name to key or something similar because it can be eitther rowid and primary key I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a bug i forgot to refactor. it should be last
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in penberg@93a8110
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left the rowid naming, copypasted from discord:
I think i will leave the rowid naming because it reflects current reality better, doesnt our code pretty much assume we dont have WITHOUT ROWID tables right now? And I think WITHOUT ROWID tables are pretty much btree indexes in that they store everything in the key area? https://www.sqlite.org/withoutrowid.html#:~:text=A%20WITHOUT%20ROWID%20table%20uses,binary%20search%20on%20the%20rowid.
…sult set is already ordered' from Jussi Saurio Built on top of, and currently targets, #350, _not_ main Closes #365 Examples: ``` limbo> explain select u.first_name, p.name from users u join products p on u.id = p.id order by u.id; addr opcode p1 p2 p3 p4 p5 comment ---- ----------------- ---- ---- ---- ------------- -- ------- 0 Init 0 15 0 0 Start at 15 1 OpenReadAsync 0 2 0 0 table=u, root=2 2 OpenReadAwait 0 0 0 0 3 OpenReadAsync 1 3 0 0 table=p, root=3 4 OpenReadAwait 0 0 0 0 5 RewindAsync 0 0 0 0 6 RewindAwait 0 14 0 0 Rewind table u 7 RowId 0 1 0 0 r[1]=u.rowid 8 SeekRowid 1 1 12 0 if (r[1]!=p.rowid) goto 12 9 Column 0 1 2 0 r[2]=u.first_name 10 Column 1 1 3 0 r[3]=p.name 11 ResultRow 2 2 0 0 output=r[2..3] 12 NextAsync 0 0 0 0 13 NextAwait 0 7 0 0 14 Halt 0 0 0 0 15 Transaction 0 0 0 0 16 Goto 0 1 0 0 ``` ``` limbo> explain select * from users where age > 80 order by age limit 5; addr opcode p1 p2 p3 p4 p5 comment ---- ----------------- ---- ---- ---- ------------- -- ------- 0 Init 0 23 0 0 Start at 23 1 OpenReadAsync 0 2 0 0 table=users, root=2 2 OpenReadAwait 0 0 0 0 3 OpenReadAsync 1 274 0 0 table=age_idx, root=274 4 OpenReadAwait 0 0 0 0 5 Integer 80 1 0 0 r[1]=80 6 SeekGT 1 22 1 0 7 DeferredSeek 1 0 0 0 8 RowId 0 2 0 0 r[2]=users.rowid 9 Column 0 1 3 0 r[3]=users.first_name 10 Column 0 2 4 0 r[4]=users.last_name 11 Column 0 3 5 0 r[5]=users.email 12 Column 0 4 6 0 r[6]=users.phone_number 13 Column 0 5 7 0 r[7]=users.address 14 Column 0 6 8 0 r[8]=users.city 15 Column 0 7 9 0 r[9]=users.state 16 Column 0 8 10 0 r[10]=users.zipcode 17 Column 0 9 11 0 r[11]=users.age 18 ResultRow 2 10 0 0 output=r[2..11] 19 DecrJumpZero 12 22 0 0 if (--r[12]==0) goto 22 20 NextAsync 1 0 0 0 21 NextAwait 1 7 0 0 22 Halt 0 0 0 0 23 Transaction 0 0 0 0 24 Integer 5 12 0 0 r[12]=5 25 Goto 0 1 0 0 ``` Closes #366
data does not match predicate when using index, e.g: `select id, age from users where age > 90 limit 1;` will return data with age 90 the reason is that the current index seek directly uses record for comparison, but the record of the index itself is longer than the record of the key (because it contains the primary key), so Gt is invalid. since only single-column indexes are currently supported: #350, only the first value of the record is currently used for comparison. Reviewed-by: Jussi Saurio <[email protected]> Closes #593
This PR adds an index on
users.age
totesting.db
, and support for indexed lookups. Only single-column ascending indexes are currently supported.This PR also gets rid of
Operator::Seekrowid
in favor ofOperator::Search
which handles all non-full-table-scan searches: 1. integer primary key (rowid) point queries 2. integer primary key index scans, and 3. secondary index scans.examples:
Sqlite version:
´Seek` instructions are also now supported for primary key rowid searches:
sqlite:
More complex example with a join that uses both a rowid lookup and a secondary index scan:
sqlite: