Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(core): Primary key index scans and single-column secondary index scans #350

Closed
wants to merge 32 commits into from

Conversation

jussisaurio
Copy link
Collaborator

@jussisaurio jussisaurio commented Sep 29, 2024

This PR adds an index on users.age to testing.db, and support for indexed lookups. Only single-column ascending indexes are currently supported.

This PR also gets rid of Operator::Seekrowid in favor of Operator::Search which handles all non-full-table-scan searches: 1. integer primary key (rowid) point queries 2. integer primary key index scans, and 3. secondary index scans.

examples:

limbo> select first_name, age from users where age > 90 limit 10;
Miranda|90
Sarah|90
Justin|90
Justin|90
John|90
Jeremy|90
Stephanie|90
Joshua|90
Jenny|90
Jennifer|90

limbo> explain query plan select first_name, age from users where age > 90 limit 10;
QUERY PLAN
`--TAKE 10
   `--PROJECT first_name, age
   |  `--SEARCH users USING INDEX age_idx

limbo> explain select first_name, age from users where age > 90 limit 10;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     15    0                    0   Start at 15
1     OpenReadAsync      0     2     0                    0   table=users, root=2
2     OpenReadAwait      0     0     0                    0   
3     OpenReadAsync      1     274   0                    0   table=age_idx, root=274
4     OpenReadAwait      0     0     0                    0   
5     Integer            90    1     0                    0   r[1]=90
6     SeekGT             1     14    1                    0   
7       DeferredSeek     1     0     0                    0   
8       Column           0     1     2                    0   r[2]=users.first_name
9       Column           0     9     3                    0   r[3]=users.age
10      ResultRow        2     2     0                    0   output=r[2..3]
11      DecrJumpZero     4     14    0                    0   if (--r[4]==0) goto 14
12    NextAsync          1     0     0                    0   
13    NextAwait          1     7     0                    0   
14    Halt               0     0     0                    0   
15    Transaction        0     0     0                    0   
16    Integer            10    4     0                    0   r[4]=10
17    Goto               0     1     0                    0

Sqlite version:

sqlite> explain select first_name, age from users where age > 90 limit 10;
addr  opcode         p1    p2    p3    p4             p5  comment      
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     13    0                    0   Start at 13
1     Integer        10    1     0                    0   r[1]=10; LIMIT counter
2     OpenRead       0     2     0     10             0   root=2 iDb=0; users
3     OpenRead       1     274   0     k(2,,)         0   root=274 iDb=0; age_idx
4     Integer        90    2     0                    0   r[2]=90
5     SeekGT         1     12    2     1              0   key=r[2]
6       DeferredSeek   1     0     0                    0   Move 0 to 1.rowid if needed
7       Column         0     1     3                    0   r[3]= cursor 0 column 1
8       Column         1     0     4                    0   r[4]= cursor 1 column 0
9       ResultRow      3     2     0                    0   output=r[3..4]
10      DecrJumpZero   1     12    0                    0   if (--r[1])==0 goto 12
11    Next           1     6     0                    0   
12    Halt           0     0     0                    0   
13    Transaction    0     0     3     0              1   usesStmtJournal=0
14    Goto           0     1     0                    0 

´Seek` instructions are also now supported for primary key rowid searches:

limbo> select id, first_name from users where id > 9995;
9996|Donald
9997|Ruth
9998|Dorothy
9999|Gina
10000|Nicole

limbo> explain query plan select id, first_name from users where id > 9995;
QUERY PLAN
`--PROJECT id, first_name
   `--SEARCH users USING INTEGER PRIMARY KEY (rowid=?)

limbo> explain select id, first_name from users where id > 9995;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     11    0                    0   Start at 11
1     OpenReadAsync      0     2     0                    0   table=users, root=2
2     OpenReadAwait      0     0     0                    0   
3     Integer            9995  1     0                    0   r[1]=9995
4     SeekGT             0     10    1                    0   
5       RowId            0     2     0                    0   r[2]=users.rowid
6       Column           0     1     3                    0   r[3]=users.first_name
7       ResultRow        2     2     0                    0   output=r[2..3]
8     NextAsync          0     0     0                    0   
9     NextAwait          0     5     0                    0   
10    Halt               0     0     0                    0   
11    Transaction        0     0     0                    0   
12    Goto               0     1     0                    0 

sqlite:

sqlite> explain select id, first_name from users where id > 9995;
addr  opcode         p1    p2    p3    p4             p5  comment      
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     8     0                    0   Start at 8
1     OpenRead       0     2     0     2              0   root=2 iDb=0; users
2     SeekGT         0     7     1                    0   key=r[1]; pk
3       Rowid          0     2     0                    0   r[2]=users.rowid
4       Column         0     1     3                    0   r[3]= cursor 0 column 1
5       ResultRow      2     2     0                    0   output=r[2..3]
6     Next           0     3     0                    0   
7     Halt           0     0     0                    0   
8     Transaction    0     0     3     0              1   usesStmtJournal=0
9     Integer        9995  1     0                    0   r[1]=9995
10    Goto           0     1     0                    0 

More complex example with a join that uses both a rowid lookup and a secondary index scan:

limbo> explain query plan select u.first_name, p.name from users u join products p on u.id = p.id and u.age > 70;
QUERY PLAN
`--PROJECT u.first_name, p.name
   `--JOIN
   |  |--SEARCH u USING INDEX age_idx
   |  `--SEARCH p USING INTEGER PRIMARY KEY (rowid=?)

limbo> explain select u.first_name, p.name from users u join products p on u.id = p.id and u.age > 70;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     18    0                    0   Start at 18
1     OpenReadAsync      0     2     0                    0   table=u, root=2
2     OpenReadAwait      0     0     0                    0   
3     OpenReadAsync      1     274   0                    0   table=age_idx, root=274
4     OpenReadAwait      0     0     0                    0   
5     OpenReadAsync      2     3     0                    0   table=p, root=3
6     OpenReadAwait      0     0     0                    0   
7     Integer            70    1     0                    0   r[1]=70
8     SeekGT             1     17    1                    0   
9       DeferredSeek     1     0     0                    0   
10      RowId            0     2     0                    0   r[2]=u.rowid
11      SeekRowid        2     2     15                   0   if (r[2]!=p.rowid) goto 15
12      Column           0     1     3                    0   r[3]=u.first_name
13      Column           2     1     4                    0   r[4]=p.name
14      ResultRow        3     2     0                    0   output=r[3..4]
15    NextAsync          1     0     0                    0   
16    NextAwait          1     9     0                    0   
17    Halt               0     0     0                    0   
18    Transaction        0     0     0                    0   
19    Goto               0     1     0                    0

sqlite:

sqlite> explain select u.first_name, p.name from users u join products p on u.id = p.id and u.age > 70;
addr  opcode         p1    p2    p3    p4             p5  comment      
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     14    0                    0   Start at 14
1     OpenRead       0     2     0     10             0   root=2 iDb=0; users
2     OpenRead       2     274   0     k(2,,)         0   root=274 iDb=0; age_idx
3     OpenRead       1     3     0     2              0   root=3 iDb=0; products
4     Integer        70    1     0                    0   r[1]=70
5     SeekGT         2     13    1     1              0   key=r[1]
6       DeferredSeek   2     0     0                    0   Move 0 to 2.rowid if needed
7       IdxRowid       2     2     0                    0   r[2]=rowid; users.rowid
8       SeekRowid      1     12    2                    0   intkey=r[2]
9       Column         0     1     3                    0   r[3]= cursor 0 column 1
10      Column         1     1     4                    0   r[4]= cursor 1 column 1
11      ResultRow      3     2     0                    0   output=r[3..4]
12    Next           2     6     0                    0   
13    Halt           0     0     0                    0   
14    Transaction    0     0     3     0              1   usesStmtJournal=0
15    Goto           0     1     0                    0

@jussisaurio jussisaurio changed the title index scan wip foo doesnt work yet (core): Primary key index scans and single-column secondary index scans Oct 5, 2024
@jussisaurio jussisaurio marked this pull request as ready for review October 5, 2024 13:08
@jussisaurio
Copy link
Collaborator Author

jussisaurio commented Oct 5, 2024

Copypasting from discord:

sqlite compat is failing due to limbo returning '1' here, while sqlite returns '182':

sqlite> SELECT id from users limit 1;
182

For some reason, sqlite3 chooses to use age_idx here.

(much later edit: I think it's an optimization because the index is smaller due to only containing 2 columns so less IO)

sqlite> explain select id from users limit 1;
addr  opcode         p1    p2    p3    p4             p5  comment      
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     9     0                    0   Start at 9
1     Integer        1     1     0                    0   r[1]=1; LIMIT counter
2     OpenRead       1     274   0     k(2,,)         0   root=274 iDb=0; age_idx
3     Rewind         1     8     2     0              0   
4       IdxRowid       1     2     0                    0   r[2]=rowid; users.rowid
5       ResultRow      2     1     0                    0   output=r[2]
6       DecrJumpZero   1     8     0                    0   if (--r[1])==0 goto 8
7     Next           1     4     0                    1   
8     Halt           0     0     0                    0   
9     Transaction    0     0     3     0              1   usesStmtJournal=0
10    Goto           0     1     0                    0 

This is sort of a tricky compatibility test to fix because what the fuck is the heuristic that makes it use the age index to get id from a single row

EDIT:

I made a workaround for this by changing the test to be:

select typeof(id) from users limit 1;
integer

because it is more resilient to sqlite planner idiosyncrasies

EDIT2:

I did the same for another compat test where it also decided to use age_idx where it isn't really that important

@jussisaurio jussisaurio force-pushed the index-scan branch 2 times, most recently from 472dea4 to 15a66ea Compare October 6, 2024 11:05
Copy link
Collaborator

@pereman2 pereman2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing from phone so small nits

let comparison = match cmp {
SeekOp::GT => rowid_key <= *_rowid,
SeekOp::GE => rowid_key < *_rowid,
SeekOp::EQ => rowid_key < *_rowid,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be == ?

Copy link
Collaborator Author

@jussisaurio jussisaurio Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one is correct because if we're trying to find a record in a leaf node where key == rowid, then we need to go to left_child_page when ´key < interior_cell_rowid´, right?

Copy link
Collaborator Author

@jussisaurio jussisaurio Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait nvm, if interior rowid is 10 then also rowid 10 is in the left child page (i.e. <= not <). I remembered that wrong.

Copy link
Collaborator Author

@jussisaurio jussisaurio Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer your question: the updated comparisons are now:

                            SeekOp::GT => rowid_key < *_rowid,
                            SeekOp::GE => rowid_key <= *_rowid,
                            SeekOp::EQ => rowid_key <= *_rowid,

Explanation:

Assume the search key is: rowid 100, and we want to find exactly that leaf node (EQ)

if internal node rowid is over 100, we need to go left. if its exactly 100, we need to go left. otherwise we need to go right. hence <=

Assume the search key is: rowid 100, and we want to find the first leaf node that is greater or equal to our search key (GE)

if internal node rowid is over 100, we need to go left, because greater or equal keys are on the left. if its exactly 100, we need to go left, because a leaf with rowid 100 is on the left. otherwise we need to go right. hence <=

Assume the search key is: rowid 100, and we want to find the first leaf node that is GREATER than our search key (GT)

if internal node rowid is over 100, we need to go left, because greater keys exist on the left. if its exactly 100 or less, we need to go right, because there are no leaves with rowid>100 on the left. hence <

let comparison = match cmp {
SeekOp::GT => index_key <= &record,
SeekOp::GE => index_key < &record,
SeekOp::EQ => index_key < &record,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explained above

mem_page.advance();
let record = crate::storage::sqlite3_ondisk::read_record(payload)?;
let rowid = match record.values.last() {
Some(OwnedValue::Integer(rowid)) => *rowid as u64,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't index keys also blobs/strings/whatever?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are getting the rowid here and AFAIK it's always the last value in the payload record

@jussisaurio
Copy link
Collaborator Author

jussisaurio commented Oct 6, 2024

Another weird buggerino I introduced in this PR:

select * from users where id = 3816;

doesnt return a row (3815 does, etc). so btree traversal is probably wrong in multiple ways as @pereman2 alluded to

this works correctly:

select * from users where id > 3810 and id < 3820;

so something is up with the seek function

EDIT:

think i got a fix for these, but found more bugs, stay tuned 😂

EDIT2:

the problem is that there are so few actual page changes with this amount of data that a lot of the existing tests just didnt hit any of those boundaries

@jussisaurio
Copy link
Collaborator Author

jussisaurio commented Oct 6, 2024

done for today... i added some regression tests, with one failing: theres a single users row with rowid 3425 missing when scanning the entire age_idx. dunno why exactly 1 row is missing. 💀

do_execsql_test where-age-index-seek-regression-test-2 {
    select count(1) from users where age > 0;
} {10000}

# this returns 9999, because row with rowid 3425 is missing

getting a bit lost, probably getting back to this next weekend or something

EDIT:

i think the issue is that since index interior cells also contain payloads, when scanning thru the index we need to first 1. iterate all the child keys and return them, and THEN: 2. also return the key from the parent

@jussisaurio jussisaurio force-pushed the index-scan branch 2 times, most recently from 45cd116 to e5cf052 Compare October 7, 2024 06:23
interior index cells have values that are not in the leaves, e.g.

     (interior: 3)
    /            \
(leaf: 2)     (leaf: 4)

so their values need to be emitted after the left subtree is emitted.
@jussisaurio
Copy link
Collaborator Author

yeah so the issue was that the index was not traversed properly. index interior cells have payloads, and those payloads do not exist in the leaves (unlike in table btrees where an interior cell and a leaf cell may share the same rowid).

this commit addresses that: penberg@fc71f2b

basically we need to traverse the left subtree first (returning all the values from the subtree in order), THEN return the value from the interior cell, and then advance to the next interior cell.

unreachable!("index seek key should be a record");
};
let record = crate::storage::sqlite3_ondisk::read_record(payload)?;
let comparison = match cmp {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we rename comparison to found?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

penberg@8563d62

Renamed to found in seek() and to target_leaf_page_is_in_left_subtree in move_to()

Comment on lines 247 to 251
if comparison {
let rowid = match record.values.get(1) {
Some(OwnedValue::Integer(rowid)) => *rowid as u64,
_ => unreachable!("index cells should have an integer rowid"),
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this part. We assume record.values.get(1) is the rowid. But I imagine there could be indexes with multiple columns right? Shouldn't the rowid be the last value? Furthermore, should we maybe change rowid name to key or something similar because it can be eitther rowid and primary key I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a bug i forgot to refactor. it should be last

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in penberg@93a8110

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left the rowid naming, copypasted from discord:

I think i will leave the rowid naming because it reflects current reality better, doesnt our code pretty much assume we dont have WITHOUT ROWID tables right now? And I think WITHOUT ROWID tables are pretty much btree indexes in that they store everything in the key area? https://www.sqlite.org/withoutrowid.html#:~:text=A%20WITHOUT%20ROWID%20table%20uses,binary%20search%20on%20the%20rowid.

@penberg penberg closed this in fdf3b0d Oct 13, 2024
penberg added a commit that referenced this pull request Oct 13, 2024
…sult set is already ordered' from Jussi Saurio

Built on top of, and currently targets, #350, _not_ main
Closes #365
Examples:
```
limbo> explain select u.first_name, p.name from users u join products p on u.id = p.id order by u.id;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     15    0                    0   Start at 15
1     OpenReadAsync      0     2     0                    0   table=u, root=2
2     OpenReadAwait      0     0     0                    0
3     OpenReadAsync      1     3     0                    0   table=p, root=3
4     OpenReadAwait      0     0     0                    0
5     RewindAsync        0     0     0                    0
6     RewindAwait        0     14    0                    0   Rewind table u
7       RowId            0     1     0                    0   r[1]=u.rowid
8       SeekRowid        1     1     12                   0   if (r[1]!=p.rowid) goto 12
9       Column           0     1     2                    0   r[2]=u.first_name
10      Column           1     1     3                    0   r[3]=p.name
11      ResultRow        2     2     0                    0   output=r[2..3]
12    NextAsync          0     0     0                    0
13    NextAwait          0     7     0                    0
14    Halt               0     0     0                    0
15    Transaction        0     0     0                    0
16    Goto               0     1     0                    0
```
```
limbo> explain select * from users where age > 80 order by age limit 5;
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     23    0                    0   Start at 23
1     OpenReadAsync      0     2     0                    0   table=users, root=2
2     OpenReadAwait      0     0     0                    0
3     OpenReadAsync      1     274   0                    0   table=age_idx, root=274
4     OpenReadAwait      0     0     0                    0
5     Integer            80    1     0                    0   r[1]=80
6     SeekGT             1     22    1                    0
7       DeferredSeek     1     0     0                    0
8       RowId            0     2     0                    0   r[2]=users.rowid
9       Column           0     1     3                    0   r[3]=users.first_name
10      Column           0     2     4                    0   r[4]=users.last_name
11      Column           0     3     5                    0   r[5]=users.email
12      Column           0     4     6                    0   r[6]=users.phone_number
13      Column           0     5     7                    0   r[7]=users.address
14      Column           0     6     8                    0   r[8]=users.city
15      Column           0     7     9                    0   r[9]=users.state
16      Column           0     8     10                   0   r[10]=users.zipcode
17      Column           0     9     11                   0   r[11]=users.age
18      ResultRow        2     10    0                    0   output=r[2..11]
19      DecrJumpZero     12    22    0                    0   if (--r[12]==0) goto 22
20    NextAsync          1     0     0                    0
21    NextAwait          1     7     0                    0
22    Halt               0     0     0                    0
23    Transaction        0     0     0                    0
24    Integer            5     12    0                    0   r[12]=5
25    Goto               0     1     0                    0
```

Closes #366
jussisaurio added a commit that referenced this pull request Jan 3, 2025
data does not match predicate when using index, e.g: `select id, age
from users where age > 90 limit 1;` will return data with age  90
the reason is that the current index seek directly uses record for
comparison, but the record of the index itself is longer than the record
of the key (because it contains the primary key), so Gt is invalid.
since only single-column indexes are currently supported:
#350, only the first value of
the record is currently used for comparison.

Reviewed-by: Jussi Saurio <[email protected]>

Closes #593
@penberg penberg deleted the index-scan branch January 5, 2025 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants