feat: propose a simpler way to get software row_id #912

jiashenC · 2023-06-29T05:42:33Z

The current approach to get the row_id in the base branch is a bit complicated. It requires us to modify or insert an expression to derive the row_id. It is easy for create index, but it becomes problematic for index scan. Index scan is enabled during optimization. We need to take care of the row_id expression during optimization by traversing multiple operators. I think this is not very clean.

I am thinking we can just generate runtime time row id for storage besides structure table. The assumption is for those data table, they won't be used for operations like join, etc, so it is safe to just use a runtime row id which is different from the row_id actually stored on disk.

Implementation-wise, I think it becomes simpler that we don't need manually insert any expression. Feedback is appreciated if any major issue is overlooked.

xzdandy · 2023-06-29T08:21:59Z

The following is confusing, if we do so, we shall give another name.

a runtime row id which is different from the row_id actually stored on disk.

I am in to redesign the row_id. The current implementation is complicated and requires special treatments in different components of the system.

jiashenC · 2023-06-29T17:13:23Z

The following is confusing, if we do so, we shall give another name.

a runtime row id which is different from the row_id actually stored on disk.

I am in to redesign the row_id. The current implementation is complicated and requires special treatments in different components of the system.

One alternative is that we can rename it to "id"? For normal tables, "id" should be equal to "row_id". For others, it will be created at runtime. Any idea?

gaurav274 · 2023-07-01T17:01:48Z

The current approach to get the row_id in the base branch is a bit complicated. It requires us to modify or insert an expression to derive the row_id. It is easy for create index, but it becomes problematic for index scan. Index scan is enabled during optimization. We need to take care of the row_id expression during optimization by traversing multiple operators. I think this is not very clean.

I am thinking we can just generate runtime time row id for storage besides structure table. The assumption is for those data table, they won't be used for operations like join, etc, so it is safe to just use a runtime row id which is different from the row_id actually stored on disk.

Implementation-wise, I think it becomes simpler that we don't need manually insert any expression. Feedback is appreciated if any major issue is overlooked.

I like the idea! However, this won't work for delete, which is problematic. How about we create a runtime unique id similar to what is done in the master, but we do it in the storage engine? This ensures that the rest of the system does not need to worry about it. We can make sure the _row_id (or some other column name) is always unique.

jiashenC · 2023-07-04T16:54:55Z

The current approach to get the row_id in the base branch is a bit complicated. It requires us to modify or insert an expression to derive the row_id. It is easy for create index, but it becomes problematic for index scan. Index scan is enabled during optimization. We need to take care of the row_id expression during optimization by traversing multiple operators. I think this is not very clean.
I am thinking we can just generate runtime time row id for storage besides structure table. The assumption is for those data table, they won't be used for operations like join, etc, so it is safe to just use a runtime row id which is different from the row_id actually stored on disk.
Implementation-wise, I think it becomes simpler that we don't need manually insert any expression. Feedback is appreciated if any major issue is overlooked.

I like the idea! However, this won't work for delete, which is problematic. How about we create a runtime unique id similar to what is done in the master, but we do it in the storage engine? This ensures that the rest of the system does not need to worry about it. We can make sure the _row_id (or some other column name) is always unique.

Ok. A few clarifications.

Can you remind me why it would not work for delete?
How about we create a runtime unique id similar to what is done in the master, but we do it in the storage engine? Are you suggesting that we just create a new column with a different name for runtime unique id?

xzdandy · 2023-07-06T06:12:23Z

If I understand correctly, ROWID needs to be permanent (https://www.ibm.com/docs/en/db2-for-zos/11?topic=types-row-id-values), while the ROWNUM can be generated at run time.

I like the idea of storage engine. For structure data, we can get rowid from the sqlalchemy (or underlying database). So we only need handle the multimedia case to generate a unique id.

PS: I suggest that SELECT * does not return ROWID column (https://www.ibm.com/docs/en/informix-servers/12.10?topic=statements-rowid-values-in-select), which makes the column implicit. It helps the case like CREATE TABLE x AS SELECT * ... (#786).

Use a runtime `row_number` to build the index by incorporating design discussions from #912 and #868.

jiashenC · 2023-09-08T14:34:27Z

Close this for now. #1073 is merged as a fix.

propose a simpler way to get software row_id

ad85ec7

jiashenC mentioned this pull request Sep 8, 2023

fix: create index from single document #1073

Merged

jiashenC added a commit that referenced this pull request Sep 8, 2023

fix: create index from single document (#1073)

f0dd533

Use a runtime `row_number` to build the index by incorporating design discussions from #912 and #868.

jiashenC closed this Sep 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: propose a simpler way to get software row_id #912

feat: propose a simpler way to get software row_id #912

jiashenC commented Jun 29, 2023 •

edited

Loading

xzdandy commented Jun 29, 2023

jiashenC commented Jun 29, 2023

gaurav274 commented Jul 1, 2023

jiashenC commented Jul 4, 2023

xzdandy commented Jul 6, 2023

jiashenC commented Sep 8, 2023

feat: propose a simpler way to get software row_id #912

feat: propose a simpler way to get software row_id #912

Conversation

jiashenC commented Jun 29, 2023 • edited Loading

xzdandy commented Jun 29, 2023

jiashenC commented Jun 29, 2023

gaurav274 commented Jul 1, 2023

jiashenC commented Jul 4, 2023

xzdandy commented Jul 6, 2023

jiashenC commented Sep 8, 2023

jiashenC commented Jun 29, 2023 •

edited

Loading