Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: propose a simpler way to get software row_id #912

Closed

Conversation

jiashenC
Copy link
Member

@jiashenC jiashenC commented Jun 29, 2023

The current approach to get the row_id in the base branch is a bit complicated. It requires us to modify or insert an expression to derive the row_id. It is easy for create index, but it becomes problematic for index scan. Index scan is enabled during optimization. We need to take care of the row_id expression during optimization by traversing multiple operators. I think this is not very clean.

I am thinking we can just generate runtime time row id for storage besides structure table. The assumption is for those data table, they won't be used for operations like join, etc, so it is safe to just use a runtime row id which is different from the row_id actually stored on disk.

Implementation-wise, I think it becomes simpler that we don't need manually insert any expression. Feedback is appreciated if any major issue is overlooked.

@xzdandy
Copy link
Collaborator

xzdandy commented Jun 29, 2023

The following is confusing, if we do so, we shall give another name.

a runtime row id which is different from the row_id actually stored on disk.

I am in to redesign the row_id. The current implementation is complicated and requires special treatments in different components of the system.

@jiashenC
Copy link
Member Author

The following is confusing, if we do so, we shall give another name.

a runtime row id which is different from the row_id actually stored on disk.

I am in to redesign the row_id. The current implementation is complicated and requires special treatments in different components of the system.

One alternative is that we can rename it to "id"? For normal tables, "id" should be equal to "row_id". For others, it will be created at runtime. Any idea?

@gaurav274
Copy link
Member

The current approach to get the row_id in the base branch is a bit complicated. It requires us to modify or insert an expression to derive the row_id. It is easy for create index, but it becomes problematic for index scan. Index scan is enabled during optimization. We need to take care of the row_id expression during optimization by traversing multiple operators. I think this is not very clean.

I am thinking we can just generate runtime time row id for storage besides structure table. The assumption is for those data table, they won't be used for operations like join, etc, so it is safe to just use a runtime row id which is different from the row_id actually stored on disk.

Implementation-wise, I think it becomes simpler that we don't need manually insert any expression. Feedback is appreciated if any major issue is overlooked.

I like the idea! However, this won't work for delete, which is problematic. How about we create a runtime unique id similar to what is done in the master, but we do it in the storage engine? This ensures that the rest of the system does not need to worry about it. We can make sure the _row_id (or some other column name) is always unique.

@jiashenC
Copy link
Member Author

jiashenC commented Jul 4, 2023

The current approach to get the row_id in the base branch is a bit complicated. It requires us to modify or insert an expression to derive the row_id. It is easy for create index, but it becomes problematic for index scan. Index scan is enabled during optimization. We need to take care of the row_id expression during optimization by traversing multiple operators. I think this is not very clean.
I am thinking we can just generate runtime time row id for storage besides structure table. The assumption is for those data table, they won't be used for operations like join, etc, so it is safe to just use a runtime row id which is different from the row_id actually stored on disk.
Implementation-wise, I think it becomes simpler that we don't need manually insert any expression. Feedback is appreciated if any major issue is overlooked.

I like the idea! However, this won't work for delete, which is problematic. How about we create a runtime unique id similar to what is done in the master, but we do it in the storage engine? This ensures that the rest of the system does not need to worry about it. We can make sure the _row_id (or some other column name) is always unique.

Ok. A few clarifications.

  1. Can you remind me why it would not work for delete?
  2. How about we create a runtime unique id similar to what is done in the master, but we do it in the storage engine? Are you suggesting that we just create a new column with a different name for runtime unique id?

@xzdandy
Copy link
Collaborator

xzdandy commented Jul 6, 2023

If I understand correctly, ROWID needs to be permanent (https://www.ibm.com/docs/en/db2-for-zos/11?topic=types-row-id-values), while the ROWNUM can be generated at run time.

I like the idea of storage engine. For structure data, we can get rowid from the sqlalchemy (or underlying database). So we only need handle the multimedia case to generate a unique id.

PS: I suggest that SELECT * does not return ROWID column (https://www.ibm.com/docs/en/informix-servers/12.10?topic=statements-rowid-values-in-select), which makes the column implicit. It helps the case like CREATE TABLE x AS SELECT * ... (#786).

jiashenC added a commit that referenced this pull request Sep 8, 2023
Use a runtime `row_number` to build the index by incorporating design
discussions from #912 and
#868.
@jiashenC
Copy link
Member Author

jiashenC commented Sep 8, 2023

Close this for now. #1073 is merged as a fix.

@jiashenC jiashenC closed this Sep 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants