Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize planner to avoid excessive schema transform functions #9144

Open
comphead opened this issue Feb 7, 2024 · 2 comments
Open

Optimize planner to avoid excessive schema transform functions #9144

comphead opened this issue Feb 7, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@comphead
Copy link
Contributor

comphead commented Feb 7, 2024

Is your feature request related to a problem or challenge?

it was found that the planner calling schema transform functions excessively.

Experiment, lets take a simple query

        let sql = "select a, a + 1 b from (select 1 a union all select 2 a) x";

And increase rows in the x subquery by adding more and more rows, and we can see that schema function grows with every record

rows new_with_metadata merge
2 58 35
3 131 83
4 165 109

IMHO That is not expected, once the plan has built the schema calls should not increase with every new record in the dataset

Describe the solution you'd like

Ideally resolve excessive, more real is to reduce such calls

Describe alternatives you've considered

No response

Additional context

Follow up on #9104 investigations

@comphead
Copy link
Contributor Author

comphead commented Feb 7, 2024

The same basically happens with table

create table t1 as (select 1 a union all select 2 a )      
let sql: &str = "select a, a + 1 b, a+2 from t1";

It seems planner phase like schema transform affects execution phase

@comphead
Copy link
Contributor Author

comphead commented Mar 15, 2024

Update: reading from table or parquet doesn't cause exponential schema calls growth.
But Queries with literals causes like below does

let sql = "select a, a + 1, a+2 b from (select 1 a union all select 2 a union all select 3 a union all select 4 a union all select 5 a union all select 6 a)";

But this is rare and won't be that critical.

However for reading parquet and tables another problem arises with wild card expansion

adding outer select * adds 50 new calls to with_new_metadata per each *

    let sql = "select * from (select * from (select * from (select * from (select * from (select a, a+1, a+2, a+3, a+4, a+5, a+6, a+7, a+8, a+9 from t1)))))";

makes 359 calls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant