Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Introduce the "parser" feature to gate the SQL text processing and leaving only AST and other support types #1691

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

felipecrv
Copy link

This crate can become a very lightweight dependency to datafusion (and other projects) that have their own SQL parser but need to use datafusion-sqlparser-rs AST types to interact with the rest of datafusion (as a modular library and not as a full-featured SQL engine).

WORK-IN-PROGRESS notice: this is draft state and I want to see how CI behaves, reduce the number of cfg(feature = "parser") annotations needed and ensure datafusion still builds even if the "parser" feature is disabled.

Before merging, "parser" will of course, become part of the default set of features of the crate.

Caveats:
- `dialect` is completely disabled which is not the final goal.
- `parser/mod.rs` and `tokenizer/mod.rs` should be split so that
  the number of annotations is greatly reduced and applied at the
  file level instead of struct/function level.
@alamb
Copy link
Contributor

alamb commented Jan 30, 2025

This crate can become a very lightweight dependency to datafusion (and other projects) that have their own SQL parser but need to use datafusion-sqlparser-rs AST types to interact with the rest of datafusion (as a modular library and not as a full-featured SQL engine).

BTW another way to approach this issue would be to remove the dependency of sqlparser from DataFusion (e.g. make the parsing an optional module)

@felipecrv
Copy link
Author

This crate can become a very lightweight dependency to datafusion (and other projects) that have their own SQL parser but need to use datafusion-sqlparser-rs AST types to interact with the rest of datafusion (as a modular library and not as a full-featured SQL engine).

BTW another way to approach this issue would be to remove the dependency of sqlparser from DataFusion (e.g. make the parsing an optional module)

I tried, but there are non-trivial dependencies on the ast:: and dynamic dispatch to the dialect trait. I think it's reasonable to share the ast:: types instead of abstracting them, hence my idea of disabling the parsing from the crate and leave the rest available.

@alamb
Copy link
Contributor

alamb commented Jan 30, 2025

I tried, but there are non-trivial dependencies on the ast:: and dynamic dispatch to the dialect trait. I think it's reasonable to share the ast:: types instead of abstracting them, hence my idea of disabling the parsing from the crate and leave the rest available.

The dependencies on the ast can be handled by copying the definitions into DataFusion

The dpeendencies on Dialect is maybe harder to imagine

BTW to be clear the idea in this PR seems fine to me -- I just figured I would comment

@felipecrv
Copy link
Author

felipecrv commented Jan 30, 2025

I tried, but there are non-trivial dependencies on the ast:: and dynamic dispatch to the dialect trait. I think it's reasonable to share the ast:: types instead of abstracting them, hence my idea of disabling the parsing from the crate and leave the rest available.

The dependencies on the ast can be handled by copying the definitions into DataFusion

But then sqlparser would depend on datafusion and not just the other way around.

The dpeendencies on Dialect is maybe harder to imagine

I might be fuzzy on the details as I'm just getting started on the codebase.

BTW to be clear the idea in this PR seems fine to me -- I just figured I would comment

And thank you for commenting!

@alamb
Copy link
Contributor

alamb commented Jan 31, 2025

But then sqlparser would depend on datafusion and not just the other way around.

I meant literally copy/pasting (or some variant thereof) of the relevant structures to break the dependency. Similar to the way @eliaperantoni did in this PR for Span:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants