Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A fork and some feedback #1167

Closed
nene opened this issue Oct 23, 2022 · 4 comments
Closed

A fork and some feedback #1167

nene opened this issue Oct 23, 2022 · 4 comments
Assignees

Comments

@nene
Copy link
Contributor

nene commented Oct 23, 2022

I wanted to let you know that I'm forking this library: https://github.com/nene/sql-parser-cst

My needs for an SQL parser are somewhat different, and this library (like all others I've found) doesn't fulfill them. It has however provided me with something to start with, for which I'm thankful for.

My fork is still in very early stages of development, though there's already not much left from the original code. It's more of a complete-rewrite kind of fork.

While working with this code, I've discovered some issues, on which I'd like to share some feedback:

Huge amount of duplication

There's a separate parser for each SQL dialect, but most of the code is really the same, which leads to maintenance nightmare. I've seen you making commits where you do the exact same change in nine files. This will only get worse over time, unless some drastic action is taken against this.

In my fork I've augmented the parser generator with a small plugin that allows me to generate multiple parsers from a single grammar file.

Mix of responsibilities

I was surprised to discover that lists of visited tables and columns are built alongside the AST. This information can simply be derived from AST.

A much better approach would be to first only build AST, and then post-process the AST to detect which tables and columns were visited. This would also reduce the amount of duplication as that code could be applied to all dialects.

Complex logic in sqlify()

Initially I tried to make some changes to the code that converts AST back to SQL, but soon gave up. I found it very hard to navigate in this code and just generally hard to comprehend.

Lacking/incorrect SQL syntax support

I've worked mainly with the MySQL parser code, from which I've discovered lots of SQL syntax that's not actually supported by MySQL and at the same time missing support for lots of features in MySQL. Then again the Postgres parser contains support for various MySQL-specific features that are definitely not part of Postgres.

It seems to me that very little research has gone into supporting all these different dialects that this project claims to support.

In conclusion

I don't intend this all as a discouragement. You've definitely put lots of time and effort into this. If anything, it requires some guts to take on a language as complex and convoluted as SQL.

@taozhi8833998
Copy link
Owner

@nene Thanks for ur feedback, the issues in ur feedback all exist.

Huge amount of duplication

  • In Early, I try to write a common parser logical into modular Modular grammar #100, but pegjs doesn't support it now.

Mix of responsibilities

Complex logic in sqlify()

  • it's my bad, I didn't organize the code well, and didn't generate the base data structure for all DB, which make the code too complexity

Lacking/incorrect SQL syntax support

  • I'm not familiar with all databases, such as pg. The origin pg parser file, which I copied from MySQL pegjs file, change some logic in it, and it didn't support all pg syntax and some MySQL syntax in it.

All in all, You can fork a new project based on it, and fix the above issues in your feedback. If anything I can help you to make your project move fast, please let me know. I am very glad to help and learn. Thanks again for ur feedback and patience.

@taozhi8833998
Copy link
Owner

@nene BTW, I will try to fix the issues in ur feedback also. But It may take some time to finish.

@nene
Copy link
Contributor Author

nene commented Oct 24, 2022

Well, my fork is currently still in very early stages of development. I would welcome some help, but I'm not sure if and how you could help me right now. I'm still sort of fleshing things out and getting it to parse the most basic SQL. I haven't even released it to NPM or anything.

If you do want to help, I guess for start you could just check out the repo, run the tests and take a look at the output of that parser and see if it makes sense to you. The syntax tree it produces is something quite different. Or look though the code in general and report if you find something strange or hard to understand.

BTW, for historical context, wanted to ask what was your original use-case for the parser? Usually people don't simply start tinkering with a large and complex parser unless they have some actual need for it.

@nene
Copy link
Contributor Author

nene commented Nov 7, 2022

By now I have implemented full support for SQLite. And by full, I do mean 100% of all the syntax. Though there are likely some corner cases where it still fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants