Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea about making parsing "easier" #36

Open
audreyseo opened this issue Jul 8, 2019 · 1 comment
Open

Idea about making parsing "easier" #36

audreyseo opened this issue Jul 8, 2019 · 1 comment
Labels
priority: LOW Not at all necessary in the short term question Further information is requested

Comments

@audreyseo
Copy link
Collaborator

So parsing C++ is supposed to be Evil because the problem of parsing C++'s grammar is actually undecidable, probably with respect to deterministic finite automata, since there are forms that are exactly the same such as foo bar(baz);, which could either be a function declaration or an object constructor, which is crazy to me, and probably many more examples of such ambiguities in the grammar.

BUT, we don't necessarily have to take Every Single C++ file, since we can limit the scope of the input. Parsers in actual compilers can deal with ambiguous grammars by just choosing one of the several possible forms, and similarly, we can just decide that all files taken by the sketch parser will only deal with one of each of the possible forms, for every ambiguous one.

And if people think that this is unnecessary or makes this tool unusable, I think it would be more tractable to create an LLVM-based pre-parser that simply gets rid of all of the ambiguities by choosing one form, and getting rid of the ambiguous forms. For example, in the case of a function declaration versus an object constructor, we could opt for the object constructor. Therefore, the LLVM parser would take all function declarations, which it should be able to detect, and find the actual definition of the function, which should be included in some .hpp file, and replace the function declaration with the definition.

I'm just not sure if there's a list of all the ambiguous C++ grammar forms, which would be necessary for doing such a thing.

@audreyseo audreyseo added question Further information is requested priority: MED Would be nice to have, but we can wait a little priority: LOW Not at all necessary in the short term and removed priority: MED Would be nice to have, but we can wait a little labels Jul 8, 2019
@ivoysey
Copy link
Contributor

ivoysey commented Jul 11, 2019

I mentioned this on slack, so this isn't new info but I'll put it here so that it doesn't get lost in scroll back. https://github.com/GumTreeDiff/gumtree might be a good tool to look into for light weight AST construction. It's language agnostic so has some limitations but it might be good enough for your purposes for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: LOW Not at all necessary in the short term question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants