You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So parsing C++ is supposed to be Evil because the problem of parsing C++'s grammar is actually undecidable, probably with respect to deterministic finite automata, since there are forms that are exactly the same such as foo bar(baz);, which could either be a function declaration or an object constructor, which is crazy to me, and probably many more examples of such ambiguities in the grammar.
BUT, we don't necessarily have to take Every Single C++ file, since we can limit the scope of the input. Parsers in actual compilers can deal with ambiguous grammars by just choosing one of the several possible forms, and similarly, we can just decide that all files taken by the sketch parser will only deal with one of each of the possible forms, for every ambiguous one.
And if people think that this is unnecessary or makes this tool unusable, I think it would be more tractable to create an LLVM-based pre-parser that simply gets rid of all of the ambiguities by choosing one form, and getting rid of the ambiguous forms. For example, in the case of a function declaration versus an object constructor, we could opt for the object constructor. Therefore, the LLVM parser would take all function declarations, which it should be able to detect, and find the actual definition of the function, which should be included in some .hpp file, and replace the function declaration with the definition.
I'm just not sure if there's a list of all the ambiguous C++ grammar forms, which would be necessary for doing such a thing.
The text was updated successfully, but these errors were encountered:
I mentioned this on slack, so this isn't new info but I'll put it here so that it doesn't get lost in scroll back. https://github.com/GumTreeDiff/gumtree might be a good tool to look into for light weight AST construction. It's language agnostic so has some limitations but it might be good enough for your purposes for now.
So parsing C++ is supposed to be Evil because the problem of parsing C++'s grammar is actually undecidable, probably with respect to deterministic finite automata, since there are forms that are exactly the same such as
foo bar(baz);
, which could either be a function declaration or an object constructor, which is crazy to me, and probably many more examples of such ambiguities in the grammar.BUT, we don't necessarily have to take Every Single C++ file, since we can limit the scope of the input. Parsers in actual compilers can deal with ambiguous grammars by just choosing one of the several possible forms, and similarly, we can just decide that all files taken by the sketch parser will only deal with one of each of the possible forms, for every ambiguous one.
And if people think that this is unnecessary or makes this tool unusable, I think it would be more tractable to create an LLVM-based pre-parser that simply gets rid of all of the ambiguities by choosing one form, and getting rid of the ambiguous forms. For example, in the case of a function declaration versus an object constructor, we could opt for the object constructor. Therefore, the LLVM parser would take all function declarations, which it should be able to detect, and find the actual definition of the function, which should be included in some
.hpp
file, and replace the function declaration with the definition.I'm just not sure if there's a list of all the ambiguous C++ grammar forms, which would be necessary for doing such a thing.
The text was updated successfully, but these errors were encountered: