Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing large files is too slow #3

Open
drieks opened this issue Oct 21, 2019 · 7 comments
Open

Parsing large files is too slow #3

drieks opened this issue Oct 21, 2019 · 7 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@drieks
Copy link
Collaborator

drieks commented Oct 21, 2019

These files are currently not included in SelfTest.kt because the processing does not finish within a reasonable time:

  • KotlinLexer.kt
  • KotlinParser.kt
  • UnicodeClasses.kt
@drieks
Copy link
Collaborator Author

drieks commented Oct 27, 2019

Hi @martinflorek,

please try version fd6123da02. Can you tell me the required parsing time of the old and the new version? Thank you very much!

@martinflorek
Copy link

I am not able to properly measure the parsing time only, because I process several source code repositories at once and I am looking for specific files only before parsing them.

But the new version runs a bit faster. All my processing went from ~33 seconds to ~32 seconds. Version with Kastree runs in 1.7 seconds.

@drieks
Copy link
Collaborator Author

drieks commented Oct 30, 2019

I refactored kotlinx.ast so that it is now possible to use both antlr-kotlin and antlr-java to parse kotlin sources.
Example: https://github.com/kotlinx/ast/blob/master/grammar-kotlin-parser-antlr-java/src/test/kotlin/kotlinx/ast/example/ExampleMain.kt

But sadly, it seems that antlr-kotlin is not much slower than antlr-java. I will try to figure out how to speed up parsing.

@drieks
Copy link
Collaborator Author

drieks commented Nov 6, 2019

@ShikaSD pointed me to antlr-optimized, so I implemented support for this antlr fork in kotlinx.ast. But sadly, it is not as fast as hoped.
I will try to implement a lexer and parser using antlr4 grammar files, only supporting the features that are required to parse kotlin files.
I already added support to parse antlr4 grammar files for this use case in kotlinx.ast:grammar-antlr4-parser-antlr-java.

drieks added a commit that referenced this issue Dec 7, 2020
@drieks
Copy link
Collaborator Author

drieks commented Dec 7, 2020

The time for ./gradlew clean check was reduced from 3min 30s in commit c7dd6bb to 2min 30s in commit f088b3c.

because of this, all kotlin files will now be scanned in the self test.

it is still required to speed this up, I think we need some patch to the kotlin parser/lexer for this.

@drieks
Copy link
Collaborator Author

drieks commented Dec 7, 2020

build time for commit 95db180 is 44s, so we can assume that testing the previusly excluded files takes around 1 minute 45s.

  • KotlinLexer.kt
  • KotlinParser.kt
  • UnicodeClasses.kt

@drieks drieks added bug Something isn't working enhancement New feature or request labels Dec 17, 2020
@fab1an
Copy link

fab1an commented Dec 15, 2021

Can you have a look at my comment in #50 ? Why is a large garbage-string faster than a large string containing json?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants