-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvement? #38
Comments
Can you provide an example of how you're specifically performing the matches? About how long is a long time? Is it on the order of minutes, hours, or days? This will help me look into the performance issue. |
For example, It would take 20μs for each file. And 2 seconds for 100k file. And if we use big regex and use |
I checked
vs
I'll check if I can fix it. |
I made it to:
so far. Thing is, |
I got slightly better RE for a start of pattern: |
Is it worth adding an actual benchmark with e.g. pytest-benchmark or asv? |
Of course it is
…On Fri, Sep 2, 2022, 7:20 PM Nicholas Bollweg ***@***.***> wrote:
Is it worth adding an actual benchmark with e.g. pytest-benchmark
<https://pypi.org/project/pytest-benchmark/> or asv
<https://asv.readthedocs.io/en/stable/>?
—
Reply to this email directly, view it on GitHub
<#38 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AARGSB5QNTB6GVKOFGFC5O3V4ISMXANCNFSM4NPDKCWA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
It would be great if there was a way to combine multiple patterns from different lines into larger regexes automatically. |
👀 It is possible, from my experimentation:
then you end up with one long pattern like But I have a completely different implementation so idk how hard that would be for this project. I actually have 2 patterns: one that's used if the path is a directory, and one that's used if the path doesn't exist or is a file. That lets me flatten all the patterns into one. But since checking if it's a dir is comparatively slow, I also have a setting to not check and assume everything passed in is a file such that I'm still working on fixing #74 |
I only used method 1 in another project and get a significant performance improvement. Method 2 is something I didn't think of. In my case, I split the pattern into several groups, only the same type of pattern can be joined together. |
Hello.
We are users of
pathspec
in some other project. I have a performance question.For a long list of rules (dozens) matches large amount of files (hundreds of thousands) the
match_file
takes a long time. Is there any method to improve its performance?For example, using a big regex instead of multiple small ones.
The text was updated successfully, but these errors were encountered: