Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiprocessing vulture with python. #330

Open
pernordeman123 opened this issue Oct 4, 2023 · 3 comments
Open

Multiprocessing vulture with python. #330

pernordeman123 opened this issue Oct 4, 2023 · 3 comments

Comments

@pernordeman123
Copy link

Hi!

Is it possible to run vulture with multiprocessing?
if so do you have any recommendation on how to do it?

Trying the split the files in different processes/threads will break the tool, so it would be cool if its possible to share/split the AST between processes.
I'm using pythons threading and queue libs.

BR
Per

@jendrikseipp
Copy link
Owner

Vulture currently doesn't support parallel processing. And I feel that if it's slow enough to make somebody wish for parallel processing, we should be making its single-core execution faster first. Is the code you're testing open source and can you share a link and how you call Vulture? How ling does running Vulture on your code take?

@gtkacz
Copy link

gtkacz commented Sep 11, 2024

Just to add to this, multiprocessing is only efficient in Python when it's I/O bound because of the GIL. So until something like PEP 734 or PEP 684 makes it into Python, I'm not sure this is feasible.

@giampaolo
Copy link

giampaolo commented Nov 15, 2024

FYI, some time ago I contributed something similar for the autoflake CLI tool: PyCQA/autoflake#107.

I tried to take a look at vulture source code. As it currently stands, it's difficult to integrate multiprocessing into it, because the AST parsing logic is intertwined with the Vulture class.

Vulture.scan() (the method which should be parallelized), and the other methods which depend on it, should be turned into a function or staticmethod, independent from the Vulture class. It should accept one argument (either a path or code), traverse the entire AST tree and return a "final result" (a Python dict). You want to use standard Python types (dict, str, int, etc.) for both the function argument and return value, because they need to be serialized by multiprocessing.

That is, also the argparse namespace should be turned into a dict for the same reason.

In summary: adding multiprocessing is the easy part. The hard part is refactoring the Vulture class first. =)

@gtkacz wrote:

Just to add to this, multiprocessing is only efficient in Python when it's I/O bound because of the GIL.

It's the other way around actually. You want to use threading when the work is I/O bound, and multiprocessing when it's CPU bound. This sort of work is mostly CPU bound. Yes, vulture reads multiple files from disk, but that's fast already (.py files are small, disk read()s are cached, etc.). What's slow here is parsing the AST trees. That's the typical sort of work you want to split across multiple CPUs. The speedup will be massive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants