Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement parallel processing #11

Open
samirelanduk opened this issue Nov 26, 2017 · 3 comments
Open

Implement parallel processing #11

samirelanduk opened this issue Nov 26, 2017 · 3 comments

Comments

@samirelanduk
Copy link
Owner

The multiprocessing library could speed up parts of the PDB parsing process - especially those parts that are just processing thousands of records.

@gf712
Copy link
Contributor

gf712 commented Jul 18, 2018

In the past I have successfully managed to get a speed up from parallel processing PDB files using C++. The way to go, in my opinion, is as follows:

  1. The main thread parses a file
  2. either:
    a. as the parser loads data to a thread safe container, e.g. python's Manager, launch threads that instantiate individual Atom objects concurrently
    b. after the parser is done start multiple threads that instantiate Atom objects
  3. Check if Atom objects can be connected (this can be also processed in parallel but I suspect that there would be very little, if any, gain)

2a should be faster than 2b but you need to be careful with deadlocks and thread safety, which can be a pain to debug! In any case step 2 is where I think you could gain from parallel processing.

I am happy to help with this!

@wojdyr
Copy link

wojdyr commented Oct 29, 2018

my two cents: the PDB and mmCIF parsers could be made 1-2 orders of magnitude faster, although not in pure Python. Then the parallel processing would not be needed.
You may have a look at https://github.com/project-gemmi/mmcif-benchmark

@samirelanduk
Copy link
Owner Author

Hi - thanks for the benchmark link. I hadn't seen this before and it will be very useful.

atomium 0.12, curently under development and hopefully out in the next few days, does have large speed increases, though still in pure Python (see this tweet). Moving to compiled code is a medium term goal for this library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants