Replies: 12 comments 2 replies
-
This would be a great addition. I'd be glad to collaborate in any way to make it happen and get it integrated with NAMD 3. Your plan sounds fine to me. Avoiding the latency copying the values over and having the CPU perform the relevant math is where most of the performance will come from, so I wouldn't care about GPU performance of the code right now. |
Beta Was this translation helpful? Give feedback.
-
This is very much connected to issue #61 - the plan described there to move to flat arrays of shape xyzxyz could be amended. |
Beta Was this translation helpful? Give feedback.
-
I'd suggest just getting colvars running on the GPU in whatever way is possible with the fewest code changes using whatever bleeding-edge features might help, even if it only runs on Volta and Ampere under CUDA 11. Don't try to optimize performance before you know what the actual bottlenecks are. |
Beta Was this translation helpful? Give feedback.
-
As @jhenin pointed out there is already a discussion on the matter of refactoring. In some part, it reflects also conversations with another UIUC programmer (John). Regarding details, I am of course open to changes. The only plans that shouldn't be changed are those that are so good that they are carried out quickly... And clearly that plan isn't (see the date on issue #61). One thing that stands out is that the choice between We all agree that ruling out C++11 or other modern resources is not useful any more. Eventually even VMD will get there. But then the question is how much of the Colvars GPU code, which will most likely be written with NAMD in mind, could be reused. If it needs to be tied to NAMD, Alternatively, there is also the option to just wrap NAMD objects, like we do already for centers of mass or GridForces maps, both of which are computed entirely by NAMD code (Colvars is just a wrapper to those features). This could give you guys more flexibility, e.g. to handle the transition from the current single-GPU scheme to a (new and better than before) multi-GPU scheme. Not to mention that you could probably also reuse said variables with |
Beta Was this translation helpful? Give feedback.
-
I guess I put stuff down without actually engaging the conversation, so with that in mind here is a question. What are the variables that NAMD 3 most sorely needs to have on the GPU? |
Beta Was this translation helpful? Give feedback.
-
I think the final goal may be totally avoiding the data transfer between CPU and GPU, so the variables include all the atomic coordinates, colvar grids and biases. Is that possible?
|
Beta Was this translation helpful? Give feedback.
-
All of them I think would be ideal, to make colvars properly GPU-resident the whole time the simulation is running (except for file I/O). Otherwise you still incur a memory transfer across the PCI-bus or whatever its successor is, and the simulation would need to wait for that step to complete. Right now its those memory transfers that appear to be the bottleneck, since calculating relatively simple colvars substantially reduced performance. |
Beta Was this translation helpful? Give feedback.
-
I have put an example code of computing optimal RMSD with a reference frame using CUDA in https://github.com/HanatoK/RMSD_CUDA . It does all kinds of calculations (COM, matrix F, eigenvalues and reduction) on GPU, but of course the code is not optimal. |
Beta Was this translation helpful? Give feedback.
-
Hi, here are a few comments that I hope will contribute constructively to this conversation.
To those of you who are endeavoring to make complex plans, we will need much more frequent and accurate communication than there ever was in the NAMD project. Whether you are making such plans or are reviewing them, do keep this in mind. |
Beta Was this translation helpful? Give feedback.
-
@giacomofiorin |
Beta Was this translation helpful? Give feedback.
-
Good luck with your defense! I'll take a closer look again at Julio's draft changes to If the GPU-aware implementation of |
Beta Was this translation helpful? Give feedback.
-
I am now a NAMD developer so I can do more on this aspect. For the NAMD side, I have proposed the new interface for GPU-resident NAMD, which avoids the CPU-GPU copy (see https://gitlab.com/tcbgUIUC/namd/-/wikis/Developer-Notes/CudaGlobalMaster). PytorchForces serves as an example of the new interface (see https://gitlab.com/tcbgUIUC/namd/-/wikis/Developer-Notes/PytorchForces). Code are ready (see the Any updates from the Colvars side? I am willing to help, but I don't know how to get the ball rolling. @giacomofiorin Do you have any ideas? Thanks! |
Beta Was this translation helpful? Give feedback.
-
COVID work has slowed down a bit, so now I have time to think about how to better integrate colvars onto GPUs to avoid data transfers across the relatively slow PCI bus. As a discussion point, there are a few parts to colvars that aren't ideal from a GPU perspective as it has currently been designed.
Right now, quantities like the atomic coordinates are stored as arrays of rvectors, so the data is stored xyzxyzxyz, etc. On the GPU, the coordinates are stored as independent arrays, (xxxyyyzzz), since this better exposes the parallelism on the hardware. rvector seems to be used everywhere within the module, so in the short term it may be worth rearranging the data within the GPU every timestep into an array of
double3
(orfloat3
depending on thecvm::real
typedef).There are some code patterns that would probably vectorize well, but instead need to resort to simple for loop patterns because
std::transform
and theirthrust
equivalents are verboten until VMD allows us to use C++11.My inclination is to use as much of the existing code-base as is practical at the expense of potential GPU performance, so long as no memory needs to move across the PCI-bus, but this is still at the beginning stages so it would be useful to gather feedback. Thoughts and opinions welcome!
Beta Was this translation helpful? Give feedback.
All reactions