Support writing MPCD particle data to file #774

mphoward · 2020-09-04T14:38:57Z

Description

The state of the MPCD particle data cannot currently be saved to a GSD file. We have done restarts by using a snapshot with NumPy arrays to save the final state, but it would be more convenient to be able to save this directly in a nicer file format. I discussed this with @joaander a long time ago but there seem to be two options, and I wanted to get feedback / thoughts on them:

Define an MPCD particle data schema and save the particles to their own GSD file.
Embed the MPCD data into the GSD file with the rest of the HOOMD data, again using some reasonable schema.

With the new v3 API, does one of these seem simpler / more appealing? I'm thinking in particular about how initialization might look. Currently, MPCD initialization happens in a second stage from the HOOMD system, so would the same GSD file need to be read twice (in two init commands) if we went with option 2? Also, there are usually many MPCD particles so it is not good to save them too frequently (i.e., the MPCD particle data would probably be written much less frequently than the HOOMD particle data). Last, is one of these options easier in terms of accessing the data using the gsd python module (like, one is already supported but the other would need to be implemented)?

~~I was somewhat favoring option 1, but I am totally open to either.~~ With the new API, this is probably easier using approach 2 so that the MPCD particles can be initialized from the same GSD file as the MD particles. Otherwise, we need an additional argument for the MPCD GSD file, or a method that can be called after the state is created to also read the MPCD particles.

Developer

I will work on this eventually, but it will be lower priority for me than the other migration tasks because there is a reasonable workaround already.

The text was updated successfully, but these errors were encountered:

github-actions · 2022-03-22T19:03:50Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

github-actions · 2022-04-02T19:03:41Z

This issue has been automatically closed because it has not had recent activity.

mphoward · 2024-03-05T16:54:14Z

@joaander I would like to restart the discussion on this. I am having issues, similar to what I saw on Blue Waters a long time ago, where the process of taking a Snapshot with MPCD particles fails. (This is using HOOMD 2, but I expect the behavior to be similar in HOOMD 4 because the error comes from Gatherv failure.) I would guess that I can generate similar errors if I try to initialize from a Snapshot with a large number of MPCD particles.

I think it would be great if we had a way to read/write the MPCD particles to GSD, but I am not sure how to proceed. What would be the best way to do this?

joaander · 2024-03-05T17:30:02Z

Similar to the HOOMD v2 codebase, create_state_from_gsd reads the GSD file to a Snapshot, then initializes from the snapshot:

hoomd-blue/hoomd/simulation.py

Lines 230 to 240 in da678fe

    
           reader = _hoomd.GSDReader(self.device._cpp_exec_conf, filename, 
        
                                     abs(frame), frame < 0) 
        
           snapshot = Snapshot._from_cpp_snapshot(reader.getSnapshot(), 
        
                                                  self.device.communicator) 
        
           step = reader.getTimeStep() if self.timestep is None else self.timestep 
        
           self._state = State(self, snapshot, domain_decomposition) 
        
           reader.clearSnapshot() 
        
           self._init_system(step)

This always requires rank 0 to have enough memory to store the entire system. For MD/HPMC simulations, the memory/per node has grown massively with core counts in recent years so I have not found a strong need to refactor the initialization to operate with in parallel with O(N/P) memory requirements. I think you would need to implement parallel initialization for MPCD particles to solve the problem you mention - though the problem could be an underlying limitation in the size of arrays supported by MPI. The gsd C API would also need to be expanded with partial data chunk reading support to avoid reading all N particles on each rank.

On the other points:

reader in this code could be kept and used later to prevent the need to re-open a gsd file during mpcd initialization. This may not be necessary as opening a gsd file costs a few milliseconds.
What schema would you propose in GSD for mpcd particles? What would you expect tools like VMD and Ovito to do with this data? If you want to add this into the hoomd schema, note that all data chunks are defined valid at all times (https://gsd.readthedocs.io/en/v3.2.1/schema-hoomd.html#data-chunks). Thus, it is not feasible to have one gsd file with separate triggers for normal and mpcd particles.
I don't think any one option is easier or harder to implement in the gsd python module. The hoomd schema is read by hoomd.py (https://github.com/glotzerlab/gsd/blob/trunk-patch/gsd/hoomd.py). You can just as easily add mpcd.py as you can add the same code to hoomd.py.

mphoward · 2024-03-05T18:38:24Z

though the problem could be an underlying limitation in the size of arrays supported by MPI

I did some additional testing for my simulation that was crashing, and I came to the same conclusion. I think the issue is with the use of int internally by the MPI library and routines in HOOMDMPI.h. I was getting crashes from gather_v when taking a snapshot, but even if I disabled that, I was also getting weird behavior when I initialized from a snapshot. (That indicates an issue with scatter_v). If I reduced the number of particles, everything worked fine.

I saw that MPICH has an MPI_Count and MPI_Aint (address int), which can be passed to an alternative API like MPI_Gatherv_c and are supposed to address this issue. It looked like this was added in MPICH 3.1 to support the MPI-3 standard (released a while ago), but I'm not sure how widely supported this is by other MPI libraries.

Unfortunately, this means that parallel initialization and write would be necessary to fix my problem because I could not go through a Snapshot, but I think it would still be useful to have GSD support even if it doesn't work for these big problems.

What schema would you propose in GSD for mpcd particles?

MPCD particles are like pared down HOOMD particles. They each have a position, velocity, and typeid. Additionally, we would need to record the number of particles, the mass m (a scalar, same for all particles), and the list of types. We would also want to have a copy of the box.

What would you expect tools like VMD and Ovito to do with this data?

I would want them to be ignored because there are so many particles, and they are also basically points.

Thus, it is not feasible to have one gsd file with separate triggers for normal and mpcd particles.

That is a good point. If the MPCD particles were in the HOOMD schema, I would put it in its mpcd/ namespace. Could we create multiple dump writers if we needed the info for the different particles at different rates, using dynamic to opt in like:

solute_only = hoomd.write.GSD(
    trigger=1e4,
    filename="solute.gsd")

solvent_only = hoomd.write.GSD(
    trigger=1e5,
    filename="solvent.gsd",
    dynamic=["configuration/box", "mpcd"])

restart = hoomd.write.GSD(
    trigger=1e6,
    filename="restart.gsd",
    dynamic=["property", "momentum", "attribute", "topology", "mpcd"],
    truncate=True)

?

Overall, I think probably then the question is whether we want the MPCD particles to be part of the standard HOOMD schema (initialize at the same time as the MD particles, write like above), or make a separate MPCD schema (initialize separately from the MD particles, always write separately too). The argument for the first is convenience for single frame operations like initialization and restart, but a potentially clunkier mechanism for making a trajectory and it adds to the HOOMD schema. The argument for the second one is that it doesn't touch the HOOMD schema, and writing trajectories is cleaner / the user is less likely to make a mistake of writing too much data.

github-actions · 2024-11-20T19:00:36Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

mphoward added enhancement New feature or request mpcd MPCD component labels Sep 4, 2020

mphoward mentioned this issue Sep 4, 2020

Migrate MPCD component to v4 #775

Closed

11 tasks

github-actions bot added the stale There has been no activity on this for some time. label Mar 22, 2022

github-actions bot closed this as completed Apr 2, 2022

mphoward removed the stale There has been no activity on this for some time. label Mar 5, 2024

mphoward reopened this Mar 5, 2024

mphoward mentioned this issue Sep 25, 2024

Segmentation fault for MPI operations on large data sizes #1895

Closed

github-actions bot added the stale There has been no activity on this for some time. label Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support writing MPCD particle data to file #774

Support writing MPCD particle data to file #774

mphoward commented Sep 4, 2020 •

edited

Loading

github-actions bot commented Mar 22, 2022

github-actions bot commented Apr 2, 2022

mphoward commented Mar 5, 2024

joaander commented Mar 5, 2024

mphoward commented Mar 5, 2024

github-actions bot commented Nov 20, 2024

Support writing MPCD particle data to file #774

Support writing MPCD particle data to file #774

Comments

mphoward commented Sep 4, 2020 • edited Loading

Description

Developer

github-actions bot commented Mar 22, 2022

github-actions bot commented Apr 2, 2022

mphoward commented Mar 5, 2024

joaander commented Mar 5, 2024

mphoward commented Mar 5, 2024

github-actions bot commented Nov 20, 2024

mphoward commented Sep 4, 2020 •

edited

Loading