New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[wip] Storage manager update #679

Open

richardjgowers wants to merge 6 commits into main from storage_manager_update

Contributor

richardjgowers commented Jan 4, 2024 •

edited

Loading

[no ci]

fixes #292

Developers certificate of origin

I certify that this contribution is covered by the MIT License here and the Developer Certificate of Origin at https://developercertificate.org/.

richardjgowers added 2 commits

January 1, 2024 18:12


          WIP of new storage backend in RFE protocol

19991a3


          register files I'm expecting to be created

6b81bb6

pep8speaks commented Jan 4, 2024 •

edited

Loading

Hello @richardjgowers! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file openfe/protocols/openmm_rfe/equil_rfe_methods.py:

Line 821:80: E501 line too long (81 > 79 characters)
Line 1005:37: E127 continuation line over-indented for visual indent
Line 1006:37: E127 continuation line over-indented for visual indent

Comment last updated at 2024-01-22 14:33:52 UTC

richardjgowers added 2 commits

January 9, 2024 14:47


          more WIP of new filesystem access

0571d28

currently fails with:

UnsupportedOperation: not readable


          more WIP of new filesystem access

ecd6716

dwhswenson reviewed

View reviewed changes

Member

dwhswenson left a comment

Quick overview of current status:

Main thing is that I think you're using file.as_path() more frequently than needed. This should be used primarily for avoiding the automatic registration that comes from pathlib.Path(file), especially for files that already exist in the cloud (user story: don't download a big file from the cloud when all I want is its parent directory, so I use .as_path()).

When the file doesn't exist, using just pathlib.Path(file) will mark it as something to transfer to the appropriate external storage. Mainly, you don't want to do this on files that you delete during the unit (although those really should have been in scratch, not in either shared or permanent). I think it will also do something on transfer if file in that was a directory (either warning or info-level log message; can't remember which).

You're also using str(file.as_path()) frequently. (1) I think in most cases you actually want str(pathlib.Path(file)), which invokes registration, so you don't need to do it manually; and (2) we should probably implement StagingPath.__str__(self): return str(pathlib.Path(self)), so that your usage would actually just be str(file).

openfe/protocols/openmm_rfe/equil_rfe_methods.py Outdated

Comment on lines 583 to 584

		shared_basepath: stagingregistry.StagingRegistry,
		permanent_basepath: stagingregistry.StagingRegistry,

Member

dwhswenson Jan 10, 2024

I don't think these are StagingRegistry ... aren't they StagingPath?

The input to this method was, e.g., root/$DAG_LABEL/shared_$UNIT_LABEL. StagingRegistry would correspond to root/. The full path comes in as a StagingPath, I believe (this is something that changed at one point, because there was no need to expose both classes to protocol developers; it also simplified serialization, IIRC.)

Contributor Author

richardjgowers Jan 11, 2024

I'm working off these hints here, these three vars are just the context unpacked: https://github.com/OpenFreeEnergy/gufe/blob/staging-execute-dag/gufe/protocols/protocolunit.py#L30

openfe/protocols/openmm_rfe/equil_rfe_methods.py Outdated

Comment on lines 597 to 602

+                      scratch_basepath: StagingDirectory
+                        Where to store temporary files
+                      shared_basepath : StagingDirectory
+                        Where to run the calculation
+                      permanent_basepath : StagingDirectory
+                        Where to store files that must persist beyond the DAG

Member

dwhswenson Jan 10, 2024

update typing in docstring

openfe/protocols/openmm_rfe/equil_rfe_methods.py

@@ @@ -664,11 +664,13 @@ def run(self, *, dry=False, verbose=True, @@
                       else:
                           ffcache = None
+                      ffcache.register()

Member

dwhswenson Jan 10, 2024

Problem 1 (easy): this line and the ffcache.as_path() below will fail if ffcache is None.

Problem 2 (more complicated): How is ffcache supposed to work? Isn't that supposed to be something that we can pull down from some previous unit? shared_basepath is supposed to be for a specific execution of a given unit. That directory might not exist until we're about to start executing this unit. How does a file get in there? How are we currently using that? [EDIT: I'm having a call with @IAlibay tomorrow to clarify what is going on with this]

Contributor Author

richardjgowers Jan 15, 2024

reading through, I think the only thing that would be cached would be the LJ parameters for the small molecules. We can probably remove this?

openfe/protocols/openmm_rfe/equil_rfe_methods.py

                       chk = sim_settings.checkpoint_storage
                       reporter = multistate.MultiStateReporter(
-                          storage=nc,
+                          storage=str(nc.as_path()),

Member

dwhswenson Jan 10, 2024

You're using this str(file.as_path()) pattern a lot. When you use .as_path(), you're explicitly bypassing the automatic registration mechanism, meaning that you must explicitly register the object.

I'm not sure if this is implemented, but you should be able to instead just use str(file), which should also register the file. Equivalently, str(pathlib.Path(file)) will definitely work, which means you don't need to manually use .register().

You should only need to use .register() on paths that are created outside your control (like the checkpoint and RTA files, which are paths generated inside OpenMMTools and not exposed to us).

Contributor Author

richardjgowers Jan 15, 2024

Yeah once it's MultiStateReporter wants a string (not path) and the other is going into a subprocess call, so again it's bypassing file-like objects and is a string.

openfe/protocols/openmm_rfe/equil_rfe_methods.py Outdated

Comment on lines 992 to 993

		'nc': nc.as_path(),
		'last_checkpoint': checkpoint.as_path(),

Member

dwhswenson Jan 10, 2024

Probably better to return the StagingPath itself, which can then be reloaded even if you move the path. Imagine something running on the EC2, where files will get stored on S3. Knowing the path on your EC2 instance isn't particularly useful.

openfe/protocols/openmm_rfe/equil_rfe_methods.py Outdated

                               **analyzer.unit_results_dict
                           }
                       else:
                           return {'debug': {'sampler': sampler}}
                   @staticmethod
-                  def analyse(where) -> dict:
+                  def analyse(where: stagingregistry.StagingRegistry) -> dict:

Member

dwhswenson Jan 10, 2024

this is probably a StagingPath

openfe/protocols/openmm_rfe/equil_rfe_methods.py Outdated

                       # don't put energy analysis in here, it uses the open file reporter
                       # whereas structural stuff requires that the file handle is closed
-                      ret = subprocess.run(['openfe_analysis', str(where)],
+                      trjdir = (where / '')

Member

dwhswenson Jan 10, 2024

what's the purpose of doing this?

Contributor Author

richardjgowers Jan 11, 2024

This was the question I was asking on slack. I needed a .as_path() of a directory and StagingRegistry doesn't have this

openfe/protocols/openmm_rfe/equil_rfe_methods.py Outdated

+                      trjdir = (where / '')
+                      output = (where / 'results.json')
+                      ret = subprocess.run(['openfe_analysis', 'RFE_analysis',
+                                                  str(trjdir.as_path()),

Member

dwhswenson Jan 10, 2024

If I understand correctly, has the requirement that the files are already in the directory trjdir. Note that this will only work for situations where this analyse method is called on the same computer where the data was generated.

To run it on a different computer would require downloading all files within the directory. The best way to do that now would be to enumerate them here -- in principle there is a way to obtain all files with a given string prefix, but all of that gets deep into the weeds.

Contributor Author

richardjgowers Jan 11, 2024

this method is called from _execute() right after run() so it should always be the same machine

Contributor Author

richardjgowers Jan 11, 2024

it is a little ugly that it needs to know if the trj was stored in scratch/shared/permanent, so there's probably something there I need to fix

openfe/protocols/openmm_rfe/equil_rfe_methods.py

                           plt.close(f2)
                       f3 = plotting.plot_ligand_RMSD(data['time(ps)'], data['ligand_RMSD'])
-                      f3.savefig(savedir / "ligand_RMSD.png")
+                      f3.savefig(where / "ligand_RMSD.png")

Member

dwhswenson Jan 10, 2024

(not really this line, but this is the closest changed line)

You may want to include the figure filenames in the return value, too. Otherwise it will be harder to download them from the cloud.


          pass StagingPath objects back in Unit return

37d448c

richardjgowers added this to the 1.0.0 milestone


          fixed StagingPath type annotations

fb85a58

richardjgowers removed this from the 1.0.0 milestone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet