Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Profile data mirroring #6723

Draft
wants to merge 60 commits into
base: main
Choose a base branch
from

Conversation

GeigerJ2
Copy link
Contributor

@GeigerJ2 GeigerJ2 commented Jan 22, 2025

Design questions:

  • If a Node itself is not created newly, but only added to a new group, and the mirroring is run, it is not dumped, as the node is not new... does adding to a group affect its mtime? Possibly not. Other way to get this change, e.g., check collections since the last dump?
  • If a sub-workflow/calculation of another workflow is put into its own group, no folder for this group is created with the default settings
  • When I delete a group, but don't delete its nodes, and I re-run the mirroring, this change is not picked up, as the filtering is done by the mtime of the node -> Currently working on this.
  • Three dumping modes:
    • incremental=False, overwrite=False: Will error out with FileExistsError if the directory exists, goes through if doesn't exist or empty.
    • incremental=True, overwrite=False: Will keep original main directory, but update subdirectories with new data.
    • incremental=False, overwrite=True: Will clean main directory and perform the dumping again from scratch.
    • If both options are set to True, --overwrite will take precedence, and a report message will be issued to the user. This is because --incremental is by default True, as it is the most sensible option, and should not be required to be always specified. However, if also --overwrite is set, we don't raise an exception (as I had it initially implemented), as that would require the user to always pass --overwrite --no-incremental, which is annoying. Automatically setting --incremental to False if --overwrite is specified could be handled by a click callback, but for now I just change the options on the fly at a later stage in the code.
  • Ways to specify the output path for dumping/mirroring (of all, processes, groups, and the profile data):
    • Passing no value should generate a sensible default output path in all cases.
    • If a relative path is given, this should be created under the CWD.
    • If an absolute path is given, this should be used as the top-level directory of the dumping.
    • These three options should be handled in the same way via the verdi CLI, as well as the top-level dump method of each Dumper class, so that the path can be set accordingly via the Python API.
    • To achieve this, internally, the path is split into the dump_parent_path (absolute, defaults to CWD) and an output_path part (relative, either provided by the user, or automatically generated), which, combined, yield the full top-level path where the files are dumped.

General notes:

  • What happens if I delete a calculation that was called by another workchain, from AiiDA's DB, and I run with the --delete-missing option? -> Possibly use graph_traversal_rules like for verdi node delete when updating directories after a node was deleted.
  • What to do if group gets deleted and verdi profile mirror --delete-missing is executed? Should also keep track of the groups in the DumpLogger, and delete the directory in that case.
  • dump_parent_path is the CWD from which the dumping/mirroring command is called, while dump still provides an output_path parameter to denote the directory name of the profile, group, or process that will be dumped. This is optional, and if not provided by the user, it will be auto-generated.
  • Possibly use graph_traversal_rules and add get_nodes_dump to src/aiida/tools/graph/graph_traversers.py, as well as AiidaEntitySet from src/aiida/tools/graph/age_entities.py, etc., to first obtain the nodes, and then run the dumping.

(Possible) future TODOs:

  • Allow specifying options via config file
  • Add data dumping (raw/rich)
  • Support batch operations?
  • Keep track of symlinks and/or dumped node types in DumpLogger?
  • Expose endpoint to CollectionDumper and allow for mixed node types
  • Add option to check directory existence/mtime/contents to determine what to do for incremental dumping

Bugs

  • README for dumped processes in wrong (too high) directory

@GeigerJ2 GeigerJ2 force-pushed the feature/verdi-profile-mirror branch 3 times, most recently from 9597527 to 1c4b67b Compare January 23, 2025 16:07
@GeigerJ2 GeigerJ2 force-pushed the feature/verdi-profile-mirror branch from ce20e4c to 2dfe2ca Compare January 28, 2025 16:26
@GeigerJ2 GeigerJ2 force-pushed the feature/verdi-profile-mirror branch 3 times, most recently from 09e6f01 to 074b053 Compare February 13, 2025 17:13
GeigerJ2 and others added 22 commits February 17, 2025 14:56
Either in groups, or not associated with any group.
Either sorted by groups, or in a top-level flat hierarchy.
"De-duplication" works by symlinking calculations if they are part
of a workflow.

Next, check what happens if a workflow is part of two groups -> Here,
de-deplucation should actually make more sense.
Add `BaseDumper`, `ProfileDumper` and `CollecionDumper` -> `GroupDumper`
Remove code related to data and rich dumping
- Use the `BaseDumper` instead of passing arguments to the
  `ProcessDumper`
- Append PKs to the test output paths and use `aiida_profile_clean`
  fixture for reproducible results
And back to `CollectionDumper`
@GeigerJ2 GeigerJ2 force-pushed the feature/verdi-profile-mirror branch from 06bc55e to b795eda Compare February 17, 2025 16:30
@GeigerJ2 GeigerJ2 force-pushed the feature/verdi-profile-mirror branch from 7cccc3a to 99c7276 Compare February 19, 2025 11:10
@GeigerJ2 GeigerJ2 force-pushed the feature/verdi-profile-mirror branch from f1cce61 to 2ea72f9 Compare February 19, 2025 15:47
@GeigerJ2 GeigerJ2 force-pushed the feature/verdi-profile-mirror branch from 6d9c50d to 8971fbf Compare February 20, 2025 09:09
@GeigerJ2 GeigerJ2 force-pushed the feature/verdi-profile-mirror branch from b9fd668 to b4cc58b Compare February 20, 2025 09:34
@GeigerJ2 GeigerJ2 force-pushed the feature/verdi-profile-mirror branch from e1be6cb to 09c434c Compare February 20, 2025 17:03
@GeigerJ2 GeigerJ2 force-pushed the feature/verdi-profile-mirror branch from ba25a5c to 568af45 Compare February 25, 2025 16:36
@GeigerJ2 GeigerJ2 force-pushed the feature/verdi-profile-mirror branch from 02acb56 to 8e0cfdc Compare February 25, 2025 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant