WIP: Profile data mirroring #6723

GeigerJ2 · 2025-01-22T12:56:30Z

Design questions:

If a Node itself is not created newly, but only added to a new group, and the mirroring is run, it is not dumped, as the node is not new... does adding to a group affect its mtime? Possibly not. Other way to get this change, e.g., check collections since the last dump?
If a sub-workflow/calculation of another workflow is put into its own group, no folder for this group is created with the default settings
When I delete a group, but don't delete its nodes, and I re-run the mirroring, this change is not picked up, as the filtering is done by the mtime of the node -> Currently working on this.
Three dumping modes:
- incremental=False, overwrite=False: Will error out with FileExistsError if the directory exists, goes through if doesn't exist or empty.
- incremental=True, overwrite=False: Will keep original main directory, but update subdirectories with new data.
- incremental=False, overwrite=True: Will clean main directory and perform the dumping again from scratch.
- If both options are set to True, --overwrite will take precedence, and a report message will be issued to the user. This is because --incremental is by default True, as it is the most sensible option, and should not be required to be always specified. However, if also --overwrite is set, we don't raise an exception (as I had it initially implemented), as that would require the user to always pass --overwrite --no-incremental, which is annoying. Automatically setting --incremental to False if --overwrite is specified could be handled by a click callback, but for now I just change the options on the fly at a later stage in the code.
Ways to specify the output path for dumping/mirroring (of all, processes, groups, and the profile data):
- Passing no value should generate a sensible default output path in all cases.
- If a relative path is given, this should be created under the CWD.
- If an absolute path is given, this should be used as the top-level directory of the dumping.
- These three options should be handled in the same way via the verdi CLI, as well as the top-level dump method of each Dumper class, so that the path can be set accordingly via the Python API.
- To achieve this, internally, the path is split into the dump_parent_path (absolute, defaults to CWD) and an output_path part (relative, either provided by the user, or automatically generated), which, combined, yield the full top-level path where the files are dumped.

General notes:

What happens if I delete a calculation that was called by another workchain, from AiiDA's DB, and I run with the --delete-missing option? -> Possibly use graph_traversal_rules like for verdi node delete when updating directories after a node was deleted.
What to do if group gets deleted and verdi profile mirror --delete-missing is executed? Should also keep track of the groups in the DumpLogger, and delete the directory in that case.
dump_parent_path is the CWD from which the dumping/mirroring command is called, while dump still provides an output_path parameter to denote the directory name of the profile, group, or process that will be dumped. This is optional, and if not provided by the user, it will be auto-generated.
Possibly use graph_traversal_rules and add get_nodes_dump to src/aiida/tools/graph/graph_traversers.py, as well as AiidaEntitySet from src/aiida/tools/graph/age_entities.py, etc., to first obtain the nodes, and then run the dumping.

(Possible) future TODOs:

Allow specifying options via config file
Add data dumping (raw/rich)
Support batch operations?
Keep track of symlinks and/or dumped node types in DumpLogger?
Expose endpoint to CollectionDumper and allow for mixed node types
Add option to check directory existence/mtime/contents to determine what to do for incremental dumping

Bugs

README for dumped processes in wrong (too high) directory

Take it from here

Either in groups, or not associated with any group. Either sorted by groups, or in a top-level flat hierarchy. "De-duplication" works by symlinking calculations if they are part of a workflow. Next, check what happens if a workflow is part of two groups -> Here, de-deplucation should actually make more sense.

for more information, see https://pre-commit.ci

Add `BaseDumper`, `ProfileDumper` and `CollecionDumper` -> `GroupDumper` Remove code related to data and rich dumping

for more information, see https://pre-commit.ci

- Use the `BaseDumper` instead of passing arguments to the `ProcessDumper` - Append PKs to the test output paths and use `aiida_profile_clean` fixture for reproducible results

for more information, see https://pre-commit.ci

And back to `CollectionDumper`

for more information, see https://pre-commit.ci

…oup relabel.

for more information, see https://pre-commit.ci

…ry point.

for more information, see https://pre-commit.ci

…to False.

for more information, see https://pre-commit.ci

GeigerJ2 force-pushed the feature/verdi-profile-mirror branch 3 times, most recently from 9597527 to 1c4b67b Compare January 23, 2025 16:07

GeigerJ2 mentioned this pull request Jan 28, 2025

Add ArithmeticAdd CJ Node fixture without run or submit #6733

Open

GeigerJ2 force-pushed the feature/verdi-profile-mirror branch from ce20e4c to 2dfe2ca Compare January 28, 2025 16:26

GeigerJ2 force-pushed the feature/verdi-profile-mirror branch 3 times, most recently from 09e6f01 to 074b053 Compare February 13, 2025 17:13

GeigerJ2 and others added 22 commits February 17, 2025 14:56

Move dumping test fixtures to conftest.py

a238052

First working version with DataDumper and CollectionDumper

c601cc2

Take it from here

[pre-commit.ci] auto fixes from pre-commit.com hooks

78c7239

for more information, see https://pre-commit.ci

Major code refactor

223b35a

Add `BaseDumper`, `ProfileDumper` and `CollecionDumper` -> `GroupDumper` Remove code related to data and rich dumping

[pre-commit.ci] auto fixes from pre-commit.com hooks

b2d9474

for more information, see https://pre-commit.ci

Symlinking of workflows between groups works.

4bee149

[pre-commit.ci] auto fixes from pre-commit.com hooks

709f68e

for more information, see https://pre-commit.ci

Fix verdi process dump tests

fa46ec5

- Use the `BaseDumper` instead of passing arguments to the `ProcessDumper` - Append PKs to the test output paths and use `aiida_profile_clean` fixture for reproducible results

Fix mypy complaints

132fbac

Start to work on group testing

c49f7c6

[pre-commit.ci] auto fixes from pre-commit.com hooks

4b28016

for more information, see https://pre-commit.ci

Add ArithmeticAdd CJ Node fixture without run

667b81a

[pre-commit.ci] auto fixes from pre-commit.com hooks

35b4415

for more information, see https://pre-commit.ci

First tests for node collection dumping

782f735

And back to `CollectionDumper`

[pre-commit.ci] auto fixes from pre-commit.com hooks

ddd1fb9

for more information, see https://pre-commit.ci

Improve logging and add dry-run feature.

61dcabb

BaseDumper dataclass. get_processes return dict. Extend tests.

9f65e18

Add ProcessesToDump NamedTuple

e60425d

Use compare_tree utility function for dumping tests

2bd3925

[pre-commit.ci] auto fixes from pre-commit.com hooks

2d9cff9

for more information, see https://pre-commit.ci

Start making test methods smaller

614b086

First version of renaming group paths and relabeling the Logger on gr…

b795eda

…oup relabel.

GeigerJ2 force-pushed the feature/verdi-profile-mirror branch from 06bc55e to b795eda Compare February 17, 2025 16:30

pre-commit-ci bot and others added 2 commits February 17, 2025 16:32

[pre-commit.ci] auto fixes from pre-commit.com hooks

69cd541

for more information, see https://pre-commit.ci

Moved click options to options/main.py and added group mirror CLI ent…

b5fa6a8

…ry point.

agoscinski assigned GeigerJ2 Feb 19, 2025

Improve path handling and incremental/overwrite options.

99c7276

GeigerJ2 force-pushed the feature/verdi-profile-mirror branch from 7cccc3a to 99c7276 Compare February 19, 2025 11:10

Improve path handling and fix _some_ of the bugs.

2ea72f9

GeigerJ2 force-pushed the feature/verdi-profile-mirror branch from f1cce61 to 2ea72f9 Compare February 19, 2025 15:47

pre-commit-ci bot and others added 3 commits February 19, 2025 15:48

[pre-commit.ci] auto fixes from pre-commit.com hooks

394ebcb

for more information, see https://pre-commit.ci

prepare_dump_path check if symlink. --symlink-duplicates default …

06891e3

…to False.

Fix bug in subdirectories duplicated for profile mirror.

8971fbf

GeigerJ2 force-pushed the feature/verdi-profile-mirror branch from 6d9c50d to 8971fbf Compare February 20, 2025 09:09

Add --only-groups option to profile mirror.

b4cc58b

GeigerJ2 force-pushed the feature/verdi-profile-mirror branch from b9fd668 to b4cc58b Compare February 20, 2025 09:34

Dumping of import groups still buggy...

09c434c

GeigerJ2 force-pushed the feature/verdi-profile-mirror branch from e1be6cb to 09c434c Compare February 20, 2025 17:03

pre-commit-ci bot and others added 7 commits February 20, 2025 17:03

[pre-commit.ci] auto fixes from pre-commit.com hooks

92479cc

for more information, see https://pre-commit.ci

Synchronize local and remote changes.

59def9b

[pre-commit.ci] auto fixes from pre-commit.com hooks

942ebdb

for more information, see https://pre-commit.ci

Working on fixing bugs.

984f71f

[pre-commit.ci] auto fixes from pre-commit.com hooks

6972840

for more information, see https://pre-commit.ci

Currently working version. Commit before I break it again...

5618c19

Commit before refactor to CollectionDumper inheritance.

568af45

GeigerJ2 force-pushed the feature/verdi-profile-mirror branch from ba25a5c to 568af45 Compare February 25, 2025 16:36

pre-commit-ci bot and others added 3 commits February 25, 2025 16:36

[pre-commit.ci] auto fixes from pre-commit.com hooks

29a02f2

for more information, see https://pre-commit.ci

VSC lost my CollectionDumper :(

65f30cf

fml

8e0cfdc

GeigerJ2 force-pushed the feature/verdi-profile-mirror branch from 02acb56 to 8e0cfdc Compare February 25, 2025 17:50

[pre-commit.ci] auto fixes from pre-commit.com hooks

38d2940

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Profile data mirroring #6723

WIP: Profile data mirroring #6723

GeigerJ2 commented Jan 22, 2025 •

edited

Loading

WIP: Profile data mirroring #6723

Are you sure you want to change the base?

WIP: Profile data mirroring #6723

Conversation

GeigerJ2 commented Jan 22, 2025 • edited Loading

Design questions:

General notes:

(Possible) future TODOs:

Bugs

GeigerJ2 commented Jan 22, 2025 •

edited

Loading