Add robustness towards pipeline.config change of an inited pipeline. #46

varisd · 2024-08-12T12:54:13Z

Right now, adjusting pipeline.config (generated during pipeline init) does not have any effect on the existing pipeline (serves only an informational purpose).

A feasible feature would be a support of editing pipeline.config. This would require:

identifying steps which have step.parameters different from the pipeline.config definition.
identifying whether the current directory structure (and individual step.dependencies) reflect the pipeline graph defined in pipeline.config
checking for correctness of the edited pipeline.config

What should be the default response when "inconsinstencies" are detected? Do we:

only report the inconsistencies (and leave the new initialization to the user)?
reinitialize "in a dumb way", i.e. deleting the pipeline directory step subdirs and reinitialize?
implement some smart reinitialization of only the parts that were changed?
Personally, I suggest either 1) or 2) since the initialization process is fairly quick so it does not have to be super optimized.

bhaddow · 2024-08-12T13:34:39Z

The third option would be the behaviour I would expect of a tool like this (eg make, snakemake). If a step has changed, then it should be re-run. If other steps are dependent on this (according to the DAG) then they should be rerun. If I fix the evaluation script, I really don't want the pipeline to re-run training.

Options 1 and 2 may even make the situation slightly worse. At the moment, if I make a change to the pipeline config, I have to delete and rerun. If I make a trivial change that I know will not affect the rerunning, then I can avoid the delete. If we implement options 1 or 2 then the system may force me to delete and rerun (even though I know that delete is not needed).

Connected with this issue, why do we have init and run as separate commands? I would think that run and dry-run would make more sense. dry-run is the same as run, except that it does not execute anything.

varisd · 2024-08-12T13:44:20Z

You make a good point about the rerunning of modified steps. In practice, codewise, it should not be more complicated, only inefficient part would be reinitializing of steps that have the change step as dependency (the graph has only unidirectional pointers at the moment). The usual graphs should usually be rather small so that should not be a big issue.

Regarding changing "init" to "dry-run" and having "run" command to do both initialization and running: I support this change since right now there is really no practical reason to "execute" the pipeline in two steps (init and run). The CLI mainly copied the internal implementation, where separating init and run kinda makes sense.
I propose opening another issue regarding this change - it only requires some adjustments to the opuspocus_cli/ scripts (rest of the codebase can and should stay the same).

varisd added feature New feature or request help wanted Extra attention is needed question Further information is requested labels Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add robustness towards pipeline.config change of an inited pipeline. #46

Add robustness towards pipeline.config change of an inited pipeline. #46

varisd commented Aug 12, 2024

bhaddow commented Aug 12, 2024

varisd commented Aug 12, 2024

Add robustness towards pipeline.config change of an inited pipeline. #46

Add robustness towards pipeline.config change of an inited pipeline. #46

Comments

varisd commented Aug 12, 2024

bhaddow commented Aug 12, 2024

varisd commented Aug 12, 2024