Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: prevent unwanted subprocess termination #1079

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

ollie-bell
Copy link
Contributor

@ollie-bell ollie-bell commented Aug 10, 2024

Closes # (if applicable).

#839 (comment)

Changes proposed in this Pull Request

See comment #839 (comment)

Checklist

  • I consent to the release of this PR's code under the AGPLv3 license and non-code contributions under CC0-1.0 and CC-BY-4.0.
  • I tested my contribution locally and it seems to work fine.
  • Code and workflow changes are sufficiently documented.
  • Newly introduced dependencies are added to envs/environment.yaml and doc/requirements.txt.
  • Changes in configuration options are added in all of config.default.yaml and config.tutorial.yaml.
  • Add a test config or line additions to test/ (note tests are changing the config.tutorial.yaml)
  • Changes in configuration options are also documented in doc/configtables/*.csv and line references are adjusted in doc/configuration.rst and doc/tutorial.rst.
  • A note for the release notes doc/release_notes.rst is amended in the format of previous release notes, including reference to the requested PR.

@davide-f
Copy link
Member

Hello @ollie-bell :D
This PR got a bit stuck and apologies! If interested, you are also welcome to the initiative for discussions in case :)

I see this is an attempt to basically disable the cutout check.
I was wondering if instead of having the call check, a simple config parameter may help.
For example, in the run section, we could have a parameter like run->allow_cutout_overwrite that is false by default and can be enabled to disable the check that may create problems in some runs.
What do you think?

@ollie-bell
Copy link
Contributor Author

ollie-bell commented Sep 18, 2024

Hello @ollie-bell :D This PR got a bit stuck and apologies! If interested, you are also welcome to the initiative for discussions in case :)

I see this is an attempt to basically disable the cutout check. I was wondering if instead of having the call check, a simple config parameter may help. For example, in the run section, we could have a parameter like run->allow_cutout_overwrite that is false by default and can be enabled to disable the check that may create problems in some runs. What do you think?

Hey no worries!

So it doesn't quite disable the cutout check. It only disables the check if snakemake is being run as a subprocess (i.e. snakemake calling itself).

The behaviour it avoids is something like this (although it's been a while since I thought about it now!):

  1. You don't have a cutout built and you have build_cutout parameter set to True.
  2. You run the workflow and at some point the cutout gets built when it hits the build_cutout rule
  3. When the workflow hit's a subsequent rule with a run or shadow statement, snakemake will spawn a nested subprocess (for whatever reason, that's just how it works).
  4. That subprocess targets a specific rule but it will rerun python code in the snakefile again, including the cutout termination check, so it will see cutout already exists and will trigger the failure even though the rule it was supposed to run was not the build_cutout rule. The whole workflow then fails.

If instead you just want to remove the cutout check functional call entirely and replace it with a config param then that's also a solution for sure, and this PR will not be necessary. I just proposed this change which didn't change the overall behaviour or introduce any new configuration.

@davide-f
Copy link
Member

davide-f commented Sep 18, 2024

Ah now I see.
Not sure if I see this right, but this is seems a customized solution for your case where the runs are executed with a special order and the specific first always create the right cutout.

However, your question raises the problem whether we want this check to be performed always, which is a good point.
Some users, including you, do not like this feature.
An option enable/disable the check by config may be interesting for some.
Maybe I'm wrong, but may my proposal tackle your same problem from a different angle?
I'd be in favor of such option

@ollie-bell
Copy link
Contributor Author

ollie-bell commented Sep 18, 2024

So I think what you mean is if you modify these lines

pypsa-earth/Snakefile

Lines 348 to 349 in f660a3a

if config["enable"].get("build_cutout", False):
terminate_if_cutout_exists(config)

to something like:

if config["enable"].get("build_cutout", False): 
    if not config["run"].get("allow_cutout_overwrite", False):
        terminate_if_cutout_exists(config) 

I think the problem my PR addresses will still exist if this is the case.

You might not want to allow cutout overwrite if one already exists, so you'd set allow_cutout_overwrite to False and the terminate_if_cutout_exists code will get run. However, you still may be in a position where the cutout doesn't exist when you run snakemake and you want the rule to create it, but then once the build_cutout rule has been run and the cutout now exists, the job will then fail when it gets to any subsequent rule with a run or shadow statement (copy_defaultnatura_tiff, retrieve_cost_data, solve_network or run_scenario)

@davide-f
Copy link
Member

Mmm so in plain terms, what do you want to achieve is that the first execution creates the cutout and the following do not right?
With the premature (intended) return statement in the terminate_..._exists basically you disable the check when the subprocesses are called.

It appears to me that the behavior is no different than calling the scenarios by simply disabling the check on the cutout, no?

@ollie-bell
Copy link
Contributor Author

Ah yes if you remove the check entirely (as in, delete the terminate_if_cutout_exists function from the Snakefile) then indeed that does achieve the same thing!

@davide-f
Copy link
Member

Ah yes if you remove the check entirely (as in, delete the terminate_if_cutout_exists function from the Snakefile) then indeed that does achieve the same thing!

Just to further follow up, I was wondering if you were interested in such feature

@ollie-bell
Copy link
Contributor Author

ollie-bell commented Sep 29, 2024

Hi @davide-f. I've added a test snakefile which is a minimal reproduction of the issue. You can run it like so:

snakemake --snakefile test/test_cutout_termination.smk -c1 --config subprocess_check=0

I have included detailed output below which shows that the run fails after the build_cutout rule has already completed. Changing subprocess_check=0 to subprocess_check=1 to apply the fix results in the test running successfully.

If you have another way you'd like to address the issue then absolutely no problem!

Without fix:
Building DAG of jobs...
Using shell: /opt/homebrew/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job              count
-------------  -------
build_cutout         1
remove_cutout        1
total                2

Select jobs to execute...

[Sun Sep 29 15:55:48 2024]
rule build_cutout:
    output: cutout.txt
    jobid: 1
    reason: Missing output files: cutout.txt
    resources: tmpdir=/var/folders/gs/wmf0yh194f77d4k9694p85p00000gp/T

[Sun Sep 29 15:55:48 2024]
Finished job 1.
1 of 2 steps (50%) done
Select jobs to execute...

[Sun Sep 29 15:55:48 2024]
rule remove_cutout:
    input: cutout.txt
    jobid: 0
    reason: Rules with a run or shell declaration but no output are always executed.
    resources: tmpdir=/var/folders/gs/wmf0yh194f77d4k9694p85p00000gp/T

Exception in file /Users/ollieb/Documents/pypsa-earth/test/test_cutout_termination, line 25:
A cutout file cutout.txt exists and risks to be overwritten. If this is an intended behavior, please move or delete this file and re-run the rule.
  File "/Users/ollieb/Documents/pypsa-earth/test/test_cutout_termination", line 31, in <module>
  File "/Users/ollieb/Documents/pypsa-earth/test/test_cutout_termination", line 25, in terminate_if_cutout_exists
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-09-29T155548.095661.snakemake.log
With fix:
Building DAG of jobs...
Using shell: /opt/homebrew/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job              count
-------------  -------
build_cutout         1
remove_cutout        1
total                2

Select jobs to execute...

[Sun Sep 29 16:03:06 2024]
rule build_cutout:
    output: cutout.txt
    jobid: 1
    reason: Missing output files: cutout.txt
    resources: tmpdir=/var/folders/gs/wmf0yh194f77d4k9694p85p00000gp/T

[Sun Sep 29 16:03:06 2024]
Finished job 1.
1 of 2 steps (50%) done
Select jobs to execute...

[Sun Sep 29 16:03:06 2024]
rule remove_cutout:
    input: cutout.txt
    jobid: 0
    reason: Rules with a run or shell declaration but no output are always executed.
    resources: tmpdir=/var/folders/gs/wmf0yh194f77d4k9694p85p00000gp/T

Building DAG of jobs...
Using shell: /opt/homebrew/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Select jobs to execute...
[Sun Sep 29 16:03:06 2024]
Finished job 0.
2 of 2 steps (100%) done
Complete log: .snakemake/log/2024-09-29T160306.064004.snakemake.log

Copy link
Member

@davide-f davide-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ollie-bell ! :D
Alternative approaches are possible and solves your issue. We are happy to welcome it :)

We have two minor comments and happy to merge this afterwards :D

@@ -36,6 +36,7 @@ PyPSA-Earth 0.4.1
* Fixed problematic float parsing (`_parse_float`) in `clean_osm_data.py` to make sure all OSM lines are correctly accounted for `PR #1089 <https://github.com/pypsa-meets-earth/pypsa-earth/pull/1089>`__
* Fix minor bug for advanced csp implementation `PR #1076 <https://github.com/pypsa-meets-earth/pypsa-earth/pull/1076>`__
* Fix minor bug in `build_powerplants.py` where the gas technology assignment incorrectly introduced NaN values for all powerplant technologies. `PR #1102 <https://github.com/pypsa-meets-earth/pypsa-earth/pull/1102>`__
* Fix unwanted termination due to existing cutout during snakemake subprocess calls `PR #1079 <https://github.com/pypsa-meets-earth/pypsa-earth/pull/1079>`__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind moving this up? :)

@@ -0,0 +1,50 @@
# SPDX-FileCopyrightText: PyPSA-Earth and PyPSA-Eur Authors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for the test example! :)
Do you think it shall be included or can we remove it? [less material to maintain]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants