-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensemble members do not recognize breakpoints #3089
Comments
Comparing the two log files, in mem000 it looks like |
Please retry with current develop. This was fixed in #3009 |
Will do, thanks Walter. :) |
@benjamin-cash Did this work with the more recent hash? |
Hi @WalterKolczynski-NOAA - I have to look back at my notes to be sure, but I think I haven't been able to test it yet. I upgraded my whole experiment suite to a more recent version of the SFS branch and the SKEB settings caused all of my ensemble members to blow up almost immediately. I didn't think to check the log files to see if the breakpoints were being correctly identified while that was happening. Assuming the scrubber on Frontera didn't get them I'll take a look today and see. |
@WalterKolczynski-NOAA - I can confirm now, from the logs at least, that the ensemble members are picking up the breakpoints. Thanks! |
What is wrong?
Running with multiple ensemble members and breakpoints, but only mem000 appears to have recognized the breakpnt and stopped at the appropriate forecast hour.
What should have happened?
All members of the ensemble should have stopped at fhour 1472, which is when the breakpoint was set for. However, only mem000 did so - other members continued past the breakpoint and timed out.
What machines are impacted?
All or N/A
What global-workflow hash are you using?
077ad5f, from https://github.com/NeilBarton-NOAA/global-workflow/tree/SFS
Steps to reproduce
Generate ensemble workflow with breakpoints via https://github.com/NeilBarton-NOAA/global-workflow/tree/SFS
Run at least two members
Additional information
Log files from the successful mem000 and failed mem001 are too large to upload to github, so I have moved them to Hercules: /work/noaa/nems/cash/breakpoint_issue
Do you have a proposed solution?
No response
The text was updated successfully, but these errors were encountered: