Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensemble members do not recognize breakpoints #3089

Closed
benjamin-cash opened this issue Nov 12, 2024 · 6 comments
Closed

Ensemble members do not recognize breakpoints #3089

benjamin-cash opened this issue Nov 12, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@benjamin-cash
Copy link

What is wrong?

Running with multiple ensemble members and breakpoints, but only mem000 appears to have recognized the breakpnt and stopped at the appropriate forecast hour.

What should have happened?

All members of the ensemble should have stopped at fhour 1472, which is when the breakpoint was set for. However, only mem000 did so - other members continued past the breakpoint and timed out.

What machines are impacted?

All or N/A

What global-workflow hash are you using?

077ad5f, from https://github.com/NeilBarton-NOAA/global-workflow/tree/SFS

Steps to reproduce

Generate ensemble workflow with breakpoints via https://github.com/NeilBarton-NOAA/global-workflow/tree/SFS
Run at least two members

Additional information

Log files from the successful mem000 and failed mem001 are too large to upload to github, so I have moved them to Hercules: /work/noaa/nems/cash/breakpoint_issue

Do you have a proposed solution?

No response

@benjamin-cash benjamin-cash added bug Something isn't working triage Issues that are triage labels Nov 12, 2024
@benjamin-cash
Copy link
Author

benjamin-cash commented Nov 12, 2024

Comparing the two log files, in mem000 it looks like breakpnt hours is being used in forecast_predet.sh (log line 6358), but in mem001 it is the FHMAX_GFS value (log line 6769).

@WalterKolczynski-NOAA
Copy link
Contributor

WalterKolczynski-NOAA commented Nov 12, 2024

Please retry with current develop. This was fixed in #3009

@WalterKolczynski-NOAA WalterKolczynski-NOAA removed the triage Issues that are triage label Nov 12, 2024
@benjamin-cash
Copy link
Author

Will do, thanks Walter. :)

@WalterKolczynski-NOAA
Copy link
Contributor

@benjamin-cash Did this work with the more recent hash?

@benjamin-cash
Copy link
Author

Hi @WalterKolczynski-NOAA - I have to look back at my notes to be sure, but I think I haven't been able to test it yet. I upgraded my whole experiment suite to a more recent version of the SFS branch and the SKEB settings caused all of my ensemble members to blow up almost immediately. I didn't think to check the log files to see if the breakpoints were being correctly identified while that was happening. Assuming the scrubber on Frontera didn't get them I'll take a look today and see.

@benjamin-cash
Copy link
Author

@WalterKolczynski-NOAA - I can confirm now, from the logs at least, that the ensemble members are picking up the breakpoints. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants