Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run benchmarks for 24h simulation time #844

Closed
wants to merge 1 commit into from
Closed

Conversation

juliasloan25
Copy link
Member

@juliasloan25 juliasloan25 commented Jun 8, 2024

Purpose

The benchmark runs have been set to run for 12 hours, but it looks like there's still some variability in SYPD at that point. Since we're iterating on the table output itself less now, it's probably worth it to increase the simulation length from 12 hours to 1 day.

The 2 original runs, AMIP and ClimaAtmos with diagnostic EDMF, seem to have a stable SYPD after 12 hours, but the newest ClimaAtmos without diagnostic EDMF seems to still have variability at 12 hours.

To-do

Content


  • I have read and checked the items on the review checklist.

@Sbozzolo
Copy link
Member

Could you please quantify the variability?

There shouldn't be much for a 12h run and we should understand why things fluctuate if they do.

@juliasloan25
Copy link
Member Author

Could you please quantify the variability?

There shouldn't be much for a 12h run and we should understand why things fluctuate if they do.

Here are the SYPDs for the 3 runs [coupled, atmos with diag. edmf, atmos without diag. edmf], in 3 builds from the last week all using the same package versions and no performance changes in ClimaCoupler:

In Atmos, @szy21 has seen that the output SYPDs sometimes take up to 24 hours of simulation time to converge to the number that we take to be the accurate measurement, so we may need to increase our runtime here too

@Sbozzolo
Copy link
Member

Sbozzolo commented Jun 10, 2024

The difference is less than 2 %. It is very reasonable variability to have and I don't think we should be concerned with reducing it further. Such variability could even be due to the physical temperature of the device or with how processes are distributed by the operating system.

If we want to have a really accurate measurament, we would have to run a statistically significant number of runs (as we do for the bucket in ClimaLand) and take statistics out of it.

@szy21
Copy link
Member

szy21 commented Jun 10, 2024

In the atmos now we run most of the simulations for 12 hours for scaling. The difference I saw is also within 2% so I think it's ok?

@juliasloan25
Copy link
Member Author

Okay, I won't change it here then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants