Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run slabplanet on gpu #529

Merged
merged 7 commits into from
Feb 1, 2024
Merged

run slabplanet on gpu #529

merged 7 commits into from
Feb 1, 2024

Conversation

juliasloan25
Copy link
Member

@juliasloan25 juliasloan25 commented Nov 30, 2023

Purpose

go through coupler_driver_modular.jl, update functions that won't run on GPU so that the slabplanet setup will run on GPU.

Status

This currently runs on clima, but not on central due to a too-large parameter getting passed to the gpu
See CliMA/ClimaCore.jl#1597

closes #530

To-do

  • find better binary_mask solution
  • clean up
  • add GPU slabplanet case to buildkite
  • check AMIP paperplots change - why?
  • remove Ref() in atmos_init

Content

  • update mpi_init
    • simplified, moved to driver
  • update atmos_init 0-setting
    • shouldn't use parent, but we can fix that later
  • Regridder use CPU context for file reading/writing
    • fixes unique_nodes error
  • don't pass FT as a argument (try to get from e.g. space instead)
    • ex. in slab_ocean_space_init, but probably more places too
  • pi is of type Irrational(:pi), which isn't isbits (should be wrapped in FT)
    • in slab_ocean_space_init
  • binary_mask is not isbits
    • when threshold = eps(FT) or 0, can replace with clamp, but need other solution for other thresholds
  • bug: anom_ampl undefined in this line of slab_ocean_space_init, code never reaches this point because anomaly = false always
    • function changed to remove unused code branches

Specific todos (later work - documented in SDI)

  • look for places where FT passed as argument
  • look for places where pi not wrapped in FT
  • remove use of parent
  • fix plotting (make sure on CPU not GPU)

Notes

  • Bucket model is not entirely GPU-compatible (esp. file reading - BulkAlbedoTemporal, BulkAlbedoStatic)
    • need to fix in ClimaLSM before coupler can be GPU-compatible
    • for now, can test with BulkAlbedoFunction, which is GPU-compatible

  • I have read and checked the items on the review checklist.

@Sbozzolo
Copy link
Member

Sbozzolo commented Dec 1, 2023

Regridder unique_nodes error - ? reminds me of something I fixed in ClimaLSM. I don't remember the details, but it could have been in the first PR.

@juliasloan25 juliasloan25 changed the title initial gpu changes initial gpu changes - slabplanet Jan 22, 2024
@juliasloan25 juliasloan25 changed the title initial gpu changes - slabplanet run slabplanet on gpu Jan 22, 2024
@juliasloan25 juliasloan25 self-assigned this Jan 22, 2024
@juliasloan25 juliasloan25 force-pushed the js/gpu branch 2 times, most recently from 3a8caa4 to 6cf7daf Compare January 22, 2024 23:32
Comment on lines 230 to 247
T_sfc_0 = FT(271.0)
if land_temperature_anomaly == "zonally_asymmetric"
Y.bucket.T = map(coords.subsurface) do coord
radlat = coord.lat / FT(180) * pi
ΔT = FT(0)

anom_ampl = FT(0)# this is zero, no anomaly
lat_0 = FT(60) / FT(180) * pi
lon_0 = FT(-90) / FT(180) * pi
radlon = coord.long / FT(180) * pi
stdev = FT(5) / FT(180) * pi
ΔT = anom_ampl * exp(-((radlat - lat_0)^2 / 2stdev^2 + (radlon - lon_0)^2 / 2stdev^2))
elseif land_temperature_anomaly == "aquaplanet"
T_sfc_0 + ΔT
end
elseif land_temperature_anomaly == "aquaplanet"
Y.bucket.T = map(coords.subsurface) do coord
ΔT = FT(29) * exp(-coord.lat^2 / (2 * 26^2))
elseif land_temperature_anomaly == "amip"
T_sfc_0 + ΔT
end
elseif land_temperature_anomaly == "amip"
Y.bucket.T = map(coords.subsurface) do coord
ΔT = FT(40 * cos(radlat)^4)
else
T_sfc_0 + ΔT
end
else
Y.bucket.T = map(coords.subsurface) do coord
ΔT = FT(0)
T_sfc_0 + ΔT
end
T_sfc_0 + ΔT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section could be improved:

  • There's lots of hardcoded values (and there's no explanation where they come from
  • You are doing dispatch over strings that identify a small set with finite number of possible options, with a fallback. There are two problems with this:
    1. The fallback might lead to bugs that are very hard to find. For example, land_temperature_anomaly were "zonaly_asymmetric" (ie, there's a typo), the code will follow the else branch and do something completely different compared to the expected behavior.
    2. Even without the fallback, the pattern is not very robust/scalable, and doesn't provide filesafes. The standard way to do this in other languages is to use [Enum](https://docs.julialang.org/en/v1/base/base/#Base.Enums.@enum)s where you define the set of allowed options. I don't think this is done as much in Julia, where we prefer dispatching over types. The advantage of using Enums is that they scale better with more options, do not rely on strings. They also provide a clear place where all the options are collected.
  • There's a lot of redundancy that can be removed. Instead of putting the map in each branch, you could do something like this:
aquaplanet_T(coord) = XXX
amip_T(coord) = YYY
[...]
T_functions = Dict("aquaplanet" => aquaplanet_T, "amip" => ampi_T, ....)
haskey(T_functions, land_temperature_anomaly) || error("T function not supported")
T_func = T_functions[land_temperature_anomaly]
Y.bucket.T  .= T_func.(coords.subsurface)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LenkaNovak I'm not sure where the hardcoded values come from, do you have any input?

Copy link
Collaborator

@LenkaNovak LenkaNovak Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It comes from various experiments we did to match observed or ClimaAtmos prescribed values as initial conditions. zonally_asymmetric is not being exercised anymore, so feel free to remove it. aquaplanet follows ClimaAtmos PrognosticSurface initial condition (@szy21 may know which paper these values come from, or if it's based on climatology) and amip vaguely follows the observed temperature and is what was found most stable (we could replace this with a file read as well, but not as part of this PR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zonal symmetric aquaplanet SST is from the moist held-suarez paper eq. 6: https://gmd.copernicus.org/articles/9/1263/2016/gmd-9-1263-2016.pdf

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document this in Atmos as well. :)

Copy link
Member Author

@juliasloan25 juliasloan25 Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some comments explaining the aquaplanet and AMIP cases, and removed the zonally asymmetric case. Let me know if the comments aren't sufficient

Copy link
Collaborator

@LenkaNovak LenkaNovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After addressing Gabriele's comments, I only have a few minor ones, thank you, @juliasloan25. I'm also noting a small difference in the AMIP paperplots from the last merge. It doesn't look wrong, it's just slightly different. Would we not expect no behavioral change?

Do we want to open a new issue on adding the albedo read from file to the GPU AMIP runs, once that's sorted in ClimaLSM?

@juliasloan25 juliasloan25 force-pushed the js/gpu branch 3 times, most recently from d613b0e to ccb772e Compare January 29, 2024 20:49
Copy link
Collaborator

@LenkaNovak LenkaNovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @juliasloan25. Following @Sbozzolo 's comments, I only have minor comments. We should also open an issue on reducing the allocations with temporaty fields, if we all agree this would help. Thank you!

Comment on lines 230 to 247
T_sfc_0 = FT(271.0)
if land_temperature_anomaly == "zonally_asymmetric"
Y.bucket.T = map(coords.subsurface) do coord
radlat = coord.lat / FT(180) * pi
ΔT = FT(0)

anom_ampl = FT(0)# this is zero, no anomaly
lat_0 = FT(60) / FT(180) * pi
lon_0 = FT(-90) / FT(180) * pi
radlon = coord.long / FT(180) * pi
stdev = FT(5) / FT(180) * pi
ΔT = anom_ampl * exp(-((radlat - lat_0)^2 / 2stdev^2 + (radlon - lon_0)^2 / 2stdev^2))
elseif land_temperature_anomaly == "aquaplanet"
T_sfc_0 + ΔT
end
elseif land_temperature_anomaly == "aquaplanet"
Y.bucket.T = map(coords.subsurface) do coord
ΔT = FT(29) * exp(-coord.lat^2 / (2 * 26^2))
elseif land_temperature_anomaly == "amip"
T_sfc_0 + ΔT
end
elseif land_temperature_anomaly == "amip"
Y.bucket.T = map(coords.subsurface) do coord
ΔT = FT(40 * cos(radlat)^4)
else
T_sfc_0 + ΔT
end
else
Y.bucket.T = map(coords.subsurface) do coord
ΔT = FT(0)
T_sfc_0 + ΔT
end
T_sfc_0 + ΔT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document this in Atmos as well. :)

FT = eltype(area_fraction)

# atmospheric surface density
Interfacer.update_field!(sim, Val(:air_density), csf.ρ_sfc)

# turbulent fluxes
mask = Regridder.binary_mask.(area_fraction)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably allocates as it is called every dt. I wonder if we should have a field in cache which holds temporary calculations on the boundary space, like Atmos does. @Sbozzolo , what do you think? There will be many instances of this though, so I'd suggest leaving this for separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point (for a future PR)!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened an issue here #584. I can change this back to calling binary_mask inside the update_field! calls. I thought this way was more readable, but I agree we should avoid extra allocations.

@juliasloan25
Copy link
Member Author

Do we want to open a new issue on adding the albedo read from file to the GPU AMIP runs, once that's sorted in ClimaLSM?

Just opened an issue for it #583

@juliasloan25 juliasloan25 force-pushed the js/gpu branch 2 times, most recently from 6d31acc to e77a41d Compare January 31, 2024 02:00
@juliasloan25 juliasloan25 merged commit e44ffa0 into main Feb 1, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

run slabplanet sim on GPU
4 participants