-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bitrounding + Lossless compression #3599
base: main
Are you sure you want to change the base?
Conversation
Just added BitInformation to the Project.toml, due to dependency on
also why is the Manifest.toml committed? |
Through past experience we found that we needed the Manifest committed to make sense of the errors we encounter during CI. |
@@ -326,7 +320,7 @@ simulation = Simulation(model, Δt=1.25, stop_iteration=3) | |||
|
|||
f(model) = model.clock.time^2; # scalar output | |||
|
|||
g(model) = model.clock.time .* exp.(znodes(grid, Center())) # vector/profile output | |||
g(model) = model.clock.time .* exp.(znodes(Center, grid)) # vector/profile output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to merge main
because I think we need this change for the doctest to pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm two commits ahead of main, none behind main...mk/compression I haven't actively changed these, but maybe @simone-silvestri and I started off from an outdated branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes not sure, but this change does walk back a recent PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shoot then I might have created a non-consistent history, sorry, I'll try to resolve that.
default_bit_rounding(::Val{:T}) = 7 | ||
default_bit_rounding(::Val{:S}) = 16 # 12 at the surface, 16 deep ocean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting. Why is there a difference between T, S? Is this specific to the simulation that this was tested on, or can we be sure this is valid for all simulations, past climates, future climates, idealized simulations at other resolutions, etc?
It seems we need to have default bit rounding for passive tracers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although relatively robust through time and space, this depends on a lot of things, also whether your unit carries some offset around (e.g. Kelvin vs ˚C, density vs density anomaly). So it's tricky to generalise. I suggest to have some reasonable defaults if someone uses bit rounding (default nothing
or single precision as you like) but suggest to highlight that this should be checked similar to how I did it here with the bitinformation analysis above.
For global ocean simulations I expect these to be reasonable defaults. I believe for now this is mostly to reduce the filesizes for OMIP simulations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I'm just not sure that OMIP is going to be the most common use case, so there's a question about what default is appropriate here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OMIP defaults might belong in the ClimaOcean
setup, perhaps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could set the defaults here as 23 mantissa bits (=Float32 precision, whether you use Float32 or 64) and then lower in ClimaOcean?
function BitRounding(outputs = nothing; | ||
user_rounding...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the purpose of figuring out good defaults perhaps we should include model
as an input here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then default_bit_rounding
can take model
as an argument, and dispatch on various things, for example the equation of state (which should know the units of temperature), and perhaps the biogeochemistry model, which may know the units of some important tracers
Co-authored-by: Gregory L. Wagner <[email protected]>
TL;DR: We can compress 18GB of Oceananigans simulation checkpoints into 350MB with bitrounding and lossless compression.
Problem
Output is currently uncompressed in Float64 which contains
Proposed solution
Bitrounding to remove false information (replaced with zero bits -> redundancies) then lossless compression to remove redundancies.
I've looked into the bitwise real information content for a single checkpoint in Simone's OMIP simulations and I got this with the orange line denothing the 99.9% of real information
So
The checkpoint file Simone provided had
Compression options
The 18GB can be compressed into
This currently uses Zstd (https://github.com/facebook/zstd), a modern yet already widely available lossless compressor through its commandline interface
zstd
. With JLD2 at the momentcompress=true
usesZlibCompressor
from https://github.com/JuliaIO/CodecZlib.jl which is similarly good but 2-3x slower. I'm working on getting CodecZstd supported in JLD2: JuliaIO/JLD2.jl#560While this PR is still a draft I'm proposing the new defaults
compress=true
for JLD2,deflatelevel=3
for netCDFbitrounder
that rounds to the keepbits as suggested above that can be used instead ofbitrounder=nothing
(default)We can then independently tweak the precision (how many keepbits, ideally as a function of the vertical, see salinity) and the lossless compressor (Zlib -> Zstandard)