Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store conserved variable values to HDF5 #314

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

ChunYen-Chen
Copy link
Collaborator

Use the solution 1 mentioned in #305 to resolve #305. Now, *_Ref is assigned only at the start of the simulation, otherwise, it is loaded from HDF5.

@hyschive hyschive requested a review from hsinhaoHHuang May 22, 2024 12:46
@hyschive hyschive added enhancement output Data output and log test Test problems labels May 22, 2024
@hyschive hyschive removed the request for review from hsinhaoHHuang May 22, 2024 12:55
@hyschive hyschive requested a review from hsinhaoHHuang May 22, 2024 13:03
Copy link
Contributor

@hsinhaoHHuang hsinhaoHHuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have run some tests using BlastWave, ClusterMerger, CR_Diffusion, JetICMWall, MHD_OrszagTangVortex and Plummer to check values in HDF5 data and check the Record__Conservation can be reproduced after restart.
There is only one small issue:

  • For the BlastWave test problem (serial by default) with --bitwise_reproducibility=true and OPT__CK_CONSERVATION turned on. The Record__Conservation of the restarted run is different (order of machine precision) from the Record__Conservation of the fresh run.
    • However, if the tests are run with MPI (--mpi=true), then the restarted Record__Conservation is the same as the fresh run.
    • This issue also happens in the main branch: The recorded total momentum is different after the restart.

And I have one minor suggestion:

  • Since the variables XXX_Ref and XXX_H5Ref have become global and are not inside Aux_Check_Conservation(), how about adjusting the variable names or adding some comments to describe what kind of references they are to avoid confusion?

src/Main/Main.cpp Outdated Show resolved Hide resolved
src/Output/Output_DumpData_Total_HDF5.cpp Show resolved Hide resolved
@ChunYen-Chen
Copy link
Collaborator Author

Add a simple check of storing the reference value at OutputDumpData_HDF5.cpp.

@ChunYen-Chen
Copy link
Collaborator Author

@hsinhaoHHuang The file Aux_Check_Conservation.cpp has been updated to the latest version. Please review it, thanks!

@ChunYen-Chen
Copy link
Collaborator Author

Add a simple check of storing the reference value at OutputDumpData_HDF5.cpp.

Done!

@ChunYen-Chen
Copy link
Collaborator Author

@hsinhaoHHuang I have added the force record reference value code.

Copy link
Contributor

@hsinhaoHHuang hsinhaoHHuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After your latest changes, it (Plummer test problem) will show this error if OPT__CK_CONSERVATION is off.

[eureka04:64557] *** An error occurred in MPI_Allreduce
[eureka04:64557] *** reported by process [140131862052865,1719644775776256]
[eureka04:64557] *** on communicator MPI_COMM_WORLD
[eureka04:64557] *** MPI_ERR_TRUNCATE: message truncated
[eureka04:64557] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[eureka04:64557] ***    and potentially your MPI job)

src/Auxiliary/Aux_Check_Conservation.cpp Outdated Show resolved Hide resolved
src/Auxiliary/Aux_Check_Conservation.cpp Outdated Show resolved Hide resolved
src/Auxiliary/Aux_Check.cpp Outdated Show resolved Hide resolved
src/Output/Output_DumpData_Total_HDF5.cpp Outdated Show resolved Hide resolved
@ChunYen-Chen
Copy link
Collaborator Author

@hsinhaoHHuang Thanks for reviewing. I have fixed the bug caused by ConservedRefLoaded not being updated at non-master rank.

Copy link
Contributor

@hsinhaoHHuang hsinhaoHHuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!
I have finished further tests on restarting simulations from runs with OPT__CK_CONSERVATION on or off initially. The output Record__Conservation and those stored reference values in HDF5 data are all the same as the fresh runs.

I only have one final suggestion:

  • Since ANGMOM_ORIGIN_* will always be used to calculate the conservation reference, how about also always recording them in Record__Note at src/Auxiliary/Aux_TakeNote.cpp Line.1545? It is possible that OPT__CK_CONSERVATION is off at first and is turned on during the restart, and user may want to look up what origin (which could possibly be reset during restart) is used to calculate the angular momenta reference.

src/Auxiliary/Aux_Check.cpp Outdated Show resolved Hide resolved
src/Auxiliary/Aux_Check.cpp Outdated Show resolved Hide resolved
@ChunYen-Chen
Copy link
Collaborator Author

@hsinhaoHHuang Thanks for the review again! I have updated the code according to your suggestions.

Conflicts:
	src/Output/Output_DumpData_Total_HDF5.cpp
Conflicts:
	src/Main/Main.cpp
	src/Output/Output_DumpData_Total_HDF5.cpp
@hyschive
Copy link
Contributor

@ChunYen-Chen Please resolve the conflicts.

Conflicts:
	src/Output/Output_DumpData_Total_HDF5.cpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement output Data output and log test Test problems
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Store the reference values in Aux_Check_Conservation.cpp
3 participants