Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow using a Colvars input state file instead of binary checkpoint in GROMACS #610

Merged
merged 10 commits into from
Nov 15, 2023
58 changes: 38 additions & 20 deletions doc/colvars-refman-main.tex
Original file line number Diff line number Diff line change
Expand Up @@ -490,7 +490,7 @@
string}{%
If a value is provided, it is interpreted as either the name of the input state file, or as the prefix of the file named \emph{input}\texttt{.colvars.state}.
This file contains information needed to continue a previous collective variables-based calculation, including the number of the last computed step (useful for time-dependent biases).
The same information is also stored in the binary restart files written by LAMMPS, so this option is not needed when continuing a calculation from a LAMMPS restart.
The same information is also stored in the binary restart files written by LAMMPS, so this option is generally not needed when the \texttt{read\_restart} LAMMPS command is used.
}

\item %
Expand Down Expand Up @@ -966,7 +966,7 @@

\cvsubsec{Input state file}{sec:colvars_input}

Several of the sampling methods implemented in Colvars are time- or history-dependent, i.e.\ they work by accumulating data as a simulation progresses, and use these data to determine their biasing forces. If the simulation engine uses a checkpoint or restart file (as GROMACS and LAMMPS do), any data needed by Colvars are embedded inti that file. Otherwise, a dedicated \emph{state file} can be loaded into Colvars directly.
Several of the sampling methods implemented in Colvars are time- or history-dependent, i.e.\ they work by accumulating data as a simulation progresses, and use these data to determine their biasing forces. If the simulation engine uses a checkpoint or restart file (as GROMACS and LAMMPS do), any data needed by Colvars are embedded into that file. Otherwise, a dedicated \emph{state file} can be loaded into Colvars directly.

When a dedicated Colvars state file is used, it may be in either one of two formats:
\begin{itemize}
Expand All @@ -984,7 +984,10 @@
This means that the step number used internally by Colvars may not always match the step number reported by \MDENGINE{}.
\cvnamdonly{This is particularly inmportant in NAMD, which represents step numbers as a 32-bit integers that overflows after $\sim$ 2 billion steps, effectively negating the usefulness of the \texttt{firstTimeStep} keyword. However, step numbers are implemented correctly in the Colvars state file.}

% \cvgromacsonly{\cvsubsubsec{Restarting in GROMACS.}{} TODO }
\cvgromacsonly{\cvsubsubsec{Restarting in GROMACS.}{}
Beginning with GROMACS 2024, all information necessary to restart Colvars is included in the checkpoint ``\texttt{.cpt}'' file.
No special provisions are therefore needed compared to a GROMACS simulation without Colvars enabled.
}

\cvlammpsonly{\cvsubsubsec{Restarting in LAMMPS.}{}
For continuing a Colvars-based simulation, the recommended method is using the standard LAMMPS \texttt{read\_restart} command, which reads the Colvars state data from the LAMMPS restart file (in binary format).
Expand All @@ -1009,42 +1012,57 @@
} % end \cvnamdonly


\cvsubsubsec{Restarting after a change in Colvars configuration.}{}
It useful in some cases to modify the configuration of variables or biases between consecutive runs: a typical example would the addition or removal of a restraint in a simulation.
By restarting using text-format Colvars state files, it is possible to read previous data while allowing for changes in configuration.
For each newly defined variable or bias, no information will be read from the state file if this is unavailable: such new objects will remain uninitialized until the first compute step.
Conversely, any information that the state file has about variables or biases that are no longer defined is silently ignored.
\emph{Because these checks are performed based solely on the names of variables and biases, it is your responsibility to ensure that these names correspond to consistent definitions between runs.}
\cvsubsubsec{Changing configuration upon restarting.}{}

When restarting using binary state files, configuration changes are not allowed. The easiest solution would be to produce a text-format state file specifically for that purpose.
\ifdefined\cvscriptapi{%
Alternatively, after restarting Colvars using a state file consistent with the previous configuration, the configuration may be changed using the scripting interface (see \ref{sec:cv_scripting}).
}\fi
In some cases, it is useful to modify the configuration of variables or biases between consecutive runs, for example by adding or removing a restraint.
Some special provisions will happen in that case.
When a state file is loaded, no information is available about any newly added variable or bias, which will thus remain uninitialized until the first compute step.
Conversely, any information that the state file may contain about variables or biases that are no longer defined will be silently ignored.
Please note that these checks are performed based only on the \emph{names} of variables and biases: it is your responsibility to ensure that these names have \emph{consistent definitions between runs.}

The flexibility just described carries some limitations: namely, it is only supported when reading \emph{text-format} Colvars state files.
Instead, restarting from binary files\cvlammpsonly{ (such as the LAMMPS restart file)}\cvgromacsonly{ (such as the GROMACS checkpoint file)} after a configuration change will trigger an error.
It is also important to remind that when switching to a different build of \MDENGINE, the binary format may change slightly, even if the release version is the same.

To work around the potential issues just described, a text-format Colvars state file should be loaded.
\ifdefined\cvvmdornamd{This is the default in \MDENGINE{} unless the ``\texttt{COLVARS\_BINARY\_RESTART}'' is set to 1, and this information is only provided here for troubleshooting purposes.}\fi\cvlammpsonly{This can be achieved by providing an explicit \texttt{input} keyword when initializing the Colvars fix (see \ref{sec:colvars_mdengine_parameters}), which will instruct Colvars to use the given filename, instead of the LAMMPS restart file.
Furthermore, the \texttt{fix\_modify} scripting command allows to load a Colvars file after initialization (\ref{sec:cv_command_loadsave}).}\cvgromacsonly{Loading such state file requires an exception to the standard behavior in GROMACS (i.e.\ loading a checkpoint file): this exception is supported by the following Colvars configuration:
\begin{itemize}
\item %
\labelkey{Colvars-global|defaultInputStateFile}
\key{defaultInputStateFile}{%
global}{%
Default input state file, if not provided by \MDENGINE}{%
UNIX filename}{%
Define a state file that will be loaded by default, unless \MDENGINE{} provides restarting information for Colvars through the checkpoint file.
}
\end{itemize}
When a Colvars configuration featuring \texttt{defaultInputStateFile} is processed into a TPR file, and a GROMACS simulation is started from this TPR file but without providing a checkpoint, Colvars will load its state from the file named by \texttt{defaultInputStateFile}.
Later, when that same simulation is continued by providing a checkpoint file to GROMACS, Colvars will ignore \texttt{defaultInputStateFile} and will read its data from the checkpoint file.
For the sake of clarity, we recommend that as soon as a suitable GROMACS checkpoint is available, the \texttt{defaultInputStateFile} is removed \emph{removed}, and a new TPR file is produced accordingly.
}


\cvsubsec{Output files}{sec:colvars_output}

If the output prefix \outputName{} is defined, the following output files are written during a simulation run:
If the output prefix \outputName{} is defined\cvgromacsonly{ (in GROMACS, this is defined by the value of the \texttt{-e} flag of \texttt{mdrun})}, the following output files are written during a simulation run:

\begin{itemize}

\item A \emph{state file}, named \outputName\texttt{.colvars.state}, which is written at the end of the specified run\cvscriptonly{, and can also be written at any time with the scripting command \texttt{save} (\ref{sec:cv_command_loadsave})}.
This file is in plain text format by default\cvnamdonly{, regardless of the value of \texttt{binaryOutput} of the NAMD coordinate and velocity files}, or in binary format if the environment variable \texttt{COLVARS\_BINARY\_RESTART} is set to a non-zero integer.
The state file is used to continue a simulation, and is required to restart a simulation unless the engine supports embedding information into their checkpoint file (as GROMACS or LAMMPS currently do).

\item If the parameter \refkey{colvarsRestartFrequency}{Colvars-global|colvarsRestartFrequency} is larger than zero, a \emph{restart file} is written every that many steps: this file is fully equivalent to the final state file.
\item If the parameter \refkey{colvarsRestartFrequency}{Colvars-global|colvarsRestartFrequency} is larger than zero and \restartName{} is defined\cvgromacsonly{ (this is \emph{not} the case in GROMACS)}, a \emph{restart file} is written every that many steps: this file is fully equivalent to the final state file.
The name of this file is \restartName\texttt{.colvars.state}.

\item If the parameter \refkey{colvarsTrajFrequency}{Colvars-global|colvarsTrajFrequency} is greater than 0 (default value: 100 steps), a \emph{trajectory file}, named \outputName\texttt{.colvars.traj}, is written during the simulation. Unlike the state file, this file is not needed to restart a simulation, but can be used for post-processing and analysis. The format of this file is described in sec.~\ref{sec:colvars_traj_format}.

\item Additionally, certain features, when enabled, can emit output files with a specific purpose: for example, potentials of mean force can be written to file to be analyzed or plotted. These files are described in the respective sections, but as a general rule they all use names beginning with the \outputName prefix.
\item Additionally, certain features, when enabled, can emit output files with a specific purpose: for example, potentials of mean force (PMFs) can be written to file to be analyzed or plotted. These files are described in the respective sections, but as a general rule they all use names beginning with the \outputName{} prefix.
Like the trajectory file, these additional files are needed only for analyzing a simulation's results, but not to continue it.

\end{itemize}

Other output files may also be written by specific methods, e.g.{} the ABF or metadynamics methods (\ref{sec:colvarbias_abf}, \ref{sec:colvarbias_meta}).
Like the trajectory file, they are needed only for analyzing, not continuing a simulation.
All such files' names also begin with the prefix \outputName.

\cvnamdonly{Lastly, the total energy of all biases or restraints applied to the colvars appears under the NAMD standard output, under the MISC column.}


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
*/

#include "colvarproxygromacs.h"
#include "colvarproxy_gromacs_version.h"

#include <sstream>

Expand All @@ -57,6 +58,8 @@ ColvarProxyGromacs::ColvarProxyGromacs(const std::string& colvarsConfigString,
int seed) :
gmxAtoms_(atoms), pbcType_(pbcType), logger_(logger), doParsing_(doParsing)
{
engine_name_ = "GROMACS";
version_int = get_version_from_string(COLVARPROXY_VERSION);

//! From colvarproxy
//! The 5 variables below are defined in the `colvarproxy` base class
Expand Down Expand Up @@ -128,7 +131,8 @@ ColvarProxyGromacs::ColvarProxyGromacs(const std::string& colvarsConfigString,
// Citation Reporter
cvm::log(std::string("\n") + colvars->feature_report(0) + std::string("\n"));

colvars->set_initial_step(static_cast<cvm::step_number>(0L));
// TODO get initial step number from MDModules
// colvars->set_initial_step(static_cast<cvm::step_number>(0L));
}
}

Expand Down
49 changes: 41 additions & 8 deletions gromacs/tests/library/run_tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -168,14 +168,47 @@ for dir in ${DIRLIST} ; do
fi

if [ "${basename}" == "test.restart" ] ; then
${BINARY} convert-tpr -s ${basename%.restart}.tpr -nsteps 40 -o ${basename}.tpr >& ${basename}.grompp.out
${BINARY} mdrun -s ${basename}.tpr -ntomp 1 -deffnm ${basename} -noappend -cpi ${basename%.restart}.cpt >& ${basename}.out
RETVAL=$?
output=${basename}.part0002
for file in ${output}.* ; do
# Remove the part number
mv -f ${file} ${file/.part0002/}
done

if [ -n "${FORCE_INPUT_STATE_FILE}" ] ; then

# Restart GROMACS using the checkpoint but Colvars using its own state file

# Add defaultInputStateFile to the Colvars config
NEW_CVCONF=$(mktemp test.XXXXX.in)
cat test.in > ${NEW_CVCONF}
echo "defaultInputStateFile test.colvars.state" >> ${NEW_CVCONF}

NEW_MDP=$(mktemp test.XXXXX.mdp)
cat ../Common/test.mdp > ${NEW_MDP}
sed -i "s/test.in/${NEW_CVCONF}/" ${NEW_MDP}
# Mimic the initial step of a job restarted from checkpoint, to be
# consistent with reference outputs
echo "init-step = 20" >> ${NEW_MDP}
${BINARY} grompp -f ${NEW_MDP} -c ../Common/da.pdb -p ../Common/da.top -t ${basename%.restart}.cpt -o ${basename}.tpr >& ${basename}.grompp.out
rm -f ${NEW_MDP} ${NEW_CVCONF}
${BINARY} mdrun -s ${basename}.tpr -ntomp 1 -deffnm ${basename} -noappend >& ${basename}.out
RETVAL=$?

output=${basename}.part0001
for file in ${output}.* ; do
# Remove the part number
mv -f ${file} ${file/.part0001/}
done

else

# Restart both GROMACS and Colvars using the GROMACS checkpoint file
${BINARY} convert-tpr -s ${basename%.restart}.tpr -nsteps 40 -o ${basename}.tpr >& ${basename}.grompp.out
${BINARY} mdrun -s ${basename}.tpr -ntomp 1 -deffnm ${basename} -noappend -cpi ${basename%.restart}.cpt >& ${basename}.out

RETVAL=$?
output=${basename}.part0002
for file in ${output}.* ; do
# Remove the part number
mv -f ${file} ${file/.part0002/}
done

fi
fi

else
Expand Down
2 changes: 2 additions & 0 deletions lammps/src/COLVARS/colvarproxy_lammps.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ colvarproxy_lammps::colvarproxy_lammps(LAMMPS_NS::LAMMPS *lmp)
{
_random = nullptr;

engine_name_ = "LAMMPS";

first_timestep = true;
previous_step = -1;
do_exit = false;
Expand Down
2 changes: 2 additions & 0 deletions namd/src/colvarproxy_namd.C
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@

colvarproxy_namd::colvarproxy_namd()
{
engine_name_ = "NAMD";

version_int = get_version_from_string(COLVARPROXY_VERSION);

first_timestep = true;
Expand Down
30 changes: 26 additions & 4 deletions src/colvarmodule.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -407,6 +407,11 @@ int colvarmodule::parse_global_params(std::string const &conf)
parse->get_keyval(conf, "sourceTclFile", source_Tcl_script);
#endif

if (proxy->engine_name() == "GROMACS" && proxy->version_number() >= 20231003) {
parse->get_keyval(conf, "defaultInputStateFile", default_input_state_file_,
default_input_state_file_);
}

return error_code;
}

Expand Down Expand Up @@ -1317,7 +1322,13 @@ int colvarmodule::reset()

int colvarmodule::setup_input()
{
if (proxy->input_prefix().size()) {
if (proxy->input_prefix().empty() && (!proxy->input_stream_exists("input state string")) &&
input_state_buffer_.empty()) {
// If no input sources have been defined up to this point, use defaultInputStateFile
proxy->set_input_prefix(default_input_state_file_);
}

if (!proxy->input_prefix().empty()) {

// Read state from a file

Expand Down Expand Up @@ -1378,14 +1389,23 @@ int colvarmodule::setup_input()
}
cvm::log(cvm::line_marker);

// Now that the explicit input file was read, we shall ignore any unformatted buffer
// Now that an explicit state file was read, we shall ignore any other restart info
if (proxy->input_stream_exists("input state string")) {
proxy->delete_input_stream("input state string");
}
input_state_buffer_.clear();

proxy->delete_input_stream(restart_in_name);
}

if (proxy->input_stream_exists("input state string")) {

if (!input_state_buffer_.empty()) {
return cvm::error("Error: formatted/text and unformatted/binary input state buffers are "
"defined at the same time.\n",
COLVARS_BUG_ERROR);
}

cvm::log(cvm::line_marker);
cvm::log("Loading state from formatted string.\n");
read_state(proxy->input_stream("input state string"));
Expand All @@ -1394,7 +1414,7 @@ int colvarmodule::setup_input()
proxy->delete_input_stream("input state string");
}

if (input_state_buffer_.size() > 0) {
if (!input_state_buffer_.empty()) {
cvm::log(cvm::line_marker);
cvm::log("Loading state from unformatted memory.\n");
cvm::memory_stream ms(input_state_buffer_.size(), input_state_buffer_.data());
Expand All @@ -1404,7 +1424,9 @@ int colvarmodule::setup_input()
input_state_buffer_.clear();
}

return cvm::get_error();
default_input_state_file_.clear();

return get_error();
}


Expand Down
3 changes: 3 additions & 0 deletions src/colvarmodule.h
Original file line number Diff line number Diff line change
Expand Up @@ -452,6 +452,9 @@ class colvarmodule {

template <typename IST> IST & read_state_template_(IST &is);

/// Default input state file; if given, it is read unless the MD engine provides it
std::string default_input_state_file_;

/// Internal state buffer, to be read as an unformatted stream
std::vector<unsigned char> input_state_buffer_;

Expand Down
10 changes: 9 additions & 1 deletion src/colvarproxy.h
Original file line number Diff line number Diff line change
Expand Up @@ -602,6 +602,11 @@ class colvarproxy
/// Destructor
~colvarproxy() override;

inline std::string const &engine_name() const
{
return engine_name_;
}

bool io_available() override;

/// Request deallocation of the module (currently only implemented by VMD)
Expand Down Expand Up @@ -714,7 +719,10 @@ class colvarproxy
/// Track which features have been acknowledged during the last run
size_t features_hash;

private:
protected:

/// Name of the simulation engine that the derived proxy object supports
std::string engine_name_ = "standalone";

/// Queue of config strings or files to be fed to the module
void *config_queue_;
Expand Down
1 change: 1 addition & 0 deletions update-colvars-code.sh
Original file line number Diff line number Diff line change
Expand Up @@ -654,6 +654,7 @@ then
mkdir ${target_folder}
mkdir -p ${target_folder}/tests/refdata
fi
condcopy gromacs/src/colvarproxy_gromacs_version.h "${target_folder}/colvarproxy_gromacs_version.h"
for src in ${source}/gromacs/gromacs-mdmodules/applied_forces/colvars/*.* ; do
tgt=$(basename ${src})
condcopy "${src}" "${target_folder}/${tgt}"
Expand Down
1 change: 1 addition & 0 deletions vmd/src/colvarproxy_vmd.C
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ colvarproxy_vmd::colvarproxy_vmd(Tcl_Interp *interp, VMDApp *v, int molid)
msgColvars("colvars: ")
#endif
{
engine_name_ = "VMD";
version_int = get_version_from_string(COLVARPROXY_VERSION);
b_simulation_running = false;

Expand Down