Skip to content

Commit

Permalink
some rewrites of archiving
Browse files Browse the repository at this point in the history
  • Loading branch information
zingale committed Jan 30, 2024
1 parent 3cd5987 commit 66be1b8
Showing 1 changed file with 57 additions and 38 deletions.
95 changes: 57 additions & 38 deletions sphinx_docs/source/nersc-hpss.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,52 +2,71 @@
Archiving Data to HPSS
======================

.. note::
The [NERSC HPSS Archive](https://docs.nersc.gov/filesystems/archive/)
is a large tape library that can store the simulations files for long
periods of time. It is recommended to move your data to HPSS
frequently, since the scratch filesystems fill up and NERSC will purge
data periodically.

Access to the xfer queue is done by loading the ``esslurm`` queue:

.. prompt:: bash
The script ``nersc.xfer.slurm``:

module load esslurm
:download:`nersc.xfer.slurm <../../job_scripts/perlmutter/nersc.xfer.slurm>`

Then you can use ``sbatch`` and ``squeue`` to submit and monitor
jobs in the ``xfer`` queue. Details are provided at:
https://docs.nersc.gov/jobs/examples/#xfer-queue
can be used to archive data to
HPSS automatically. This is submitted to the xfer queue and runs the
script ``process.xrb``:

:download:`process.xrb <../../job_scripts/perlmutter/process.xrb>`

The script ``nersc.xfer.slurm`` in
``workflow/job_scripts/cori-haswell/`` can be used to archive data to
HPSS automatically. This is submitted to the xfer queue and runs the
script ``process.xrb`` which continually looks for output and stores
which continually looks for output and stores
it to HPSS.

To use the scripts, first create a directory in HPSS that has the same
name as the directory on lustre you are running in (just the directory
name, not the full path). E.g. if you are running in a directory call
``wdconvect/`` run, then do:

.. prompt:: bash

hsi
mkdir wdconvect_run

.. note::

If the ``hsi`` command prompts you for your password, you will need
to talk to the NERSC help desk to ask for password-less access to
HPSS.

The script ``process.xrb`` is called from the xfer job and will run in
the background and continually wait until checkpoint or plotfiles are
created (actually, it always leaves the most recent one alone, since
data may still be written to it, so it waits until there are more than
1 in the directory). Then the script will use ``htar`` to archive the
plotfiles and checkpoints to HPSS. If the ``htar`` command was
successful, then the plotfiles are copied into a ``plotfile/``
subdirectory. This is actually important, since you don’t want to try
archiving the data a second time and overwriting the stored copy,
especially if a purge took place. The same is done with checkpoint
files.
The following describes how to use the scripts:

1. Create a directory in HPSS that has the same
name as the directory your plotfiles are located in
(just the directory name, not the full path). e.g. if you are running in a directory call
``/pscratch/sd/z/zingale/wdconvect/`` run, then do:

.. prompt:: bash

hsi
mkdir wdconvect_run

.. note::

If the ``hsi`` command prompts you for your password, you will need
to talk to the NERSC help desk to ask for password-less access to
HPSS.

2. Copy the ``process.xrb`` script and the slurm script ``nersc.xfer.slurm``
into the directory with the plotfiles.

3. Submit the archive job:

.. prompt:: bash

sbatch nersc.xfer.slurm

The script ``process.xrb`` is called from the xfer job and will run in
the background and continually wait until checkpoint or plotfiles are
created.

.. note::

``process.xrb`` always leaves the most recent plotfile and checkpoint file alone, since
data may still be written to it.

The script will use ``htar`` to archive the plotfiles and
checkpoints to HPSS.

If the ``htar`` command was successful, then the plotfiles are
copied into a ``plotfile/`` subdirectory. This is actually important,
since you don’t want to try archiving the data a second time and
overwriting the stored copy, especially if a purge took place. The
same is done with checkpoint files.


Additionally, if the ``ftime`` executable is in your path
(``ftime.cpp`` lives in ``amrex/Tools/Plotfile/``), then
Expand Down

0 comments on commit 66be1b8

Please sign in to comment.