Merge pull request #175 from jobordner/new-input

New input
enzo-project · Jul 1, 2022 · 1051b3f · 1051b3f
2 parents 8b24976 + 6579049
commit 1051b3f
Show file tree

Hide file tree

Showing 141 changed files with 6,353 additions and 1,807 deletions.
diff --git a/doc/source/conf.py b/doc/source/conf.py
@@ -30,7 +30,6 @@
               'breathe',]
 todo_include_todos=True
 
-
 # Add any paths that contain templates here, relative to this directory.
 # templates_path = ['_templates']
 

diff --git a/doc/source/design/adapt-balance-2.fig b/doc/source/design/adapt-balance-2.fig
@@ -1,4 +1,4 @@
-#FIG 3.2  Produced by xfig version 3.2.7b
+m#FIG 3.2  Produced by xfig version 3.2.7b
 Landscape
 Center
 Inches

diff --git a/doc/source/design/design-input.rst b/doc/source/design/design-input.rst
@@ -0,0 +1,317 @@
+.. include:: ../roles.incl
+
+*************************
+Checkpoint/Restart Design
+*************************
+
+.. toctree::
+
+============
+Requirements
+============
+
+Three code functional requirements of I/O in Cello are:
+
+  1. writing data dumps for subsequent reading by external
+     analysis/visualization applications
+  2. writing checkpoint files, and
+  3. reading checkpoint files to restart a previously run simulation
+
+(While writing image files such as "png" files is also included in the
+I/O component of Cello, here we focus on HDF5 files containing
+actual block data.)
+
+Additionally, writing and reading disk files must be scalable to the
+largest simulations runnable on the largest HPC platforms available,
+which necessarily include the largest parallel file systems available.
+
+This scalable I/O approach has been implemented for
+checkpoint/restart, and will be adapted for use with data dumps in the
+near future.
+
+========
+Approach
+========
+
+The approach used includes determining a block ordering to aid mapping
+blocks to files, what data are written to the files, and how file I/O is
+parallelized.
+
+--------
+Ordering
+--------
+
+The approach involves a generalization of the previous
+``MethodOutput`` method, but enables load-balancing of data between
+disk files through the use of block `orderings` to define how blocks
+are mapped to files. Currently, the ordering used in ``MethodOutput``,
+which is implicit and embedded in the code, is based on a regular
+partitioning of root-level blocks together with their descendents. The
+updated implementation factors out this ordering into an ``Ordering``
+class, provides a Morton space-filling curve ordering, and allows
+enables defining other orderings, such as Hilbert curves
+
+------------
+File content
+------------
+
+The content of the data files must be augmented to include all state
+data required to recreate a previously saved AMR block array on
+restart. Some information such as block connectivity are generated as
+blocks are inserted into the mesh hierarchy. Other information such as
+method or solver parameters are not stored, but are taken from
+the parameter file. This allows for "tweaking" of parameters on
+restart, for example to adjust refinement criteria or solver
+convergence criteria.
+
+------------
+Control flow
+------------
+
+Control flow is handled by separate ``IoWriter`` or ``IoReader`` chare
+arrays, where each element is associated with a single HDF5
+file. Advantages over previous approaches are better load-balancing of
+I/O operations, and decoupling of I/O operations from the Block chare
+array. For Enzo-E checkpoint/restart data in particular,
+``IoEnzoReader`` and ``IoEnzoWriter`` chare arrays are used.
+
+======
+Design
+======
+
+
+Components of the new I/O approach include
+
+  1. Control management
+
+     * ``control_restart.cpp``
+
+        - ``Main::r_restart_enter()``
+        - ``Main::p_restart_done()``
+        - ``Main::restart_exit()``
+
+  2. New Classes
+
+     * ``EnzoMethodCheck``
+     * ``IoEnzoReader``
+        - ``IoEnzoReader::IoEnzoReader()``
+     * ``IoEnzoWriter``
+        - ``IoEnzoWriter::IoEnzoWriter()``
+     * ``IoReader``
+        - ``IoReader::IoReader()``
+     * ``IoWriter``
+        - ``IoWriter::IoWriter()``
+     * ``MethodOrderMorton``
+
+------------------
+Output: checkpoint
+------------------
+
+.. image:: io-output.png
+           :width: 800
+
+--------------
+Input: restart
+--------------
+
+The UML sequence diagram below shows how the ``Simulation`` group,
+IoReader chare array, and Block chare array interoperate to read data
+from a checkpoint directory. Time runs vertically starting from the top,
+and the three Charm++ group/arrays are arranged into three columns.
+Code for restart is found in the ``enzo_control_restart.cpp`` file.
+
+.. image:: io-read.png
+
+startup
+-------
+
+Restart begins in the "startup" phase, with the unique root block for
+the (0,0,0) octree in the array-of-octrees calling the ``Simulation``
+entry method ``p_restart_enter()``.
+
+The ``p_restart_enter()`` entry method reads the number of
+restart files from the top-level `file-list`
+file, initializes synchronization counters, and creates the
+``IoEnzoReader`` chare array, one element for each file.
+
+The ``IoEnzoReader`` constructors calld the ``p_io_reader_created()``
+entry method in the root ``Simulation`` object to notify it that
+they've been created.
+
+``p_io_reader_created`` counts the number of calls, and after it
+has received the last ``IoEnzoReader`` notification, it distributes the
+``proxy_io_enzo_reader`` array proxy to all other ``Simulation`` objects by
+calling ``p_set_io_reader()``.
+
+``p_set_io_reader()`` stores the incoming proxy, then calls the
+``r_restart_start()`` barrier across ``Simulation`` objects, which is
+used to guarantee that all proxy elements will have been initialized
+before any are accessed in subsequent phases.
+
+level 0
+-------
+
+In the level-0 (root-level) phase, the root ``Simulation`` object
+reads the file names from the `file-list` file, and calls the
+``p_init_root()`` entry method in all ``IoEnzoReader`` objects,
+sending the checkpoint directory and file names.
+
+The ``p_init_root()`` entry method opens the `block-data` (HDF5) file
+and reads global attributes. It also opens and reads tho `block-list`
+(text) file, reading in the list of blocks and organizing them by mesh
+refinement level. It reads in each block data, saving data in blocks
+levels greater than 0, and sending data to level-0 blocks. Note
+level-0 blocks exist at the beginning of restart, but no blocks in
+levels higher than 0 do.  Data are packed and sent to blocks in levels
+<= 0 using the ``EnzoBlock::p_restart_set_data()`` entry method.
+
+The ``EnzoBlock::p_restart_set_data()`` method unpacks the data
+into the Block, then notifies the associated ``IoEnzoReader`` file
+object that data has been received using the ``p_block_ready`` entry
+method.
+
+``IoEnzoReader::p_block_ready()`` counts the number of block-reday
+acknowledgements, and after the last one calls
+``Simulation::p_restart_next_level()`` to process the next refinement
+level blocks.
+
+level k
+-------
+
+The level-k phase for k=1 to L is more complicated than level-0
+because the level k > 0 blocks must be created first.
+
+Assuming blocks up through level k-1 have been created, the
+root ``Simulation`` object calls ``IoEnzoReader::p_create_level(k)``
+for each ``IoEnzoReader``.
+
+In ``p_create_level()``, synchronization counters are initialized for
+counting the k-level blocks, and then each block in the list of level-k
+blocks is processed. To reuse code from the adapt phase, level-k blocks
+are created by refining the `parent` block, via a
+``p_restart_refine()`` entry method.
+
+In ``p_restart_refine()``, the parent level k-1 block creates a new
+child block, inserts the new block in its own child list, and
+recategorizes as a non-leaf.
+
+In the ``EnzoBlock`` constructor, the newly created block checks if
+it's in a restart phase, and if so sends an acknowledgement to the
+associated ``IoEnzoReader`` object using the ``p_block_created()`` entry
+method.
+
+In ``p_block_created`` the ``IoEnzoReader`` object counts the number
+of acknowledgements from newly-created level-k blocks, and after it
+receives the last one it calls ``p_restart_level_created()`` on
+the root-level ``Simulation`` object. After this, the rest of
+the level-k phase mirrors that of the level-0 phase.
+
+cleanup
+-------
+
+In the cleanup section, after all blocks up to the maximum level have
+been created and initialized, the ``p_restart_next_level()`` entry
+method calls the Charm++ call ``doneInserting()`` on the block chare
+array, then calls ``p_restart_done()`` on all the blocks, which
+completes the restart phase.
+
+-------
+Classes
+-------
+
+EnzoMethodInput
+
+===========
+Data format
+===========
+
+Data for a given checkpoint dump are stored in a single checkpoint
+directory, specified in the user's parameter file using the
+``Method:check:dir`` parameter.
+
+The number of data files in the directory is specified using the
+``Method:check:num_files`` parameter. A rule-of-thumb is to use the
+same number of files as (physical) nodes in the simulation.
+
+Data files are named ``block_data-`` `x` ``.h5``, where 0 <= x <
+``num_files``. The format of data files is given in the next section.
+
+
+Each data file has an associated `block-list` text file named
+``block_data-`` `x` ``.block_list``. The block-list file contains a
+list of all block names in the associated data file, together with each
+block's mesh refinement level. There is one block listed per line, and
+the block name and level are separated by a space.
+
+A ``check.file_list`` text file is also included, which includes the
+number of data files, and a list of the file prefixes ``block_data-`` `x`.
+
+Note all blocks are included in the files, not just leaf-blocks, and
+including blocks in "negative" refinement levels.
+
+------------------
+Data file contents
+------------------
+
+The HDF5 data files are used to store all block state data, as well as
+some global data.
+
+Simulation attributes
+---------------------
+
+Metadata for the simulation are stored in the top-level "/" group.
+These include the following:
+
+* `cycle`: Cycle of the simulation dump.
+* `dt`: Current global time-step.
+* `time`: Current time in code units.
+* `rank`: Dimensionality of the problem.
+* `lower`: Lower extents of the simulation domain.
+* `upper`: Upper extents of the simulation domain.
+* `max_level`: Maximum refinement level.
+
+Block attributes
+----------------
+
+Block attributes and data are stored in HDF5 groups with the same name
+as the block, e.g. "B00:0_00:0_00:0".
+
+Block attribute data include the following:
+
+* `cycle`: Cycle of this block.
+* `dt`: Current block time-step.
+* `time`: Current time of this block.
+* `lower`: Lower extents of the block.
+* `upper`: Upper extents of the block.
+* `index`: Index of the block, specified using three 32-bit integers.
+* `adapt_buffer`: Encoding of the block's neighbor configuration.
+* `num_field_data`: currently unused.
+* `array`: Indices identifying the octree containing the block in the "array-of-octrees".
+* `enzo_CellWidth`: Corresponds to the EnzoBlock ``CellWidth`` parameter.
+* `enzo_GridDimension`: Corresponds to the EnzoBlock ``GridDimension`` parameter.
+* `enzo_GridEndIndex`: Corresponds to the EnzoBlock ``GridEndIndex`` parameter.
+* `enzo_GridLeftEdge`: Corresponds to the EnzoBlock ``GridLeftEdge`` parameter.
+* `enzo_GridStartIndex`: Corresponds to the EnzoBlock ``GridStartIndex`` parameter.
+* `enzo_dt`: Corresponds to the EnzoBlock ``dt`` parameter.
+* `enzo_redshift`: Corresponds to the EnzoBlock ``redshift`` parameter.
+
+Block data
+----------
+
+Block data are stored as HDF5 datasets.
+
+Fields are currently stored as
+arrays of size ``(mx,my,mz)``, where ``mx``, ``my``, and ``mz`` are
+the dimensions of the field data `including` ghost data. (Note that
+future checkpoint versions may only include non-ghost data to reduce
+disk space.) Dataset names are field names with ``"field_`` prepended,
+for example ``"field_density"``.
+
+Particles are stored as one-dimensional HDF5 datasets, one dataset per
+attribute per particle type. Datasets are named using ``"particle"`` +
+`particle-type` + `particle attribute`, delimited by underscores. For
+example, ``"particle_dark_vx"`` for the x-velocity particle attribute
+``"vx"`` values of the ``"dark"`` type particles in the block.  The
+length of the arrays equals the number of that type of particle in the
+block.
+
diff --git a/doc/source/design/design-io.rst b/doc/source/design/design-io.rst
diff --git a/doc/source/design/index.rst b/doc/source/design/index.rst
@@ -13,5 +13,5 @@ Cello, including flux-correction, IO, and ghost zone refresh.
    :glob:
    :titlesonly:
    :numbered:
-      
+
    design-*
diff --git a/doc/source/design/io-input.png b/doc/source/design/io-input.png
diff --git a/doc/source/design/io-output.png b/doc/source/design/io-output.png
diff --git a/doc/source/design/io-read.png b/doc/source/design/io-read.png