Refactor default actions

* Support actions with duplicate names. * Add: [default.action] table. Set the default for any action key (or sub-key). * Add: [action.from] string. Copy any omitted keys from the given action. * Add: Include operators ">=" and "<=" * Breaking: Rename "greater_than" to ">" * Breaking: Rename "less_than" to ">" * Breaking: Rename "equal_to" to "==" * Breaking: [submit_options] is now [default.action.submit_options]
glotzerlab · May 14, 2024 · c21b459 · c21b459
1 parent f2114a8
commit c21b459
Show file tree

Hide file tree

Showing 62 changed files with 1,604 additions and 592 deletions.
diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
@@ -1 +1 @@
-* joaander
+* @joaander
diff --git a/DESIGN.md b/DESIGN.md
@@ -299,3 +299,4 @@ status may take a long time, so it should display a progress bar.
 # TODO: Dependabot configuration
 # TODO: readthedocs builds
 # TODO: logo
+# TODO: release CI
diff --git a/doc/src/SUMMARY.md b/doc/src/SUMMARY.md
@@ -16,12 +16,17 @@
   - [Working with signac projects](guide/python/signac.md)
   - [Writing action commands in Python](guide/python/actions.md)
 - [Concepts](guide/concepts/index.md)
-  - [Best practices](guide/concepts/best-practices.md)
   - [Process parallelism](guide/concepts/process-parallelism.md)
   - [Thread parallelism](guide/concepts/thread-parallelism.md)
   - [Directory status](guide/concepts/status.md)
   - [JSON pointers](guide/concepts/json-pointers.md)
   - [Cache files](guide/concepts/cache.md)
+- [How-to](guide/howto/index.md)
+  - [Best practices](guide/howto/best-practices.md)
+  - [Set the cluster account](guide/howto/account.md)
+  - [Submit the same action to different groups/resources](guide/howto/same.md)
+  - [Summarize directory groups with an action](guide/howto/summarize.md)
+
 # Reference
 
 - [row](row/index.md)
@@ -34,14 +39,13 @@
     - [show launchers](row/show/launchers.md)
   - [scan](row/scan.md)
   - [clean](row/clean.md)
-
 - [`workflow.toml`](workflow/index.md)
   - [workspace](workflow/workspace.md)
-  - [submit_options](workflow/submit-options.md)
   - [action](workflow/action/index.md)
     - [group](workflow/action/group.md)
     - [resources](workflow/action/resources.md)
     - [submit_options](workflow/action/submit-options.md)
+  - [default](workflow/default.md)
 - [`clusters.toml`](clusters/index.md)
   - [cluster](clusters/cluster.md)
   - [Built-in clusters](clusters/built-in.md)

diff --git a/doc/src/clusters/built-in.md b/doc/src/clusters/built-in.md
@@ -4,47 +4,50 @@
 
 ## Anvil (Purdue)
 
-[Anvil documentation](https://www.rcac.purdue.edu/knowledge/anvil).
-
-**Row** automatically selects from the following partitions:
+**Row** automatically selects from the following partitions on [Anvil]:
 * `shared`
 * `wholenode`
 * `gpu`
 
 Other partitions may be selected manually.
 
-There is no need to set `--mem-per-*` options on Anvil as the cluster automatically
+There is no need to set `--mem-per-*` options on [Anvil] as the cluster automatically
 chooses the largest amount of memory available per core by default.
 
-## Delta (NCSA)
+> Note: The whole node partitions **require** that each job submitted request an
+> integer multiple of 128 CPU cores.
 
-[Delta documentation](https://docs.ncsa.illinois.edu/systems/delta).
+[Anvil]: https://www.rcac.purdue.edu/knowledge/anvil
 
-**Row** automatically selects from the following partitions:
+## Delta (NCSA)
+
+**Row** automatically selects from the following partitions on [Delta]:
 * `cpu`
 * `gpuA100x4`
 
 Other partitions may be selected manually.
 
-Delta jobs default to a small amount of memory per core. **Row** inserts `--mem-per-cpu`
-or `--mem-per-gpu` to select the maximum amount of memory possible that allows full-node
-jobs and does not incur extra charges.
+[Delta] jobs default to a small amount of memory per core. **Row** inserts
+`--mem-per-cpu` or `--mem-per-gpu` to select the maximum amount of memory possible that
+allows full-node jobs and does not incur extra charges.
 
-## Great Lakes (University of Michigan)
+[Delta]: https://docs.ncsa.illinois.edu/systems/delta
 
-[Great Lakes documentation](https://arc.umich.edu/greatlakes/).
+## Great Lakes (University of Michigan)
 
-**Row** automatically selects from the following partitions:
+**Row** automatically selects from the following partitions on [Great Lakes]:
 * `standard`
 * `gpu_mig40,gpu`
 * `gpu`
 
 Other partitions may be selected manually.
 
-Great Lakes jobs default to a small amount of memory per core. **Row** inserts
+[Great Lakes] jobs default to a small amount of memory per core. **Row** inserts
 `--mem-per-cpu` or `--mem-per-gpu` to select the maximum amount of memory possible that
 allows full-node jobs and does not incur extra charges.
 
 > Note: The `gpu_mig40,gpu` partition is selected only when there is one GPU per job.
 > This is a combination of 2 partitions which decreases queue wait time due to the
 > larger number of nodes that can run your job.
+
+[Great Lakes]: https://arc.umich.edu/greatlakes/
diff --git a/doc/src/clusters/cluster.md b/doc/src/clusters/cluster.md
@@ -38,10 +38,10 @@ on this cluster. The table **must** have one of the following keys:
 * `by_environment`: **array** of two strings - Identify the cluster when the environment
   variable `by_environment[0]` is set and equal to `by_environment[1]`.
 * `always`: **bool** - Set to `true` to always identify this cluster. When `false`,
-  this cluster can only be chosen by an explicit `--cluster` option.
+  this cluster may only be chosen by an explicit `--cluster` option.
 
 > Note: The *first* cluster in the list that sets `identify.always = true` will prevent
-> any later cluster from being identified.
+> any later cluster from being identified (except by explicit `--cluster=name`).
 
 ## scheduler
 
@@ -87,16 +87,18 @@ will pass this option to the scheduler. For example SLURM schedulers will set
 
 ### cpus_per_node
 
-`cluster.partition.cpus_per_node`: **string** - Number of CPUs per node. When
-`cpus_per_node` is not set, **row** will ask the scheduler to schedule only a given
-number of tasks. In this case, some schedulers are free to spread tasks among any
-number of nodes (for example, shared partitions on Slurm schedulers).
+`cluster.partition.cpus_per_node`: **string** - Number of CPUs per node.
 
-When `cpus_per_node` is set, **row** will request the minimal number of nodes needed
-to satisfy `n_nodes * cpus_per_node >= total_cpus`. This may result in longer queue
-times, but will lead to more stable performance for users.
+When `cpus_per_node` is not set, **row** will request `n_processes` tasks. In this case,
+some schedulers are free to spread tasks among any number of nodes (for example, shared
+partitions on Slurm schedulers).
 
-Set `cpus_per_node` only when all nodes in the partition have the same number of CPUs.
+When `cpus_per_node` is set, **row** will **also** request the minimal number of nodes
+needed to satisfy `n_nodes * cpus_per_node >= total_cpus`. This may result in longer
+queue times, but will lead to more stable performance for users.
+
+> Note: Set `cpus_per_node` only when all nodes in the partition have the same number
+> of CPUs.
 
 ### minimum_gpus_per_job
 
@@ -131,7 +133,7 @@ will pass this option to the scheduler. For example SLURM schedulers will set
 ### gpus_per_node
 
 `cluster.partition.gpus_per_node`: **string** - Number of GPUs per node. Like
-`cpus_per_node` but used on jobs that request GPUs.
+`cpus_per_node` but used when jobs request GPUs.
 
 ### prevent_auto_select
 
@@ -140,6 +142,6 @@ automatically selecting this partition.
 
 ### account_suffix
 
-`cluster.partition.account_suffix`: **string** - Set to provide an account suffix
-when submitting jobs to this partition. Useful when clusters define separate
-`aacount-cpu` and `account-gpu` accounts.
+`cluster.partition.account_suffix`: **string** - An account suffix when submitting jobs
+to this partition. Useful when clusters define separate `account-cpu` and `account-gpu`
+accounts.
diff --git a/doc/src/clusters/index.md b/doc/src/clusters/index.md
@@ -18,13 +18,15 @@ name = "cluster2"
 ```
 
 User-provided clusters in `$HOME/.config/row/clusters.toml` are placed first in the
-array.
+array. Execute [`row show cluster --all`](../row/show/cluster.md) to see the complete
+cluster configuration.
 
 ## Cluster identification
 
 On startup, **row** iterates over the array of clusters in order. If `--cluster` is not
 set, **row** checks the `identify` condition in the configuration. If `--cluster` is
-set, **row** checks to see if the name matches.
+set, **row** checks to see if the name matches. **Row** selects the *first* cluster
+that matches.
 
-> Note: **Row** uses the *first* such match. To override a built-in, your configuration
-> should include a cluster by the same name and `identify` condition.
+> To override a built-in, your configuration should include a cluster by the same name
+> and `identify` condition.
diff --git a/doc/src/developers/contributing.md b/doc/src/developers/contributing.md
@@ -2,7 +2,7 @@
 
 Contributions are welcomed via [pull requests on GitHub][github]. Contact the **row**
 developers before starting work to ensure it meshes well with the planned development
-direction and standards set for the project.
+direction and follows standards set for the project.
 
 [github]: https://github.com/glotzerlab/gsd/row
 
@@ -17,40 +17,44 @@ assist you in designing flexible interfaces.
 
 Expensive code paths should only execute when requested.
 
+### Maintain compatibility
+
+New features should be opt-in and *preserve the behavior* of all existing user scripts.
+
 ## Version control
 
 ### Base your work off the correct branch
 
-- Base all new work on `trunk`.
+Base all bug fixes and new features on `trunk`.
 
 ### Propose a minimal set of related changes
 
-All changes in a pull request should be closely related. Multiple change sets that are
-loosely coupled should be proposed in separate pull requests.
+All changes in a pull request should be *closely related*. Multiple change sets that are
+loosely coupled should be proposed in *separate pull requests*.
 
 ### Agree to the Contributor Agreement
 
-All contributors must agree to the Contributor Agreement before their pull request can
-be merged.
+All contributors must agree to the **Contributor Agreement** before their pull request
+can be merged.
 
 ### Set your git identity
 
 Git identifies every commit you make with your name and e-mail. [Set your identity][id]
-to correctly identify your work and set it identically on all systems and accounts where
-you make commits.
+to correctly identify your work and set it *identically on all systems* and accounts
+where you make commits.
 
 [id]: http://www.git-scm.com/book/en/v2/Getting-Started-First-Time-Git-Setup
 
 ## Source code
 
 ### Use a consistent style
 
-The **Code style** section of the documentation sets the style guidelines for **row**
-code.
+Follow all guidelines outlined in the [Code style](style.md) section of the
+documentation.
 
 ### Document code with comments
 
-Use **Rust** documentation comments for classes, functions, etc. Also comment complex
+Write Rust documentation comments for traits, functions, etc. Also comment complex
 sections of code so that other developers can understand them.
 
 ### Compile without warnings
@@ -61,12 +65,12 @@ Your changes should compile without warnings.
 
 ### Write unit tests
 
-Add unit tests for all new functionality.
+Add unit tests for all new functionality and bug fixes.
 
-### Validity tests
+### Test validity
 
-The developer should run research-scale simulations using the new functionality and
-ensure that it behaves as intended. When appropriate, add a new test to `validate.py`.
+Run research-scale simulations using new functionality and ensure that it behaves as
+intended.
 
 ## User documentation
 
@@ -77,8 +81,7 @@ and any important user-facing change in the mdBook documentation.
 
 ### Tutorial
 
-When applicable, update or write a new tutorial.
-
+When applicable, update or write a new tutorial or how-to guide.
 
 ### Add developer to the credits
 

diff --git a/doc/src/developers/style.md b/doc/src/developers/style.md
@@ -3,7 +3,8 @@
 ## Rust
 
 **Row's** rust code follows the [Rust style guide][1]. **Row's** [pre-commit][2]
-configuration applies style fixes with `rustfmt` checks for common errors with `clippy`.
+configuration applies style fixes with `rustfmt` and checks for common errors with
+`clippy`.
 
 [1]: https://doc.rust-lang.org/style-guide/index.html
 [2]: https://pre-commit.com/
@@ -16,7 +17,7 @@ configuration applies style fixes with `rustfmt` checks for common errors with `
 
 Wrap **Markdown** files at 88 characters wide, except when not possible (e.g. when
 formatting a table). Follow layout and design patterns established in existing markdown
-files.
+files. Use reference-style links for long URLs.
 
 ## Spelling/grammar
 

diff --git a/doc/src/developers/testing.md b/doc/src/developers/testing.md
@@ -8,9 +8,12 @@ cargo test
 ```
 in the source directory to execute the unit and integration tests.
 
-All tests must be marked either `#[serial]` or `#[parallel]` explicitly. Some serial
-tests set environment variables and/or the current working directory, which may conflict
-with any test that is automatically run concurrently. Check for this with:
+## Writing unit tests
+
+Write tests using standard Rust conventions. All tests must be marked either `#[serial]`
+or `#[parallel]` explicitly. Some serial tests set environment variables and/or the
+current working directory, which may conflict with any test that is automatically run
+concurrently. Check for this with:
 ```bash
 rg --multiline "#\[test\]\n *fn"
 ```

diff --git a/doc/src/env.md b/doc/src/env.md
@@ -1,7 +1,6 @@
 # Environment variables
 
-> Note: Environment variables that influence the execution of **row** are documented in
-> [the command line options](row/index.md).
+## In job scripts
 
 **Row** sets the following environment variables in generated job scripts:
 
@@ -14,3 +13,18 @@
 | `ACTION_PROCESSES_PER_DIRECTORY` | Set to the value of `action.resources.processes_per_directory`. Unset when `processes_per_submission`.|
 | `ACTION_THREADS_PER_PROCESS` | Set to the value of `action.resources.threads_per_process`. Unset when `threads_per_process` is omitted. |
 | `ACTION_GPUS_PER_PROCESS` | Set to the value of `action.resources.gpus_per_process`. Unset when `gpus_per_process` is omitted. |
+
+# Set row options
+
+Set any of these environment variables to provide default values for
+[command line options].
+
+| Environment variable | Option |
+|----------------------|-------------|
+| `ROW_CLEAR_PROGRESS`| --clear-progress |
+| `ROW_CLUSTER` | --cluster |
+| `ROW_COLOR` | --color |
+| `ROW_IO_THREADS` | --io-threads |
+| `ROW_NO_PROGRESS` | --no-progress |
+
+[command line options]: row/index.md
diff --git a/doc/src/guide/concepts/best-practices.md b/doc/src/guide/concepts/best-practices.md