Rework ARC profile

nismod · Nov 29, 2023 · c105026 · c105026
1 parent 8e280fd
commit c105026
Show file tree

Hide file tree

Showing 3 changed files with 43 additions and 74 deletions.
diff --git a/config/arc_cluster/README.md → config/ARC/README.md b/config/arc_cluster/README.md → config/ARC/README.md
@@ -1,70 +1,50 @@
 # Running open-gira on ARC
 
-As open-gira is built with snakemake, its use is remarkably similar from a
-laptop to a cluster. However there are a few differences. They are discussed
-here.
+As open-gira is built using `snakemake`, its use is fairly similar from a
+laptop to a cluster. However there are a few differences, notably using a
+`profile` (discussed here).
 
 ## Python environment
 
-### Initialising our shared conda installation
+### Micromamba
 
-There is a conda install which users may share. This means we don't need to
-create many duplicate environments unnecessarily (snakemake will check to see
-if an equivalent environment has already been created).
+I recommend installing
+[micromamba](https://mamba.readthedocs.io/en/latest/installation/micromamba-installation.html#install-script)
+into your userspace as a package manager for Python packages (and more).
 
-Prior to using the conda install for the first time you must initialise it for
-your shell with the following command:
-```
-/data/ouce-gri-jba/anaconda/condabin/conda init
-```
+### Creating an execution environment
 
-Your `~/.bashrc` should then contain someting like this:
+To create an environment on ARC containing the necessary software to run the workflows:
 ```
-i# >>> conda initialize >>>
-# !! Contents within this block are managed by 'conda init' !!
-    __conda_setup="$('/data/ouce-gri-jba/anaconda/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
-if [ $? -eq 0 ]; then
-    eval "$__conda_setup"
-else
-    if [ -f "/data/ouce-gri-jba/anaconda/etc/profile.d/conda.sh" ]; then
-        . "/data/ouce-gri-jba/anaconda/etc/profile.d/conda.sh"
-    else
-        export PATH="/data/ouce-gri-jba/anaconda/bin:$PATH"
-    fi
-fi
-unset __conda_setup
-# <<< conda initialize <<<
+micromamba create -f environment.yml -y
 ```
 
-### Enabling an environment with snakemake
-
-We use snakemake to create jobs for us. We could use the ARC provided snakemake
-executable from the module load system, but their version is quite old
-(6.10.0). N.B. Versions <7.0.0 may cause the following problem:
-https://github.com/snakemake/snakemake/issues/1392
-
-Instead use a shared conda environment we have created which contains
-snakemake:
+To activate this:
 ```
-conda activate snakemake-7.12.1
+micromamba activate open-gira
 ```
 
 Your prompt should then change to something like:
 ```
-(snakemake-7.12.1) [cenv0899@arc-login01 ~]$
+(open-gira) [cenv0899@arc-login01 ~]$
 ```
 
-## Osmium
+## Exactextract
+
+There is one dependency of `open-gira` that is not available via the conda
+(micromamba) ecosystem, `exactextract`. To install this, see
+[here](https://github.com/isciences/exactextract#compiling) and place the
+compiled binary in your `PATH`.
 
-open-gira jobs which filter Open Street Map datasets may require the use of a
-tool called osmium. This has been compiled on the cluster (with
-`/data/ouce-gri-jba/osmium/build_osmium.sh`). To run osmium, place a symlink
-somewhere on your `$PATH`, pointing to the wrapper script. For example:
+To build (and run) `exactextract` on ARC you will need to use the `module`
+program to load two dependencies:
 ```
-mkdir -p ~/bin
-ln -s /data/ouce-gri-jba/osmium/run_osmium.sh ~/bin/osmium
+module load GEOS/3.10.3-GCC-11.3.0
+module load GDAL/3.5.0-foss-2022a
 ```
 
+I suggest placing these lines in your `~/.bashrc` file so they automatically run on login.
+
 ## Session persistence
 
 To persist a terminal over time (and despite dropped SSH connections) consider using `tmux`.
@@ -81,29 +61,24 @@ Here's a [friendly guide](https://www.hamvocke.com/blog/a-quick-and-easy-guide-t
 
 `tmux attach-session -t <session_id>` to reattach to a session.
 
-## Allocate resources
+## Invoke workflow
 
-Allocate some nodes for use:
+The general pattern to doing work with `open-gira` on `ARC` is to activate the
+environment (see above) and issue a request for a target file:
 ```
-salloc --ntasks-per-node=<max tasks per node> --nodes=<num nodes> --partition=<short|medium|long> --time=01:00:00 --mem=8000
+snakemake --profile config/ARC <target name>
 ```
 
-## Invoke pipeline
-
-Having allocated resources with `salloc` (see above), you can then invoke
-snakemake to dispatch jobs and satisfy your target rule. From the open-gira
-repository call the command you wish to run, using the cluster specific
-profile. For more details on the cluster execution, see the config.yaml file
-in the profile directory. The general pattern is:
-```
-snakemake --profile config/arc_cluster <target name>
-```
+`snakemake` will then identify what work is required, issue job requests to
+`SLURM` and monitor the filesystem to watch for completed results.
 
 To test the pipeline with a short job, try the following:
 ```
-snakemake --profile config/arc_cluster results/exposure/tanzania-mini_filter-road/hazard-aqueduct-river/img
+snakemake --profile config/ARC results/exposure/tanzania-mini_filter-road/hazard-aqueduct-river/img
 ```
 
+Resource allocation is defined per rule, with defaults in `config/ARC/config.yaml`.
+
 ## Interpreting errors
 
 Each submitted job will have its `stdout` logged to file. This is very useful
@@ -148,4 +123,5 @@ Traceback (most recent call last):
 FileNotFoundError: [Errno 2] No such file or directory: 'osmium'
 ```
 
-In particular, here, the `FileNotFoundError` says that the job runner couldn't find `osmium`, which is needed to run this rule.
+In particular, here, the `FileNotFoundError` says that the job runner couldn't
+find `osmium`, which is needed to run this rule.
diff --git a/config/arc_cluster/config.yaml → config/ARC/config.yaml b/config/arc_cluster/config.yaml → config/ARC/config.yaml
@@ -8,28 +8,21 @@ cluster:
     --job-name=smk-{rule}-{wildcards}
     --output=logs/{rule}/{rule}-{wildcards}-%j.out
     --export=ALL
+    --cluster=htc
+    --mail-type=BEGIN,END,FAIL
+    [email protected]
     --parsable
 default-resources:
   - qos=standard  # {basic, standard, priority} only have credits for standard
   - partition=short  # {short, medium, long, devel, interactive}
   - mem_mb=16000
-  - time="08:00:00"  # maximum time for a single job
-restart-times: 1  # if a job fails, retry it once
-max-jobs-per-second: 16  # maximum jobs to _submit_ per second
+  - time="01:00:00"  # maximum time for a single job
+max-jobs-per-second: 1
 max-status-checks-per-second: 1
-local-cores: 1
+restart-times: 1  # if a job fails, retry it once
 latency-wait: 15  # seconds to wait for files to appear before failing
-jobs: 64  # max simultaneous jobs
+jobs: 4  # max simultaneous jobs
 keep-going: True  # do not stop workflow if job(s) fail
 rerun-incomplete: True
 printshellcmds: True
 scheduler: greedy
-cluster-status: status-sacct.sh  # script to poll for job status
-use-conda: True  # activate conda env prior to running any given rule
-# the following path is where an anaconda install (with mamba) is located, envs
-# installed here by snakemake or otherwise should be reusable across the group
-conda-prefix: /data/ouce-gri-jba/anaconda/envs
-# use `mamba` to create envs
-# micromamba support coming soon? see:
-# https://github.com/snakemake/snakemake/pull/1889
-conda-frontend: mamba
diff --git a/config/arc_cluster/status-sacct.sh → config/ARC/status-sacct.sh b/config/arc_cluster/status-sacct.sh → config/ARC/status-sacct.sh