GitHub - nickrobison-usds/mobility-analysis: Code Repository for Covid Mobility Analysis

Installation

Setting up CADES

Getting the code running on the CADES cluster takes a little bit of work, but it's not too terrible. I can be broken down into a couple of steps.

Connect to the cluster
Setup the python environment
Run the code

Connect to the cluster

All commands need to be run in the slurm-login node rather than the entry CADES login node. These steps will need to be repeated each time you want to execute an HPC job.

ssh <username>@cades-extlogin01.ornl.gov
ssh <username>@or-slurm-login.ornl.gov

the or-slurm-login hostname is actually load balanced across a number of nodes. Take note of which one you actually connect to (e.g. or-slurm-login01) as you'll want to directly connect to that one in the future.

Setup python

CADES already has built-in support for anaconda, which makes python package management incredibly easy.

You'll first need to load the anaconda module:

module load anaconda3

From there, you can either create a new environemnt, or connect to an existing one.

conda env create <environment name>
conda env activate <environment name>

Finally, you'll need to install the python dependencies (this will also bring along any system libraries that are needed as well.)

conda install jupyter dask geopandas rtree arrow shapely scipy

Run the code

Most HPC work is framed around jobs which are an abstraction across a number of nodes dedicated to executing a parallel piece of work. These jobs are submitted to the cluster, from the login nodes, via the SLURM job scheduler.

Dask natively supports SLURM and will automatically launch a series of interactive jobs to perform a single piece of work.

Dask code can be run either directly via a python script, or interactively via an ipython shell (I tend to use the later).

Most scripts start with the following prologue, which initializes and spins up the Dask backend:

import dask.dataframe as dd
import pandas as pd
import numpy as np
import dask

from dask.distributed import Client
from datetime import datetime

# CADES configuration
from dask_jobqueue import SLURMCluster
# Be careful with using too many process, or you'll run out of file descriptors
cluster = SLURMCluster(project='covid19', queue='covid19_burst', cores=12, memory='350 GB', processes=3, walltime="4:00:00", job_extra=["-N 1"], interface="ib0")
cluster.scale(jobs=20)
client = Client(cluster)

It will take some type for the jobs are start and become ready. You can check the status either via the squeue command:

!squeue -u <username>

or by calling the cluster object directly.

The code can then be run as normal and when the session exits, it will automatically release the HPC nodes.

In order to share code and data between the various compute nodes you'll need to make sure information is stored on the high performance Lustre filesystem. The base path is: /lustre/or-hydra/cades-birthright/<username>.

The easiest way to transfer data to the cluster is via the Globus application, which has an easy web interface for moving data between the cluster and your local machine.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
CBG Connectivity.ipynb		CBG Connectivity.ipynb
CEA Sq. ft analysis.ipynb		CEA Sq. ft analysis.ipynb
Compute Block Group Distance.ipynb		Compute Block Group Distance.ipynb
Mapbox Analysis.ipynb		Mapbox Analysis.ipynb
Mobility Comparison.ipynb		Mobility Comparison.ipynb
MobilityAnalysis.twb		MobilityAnalysis.twb
POI Analysis.ipynb		POI Analysis.ipynb
README.md		README.md
close_interactions_script.py		close_interactions_script.py
concat-close-interactions.ipynb		concat-close-interactions.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Setting up CADES

Connect to the cluster

Setup python

Run the code

About

Releases

Packages

Contributors 2

Languages

nickrobison-usds/mobility-analysis

Folders and files

Latest commit

History

Repository files navigation

Installation

Setting up CADES

Connect to the cluster

Setup python

Run the code

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages