Skip to content

SchedMD/slurm-gcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

e294609 · Feb 9, 2024
Jan 9, 2024
Dec 14, 2023
Jul 27, 2023
Mar 19, 2021
Oct 3, 2023
Dec 13, 2023
Jan 8, 2024
Jan 10, 2024
Aug 3, 2023
Oct 3, 2023
May 10, 2022
Aug 30, 2023
Dec 13, 2023
Oct 3, 2023
May 16, 2023
Sep 22, 2022
Jan 9, 2024
Oct 3, 2023
Mar 15, 2018
Jul 19, 2023
Feb 9, 2024

Repository files navigation

Slurm on Google Cloud Platform

This repository is no longer actively developed

6.3.1 is the last release on this repository. Active development will continue here:

https://github.com/GoogleCloudPlatform/slurm-gcp

Google HPC-Toolkit is the recommended way to use Slurm in GCP.


FAQ | Troubleshooting | Glossary

Overview

slurm-gcp is an open-source software solution that enables setting up Slurm clusters on Google Cloud Platform with ease. With it, you can create and manage Slurm cluster infrastructure in GCP, deployed in different configurations.

Google's HPC Toolkit, on github, can be used to manage and deploy Slurm clusters and other supporting infrastrucutre via HPC Blueprints.

Image Support

See supported Operating Systems and published Image Family for machine image support.

SchedMD

SchedMD provides professional services and commercial support to help you get up and running and stay running.

Issues and/or enhancement requests can be submitted to SchedMD's Bugzilla.

Also, join community discussions on either the Slurm User mailing list or the Google Cloud & Slurm Community Discussion Group.

Cluster Configurations

slurm-gcp can be deployed and used in different configurations and methods to meet your computing needs.

See HPC Blueprints for HPC Toolkit example cluster configurations that are production ready.

Cloud

All Slurm cluster resources will exist in the cloud.

See the Cloud Cluster Guide for details.

Hybrid

Only Slurm compute nodes will exist in the cloud. The Slurm controller and other Slurm components will remain in the onprem environment.

See the Hybrid Cluster Guide for details.

Multi-Cluster/Federation

Two or more clusters are connected, allowing for jobs to be submitted from and ran on different clusters. This can be a mix between onprem and cloud clusters.

See the Federated Cluster Guide for details.

Upgrade to v6

See the Upgrade to v6 Guide for details.

TPU support

slurm-gcp supports using TPU-vm nodes. See TPU guide for details.

Help and Support

Please reach out to us here. We will be happy to support you!