This guide includes how-to guides, sample code, recommendations, and technical best practices to help new users get started with Arm-based systems like the NVIDIA Arm HPC Developer Kit. While it is intended for the users and administrators of NVIDIA's Arm-based platforms, this guide is also generically useful for anyone running HPC applications on Arm CPUs, with or without GPUs. The focus is mostly on the CPU since Arm-hosted GPUs are just the same as GPUs hosted by any other CPUs.
- Introduction to Arm64 and the NVIDIA Arm HPC Developer Kit
- Transitioning Workloads to Arm64
- Examples
- Supported Software (See the full list)
- Optimizing for Arm64
- Language-specific Considerations
- Additional Resources
- Acknowledgements
The easiest way to find help and talk to the experts is to join the NVIDIA Arm HPC Slack workspace.
The NVIDIA Arm HPC Developer Kit (simply "DevKit" in this guide) is an integrated hardware and software platform for creating, evaluating, and benchmarking HPC, AI, and scientific computing applications on a heterogeneous GPU- and CPU-accelerated computing system. The kit includes an Arm CPU, dual NVIDIA A100 Tensor Core GPUs, dual NVIDIA BlueField-2 DPUs, and the NVIDIA HPC SDK suite of tools. See the product page for more information.
This validated platform provides quick and easy bring-up and a stable environment for accelerated code execution and evaluation, performance analysis, system experimentation, and system characterization.
- Delivers a validated system for quick and easy bring-up in familiar HPC environments
- Offers a stable hardware and software platform for development and performance analysis of accelerated HPC, AI, and scientific computing applications
- Enables experimentation and characterization of high-performance, NVIDIA-accelerated, Arm server-based system architectures
Hardware | Specification |
---|---|
Model | GIGABYTE G242-P32, 2U server |
CPU | 1x Ampere Altra Q80-30 (Arm processor) |
GPU | 2x NVIDIA A100 GPU |
Memory | 512G DDR4 memory |
Storage | 6TB SAS/ SATA 3.5″ |
Network | 2x NVIDIA BlueField-2 E-Series DPU: 200GbE/HDR single-port QSFP56 |
The DevKit CPU uses the Arm architecture. The Arm architecture powers over two hundred billion chips across practically all computing domains, so the term "Arm" is somewhat overloaded. Various communities refer to the architecture as "Arm", "ARM", "Arm64", "AArch64", "arm64", etc. You may also find the term "SBSA" used to refer to server-class Arm CPUs. For simplicity, this guide will use the term "Arm64" to refer to any CPU built on the Armv8 or Armv9 standards and implementing Arm's Server Base System Architecture (SBSA). This includes CPUs like:
- Ampere Altra (NVIDIA Arm HPC Developer Kit)
- NVIDIA Grace
- AWS Graviton
- Alibaba Yitian
This guide will call out differences between Arm64 CPUs as needed. Note that this guide is not intended for mobile and embedded Arm CPUs e.g. NVIDIA Tegra. While many of the general principles and approaches presented here will hold true for mobile and embedded Arm platforms, this guide is focused on server-class platforms.
- NVIDIA Arm HPC Developer Kit
- Neoverse N1 Software Optimization Guide
- Armv8 reference manual
- Package repository search tool
Unless otherwise indicated, this work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Individual examples or attached source code may be under a different license. Check the related README or LICENSE files.
This guide was inspired by and borrows from the excellent AWS Graviton Getting Started Guide. The authors of this guide gratefully acknowledge the work of the AWS engineers and thank AWS for freely providing this valuable information in the public domain.
Feedback? [email protected]