Add a Dockerfile for AMD ROCm #3750

dark-penguin · 2025-02-07T17:22:20Z

Description

Provide a Dockerfile for AMD ROCm. Finding a good image is not trivial because unlike the Torch image for CUDA, the image for ROCm is 71 GB for whatever reason.

Additionally, having a Dockerfile that "works" is a great reference for when you are trying to install something on bare metal.

Notes

Build with: docker build -t sdnext -f Dockerfile.rocm .
Run with (example): docker run -it --rm --device /dev/dri --group-add video -v /sdnext:/mnt -p 7860:7860 sdnext

--device /dev/dri - that's the way to "mount" the graphics card devices into the container (instead of NVidia Toolkit)
--group-add video - the user inside the container needs access to that device
-v /sdnext:/mnt - mount a volume or a directory to keep persistent data
-p 7860:7860 - publish the port

The Dockerfile is made with minimal changes from the "official" NVidia Dockerfile to minimize the difference.

The Torch image for ROCm is 71 GB for some reason, so one difference I had to make is use a smaller image with only the essentials of ROCm installed (3 GB). Torch will be installed at buildtime (~2 GB download size). Total size of the built image is 23 GB (apparently Torch is packed really well).

Environment and Testing

Tested on Debian 12 Bookworm (I had to remove the --skip-all option from the CMD while testing since it's currently broken in master).

merge dev

dark-penguin · 2025-02-07T17:23:18Z

Oops, I guess I should have opened the PR against the dev branch...

lbeltrame · 2025-02-10T18:47:36Z

Dockerfile.rocm

+LABEL org.opencontainers.image.licenses="AGPL-3.0"
+LABEL org.opencontainers.image.title="SD.Next"
+LABEL org.opencontainers.image.description="SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models"
+LABEL org.opencontainers.image.base.name="https://hub.docker.com/pytorch/pytorch:2.5.1-cuda12.4-cudnn9-runtime"


This doesn't seem correct here, does it? (given that this uses ROCm)

lbeltrame · 2025-02-11T16:53:46Z

You may want to add a comment at the top of the Dockerfile mentioning the *GFX_OVERRIDE (forgot the complete name) env variable, because it needs to be set in case people don't run an officially supported card.

dark-penguin · 2025-02-11T17:28:34Z

Good point, but that's up to @vladmandic I guess. A comment in the Dockerfile or a note in the Wiki?

vladmandic · 2025-02-11T19:59:39Z

a) yes, rocm overrides should be exposed as its quite a common thing.
b) if we're to include docker for anything except cuda, wiki page needs rewrite as well. adding dockerfile without that is pointless. https://github.com/vladmandic/sdnext/wiki/Docker

vladmandic · 2025-02-11T23:54:07Z

ok, i've pretty much rewritten https://github.com/vladmandic/sdnext/wiki/Docker so its not cuda specific
this pr should target this file, not create new one in root: https://github.com/vladmandic/sdnext/blob/dev/configs/Dockerfile.rocm

Disty0 · 2025-02-12T11:22:36Z

Added Dockerfile.rocm: https://github.com/vladmandic/sdnext/blob/dev/configs/Dockerfile.rocm

Went with different approach than Cuda because of flash atten.

We can save 30 gb of disk space after installing flash atten in rocm-complete image and share the venv with the smaller rocm runtime image.
venv can also be shared between different instances if you have multiple gpus.

Also using ubuntu 24 with python3.12 because onnxruntime-rocm needs python3.12.

If you want to make changes, please target the new file.

vladmandic and others added 2 commits February 5, 2025 08:26

Merge pull request vladmandic#3745 from vladmandic/dev

9f12223

merge dev

Add a Dockerfile for AMD ROCm

fd28bd6

vladmandic changed the base branch from master to dev February 8, 2025 21:19

Merge branch 'dev' into contribute

0a61552

lbeltrame reviewed Feb 10, 2025

View reviewed changes

Fix the image metadata

257cbd2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a Dockerfile for AMD ROCm #3750

Add a Dockerfile for AMD ROCm #3750

dark-penguin commented Feb 7, 2025

dark-penguin commented Feb 7, 2025

lbeltrame Feb 10, 2025

dark-penguin Feb 10, 2025

lbeltrame commented Feb 11, 2025

dark-penguin commented Feb 11, 2025

vladmandic commented Feb 11, 2025

vladmandic commented Feb 11, 2025

Disty0 commented Feb 12, 2025 •

edited

Loading

Add a Dockerfile for AMD ROCm #3750

Are you sure you want to change the base?

Add a Dockerfile for AMD ROCm #3750

Conversation

dark-penguin commented Feb 7, 2025

Description

Notes

Environment and Testing

dark-penguin commented Feb 7, 2025

lbeltrame Feb 10, 2025

Choose a reason for hiding this comment

dark-penguin Feb 10, 2025

Choose a reason for hiding this comment

lbeltrame commented Feb 11, 2025

dark-penguin commented Feb 11, 2025

vladmandic commented Feb 11, 2025

vladmandic commented Feb 11, 2025

Disty0 commented Feb 12, 2025 • edited Loading

Disty0 commented Feb 12, 2025 •

edited

Loading