-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
amd build improvements #156
amd build improvements #156
Conversation
Skipping CI for Draft Pull Request. |
/test all |
cf62620
to
64a8df2
Compare
64a8df2
to
d7a74df
Compare
RUN rpm -ivh https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm && \ | ||
rpm -ql epel-release && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need epel and -ql listing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need epel for ccache, the -ql
listing is not required
Dockerfile.rocm.ubi
Outdated
ENV CPLUS_INCLUDE_PATH=$CPLUS_INCLUDE_PATH:/libtorch/include:/libtorch/include/torch/csrc/api/include:/opt/rocm/include | ||
ENV PYTORCH_ROCM_ARCH="gfx908;gfx90a;gfx942;gfx1100" | ||
ENV CCACHE_DIR=/root/.cache/ccache | ||
ENV PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this var used by vllm only later on..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can remove this.
Dockerfile.rocm.ubi
Outdated
torch==2.5.0.dev20240726+rocm6.1 \ | ||
torchvision==0.20.0.dev20240726+rocm6.1 && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already installed torch at line 77, do we just copy files from mounted cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the torch install to the rocm_base
layer instead
e5f6c41
to
0f2d1ef
Compare
0f2d1ef
to
a25f69f
Compare
a25f69f
to
8f1fcff
Compare
@dtrifiro: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Smoke test failure is unrelated |
8f1fcff
to
1e8d6df
Compare
- get rid of non-essential dependencies - consolidate package installs - do not copy wheels in final stage - fix ccache usage - use flashattention with triton backend by default: - clone main_perf branch - build rocm target - set up triton rocm env var - configure numba, outlines and triton cache directory
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dtrifiro, NickLucche The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Dockerfile.rocm.ubi:
flash-attention
with triton backend by default:https://issues.redhat.com/browse/RHOAIENG-12611