Skip to content

~NGC release testing #200

~NGC release testing

~NGC release testing #200

Manually triggered January 28, 2025 09:09
Status Failure
Total duration 11h 43m 54s
Artifacts 27

ngc-release-testing.yaml

on: workflow_dispatch
Matrix: test-maxtext / maxtext-multinode
Matrix: test-maxtext / single-process-multi-device
Matrix: test-jax / run-unit-test
Matrix: test-rosetta-pax / rosetta-pax-multi-node-te
Matrix: test-rosetta-pax / rosetta-pax-multi-node
Matrix: test-rosetta-pax / rosetta-pax-single-node-dropout-te
Matrix: test-rosetta-pax / single-process-evaluation-te
Matrix: test-rosetta-pax / single-process-multi-device-te
test-jax  /  ...  /  launch-slurm-runner
18m 51s
test-jax / runner / launch-slurm-runner
test-maxtext-eks  /  maxtext
5m 49s
test-maxtext-eks / maxtext
test-nccl  /  ...  /  build-mpi-operator-compatible-base
2m 52s
test-nccl / build-mpi-operator-compatible-base / build-mpi-operator-compatible-base
test-maxtext  /  test-maxtext-summary
0s
test-maxtext / test-maxtext-summary
test-maxtext  /  test-maxtext-metrics
0s
test-maxtext / test-maxtext-metrics
test-rosetta-pax  /  test-pax-rosetta-summary
0s
test-rosetta-pax / test-pax-rosetta-summary
test-rosetta-pax  /  test-pax-rosetta-metrics
0s
test-rosetta-pax / test-pax-rosetta-metrics
Matrix: test-nccl / nccl-test
test-maxtext  /  ...  /  sitrep
6s
test-maxtext / test-maxtext-sitrep / sitrep
test-rosetta-pax  /  ...  /  sitrep
8s
test-rosetta-pax / test-pax-rosetta-sitrep / sitrep
test-maxtext  /  test-maxtext-outcome
0s
test-maxtext / test-maxtext-outcome
test-rosetta-pax  /  test-pax-rosetta-outcome
0s
test-rosetta-pax / test-pax-rosetta-outcome
finalize  /  workflow-badge
3s
finalize / workflow-badge
finalize  /  report
15s
finalize / report
finalize  /  upload-badge
4s
finalize / upload-badge
finalize  /  publish-badge
3s
finalize / publish-badge
Fit to window
Zoom out
Zoom in

Annotations

8 errors
test-rosetta-pax / rosetta-pax-multi-node (4, 2, 1, 2)
The job running on runner jumpbox-vc69x-jjv9x has exceeded the maximum execution time of 360 minutes.
test-rosetta-pax / rosetta-pax-multi-node (4, 2, 1, 2)
The operation was canceled.
test-rosetta-pax / rosetta-pax-multi-node-te (16DP1FSDP1TP1PP_TE, 1, 16, 1, 1, 4)
The job running on runner jumpbox-vc69x-rzvh2 has exceeded the maximum execution time of 360 minutes.
test-maxtext / maxtext-multinode (1, 4, 2, 2)
The job running on runner jumpbox-vc69x-ltbrk has exceeded the maximum execution time of 360 minutes.
test-maxtext / maxtext-multinode (1, 4, 2, 2)
The operation was canceled.
test-maxtext / test-maxtext-outcome
Process completed with exit code 1.
test-rosetta-pax / test-pax-rosetta-outcome
Process completed with exit code 1.

Artifacts

Produced during runtime
Name Size
artifact-final-report
2.06 KB
artifact-maxtext-test
653 Bytes
artifact-mpi-operator-compatible-base-build-amd64
638 Bytes
artifact-rosetta-pax-mgmn-test
724 Bytes
artifact-workflow-metadata
266 Bytes
jax-unit-test-A100
20.7 KB
jax-unit-test-V100
27.3 KB
rosetta-pax-13007049301-1DP1FSDP1TP1PP_TE
1.31 KB
rosetta-pax-13007049301-1DP2FSDP4TP1PP_single_process_TE
1.49 KB
rosetta-pax-13007049301-1DP8FSDP1TP1PP_TE
1.34 KB
rosetta-pax-13007049301-2DP1FSDP1TP4PP
1.31 KB
rosetta-pax-13007049301-4DP1FSDP2TP1PP
1.31 KB
rosetta-pax-13007049301-4DP1FSDP2TP1PP_TE
1.35 KB
rosetta-pax-13007049301-5B_fused_attn_0
1.33 KB
rosetta-pax-13007049301-5B_fused_attn_1
1.33 KB
rosetta-pax-13007049301-8DP1FSDP1TP1PP
1.31 KB
rosetta-pax-13007049301-8DP1FSDP1TP1PP_TE
1.34 KB
rosetta-pax-13007049301-8DP1FSDP1TP1PP_eval_TE
1.4 KB
rosetta-pax-13007049301-8DP1FSDP1TP1PP_single_process_TE
1.48 KB
rosetta-pax-13007049301-8DP_TE_dropout
1.36 KB
rosetta-pax-13007049301-LLaMA_eval_TE
1.31 KB
upstream-maxtext-13007049301-1DP1FSDP1TP1PP
876 Bytes
upstream-maxtext-13007049301-1DP1FSDP8TP1PP
907 Bytes
upstream-maxtext-13007049301-1DP2FSDP4TP1PP_single_process
933 Bytes
upstream-maxtext-13007049301-1DP4FSDP2TP1PP
903 Bytes
upstream-maxtext-13007049301-1DP8FSDP1TP1PP
909 Bytes
upstream-maxtext-13007049301-2DP2FSDP2TP1PP
899 Bytes