Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: describe containerset retrystrategy and verify it works. Fixes: #11502 #12809

Merged
merged 66 commits into from
Apr 4, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
0a7fe4c
fix: test doc
shuangkun Mar 19, 2024
e57439f
fix: test
shuangkun Mar 19, 2024
6bdf3fe
fix: test
shuangkun Mar 19, 2024
609b565
fix: docs
shuangkun Mar 20, 2024
cb06893
fix: docs
shuangkun Mar 20, 2024
0ffacb2
fix: move clisute to retry suit.
shuangkun Mar 20, 2024
dec8483
fix: test
shuangkun Mar 20, 2024
9b684d6
Update docs/container-set-template.md
shuangkun Mar 26, 2024
f6dd091
fix: comments
shuangkun Mar 26, 2024
b08c977
fix: comments
shuangkun Mar 26, 2024
da0b05a
fix: codegen.
shuangkun Mar 26, 2024
396700d
fix: codegen
shuangkun Mar 26, 2024
136ae10
fix: docs
shuangkun Mar 26, 2024
46965bc
fix: docs
shuangkun Mar 26, 2024
1703b4c
fix: docs
shuangkun Mar 26, 2024
728138a
fix: test
shuangkun Mar 26, 2024
fabf22b
fix: test
shuangkun Mar 26, 2024
41822f1
fix: test
shuangkun Mar 26, 2024
439c7ab
fix: test
shuangkun Mar 26, 2024
ba61ea9
fix: test
shuangkun Mar 26, 2024
257596b
fix: comments
shuangkun Mar 27, 2024
707fbf3
fix: test
shuangkun Mar 27, 2024
76cee1a
fix: docs
shuangkun Mar 27, 2024
0387c4e
fix: test doc
shuangkun Mar 19, 2024
cb6202f
fix: test
shuangkun Mar 19, 2024
6888008
fix: test
shuangkun Mar 19, 2024
41f5823
fix: docs
shuangkun Mar 20, 2024
5bb915c
fix: docs
shuangkun Mar 20, 2024
311ce1f
fix: move clisute to retry suit.
shuangkun Mar 20, 2024
d08e66b
fix: test
shuangkun Mar 20, 2024
2f31d71
Update docs/container-set-template.md
shuangkun Mar 26, 2024
e87bfda
fix: comments
shuangkun Mar 26, 2024
395f521
fix: comments
shuangkun Mar 26, 2024
2eed443
fix: codegen.
shuangkun Mar 26, 2024
44594fe
fix: codegen
shuangkun Mar 26, 2024
996201a
fix: docs
shuangkun Mar 26, 2024
91cb203
fix: docs
shuangkun Mar 26, 2024
65c499b
fix: docs
shuangkun Mar 26, 2024
e079ccc
fix: test
shuangkun Mar 26, 2024
6248d06
fix: test
shuangkun Mar 26, 2024
d3d36f4
fix: test
shuangkun Mar 26, 2024
b593f8b
fix: test
shuangkun Mar 26, 2024
f483b1e
fix: test
shuangkun Mar 26, 2024
9b8ec1a
fix: comments
shuangkun Mar 27, 2024
5bb4f9d
fix: test
shuangkun Mar 27, 2024
a1b1a8e
fix: docs
shuangkun Mar 27, 2024
8d85df2
Update docs/container-set-template.md
shuangkun Mar 29, 2024
b354390
Update docs/container-set-template.md
shuangkun Mar 29, 2024
5dab9c5
Merge branch 'addTestForContainerSetRetry' of github.com:shuangkun/ar…
shuangkun Mar 31, 2024
c743a59
fix: add note.
shuangkun Mar 31, 2024
db7c161
fix: add note.
shuangkun Mar 31, 2024
485c661
Merge branch 'main' into addTestForContainerSetRetry
shuangkun Mar 31, 2024
05094bf
Update docs/container-set-template.md
shuangkun Apr 4, 2024
1ab992e
Update docs/container-set-template.md
shuangkun Apr 4, 2024
058bc04
Update docs/container-set-template.md
shuangkun Apr 4, 2024
c161049
Update pkg/apis/workflow/v1alpha1/container_set_template_types.go
shuangkun Apr 4, 2024
149bf7d
fix: codegen
shuangkun Apr 4, 2024
6e5e615
fix: docs
shuangkun Apr 4, 2024
87ac23d
fix: docs
shuangkun Apr 4, 2024
50a2270
fix: docs
shuangkun Apr 4, 2024
22caa45
fix: docs
shuangkun Apr 4, 2024
8f3655f
fix: docs
shuangkun Apr 4, 2024
836aff0
fix: docs
shuangkun Apr 4, 2024
af66061
fix: docs
shuangkun Apr 4, 2024
decbcc2
Merge branch 'main' into addTestForContainerSetRetry
shuangkun Apr 4, 2024
02ce8da
fix markdown in admonition with an ignore and some new line corrections
agilgur5 Apr 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion api/jsonschema/schema.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion api/openapi-spec/swagger.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

23 changes: 23 additions & 0 deletions docs/container-set-template.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,3 +116,26 @@ Example B: Lopsided requests, e.g. `a -> b` where `a` is cheap and `b` is expens
Can you see the problem here? `a` only has small requests, but the container set will use the total of all requests. So it's as if you're using all that GPU for 10h. This will be expensive.

Solution: do not use container set when you have lopsided requests.

## Container Set Retries
shuangkun marked this conversation as resolved.
Show resolved Hide resolved
shuangkun marked this conversation as resolved.
Show resolved Hide resolved

> v3.3 and after

Container Set Retry policies describes how to retry a container nodes in the container set if it fails.

Number of retries(default 0) and sleep duration between retries(default 0s, instant retry) can be set.
shuangkun marked this conversation as resolved.
Show resolved Hide resolved

The container won't retry if it's unable to locate the command.

Here is an example of a Container Set Template with `retryStrategy`:
agilgur5 marked this conversation as resolved.
Show resolved Hide resolved

```yaml
containerSet:
containers:
- name: retry-containerset
image: alpine:latest
retryStrategy:
limit: "3"
agilgur5 marked this conversation as resolved.
Show resolved Hide resolved
command: [ sh, -c ]
args: [ "echo intentional failure; exit 1" ]
```
agilgur5 marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 1 addition & 1 deletion docs/fields.md
Original file line number Diff line number Diff line change
Expand Up @@ -2411,7 +2411,7 @@ _No description available_
| Field Name | Field Type | Description |
|:----------:|:----------:|---------------|
|`containers`|`Array<`[`ContainerNode`](#containernode)`>`|_No description available_|
|`retryStrategy`|[`ContainerSetRetryStrategy`](#containersetretrystrategy)|RetryStrategy describes how to retry a container nodes in the container set if it fails. Nbr of retries(default 0) and sleep duration between retries(default 0s, instant retry) can be set.|
|`retryStrategy`|[`ContainerSetRetryStrategy`](#containersetretrystrategy)|RetryStrategy describes how to retry a container nodes in the container set if it fails. Nbr of retries(default 0) and sleep duration between retries(default 0s, instant retry) can be set. The container won't retry if it's unable to locate the command.|
|`volumeMounts`|`Array<`[`VolumeMount`](#volumemount)`>`|_No description available_|

## DAGTemplate
Expand Down
1 change: 1 addition & 0 deletions pkg/apis/workflow/v1alpha1/container_set_template_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ type ContainerSetTemplate struct {
VolumeMounts []corev1.VolumeMount `json:"volumeMounts,omitempty" protobuf:"bytes,3,rep,name=volumeMounts"`
// RetryStrategy describes how to retry a container nodes in the container set if it fails.
// Nbr of retries(default 0) and sleep duration between retries(default 0s, instant retry) can be set.
// The container won't retry if it's unable to locate the command.
agilgur5 marked this conversation as resolved.
Show resolved Hide resolved
RetryStrategy *ContainerSetRetryStrategy `json:"retryStrategy,omitempty" protobuf:"bytes,5,opt,name=retryStrategy"`
}

agilgur5 marked this conversation as resolved.
Show resolved Hide resolved
Expand Down
1 change: 1 addition & 0 deletions pkg/apis/workflow/v1alpha1/generated.proto

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pkg/apis/workflow/v1alpha1/openapi_generated.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

56 changes: 56 additions & 0 deletions test/e2e/cli_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1733,6 +1733,62 @@ func (s *CLISuite) TestPluginStruct() {
})
}

func (s *CLISuite) TestWorkflowTemplateWithRetryStrategyInContainerSet() {
var name string
s.Given().
WorkflowTemplate("@testdata/workflow-template-with-containerset.yaml").
Workflow(`
metadata:
generateName: workflow-template-containerset-
spec:
workflowTemplateRef:
name: containerset-with-retrystrategy
`).
When().
CreateWorkflowTemplates().
SubmitWorkflow().
WaitForWorkflow(fixtures.ToBeFailed).
Then().
ExpectWorkflow(func(t *testing.T, metadata *metav1.ObjectMeta, status *wfv1.WorkflowStatus) {
assert.Equal(t, status.Phase, wfv1.WorkflowFailed)
name = metadata.Name
})
// Success, no need retry
s.Run("ContainerLogs", func() {
s.Given().
RunCli([]string{"logs", name, name, "-c", "c1"}, func(t *testing.T, output string, err error) {
if assert.NoError(t, err) {
count := strings.Count(output, "capturing logs")
assert.Equal(t, 1, count)
assert.Contains(t, output, "hi")
}
})
})
// Command err. No retry logic is entered.
s.Run("ContainerLogs", func() {
s.Given().
RunCli([]string{"logs", name, name, "-c", "c2"}, func(t *testing.T, output string, err error) {
if assert.NoError(t, err) {
count := strings.Count(output, "capturing logs")
assert.Equal(t, 0, count)
assert.Contains(t, output, "executable file not found in $PATH")
}
})
})
// Retry when err.
s.Run("ContainerLogs", func() {
s.Given().
RunCli([]string{"logs", name, name, "-c", "c3"}, func(t *testing.T, output string, err error) {
if assert.NoError(t, err) {
count := strings.Count(output, "capturing logs")
assert.Equal(t, 2, count)
countFailureInfo := strings.Count(output, "intentional failure")
assert.Equal(t, 2, countFailureInfo)
}
})
})
}

func TestCLISuite(t *testing.T) {
suite.Run(t, new(CLISuite))
}
32 changes: 32 additions & 0 deletions test/e2e/testdata/workflow-template-with-containerset.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: containerset-with-retrystrategy
annotations:
workflows.argoproj.io/description: |
This workflow template is used to create a workflow with containerset.
agilgur5 marked this conversation as resolved.
Show resolved Hide resolved
spec:
entrypoint: test
templates:
- name: test
containerSet:
retryStrategy:
retries: "2"
containers:
- name: c1
image: python:alpine3.6
command:
- python
- -c
args:
- |
print("hi")
- name: c2
image: python:alpine3.6
command:
- invalid
- command
- name: c3
image: alpine:latest
command: [ sh, -c ]
args: [ "echo intentional failure; exit 1" ]
Loading