Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: don't log non-errors as "Non-transient error: <nil>". Fixes #13881 #13917

Merged
merged 2 commits into from
Nov 24, 2024

Conversation

MasonM
Copy link
Contributor

@MasonM MasonM commented Nov 19, 2024

Fixes #13881

Motivation

There's many places that can call IsTransientErr() with nil, e.g.

return !errorsutil.IsTransientErr(err), err

This caused IsTransientErr() to log Non-transient error: <nil>. We shouldn't be generating logs in this case.

Modifications

Don't log when the error is nil

Verification

This can be reproduced locally by running make start PROFILE=mysql. Before this change:

$ head -n9 logs/controller.log
time="2024-11-19T05:55:16Z" level=info msg="index config" indexWorkflowSemaphoreKeys=true                                                                                                                                                                                          
time="2024-11-19T05:55:16Z" level=info msg="cron config" cronSyncPeriod=10s                                                                                                                                                                                                        
time="2024-11-19T05:55:16Z" level=info msg="Memoization caches will be garbage-collected if they have not been hit after" gcAfterNotHitDuration=30s                                                                                                   
time="2024-11-19T05:55:16.010Z" level=info msg="not enabling pprof debug endpoints"
time="2024-11-19T05:55:16.014Z" level=debug msg="Get configmaps 200" 
time="2024-11-19T05:55:16.017Z" level=info msg="Configuration:\nartifactRepository:\n  s3:\n    accessKeySecret:\n      key: accesskey\n      name: my-minio-cred\n    bucket: my-bucket\n    endpoint: minio:9000\n    insecure: true\n    secretKeySecret:\n      key: secretkey\n      name: my-minio-cred\ncolumns:\n- key: workflows.argoproj.io/completed\n  name: Workflow Completed\n  type: label\nexecutor:\n  imagePullPolicy: IfNotPresent\n  name: \"\"\n  resources:\n    limits:\n      cpu: 500m\n      memory: 256Mi\n    requests:\n      cpu: 100m\n      memory: 64Mi\nimages:\n  docker/whalesay:latest:\n    cmd:\n    - cowsay\ninitialDelay: 0s\nlinks:\n- name: Workflow Link\n  scope: workflow\n  url: http://logging-facility?namespace=${metadata.namespace}&workflowName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Pod Link\n  scope: pod\n  url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Pod Logs Link\n  scope: pod-logs\n  url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Event Source Logs Link\n  scope: event-source-logs\n  url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Sensor Logs Link\n  scope: sensor-logs\n  url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Completed Workflows\n  scope: workflow-list\n  url: http://workflows?label=workflows.argoproj.io/completed=true\nmetricsConfig:\n  enabled: true\n  path: /metrics\n  port: 9090\nnamespaceParallelism: 10\nnodeEvents: {}\npersistence:\n  archive: true\n  archiveTTL: 168h0m0s\n  connectionPool:\n    maxIdleConns: 100\n  mysql:\n    database: argo\n    host: mysql\n    passwordSecret:\n      key: password\n      name: argo-mysql-config\n    port: 3306\n    tableName: argo_workflows\n    userNameSecret:\n      key: username\n      name: argo-mysql-config\n  nodeStatusOffLoad: true\npodSpecLogStrategy: {}\nretentionPolicy:\n  completed: 10\n  errored: 2\n  failed: 2\nsso:\n  clientId:\n    key: \"\"\n  clientSecret:\n    key: \"\"\n  issuer: \"\"\n  redirectUrl: \"\"\n  sessionExpiry: 0s\ntelemetryConfig: {}\nworkflowDefaults:\n  metadata:\n    creationTimestamp: null\n  spec:\n    activeDeadlineSeconds: 300\n    arguments: {}\n    podSpecPatch: |\n      terminationGracePeriodSeconds: 3\n  status:\n    finishedAt: null\n    startedAt: null\nworkflowEvents: {}\n"                             
time="2024-11-19T05:55:16.017Z" level=info msg="Persistence configuration enabled"                                                                                                                                                                                                 
time="2024-11-19T05:55:16.019Z" level=debug msg="Get secrets 200"                                                                                                                                                                                                                  
time="2024-11-19T05:55:16.019Z" level=warning msg="Non-transient error: <nil>"                                                                                                                                                                                                     

After:

$ head -n9 logs/controller.log                      
time="2024-11-19T05:55:57Z" level=info msg="index config" indexWorkflowSemaphoreKeys=true
time="2024-11-19T05:55:57Z" level=info msg="cron config" cronSyncPeriod=10s
time="2024-11-19T05:55:57Z" level=info msg="Memoization caches will be garbage-collected if they have not been hit after" gcAfterNotHitDuration=30s
time="2024-11-19T05:55:57.231Z" level=info msg="not enabling pprof debug endpoints"
time="2024-11-19T05:55:57.235Z" level=debug msg="Get configmaps 200"
time="2024-11-19T05:55:57.239Z" level=info msg="Configuration:\nartifactRepository:\n  s3:\n    accessKeySecret:\n      key: accesskey\n      name: my-minio-cred\n    bucket: my-bucket\n    endpoint: minio:9000\n    insecure: true\n    secretKeySecret:\n      key: secretkey\n      name: my-minio-cred\ncolumns:\n- key: workflows.argoproj.io/completed\n  name: Workflow Completed\n  type: label\nexecutor:\n  imagePullPolicy: IfNotPresent\n  name: \"\"\n  resources:\n    limits:\n      cpu: 500m\n      memory: 256Mi\n    requests:\n      cpu: 100m\n      memory: 64Mi\nimages:\n  docker/whalesay:latest:\n    cmd:\n    - cowsay\ninitialDelay: 0s\nlinks:\n- name: Workflow Link\n  scope: workflow\n  url: http://logging-facility?namespace=${metadata.namespace}&workflowName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Pod Link\n  scope: pod\n  url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Pod Logs Link\n  scope: pod-logs\n  url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Event Source Logs Link\n  scope: event-source-logs\n  url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Sensor Logs Link\n  scope: sensor-logs\n  url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Completed Workflows\n  scope: workflow-list\n  url: http://workflows?label=workflows.argoproj.io/completed=true\nmetricsConfig:\n  enabled: true\n  path: /metrics\n  port: 9090\nnamespaceParallelism: 10\nnodeEvents: {}\npersistence:\n  archive: true\n  archiveTTL: 168h0m0s\n  connectionPool:\n    maxIdleConns: 100\n  mysql:\n    database: argo\n    host: mysql\n    passwordSecret:\n      key: password\n      name: argo-mysql-config\n    port: 3306\n    tableName: argo_workflows\n    userNameSecret:\n      key: username\n      name: argo-mysql-config\n  nodeStatusOffLoad: true\npodSpecLogStrategy: {}\nretentionPolicy:\n  completed: 10\n  errored: 2\n  failed: 2\nsso:\n  clientId:\n    key: \"\"\n  clientSecret:\n    key: \"\"\n  issuer: \"\"\n  redirectUrl: \"\"\n  sessionExpiry: 0s\ntelemetryConfig: {}\nworkflowDefaults:\n  metadata:\n    creationTimestamp: null\n  spec:\n    activeDeadlineSeconds: 300\n    arguments: {}\n    podSpecPatch: |\n      terminationGracePeriodSeconds: 3\n  status:\n    finishedAt: null\n    startedAt: null\nworkflowEvents: {}\n"
time="2024-11-19T05:55:57.239Z" level=info msg="Persistence configuration enabled"
time="2024-11-19T05:55:57.241Z" level=debug msg="Get secrets 200"
time="2024-11-19T05:55:57.243Z" level=debug msg="Get secrets 200"

…3881

There's many places that can call `IsTransientErr()` with nil, e.g. https://github.com/argoproj/argo-workflows/blob/2fd54884844bb76d760466027afa023c5bfd6b64/util/util.go#L39

This caused `IsTransientErr()` to log `Non-transient error: <nil>`. We
shouldn't be generating logs in this case.

This can be reproduced locally by running `make start PROFILE=mysql`:
``
$ head -n9 logs/controller.log
time="2024-11-19T05:55:16Z" level=info msg="index config" indexWorkflowSemaphoreKeys=true
time="2024-11-19T05:55:16Z" level=info msg="cron config" cronSyncPeriod=10s
time="2024-11-19T05:55:16Z" level=info msg="Memoization caches will be garbage-collected if they have not been hit after" gcAfterNotHitDuration=30s
time="2024-11-19T05:55:16.010Z" level=info msg="not enabling pprof debug endpoints"
time="2024-11-19T05:55:16.014Z" level=debug msg="Get configmaps 200"
time="2024-11-19T05:55:16.017Z" level=info msg="Configuration:\nartifactRepository:\n  s3:\n    accessKeySecret:\n      key: accesskey\n      name: my-minio-cred\n    bucket: my-bucket\n    endpoint: minio:9000\n    insecure: true\n    secretKeySecret:\n      key: secretkey\n      name: my-minio-cred\ncolumns:\n- key: workflows.argoproj.io/completed\n  name: Workflow Completed\n  type: label\nexecutor:\n  imagePullPolicy: IfNotPresent\n  name: \"\"\n  resources:\n    limits:\n      cpu: 500m\n      memory: 256Mi\n    requests:\n      cpu: 100m\n      memory: 64Mi\nimages:\n  docker/whalesay:latest:\n    cmd:\n    - cowsay\ninitialDelay: 0s\nlinks:\n- name: Workflow Link\n  scope: workflow\n  url: http://logging-facility?namespace=${metadata.namespace}&workflowName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Pod Link\n  scope: pod\n  url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Pod Logs Link\n  scope: pod-logs\n  url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Event Source Logs Link\n  scope: event-source-logs\n  url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Sensor Logs Link\n  scope: sensor-logs\n  url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Completed Workflows\n  scope: workflow-list\n  url: http://workflows?label=workflows.argoproj.io/completed=true\nmetricsConfig:\n  enabled: true\n  path: /metrics\n  port: 9090\nnamespaceParallelism: 10\nnodeEvents: {}\npersistence:\n  archive: true\n  archiveTTL: 168h0m0s\n  connectionPool:\n    maxIdleConns: 100\n  mysql:\n    database: argo\n    host: mysql\n    passwordSecret:\n      key: password\n      name: argo-mysql-config\n    port: 3306\n    tableName: argo_workflows\n    userNameSecret:\n      key: username\n      name: argo-mysql-config\n  nodeStatusOffLoad: true\npodSpecLogStrategy: {}\nretentionPolicy:\n  completed: 10\n  errored: 2\n  failed: 2\nsso:\n  clientId:\n    key: \"\"\n  clientSecret:\n    key: \"\"\n  issuer: \"\"\n  redirectUrl: \"\"\n  sessionExpiry: 0s\ntelemetryConfig: {}\nworkflowDefaults:\n  metadata:\n    creationTimestamp: null\n  spec:\n    activeDeadlineSeconds: 300\n    arguments: {}\n    podSpecPatch: |\n      terminationGracePeriodSeconds: 3\n  status:\n    finishedAt: null\n    startedAt: null\nworkflowEvents: {}\n"
time="2024-11-19T05:55:16.017Z" level=info msg="Persistence configuration enabled"
time="2024-11-19T05:55:16.019Z" level=debug msg="Get secrets 200"
time="2024-11-19T05:55:16.019Z" level=warning msg="Non-transient error: <nil>"
```

Signed-off-by: Mason Malone <[email protected]>
@MasonM MasonM changed the title fix: don't log non-errors as "Non-transient error:". Fixes #13881 fix: don't log non-errors as "Non-transient error: <nil>". Fixes #13881 Nov 19, 2024
@MasonM MasonM mentioned this pull request Nov 19, 2024
4 tasks
Signed-off-by: Mason Malone <[email protected]>
@MasonM MasonM marked this pull request as ready for review November 19, 2024 09:31
@@ -150,7 +150,7 @@ func (s *SignalsSuite) TestSignaledContainerSet() {
Workflow("@testdata/signaled-container-set-workflow.yaml").
When().
SubmitWorkflow().
WaitForWorkflow().
WaitForWorkflow(killDuration).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to fix a flaky test:
signals_test.go:153: timeout after 1m30s waiting for condition
https://github.com/argoproj/argo-workflows/actions/runs/11907225382/job/33180716236

@isubasinghe isubasinghe self-requested a review November 21, 2024 11:02
@tczhao
Copy link
Member

tczhao commented Nov 24, 2024

error introduced from #13516

@tczhao tczhao merged commit 6b221f4 into argoproj:main Nov 24, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Non-transient error: <nil>
3 participants