Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing logs fails with operation error S3: UploadPart, https response error StatusCode: 404 ... api error NoSuchUpload: UnknownError #719

Open
iainsproat opened this issue Mar 1, 2024 · 4 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@iainsproat
Copy link

iainsproat commented Mar 1, 2024

There is a step in a pipeline run which fails with an error (the application exits with a non-zero exit code). Unfortunately the log content is not stored.

All other pipelines and steps have their logs stored as expected, so this does not seem to be a general configuration problem.

Expected Behavior

All steps in the pipeline would have their logs stored, even (and especially) if the user application exited with a non-zero exit code. A copy of the pipeline can be found below.

This is consistently failing for all numerous repeats.

Actual Behavior

The Tekton results API logs the following two errors:

{
  "level": "error",
  "ts": 1709314179.3972826,
  "caller": "v1alpha2/logs.go:103",
  "msg": "operation error S3: UploadPart, https response error StatusCode: 404, RequestID: tx00000563224ac1343fcba-0065e21083-4fb58942-ams3c, HostID: 4fb58942-ams3c-ams3-zg03, api error NoSuchUpload: UnknownError",
  "stacktrace": "github.com/tektoncd/results/pkg/api/server/v1alpha2.(*Server).UpdateLog.func1\n\tgithub.com/tektoncd/results/pkg/api/server/v1alpha2/logs.go:103\ngithub.com/tektoncd/results/pkg/api/server/v1alpha2.(*Server).UpdateLog\n\tgithub.com/tektoncd/results/pkg/api/server/v1alpha2/logs.go:156\ngithub.com/tektoncd/results/proto/v1alpha2/results_go_proto._Logs_UpdateLog_Handler\n\tgithub.com/tektoncd/results/proto/v1alpha2/results_go_proto/api_grpc.pb.go:686\ngithub.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/recovery.StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/go-grpc-middleware/[email protected]/interceptors/recovery/interceptors.go:48\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.com/grpc-ecosystem/[email protected]/chain.go:49\ngithub.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/[email protected]/server_metrics.go:121\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.com/grpc-ecosystem/[email protected]/chain.go:49\ngithub.com/grpc-ecosystem/go-grpc-middleware/auth.StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/[email protected]/auth/auth.go:66\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.com/grpc-ecosystem/[email protected]/chain.go:49\ngithub.com/grpc-ecosystem/go-grpc-middleware/logging/zap.StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/[email protected]/logging/zap/server_interceptors.go:53\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.com/grpc-ecosystem/[email protected]/chain.go:49\ngithub.com/grpc-ecosystem/go-grpc-middleware/tags.StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/[email protected]/tags/interceptors.go:39\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.com/grpc-ecosystem/[email protected]/chain.go:49\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1\n\tgithub.com/grpc-ecosystem/[email protected]/chain.go:58\ngoogle.golang.org/grpc.(*Server).processStreamingRPC\n\tgoogle.golang.org/[email protected]/server.go:1673\ngoogle.golang.org/grpc.(*Server).handleStream\n\tgoogle.golang.org/[email protected]/server.go:1787\ngoogle.golang.org/grpc.(*Server).serveStreams.func2.1\n\tgoogle.golang.org/[email protected]/server.go:1016"
}
{
  "level": "error",
  "ts": 1709314179.3973947,
  "caller": "zap/options.go:212",
  "msg": "finished streaming call with code Unknown",
  "grpc.auth_disabled": true,
  "grpc.start_time": "2024-03-01T17:29:38Z",
  "system": "grpc",
  "span.kind": "server",
  "grpc.service": "tekton.results.v1alpha2.Logs",
  "grpc.method": "UpdateLog",
  "peer.address": "10.244.1.76:58650",
  "grpc.user": "system:serviceaccount:tekton-pipelines:tekton-results-watcher",
  "grpc.issuer": "https://kubernetes.default.svc.cluster.local",
  "error": "operation error S3: UploadPart, failed to rewind transport stream for retry, request stream is not seekable",
  "grpc.code": "Unknown",
  "grpc.time_duration_in_ms": 766,
  "stacktrace": "github.com/grpc-ecosystem/go-grpc-middleware/logging/zap.DefaultMessageProducer\n\tgithub.com/grpc-ecosystem/[email protected]/logging/zap/options.go:212\ngithub.com/grpc-ecosystem/go-grpc-middleware/logging/zap.StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/[email protected]/logging/zap/server_interceptors.go:61\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.com/grpc-ecosystem/[email protected]/chain.go:49\ngithub.com/grpc-ecosystem/go-grpc-middleware/tags.StreamServerInterceptor.func1\n\tgithub.com/grpc-ecosystem/[email protected]/tags/interceptors.go:39\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1.1.1\n\tgithub.com/grpc-ecosystem/[email protected]/chain.go:49\ngithub.com/grpc-ecosystem/go-grpc-middleware.ChainStreamServer.func1\n\tgithub.com/grpc-ecosystem/[email protected]/chain.go:58\ngoogle.golang.org/grpc.(*Server).processStreamingRPC\n\tgoogle.golang.org/[email protected]/server.go:1673\ngoogle.golang.org/grpc.(*Server).handleStream\n\tgoogle.golang.org/[email protected]/server.go:1787\ngoogle.golang.org/grpc.(*Server).serveStreams.func2.1\n\tgoogle.golang.org/[email protected]/server.go:1016"
}

I can see that a corresponding log record has been stored:

{
  "name": "e271e90551-68622d75ea-b5dd1651b51a2a99cc0d/results/8b706226-37fa-4e79-add7-5e6f1d4e7534/records/2c4acb8d-3234-3176-93b9-6df94deb472c",
  "id": "7ca4d473-7f0f-41e1-92b3-a85083b9e7ad",
  "uid": "7ca4d473-7f0f-41e1-92b3-a85083b9e7ad",
  "data": {
    "type": "results.tekton.dev/v1alpha2.Log",
    "value": "eyJraW5kIjogIkxvZyIsICJzcGVjIjogeyJ0eXBlIjogIlMzIiwgInJlc291cmNlIjogeyJ1aWQiOiAiMDAxNDNmOTktODU5MC00MjJmLWFkMTQtM2FkOThjOGFhYWE3IiwgImtpbmQiOiAiVGFza1J1biIsICJuYW1lIjogImI1ZGQxNjUxYjUxYTJhOTljYzBkLTcxZmExNDI0MzctM2EwMGQ0YmJjMi0wIiwgIm5hbWVzcGFjZSI6ICJlMjcxZTkwNTUxLTY4NjIyZDc1ZWEtYjVkZDE2NTFiNTFhMmE5OWNjMGQifX0sICJzdGF0dXMiOiB7InBhdGgiOiAiZTI3MWU5MDU1MS02ODYyMmQ3NWVhLWI1ZGQxNjUxYjUxYTJhOTljYzBkLzJjNGFjYjhkLTMyMzQtMzE3Ni05M2I5LTZkZjk0ZGViNDcyYy9iNWRkMTY1MWI1MWEyYTk5Y2MwZC03MWZhMTQyNDM3LTNhMDBkNGJiYzItMC1sb2ciLCAic2l6ZSI6IDIwNzA5Mzc2MH0sICJtZXRhZGF0YSI6IHsidWlkIjogIjJjNGFjYjhkLTMyMzQtMzE3Ni05M2I5LTZkZjk0ZGViNDcyYyIsICJuYW1lIjogImI1ZGQxNjUxYjUxYTJhOTljYzBkLTcxZmExNDI0MzctM2EwMGQ0YmJjMi0wLWxvZyIsICJuYW1lc3BhY2UiOiAiZTI3MWU5MDU1MS02ODYyMmQ3NWVhLWI1ZGQxNjUxYjUxYTJhOTljYzBkIiwgImNyZWF0aW9uVGltZXN0YW1wIjogbnVsbH0sICJhcGlWZXJzaW9uIjogInJlc3VsdHMudGVrdG9uLmRldi92MWFscGhhMiJ9"
  },
  "etag": "7ca4d473-7f0f-41e1-92b3-a85083b9e7ad-1709314181023022210",
  "createdTime": "2024-03-01T17:29:38.541771Z",
  "createTime": "2024-03-01T17:29:38.541771Z",
  "updatedTime": "2024-03-01T17:29:41.023022Z",
  "updateTime": "2024-03-01T17:29:41.023022Z"
}

For convenience, the decoded base64 content of the above value is:

{
  "kind": "Log",
  "spec": {
    "type": "S3",
    "resource": {
      "uid": "00143f99-8590-422f-ad14-3ad98c8aaaa7",
      "kind": "TaskRun",
      "name": "b5dd1651b51a2a99cc0d-71fa142437-3a00d4bbc2-0",
      "namespace": "e271e90551-68622d75ea-b5dd1651b51a2a99cc0d"
    }
  },
  "status": {
    "path": "e271e90551-68622d75ea-b5dd1651b51a2a99cc0d/2c4acb8d-3234-3176-93b9-6df94deb472c/b5dd1651b51a2a99cc0d-71fa142437-3a00d4bbc2-0-log",
    "size": 207093760
  },
  "metadata": {
    "uid": "2c4acb8d-3234-3176-93b9-6df94deb472c",
    "name": "b5dd1651b51a2a99cc0d-71fa142437-3a00d4bbc2-0-log",
    "namespace": "e271e90551-68622d75ea-b5dd1651b51a2a99cc0d",
    "creationTimestamp": null
  },
  "apiVersion": "results.tekton.dev/v1alpha2"
}

When inspecting the content of the s3 blob storage directly I can see that no bucket has been created at the given path.

Likewise the logs cannot be retrieved via the API, as would be expected given the above symptoms.

Steps to Reproduce the Problem

Unfortunately I am not able to yet recreate a minimally failing example of the user workload generating the logs. The pipeline is as follows:

apiVersion: tekton.dev/v1
kind: Pipeline
metadata:
  labels:
    app.kubernetes.io/instance: pipelines
    app.kubernetes.io/name: pipelines
    app.kubernetes.io/part-of: e271e90551
    app.kubernetes.io/version: 68622d75ea
    kubernetes.io/metadata.name: e271e90551-68622d75ea-b5dd1651b51a2a99cc0d
  name: e271e90551-68622d75ea
  namespace: e271e90551-68622d75ea-b5dd1651b51a2a99cc0d
spec:
  tasks:
  - name: 71fa142437-3a00d4bbc2-0
    taskSpec:
      metadata: {}
      spec: null
      steps:
      - args:
        - /userinput/values.json
        command:
        - dotnet
        - function.dll
        computeResources:
          limits:
            cpu: "1"
            memory: 1000Mi
          requests:
            cpu: "1"
            memory: 1000Mi
        image: private_registry/71fa142437:d66859c
        name: 71fa142437-3a00d4bbc2-0
        volumeMounts:
        - mountPath: /userinput
          name: 71fa142437-3a00d4bbc2-0
          readOnly: true
      volumes:
      - name: 71fa142437-3a00d4bbc2-0
        secret:
          secretName: 71fa142437-3a00d4bbc2-0

Additional Info

  • Kubernetes version:

v1.28.2

  • Tekton Pipeline version:

v0.56.1

  • Tekton Results version

v0.9.1

@iainsproat iainsproat added the kind/bug Categorizes issue or PR as related to a bug. label Mar 1, 2024
@iainsproat
Copy link
Author

Excerpt from the above:

"status": {
    "path": "e271e90551-68622d75ea-b5dd1651b51a2a99cc0d/2c4acb8d-3234-3176-93b9-6df94deb472c/b5dd1651b51a2a99cc0d-71fa142437-3a00d4bbc2-0-log",
    "size": 207093760

Does this status.size value indicate a log size of 207Mb? Or am I misinterpreting this?

@iainsproat
Copy link
Author

To answer my own comment, it seems to be bytes so this would be 207Mb: https://github.com/tektoncd/results/blob/32314594d4e5cf6de35e13cb7429ae216a969781/pkg/api/server/v1alpha2/logs.go#L173C21-L173C28

@sayan-biswas
Copy link
Contributor

@iainsproat I believe the size of the log has something to do with this. Are you still having difficulties in storing large logs?

@iainsproat
Copy link
Author

Hi @sayan-biswas - yes, this continues to be an issue for me.

Unfortunately I haven't had time to yet recreate a minimal replicating example so I'm as yet not sure if it's an issue with our cloud provider, my configuration of s3 or logs in tekton results, or something amiss with tekton results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants