Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advise on how to debug Lambda metric errors that don't show up in Logs #364

Closed
andreas-venturini opened this issue Jan 29, 2024 · 13 comments

Comments

@andreas-venturini
Copy link

andreas-venturini commented Jan 29, 2024

This is following a discussion over at awslabs/aws-lambda-rust-runtime#786 where we were initially advised to enable response streaming for our Lambda function url to work around the bug from that issue.

After changing the function invoke mode to response streaming (and setting AWS_LWA_INVOKE_MODE to response_stream) our Lambda function continued to work normally and there were no errors reported in either CloudWatch or X-Ray.
However, Lambda metric reports suddenly started showing an error count (on the chart one can clearly see when buffered mode was changed to response streaming and back).

image

Nothing else but the invoke mode was changed, also these errors are not related to the problematic source file(s) that triggered the bug reported in the linked issue.

We were advised to open an issue about this here.

Any pointers on how we might gain visibility into these errors would be appreciated. We searched our CloudWatch logs using multiple regex patterns, e.g. filter @message LIKE /ERROR/ etc. to no avail.

@DarthSim
Copy link

Some info from my side:

I'm digging the same issue. I've done some load testing using a single source file. I've configured JSON logs for my function and set the most verbose log levels for both application and system logs. I didn't find any errors in the log yet the function's monitoring reports errors.

The only error-like log records I see are readiness check failures during the function's initialization. Yet these records aren't treated like errors my monitoring, and, in fact, this is a normal behavior.

@andreas-venturini andreas-venturini changed the title Advise on how to debug Lambda metric errors that don't show up in CloudWatch Advise on how to debug Lambda metric errors that don't show up in Logs Jan 30, 2024
@DarthSim
Copy link

I made a couple more tests.

  1. I removed the Lambda adapter from the Docker image and added a test native support for Lambda to the software. The errors didn't disappear.

  2. I built a Docker image with a sample program that just anwers OK to every request. Errors didn't disappear. The whole test program code is:

package main

import (
	"net/http"
	"time"

	"github.com/aws/aws-lambda-go/lambdaurl"
)

func main() {
	lambdaurl.Start(http.HandlerFunc(func(rw http.ResponseWriter, req *http.Request) {
		time.Sleep(100 * time.Millisecond)
		rw.Header().Set("Content-Type", "text/plain")
		rw.WriteHeader(200)
		rw.Write([]byte("OK"))
	}))
}

Hense, the Lambda adapter nor our software are not causing that errors.

@bnusunny
Copy link
Contributor

Thanks for this information. I will do some tests to verify.

In the meantime, from your test results, it seems like a Lambda service issue. Could you please open a ticket with AWS support?

@bnusunny
Copy link
Contributor

bnusunny commented Feb 1, 2024

@andreas-venturini @DarthSim It almost recovered. I got 1 or 2 errors out of thousands of invokes. Could you please check if you see the same?

image

@DarthSim
Copy link

DarthSim commented Feb 1, 2024

Unfortunately, nothing changed in my case. I noticed that the bigger the response the larger the error rate. A function with the code I posted above indeed causes only a couple of errors for thousands of requests. Yet the software that responds with images of a few kilobytes causes tons of errors.

@bnusunny
Copy link
Contributor

bnusunny commented Feb 2, 2024

Indeed. I see the same. I'm following up with Lambda team.

@andreas-venturini
Copy link
Author

@bnusunny has there been any feedback from the Lambda team so far? Thanks

@bnusunny
Copy link
Contributor

Lambda team has identified the cause. This should be fixed soon. I will update here when the fixes are rolled out.

@henriwoodcock
Copy link

@bnusunny anymore information on this? I'm experiencing a similar issue

@bnusunny
Copy link
Contributor

bnusunny commented Jan 8, 2025

@henriwoodcock There was some issue with the rollout.

But this is actually an issue with Lambda Function URL, not with this project. Could you please open a ticket with AWS support? That is the right channel to get this issue fixed.

I will close this one.

@bnusunny bnusunny closed this as completed Jan 8, 2025
@andreas-venturini
Copy link
Author

@bnusunny thanks for the update but isn't there already an internal ticket for the Lambda team? Or did they decide not to fix it after the rollout issue?
I’m not sure I understand why we should open a new ticket with AWS Support or how that would change the status quo if the Lambda team is already aware.

@bnusunny
Copy link
Contributor

bnusunny commented Jan 8, 2025

As I mentioned before, this is actually a Lambda Function URL issue, not a problem with this repo. Support ticket is the right process to solve it. And customer voice will help Lambda team to priorize the work.

@henriwoodcock
Copy link

I've opened a support ticket for our issue. If the answer is relevant to this issue I'll make sure to update here too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants