Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new option to server to report crashes/exceptions #377

Closed
beligante opened this issue Jul 7, 2024 · 7 comments
Closed

Add new option to server to report crashes/exceptions #377

beligante opened this issue Jul 7, 2024 · 7 comments

Comments

@beligante
Copy link
Contributor

beligante commented Jul 7, 2024

Is your feature request related to a problem? Please describe.

Hello Team!

Currently at my work, the team face a challenge when comes to the topic of reporting crashes to tools such as Sentry or Bugsnag. Internally we use a fork of this project where we add some implementation details to report crashes from inside Cowboy.Handler

So, I thought on a proposal where we could introduce something generic and flexible enough so that anyone can use for their own purposes.

My Idea here is to introduce an option to the GRPC Server startup where you can pass a callback to be invoked whenever an exception (or what the maintainers consider an exception worthy to be reported), that callback expects an exception to be provided and internally any implementation can be used.

Describe the solution you'd like

The new option would be passed on the server startup. Eg:

# bugsnag (no sanitization)
Supervisor.init(endpoint: MyEnpoint, port: 1234, start_server: true, crash_report: &Bugsnag.report/1)

#sentry
Supervisor.init(endpoint: MyEnpoint, port: 1234, start_server: true, crash_report: &Sentry.capture_exception/1)

# own impl
Supervisor.init(endpoint: MyEnpoint, port: 1234, start_server: true, crash_report: &MyOwnReport.capture/1)

Describe alternatives you've considered

We can alternatively add these report calls on the service level implementations. Such as

defmodule FeatureServer do
  use GRPC.Server, service: Routeguide.RouteGuide.Service

  def get_feature(point, _stream) do
    # something that can generate exceptions
  catch
    kind, e -> Bugsnag.report(e, __STACKTRACE__)  
  end
end

Tho, add the report on that level has it's limitations. For example

  1. We can only capture exceptions related to the function used to resolve this request (which is usually what we're interested), but with this level, we don't have access to the request ( :cowboy.req ) to add another layer of information about that execution
  2. We can't capture exceptions related to issues on the grpc connection level that can at some point terminate the connection (such as client canceling request or closing the connection abrupt abruptly)

Additional context
I made a WIP PR to illustrate better the idea, there's some punishment required, but I think I can add more depth to my request. It's on my fork for fow. Good part of the work there I got inspired by @darrenclark

beligante#3

@beligante
Copy link
Contributor Author

Another interesting option was presented to me today. Logger Metadata -- In specific the crash_reason metadata

The change in this case would be, instead of invoke a callback, we could just produce an error log passing the exception as part of the metadata. Eg:

def info({:EXIT, pid, {:handle_error, error}} = err, req, state = %{pid: pid}) do
    Logger.warning("3. #{inspect(state)} #{inspect(err)}")

    error = %RPCError{status: GRPC.Status.unknown(), message: "Internal Server Error"}
    req = send_error(req, error, state, :error)
    rpc_error = %RPCError{status: GRPC.Status.unknown(), message: "Internal Server Error"}
    req = send_error(req, rpc_error, state, :error)

    exception = HandlerException.new(req, error.kind, error.stack)

    Logger.error("Unexpected error happened", crash_reason: {exception, error.stack})

    {:stop, req, state}
end

Each issue reporter can implement it's own logger backend and consume this crash_reason to report the issues. Sentry already does that

Let me know if one of these captured you guys interest. I can work on them ASAP

@sleipnir
Copy link
Collaborator

sleipnir commented Jul 8, 2024

Hello friend.

I have a few questions:

1 Can we measure the impact on the library's overall performance when adding these calls? I know that if we failed, what would matter least would be this, but even so, I still think it's relevant to reflect on the possible impact that these calls could have.
2 Could we suffer some type of cascading error if some of these external services are down?

@sleipnir
Copy link
Collaborator

sleipnir commented Jul 8, 2024

Another interesting option was presented to me today. Logger Metadata -- In specific the crash_reason metadata

The change in this case would be, instead of invoke a callback, we could just produce an error log passing the exception as part of the metadata. Eg:

def info({:EXIT, pid, {:handle_error, error}} = err, req, state = %{pid: pid}) do
    Logger.warning("3. #{inspect(state)} #{inspect(err)}")

    error = %RPCError{status: GRPC.Status.unknown(), message: "Internal Server Error"}
    req = send_error(req, error, state, :error)
    rpc_error = %RPCError{status: GRPC.Status.unknown(), message: "Internal Server Error"}
    req = send_error(req, rpc_error, state, :error)

    exception = HandlerException.new(req, error.kind, error.stack)

    Logger.error("Unexpected error happened", crash_reason: {exception, error.stack})

    {:stop, req, state}
end

Each issue reporter can implement it's own logger backend and consume this crash_reason to report the issues. Sentry already does that

Let me know if one of these captured you guys interest. I can work on them ASAP

I find this solution cleaner and more interesting.

@beligante
Copy link
Contributor Author

Hello friend.

I have a few questions:

1 Can we measure the impact on the library's overall performance when adding these calls? I know that if we failed, what would matter least would be this, but even so, I still think it's relevant to reflect on the possible impact that these calls could have. 2 Could we suffer some type of cascading error if some of these external services are down?

  1. I don't know exactly how to measure that, but if we follow the crash_reason path, I believe the impact is minimal. It will all vary based on the Logger Backend you apply to capture those exeptions. So, it's pretty much on whoever uses the lib hands to add burden on this.
  • To add more to this, we're mainly working with already raised exceptions and we're just adding the :cowboy.req as extra information, which is a pretty simple struct
  1. It all vary from implementations, but from my experience Sentry and Bugsnag have excellent implementations to handle outages.

@sleipnir
Copy link
Collaborator

Hello friend.
I have a few questions:
1 Can we measure the impact on the library's overall performance when adding these calls? I know that if we failed, what would matter least would be this, but even so, I still think it's relevant to reflect on the possible impact that these calls could have. 2 Could we suffer some type of cascading error if some of these external services are down?

  1. I don't know exactly how to measure that, but if we follow the crash_reason path, I believe the impact is minimal. It will all vary based on the Logger Backend you apply to capture those exeptions. So, it's pretty much on whoever uses the lib hands to add burden on this.
  • To add more to this, we're mainly working with already raised exceptions and we're just adding the :cowboy.req as extra information, which is a pretty simple struct
  1. It all vary from implementations, but from my experience Sentry and Bugsnag have excellent implementations to handle outages.

Once you choose to follow the Logger way, I believe there will be no major impacts. The comments went to your first suggestion

@beligante
Copy link
Contributor Author

@sleipnir I'm okay with closing this in favor of: #381

@sleipnir
Copy link
Collaborator

@sleipnir I'm okay with closing this in favor of: #381

Yes of course

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants