proposal: add cause to canceled context when Client sending a RST_STREAM due to an error on their end #7778
2/3 Fail(s): DESCRIPTION, MILESTONE
Status: FAIL
Here are some stats of the run:
3 validations were ran.
1 PASSED
2 FAILED
Details
✔️ Validator: LABEL
- ✔️ label must include '^Type:'
Input : Status: Requires Reporter Clarification,Type: Feature,Area: Server
Settings :{"must_include":{"regex":"^Type:"}}
❌ Validator: DESCRIPTION
- ❌ description does not include "^RELEASE NOTES:\s([Nn][Oo][Nn][Ee]|[Nn]/[Aa]|\n(*|-)\s.+)$"**
Input : Related Issue: #7541
Summary:
This PR introduces a feature to add a cause to the context cancellation, for example when a client sends an RST_STREAM frame. This enhancement aims to provide more transparency and ease debugging by distinguishing between different reasons for context cancellation.
Use Case:
When a client resets a connection, the http2_server
cancels the current context to interrupt all server internal processing. However, it is challenging to determine the exact reason for the context cancellation, whether it was due to an internal error, a timeout, or a client sending an RST frame. This feature will help in identifying the cause of the context cancellation, making it easier to debug and handle errors appropriately.
Proposed Solution:
- Replace
context.CancelFunc
withcontext.CancelCauseFunc
. - Adjust all context cancellations to report the cause.
- For example, when handling an RST_STREAM frame, the context will be canceled with a specific error indicating the receipt of the RST_STREAM frame.
Additional Context:
This change is backward-compatible and does not affect existing application code that checks for ctx.Err()
. The new context.Cause(ctx)
method will provide additional information when available.
Example:
The PR adds an example which shows how to implement a server with asynchronous processing using goroutines.
The example illustrates some client behaviors (RST frame, close connection, timeout) which are propagated to the worker goroutine through the context.
Here are the logs of the example server (with comments):
starting work on message: "world"
starting work on message: "hello"
'context canceled' with cause 'NO_ERROR', message worker for message "world" stopping. 👈🏻 NO_ERROR indicates the context was caused by HTTP status 0
server: error receiving from stream: rpc error: code = Canceled desc = context canceled
'context canceled' with cause 'NO_ERROR', message worker for message "hello" stopping. 👈🏻 see above
starting work on message: "world"
starting work on message: "hello"
'context canceled' with cause 'context canceled', message worker for message "hello" stopping. 👈🏻 this case could be made more transparent when setting a cancel cause for a closed connection as well.
'context canceled' with cause 'context canceled', message worker for message "world" stopping. 👈🏻 see above
server: error receiving from stream: rpc error: code = Canceled desc = context canceled
starting work on message: "world"
starting work on message: "hello"
'context deadline exceeded' with cause 'context deadline exceeded', message worker for message "world" stopping. 👈🏻 this case could be made more transparent by setting a custom cause for the context deadline in `ctx, timoutCancel = context.WithTimeout(ctx, timeout)`
server: error receiving from stream: rpc error: code = DeadlineExceeded desc = context deadline exceeded
'context deadline exceeded' with cause 'context deadline exceeded', message worker for message "hello" stopping. 👈🏻 see above
Open Issues
-
This PR is still a draft and almost no unit tests have been added to production code. This can be done when the reviewers agree that the feature of this PR will be considered to be added to the library.
-
Some uses cases (client closes connection, timeout) do not receive a proper cause in this PR, as it was explicitly requested to set the cause only for the RST frame case yet.
-
So far one error type (
Http2CodeError
) has been added to illustrate how server implementations could do error matching. The design for error types could be discussed in more detail. I'd see more error cases in the future that could be propagated through the context (e.g.ConnectionClosed
by client,GrpcTimeout
to distinguish from other timeouts).Settings : ```{"must_include":{"regex":"^RELEASE NOTES:\\s*([Nn][Oo][Nn][Ee]|[Nn]/[Aa]|\\n(\\*|-)\\s*.+)$","regex_flag":"m"}}```
❌ Validator: MILESTONE
- ❌ milestone does not include "Release$"
Input :
Settings :{"must_include":{"regex":"Release$"}}