Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(proxy-wasm) invoke 'on_http_call_response' on dispatch failures #625

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

thibaultcha
Copy link
Member

@thibaultcha thibaultcha commented Nov 15, 2024

Invoke on_http_call_response on dispatch connection failures. This allows catching failures such as:

  • timeout
  • broken connection
  • resolver failures
  • TLS handshake failures
  • Any other failures after the connection has attempted to be established

Dispatch failures occurring before a connection has attempted to be established will not trigger on_http_call_response as they are already supposed to trigger a Wasm exception (i.e. trap).

When a dispatch connection failure occurs, on_http_call_response is invoked with: on_http_call_response(call_id, 0, 0, 0) (i.e. no header, no body, no trailers). Calls to retrieve the response :status will return None. A user may triage the connection failure by querying the :dispatch_status pseudo-header:

fn on_http_call_response(
    &mut self,
    token_id: u32,
    nheaders: usize,
    body_size: usize,
    ntrailers: usize,
) {
    let dispatch_status = self.get_http_call_response_header(":dispatch_status");

    match dispatch_status.as_deref() {
        Some("timeout") => {},
        Some("broken connection") => {},
        Some("tls handshake failure") => {},
        Some("resolver failure") => {},
        Some("reader failure") => {},
        Some(s) => info!("dispatch_status: {}", s),
        None => {}
    }

    self.resume_http_request()
}

The socket status "bad argument" also exists but since the connection has not been attempted yet, it won't show up during on_http_call_response and is for internal use only (i.e. could be added to the socket error log for example).

Fix #622.


TODO

  • [ ] contribute proxy_get_status() for HTTP dispatch calls to proxy-wasm-rust-sdk
  • a directive to not log socket errors
  • document connection error handling & Envoy disparities in our docs

Copy link

codecov bot commented Nov 15, 2024

Codecov Report

Attention: Patch coverage is 88.59649% with 13 lines in your changes missing coverage. Please review.

Project coverage is 90.83481%. Comparing base (ccc56a2) to head (a3d0e43).

Files with missing lines Patch % Lines
src/common/ngx_wasm_socket_tcp.c 69.56522% 7 Missing ⚠️
src/common/ngx_wasm_socket_tcp_readers.c 83.33333% 4 Missing ⚠️
src/common/proxy_wasm/ngx_proxy_wasm_maps.c 93.75000% 1 Missing ⚠️
src/http/proxy_wasm/ngx_http_proxy_wasm_dispatch.c 96.29630% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@                 Coverage Diff                 @@
##                main        #625         +/-   ##
===================================================
+ Coverage   90.69893%   90.83481%   +0.13588%     
===================================================
  Files             52          52                 
  Lines          11117       11260        +143     
===================================================
+ Hits           10083       10228        +145     
+ Misses          1034        1032          -2     
Files with missing lines Coverage Δ
src/common/proxy_wasm/ngx_proxy_wasm.c 92.67760% <100.00000%> (+0.10929%) ⬆️
src/http/ngx_http_wasm_module.c 93.58974% <100.00000%> (+1.03853%) ⬆️
src/http/proxy_wasm/ngx_http_proxy_wasm.c 93.87755% <100.00000%> (+0.12754%) ⬆️
src/wasm/ngx_wasm_core_module.c 94.48819% <100.00000%> (+0.04375%) ⬆️
src/common/proxy_wasm/ngx_proxy_wasm_maps.c 93.76623% <93.75000%> (+0.21783%) ⬆️
src/http/proxy_wasm/ngx_http_proxy_wasm_dispatch.c 91.42857% <96.29630%> (-0.25848%) ⬇️
src/common/ngx_wasm_socket_tcp_readers.c 87.25100% <83.33333%> (-0.41420%) ⬇️
src/common/ngx_wasm_socket_tcp.c 89.77591% <69.56522%> (+0.29504%) ⬆️

... and 6 files with indirect coverage changes

Flag Coverage Δ
unit 90.59621% <87.37864%> (+0.01310%) ⬆️
valgrind 82.43045% <84.31373%> (+1.30292%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

---- 🚨 Try these New Features:

@thibaultcha thibaultcha force-pushed the feat/catch-dispatch-failures branch 24 times, most recently from 182bb79 to eb13e45 Compare November 20, 2024 02:05
@hishamhm
Copy link
Collaborator

    let dispatch_status = self.get_http_call_response_header(":dispatch_status");

I assume that this is in an interim solution until we can get something like get_http_call_status (which would call the proxy_get_status hostcall in HttpContext) added to the proxy-wasm-rust-sdk, right? I can work with that.

@thibaultcha
Copy link
Member Author

I assume that this is in an interim solution until we can get something like get_http_call_status (which would call the proxy_get_status hostcall in HttpContext) added to the proxy-wasm-rust-sdk, right? I can work with that.

Yes, and I looked into this and it was unclear how to contribute this at first sight, so I logged it for later. It looks to me like get_http_call_status could cause a spec breaking change or at least it could be a host breaking change for existing implementations. At the moment, proxy_get_status (the hostcall) is only used in ctx.get_grpc_status and it worries me that it only receives return pointers, as the host must do some trickery to know where it is being invoked from ("are we executing http callback or grpc callback?"). The problem is that both gRPC and HTTP callbacks can belong to the HTTP filter context trait (HttpContext), so a host must restrict usage to within the callback and be able to distinguish which type of callback is currently running. The ideal would even be to pass it the HTTP/gRPC token_id argument so it could be queried from anywhere and not just the callback. That said for now it looks to me like Envoy (or proxy-wasm-cpp-host) is NYI on this hostcall.

@thibaultcha thibaultcha force-pushed the feat/catch-dispatch-failures branch 3 times, most recently from 8ed89d9 to 7880974 Compare November 22, 2024 03:06
}
}
--- response_body_like
:dispatch_status: timeout
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hishamhm This version will support get_http_headers and return the :dispatch_status pseudo-header in the table

Invoke `on_http_call_response` on dispatch connection failures. This
allows catching failures such as:

- timeout
- broken connection
- resolver failures
- TLS handshake failures
- Any other failures *after the connection has attempted to be
  established*

Dispatch failures occurring *before* a connection has attempted to be
established will not trigger `on_http_call_response` as they are already
supposed to trigger a Wasm exception (i.e. trap).

When a dispatch connection failure occurs, `on_http_call_response` is
invoked with: `on_http_call_response(call_id, 0, 0, 0)` (i.e. no header,
no body, no trailers). Calls to retrieve the response `:status` will
return `None`. A user may triage the connection failure by querying the
`:dispatch_status` pseudo-header:

```rust
fn on_http_call_response(
    &mut self,
    token_id: u32,
    nheaders: usize,
    body_size: usize,
    ntrailers: usize,
) {
    let dispatch_status = self.get_http_call_response_header(":dispatch_status");

    match dispatch_status.as_deref() {
        Some("timeout") => {},
        Some("broken connection") => {},
        Some("tls handshake failure") => {},
        Some("resolver failure") => {},
        Some("reader failure") => {},
        Some(s) => info!("dispatch_status: {}", s),
        None => {}
    }

    self.resume_http_request()
}
```

The socket status `"bad argument"` also exists but since the connection
has not been attempted yet, it won't show up during
`on_http_call_response` and is for internal use only (i.e. could be
added to the socket error log for example).

Fix #622.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to handle dispatch_http_call/other timeout
2 participants