Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support proxying client requests through direct and tunneling HTTP/HTTPS proxies #1080

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

MisterDA
Copy link
Contributor

@MisterDA MisterDA commented Jul 23, 2024

This PR implements a new Cohttp_lwt_unix.Connection_proxy module satisfying Cohttp_lwt.S.Connection_cache, allowing client HTTP/HTTPS requests to be routed through proxies.

I recommend a quick glance at Everything curl - 9.8 Proxies for a nice introduction to proxies. As curl is well established, I try following the same terminology and support similar features.

  • HTTP proxy

    An HTTP proxy is a proxy that the client speaks HTTP with to get the transfer done.

    If the remote server is reachable through HTTP, then the client should send HTTP requests where targets are in absolute form (see RFC 9112).

    When making a request to a proxy, other than a CONNECT or server-wide OPTIONS request (as detailed below), a client MUST send the target URI in "absolute-form" as the request-target.

    GET http://www.example.org/pub/WWW/TheProject.html HTTP/1.1

    The first step is to thread an ~absolute_form parameter when building requests. This way, the Cohttp_lwt_unix.Connection_cache module can simply be extended with an optional ?proxy parameter, taking the URI of the proxy. If a proxy is specified, all requests will be send in the absolute form, and instead of connecting to the remote server, the client will transparently connect to the proxy. Conduit handles the connection to an HTTPS proxy automatically.

  • HTTPS with HTTP proxy

    If the remote server uses HTTPS, then a tunneling proxy needs to be used. The CONNECT HTTP requests asks the proxy to open a connection to the remote server. If it succeeds, the proxy then blindly forwards data between the client and the remote server. Conduit needs to be modified to handle an HTTPS connection (more specifically, a TLS handshake and encryption layer) on top of an already opened connection. If the proxy itself is reachable through HTTPS, conduit needs to be able to tunnel a TLS connection on top of another TLS connection.

    To handle tunneled client connections, we provide the (private) Cohttp_lwt.Connection_tunnel, which takes the proxy URI when created. It then maintains a cache of opened connections via the proxy to each remote server, and is able to distribute new requests amongst currently opened connections (the code is mostly copied from Cohttp_lwt.Connection_cache balancing).

    We expose the Cohttp_lwt.Connection_proxy via Cohttp_lwt_unix.Connection_proxy which, based on the remote URI, selects either a direct (absolute-form queries) or tunneled connection.

  • MITM proxy

    [MITM proxies] require users to install a custom "trust root" (Certificate Authority (CA) certificate) in the client, and then the proxy terminates all TLS traffic from the client, impersonates the remote server and acts like a proxy.

    This can be achieved by installing a new certificate system-wide, or passing it to the ca-certs library via external means (environment variable, parameter at startup, ...). See Facilitate the use of extra CA certificates ca-certs#30.

  • Proxy environment variables

    We show how to support ALL_PROXY, NO_PROXY, and <scheme>_proxy (such as http_proxy, https_proxy), as documented by the curl project.

This PR relies on tunneling the TLS-on-TLS feature being added to ocaml-tls and Conduit. We've added some opam pin-depends to allow it to build with the latest dependencies.

@art-w art-w force-pushed the connection-cache-proxy branch 5 times, most recently from 0add310 to ba4a191 Compare August 30, 2024 08:28
@rgrinberg
Copy link
Member

I don't think you should modify the Http.Request.t type. The point of this type is to stay minimal and not reflect any application specific concerns.

@MisterDA
Copy link
Contributor Author

MisterDA commented Sep 3, 2024

I don't think you should modify the Http.Request.t type. The point of this type is to stay minimal and not reflect any application specific concerns.

We're only adding a boolean, absolute_form, which I'd argue is a property of the request. If we don't add it, write_header in Cohttp.Request.Make has no way of knowing whether the request needs to be formatted in absolute form.

@rgrinberg
Copy link
Member

rgrinberg commented Sep 3, 2024

Whether it's just a boolean or not is besides the point. You're changing the semantics of comparison, pretty printing, etc for a large group of users who aren't gaining anything from this feature.

Some alternatives you could try:

  1. Construct your resource field in absolute form
  2. Pass the absolute form as a parameter to the header encoder
  3. Introduce your own request type that includes this flag

EDIT: after glancing at the spec, seems like 1. is the way to go. the resource field should already handle requests in "absolute form"

@MisterDA
Copy link
Contributor Author

MisterDA commented Sep 3, 2024

EDIT: after glancing at the spec, seems like 1. is the way to go. the resource field should already handle requests in "absolute form"

I did not follow this option, because the resource field would then need to duplicate the scheme and the host, which are already stored in the request record. I can try this and see what happens.

@rgrinberg
Copy link
Member

Good point, doesn't that mean we can get rid of scheme from the request as well? @anuragsoni why did we add scheme?

@anuragsoni
Copy link
Contributor

why did we add scheme?

From what I remember we kept it around mostly to maintain compatibility with the existing release of cohttp.

@rgrinberg
Copy link
Member

@MisterDA do you want to take the honors of getting rid of it?

let* ans = Cohttp_lwt_unix.Client.get ?headers uri in
follow_redirect ~max_redirects ?headers uri ans

and follow_redirect ~max_redirects ?headers request_uri (response, body) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't need to be done for this PR but that would be nice to have a similar function in the client API (so that every user do not have to re-implement an half-broken version of it)

@MisterDA
Copy link
Contributor Author

MisterDA commented Sep 3, 2024

@MisterDA do you want to take the honors of getting rid of it?

It doesn't seem possible to remove the scheme field of the request, as Cohttp.Request.uri needs to be able to reconstruct the full URI from a request, as exhibited by the test:

let uri_round_trip _ =
let expected_uri = Uri.of_string "https://www.example.com/test" in
let actual_uri = Request.make expected_uri |> Request.uri in
Alcotest.check uri_testable "Request.make uri round-trip" actual_uri
expected_uri

we might want to remove this feature altogether. This mostly impacts logging and testing, but also the Cohttp_lwt.Client.callv function (that is deprecated).

@art-w art-w force-pushed the connection-cache-proxy branch from ba4a191 to 02a78e3 Compare September 4, 2024 08:57
@rgrinberg
Copy link
Member

Indeed it's not worth it. I got rid of it here #1086

@MisterDA MisterDA force-pushed the connection-cache-proxy branch from 02a78e3 to bc8f3a4 Compare September 10, 2024 15:43
@MisterDA MisterDA marked this pull request as draft September 10, 2024 15:44
http/test/test_parser.ml Outdated Show resolved Hide resolved
@MisterDA MisterDA force-pushed the connection-cache-proxy branch from bc8f3a4 to 473456b Compare September 19, 2024 16:25
@MisterDA MisterDA force-pushed the connection-cache-proxy branch 3 times, most recently from 5fbbee0 to 40af14b Compare October 7, 2024 17:42
@ajbt200128
Copy link

Hey wondering if there was any work left here besides the merge conflicts?

@samoht
Copy link
Member

samoht commented Dec 5, 2024

LGTM - happy to merge this once rebased

@art-w art-w force-pushed the connection-cache-proxy branch from 40af14b to e272348 Compare December 5, 2024 11:34
@rgrinberg
Copy link
Member

I was waiting to do another round of review after this PR went past the "draft" stage

@samoht
Copy link
Member

samoht commented Dec 5, 2024

@rgrinberg - ha ok! feel free to dismiss my comment and make a proper round of review then :-)

I'm still interested in moving the helper functions to manage redirects into the lib, but I am not sure where exactly.

@art-w art-w force-pushed the connection-cache-proxy branch from c35d0c5 to c877c41 Compare December 5, 2024 12:23
@ajbt200128
Copy link

@samoht I think @rgrinberg is waiting for this PR to no longer a draft. Is there anything I can help do to get this out of the draft stage?

@MisterDA MisterDA marked this pull request as ready for review January 7, 2025 22:42
MisterDA and others added 9 commits January 9, 2025 09:32
> When making a request to a proxy, other than a CONNECT or
> server-wide OPTIONS request (as detailed below), a client MUST send
> the target URI in "absolute-form" as the request-target.

https://www.rfc-editor.org/rfc/rfc9112#name-absolute-form

See https://www.rfc-editor.org/rfc/rfc3986#appendix-A for the ABNF of
absolute-URI.

Signed-off-by: Antonin Décimo <[email protected]>
To ensure end-to-end security it's possible to use a tunneling
proxy. The proxy in the middle blindly forwards data.

This is needed if the remote server is available via HTTPS. First, a
CONNECT request is made to the proxy with the remote server as
target. If it succeeds, a new connection can be made to the remote
server, tunneled via the connection made to the proxy.

See also RFC 9110 § 9.3.6. CONNECT.
https://www.rfc-editor.org/rfc/rfc9110#name-connect

We consider that it's a sane default to always tunnel connections to
HTTPS remote server. We provide the Connection_proxy module that
automatically opens connections to a direct proxy or a tunneling
proxy, based on the remote sheme used.

We show how to respect curl's [scheme]_proxy, ALL_PROXY, and NO_PROXY
environment variables.
https://curl.se/libcurl/c/libcurl-env.html

Signed-off-by: Antonin Décimo <[email protected]>
@art-w art-w force-pushed the connection-cache-proxy branch from 811f790 to f0ff485 Compare January 9, 2025 08:37
@cmellazchy
Copy link

Appreciate all the effort put into this. These changes are really valuable to us as well. So, I just wanted to check in on the status. Thanks again!

@samoht
Copy link
Member

samoht commented Jan 23, 2025

@MisterDA @art-w - do you think you could fix the CI errors? I'm happy to merge and release it now that @rgrinberg has approved it.

@MisterDA
Copy link
Contributor Author

The CI errors will be fixed when core_unix 0.18 is released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants