Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client binding issues #1778

Closed
alexrabi opened this issue Nov 20, 2024 · 19 comments
Closed

Client binding issues #1778

alexrabi opened this issue Nov 20, 2024 · 19 comments

Comments

@alexrabi
Copy link
Collaborator

Sometimes, it is useful to set the initial path used by the client to establish a connection, e.g. to bypass the default routing or to specify the port number used by the client. The function picoquic_set_local_addr exists in the API, but it appears to be a legacy function that is never used anywhere nor does it seem to be covered in any of the tests.

When using MP-QUIC, the use of picoquic_set_local_addr seems inconsistent. On one of my machines, it can be used by the application to successfully select the initial path and the port number used by the client, but it seems to have adverse effects elsewhere.

If the client binds to a specific port number (i.e. a non-zero port number), some funky things seem to be possible with picoquic_probe_new_path_ex. When probing for new paths, it seems to be possible to create a new path using the IP address as the initial path, but with a port number of zero. The result will be two paths using identical IP/Port 4-tuples, since I guess the new path inherits the source port number of the initial path, even if the checks say otherwise. If the initial path is bound to port 0 instead (i.e. using a random port), this is not the case, as I assume that the checks in picoquic_set_local_addr will recognize that the paths use an identical 4-tuple. Specifically setting the port numbers when doing both picoquic_set_local_addr and picoquic_probe_new_path_ex also works in that regard, but that also appears to make any alternative path (i.e. using a different IP-address) fail the path challenge validation.

In any case, path 0 seems to be unable to transmit any data when binding to an address on that machine (using picoquic_set_local_addr), but the "duplicate" path using the same 4-tuple as path 0 can if it exists. I am not sure whether these issues are related or not.

@huitema
Copy link
Collaborator

huitema commented Nov 21, 2024

It is entirely possible to build two paths with the same 4-tuple -- that's explicitly called out in the multipath spec. As long as the path-id in the CID are different, these will be two different paths.

There may indeed be some ambiguity in the code. On the client, the assumption at one point was that the client set the port number to 0, then the stack fills it after getting the address of the socket. But on multipath, the choice of the socket depends on the source port and IP address. There may well be a few loose ends.

The real assumption is that when using multipath the client has several IP addresses. In that case we can use a single socket, and everything works. Using multiple ports and a single IP is pretty much only a test-only scenario...

@alexrabi
Copy link
Collaborator Author

Even if having two paths with the same 4-tuple is allowed in the spec, the way that path probing in picoquic works now seems like the opposite of that, no? For example, setting the source IP and port of a new path specifically to the same as path 0 seems to fail.

What does work is setting the port number of the new path to 0 (given that the initial path was bound to some specific port), which will give new path the same port number as path 0. I would argue that this is confusing since the convention is that a port number of 0 typically means using randomized port number.

@huitema
Copy link
Collaborator

huitema commented Nov 22, 2024

First action is to remove "picoquic_set_local_addr ". This is a bad API, because it assumes that the local address is the address of path 0. That behavior does not make sense with multipath.

What has to work is:

int picoquic_probe_new_path_ex(picoquic_cnx_t* cnx, const struct sockaddr* addr_peer,
    const struct sockaddr* addr_local, int if_index, uint64_t current_time, int to_preferred_address);

The stack supports that, but there is a twist. The stack has an API with the lower layer implementing the socket code. By default, that's the code in sockloop.c. Normally, that code opens just one socket per address family, so all outgoing connection appear to come from the same port. There is a debugging option to open 2 sockets with 2 separate port numbers, in which case the outgoing socket is chosen based on the outgoing port. Can you test that?

@alexrabi
Copy link
Collaborator Author

Are you saying that the functionality to specifically set the address of path 0 should go altogether? While I do think that there are issues with picoquic_set_local_addr indeed, I would still argue that being able to specify the address to used by path 0 before the connection is established can be very useful, especially for latency-sensitive applications. My proposal would be to do something like this:

picoquic_cnx_t* picoquic_create_cnx_ex(picoquic_quic_t* quic,
    picoquic_connection_id_t initial_cnx_id, picoquic_connection_id_t remote_cnx_id,
    const struct sockaddr* addr_to, const struct sockaddr* addr_from,
    uint64_t start_time, uint32_t preferred_version,
    char const* sni, char const* alpn, char client_mode);
picoquic_cnx_t* picoquic_create_client_cnx_ex(picoquic_quic_t* quic,
    struct sockaddr* addr_to, struct sockaddr* addr_from,
    uint64_t start_time, uint32_t preferred_version,
    char const* sni, char const* alpn,
    picoquic_stream_data_cb_fn callback_fn, void* callback_ctx);

I.e., allowing the address to be used by the initial path to be specified when creating the connection context (using the addr_from parameter). Using a NULL value would then make picoquic resort to the current behavior, i.e., relying on the routing table to decide the initial path. Does that make sense?

This debug option that you are mentioning is for picoquicdemo I take it? I have been using my own application (https://git.cs.kau.se/alexrabi/monty), but I can do some debugging using picoquicdemo if that is more useful to you.

@huitema
Copy link
Collaborator

huitema commented Nov 22, 2024

I don't think that making the picoquic_create_cnx more complex is the best solution. I think the pattern out to be:

  • create connection with essential parameters, all other parameters get default value
  • series of get/set calls to set the specific parameters that the application needs
  • start the connection

I will refrain from removing the set local address API. Maybe make it fail if the connection is already started?

@alexrabi
Copy link
Collaborator Author

If you don't want to make picoquic_create_cnx any more complex than it already is, then picoquic_set_local_addr should definitely stay (in some form). Calling it after the connection has already been established makes little sense, as you have already pointed out, so making it fail seems reasonable. At the same time picoquic_set_local_addr (or something like it) is needed if the application wants select which address to use when making the initial connection establishment. The name picoquic_set_local_addr may be a bit misrepresentative in the context of MP-QUIC granted, but that is a minor issue.

The pattern that you are describing is already what I am using for my application. The issue of the inconsistencies in terms of the meaning of port numbers between the picoquic_set_local_addr and picoquic_probe_new_path_ex still remains.

@huitema
Copy link
Collaborator

huitema commented Nov 23, 2024

The picoquic_set_local_addr function will stay. It fails if the address is already set, which is OK.

If you are writing your own application, are you using the default socket code (sockloop.c) or writing your own?

@alexrabi
Copy link
Collaborator Author

I am using the default socket code for my application.

@huitema
Copy link
Collaborator

huitema commented Nov 23, 2024

OK. In that case, you should mimic what picoquicdemo is doing. In the call to...

ret = picoquic_packet_loop_v2(qclient, &param, client_loop_cb, &loop_cb);

... look for the argument "param". It programs what the loop shall do. The key argument here is extra_socket_required:

        param.extra_socket_required = 0;
        if (force_migration == 1 || force_migration == 3 || config->multipath_alt_config != NULL) {
            param.local_port = (uint16_t)picoquic_uniform_random(30000) + 20000;
            param.extra_socket_required = 1;
        }

It instructs the "packet loop" code to create an extra socket. The default socket is bound to the specified local_port, while the extra socket will be bound to port 0, i.e., to a port number randomly chosen by the OS.
Once that port is chosen, the packet loop code will do a packet loop callback:

            ret = loop_callback(quic, picoquic_packet_loop_alt_port, loop_callback_ctx, &alt_port);

If the source address of a packet is set to the alt_port , the packet loop code will send that packet to the alternate socket. For any other value, it will be sent through the default socket, and to the peer it will appear to come from the local_port.

And yes, I understand that this code is a bit too convoluted. Suggestions for refactoring are welcome!

@alexrabi
Copy link
Collaborator Author

Hmm, it looks like combining the local_port and extra_socket_required arguments doesn't work as expected then, at least on my end. If I limit the set arguments to just the local_port, the socket loop uses the socket with the specified port, as it should. If I combine the two arguments then two sockets bound to different ports are indeed created, but it seems like the socket bound to port 0 (i.e. random port) is used as default, and NOT the socket bound to the local_port . That seems to be the opposite of what you are describing should happen?

@huitema
Copy link
Collaborator

huitema commented Nov 25, 2024

Extra socket means two sockets are created, one with the "local port" value (or a random value), the other with the "alt port" value. By default, the first packet goes out with an unspecified source port, which means picking one of the two sockets at random. If you set the local port value, you probably also want to call the set local port API so the first packet goes out with that value.

@alexrabi
Copy link
Collaborator Author

Yes, I can see that two sockets are created, one with the specified port number (e.g. 12345) and another one with a random port number. However, it does not seem like the socket bound to port 12345 is ever used whether I set the port number using the param.local_port argument, the picoquic_set_local_addr API call, or using both of them at the same time. I feel like I must be missing something?

@huitema
Copy link
Collaborator

huitema commented Nov 26, 2024

As I mentioned before, the sockloop code could certainly be improved. The current code kind of guesses the application intent's based on the addresses that it sees used. The guesswork works fine when the app uses only one UDP socket, or when the app behaves exactly as picoquicdemo. The takeway form the discussion so far is that this is not good enough. So, let's focus on what "really good" would be. For example:

  • some applications will not set the local port. The port number in outgoing packets will be zero. In that case, what socket shall the sockloop pick?
  • if an application sets a port number, and if there is a socket with the matching port number, the packet ought to go out on that socket. That's the only way to ensure that the port number in the outgoing packet matches the application's intent. But what shall the sockloop do if there is no such socket?

That last question has at least four possible answers: drop the packet, always pick the first socket, pick a socket at random, or maybe create a new socket on the fly. Each of those behaviors has its pros and cons. If the code drops the packet, shall it provide a warning to the app, and how? Same for picking a socket with a different port, should that look like a NAT rebinding? If the sockloop code creates sockets on the fly, how does is close them?

@huitema
Copy link
Collaborator

huitema commented Nov 26, 2024

Another issue: should the port numbers in the application be in host order or in network order? Is that clear to the application?

@alexrabi
Copy link
Collaborator Author

alexrabi commented Dec 2, 2024

I believe that most applications will not set the local port as they will likely have no need for it. Wanting to set the port numbers on the client should probably be considered a "special" case. Creating multiple sockets with different port numbers is also likely a very special case. I am only really considering these options because I am currently doing quite a lot of unorthodox stuff that is likely far from the average use-case.

For the first point, generally speaking, if the port number is not selected by the user, or specifically set to zero, the user probably "does not care" which socket is used. Perhaps the current behavior would be fine in that case.

For the second point and third points, I guess that depends on "how complete" you aim picoquic to be as a solution. Currently it has a mix between high-level APIs (i.e. socket loop) and low-level APIs. From my understanding the original intention was that picoquic would be more of a lower-level thing, but has drifted more towards higher-level APIs over time. My guess is that most users would tend to gravitate to the high-level APIs for the sake of simplicity, but ultimately the deciding factor on what to focus on should be the overall vision for what picoquic should be.

If the goal is to provide more high-level APIs, then I would say that it would definitely be very nice if the socket loop could create and close sockets on the fly. Doing so would probably require that the socket loop code is extended to a provide a more complete set of API functions (e.g. a separate function to specifically tell the socket loop which socket or port number to use for the initial connection setup, and perhaps another API to potentially create new sockets whenever the user probes for new paths). Tying the paths to the sockets could then be used for cleanup (e.g. if all paths tied to a specific socket has been closed, then the socket may safely be closed). Of course, implementing all of this would likely require a lot of work.

As for the network byte order issue, I would say that this is not at all clear. For a "low-level" API, it would make sense if the user needs to translate the port number into the correct byte order. For a higher level API, then I would expect picoquic would do this for me. Both options are sensible, but with the current mix of high and low level APIs it is really unclear.

@huitema
Copy link
Collaborator

huitema commented Dec 2, 2024

Yes, the initial intent was to focus on the protocol mechanisms, with the low level API connection to network code that could be basic socket code, some advanced stuff, or a network simulator. The "socket loop" was merely for demonstration and example. But then more and more user wanted to just keep it and use it, hence the drift. I guess we have to live with that and beat it in shape. And find a way to properly test it -- test coverage is just 76.9% lines, 64.5% branches. But it is quite hard to write automated tests that work on Windows, Linux and Apple.

Let's file a bug specifically on the mix of host and network byte order.

Then, let's update the socket loop code to enforce determinism, "if no preference pick the first socket, if port is specified pick the matching socket, if no matching socket yell."

Socket on demand will have to wait, unless someone volunteers a PR.

@alexrabi
Copy link
Collaborator Author

alexrabi commented Dec 3, 2024

Sounds like good ideas to me! Should I close this issue and create new ones for each of the three items (i.e. byte order, socket loop, and sockets on demand) separately?

@huitema
Copy link
Collaborator

huitema commented Dec 5, 2024

Yes, please.

@alexrabi
Copy link
Collaborator Author

alexrabi commented Dec 6, 2024

Alright, these issues have been created #1802 #1803 #1804, and I'm closing this.

@alexrabi alexrabi closed this as completed Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants