Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow over SSH port forwarding #53

Open
TobiasKarnat opened this issue Mar 29, 2019 · 11 comments
Open

Very slow over SSH port forwarding #53

TobiasKarnat opened this issue Mar 29, 2019 · 11 comments

Comments

@TobiasKarnat
Copy link

TobiasKarnat commented Mar 29, 2019

Throughput is very slow ~ 9 MiB/s with SSH port forwarding. CPU utilization diod+ssh only ~ 35%-
Can I do anything to optimize the speed? With scp I can get ~ 120 MiB/s (AES-NI on both sides).

My idea was that it could be caused by the large default MTU size (65536) of the loopback interface on the server side and I used iptables to rewrite the MSS-value of the initializing SYN-paket:
iptables -A OUTPUT -o lo -p tcp --dport 564 --syn -j TCPMSS --set-mss 1412
It does rewrite the initial MSS, but doesn't help to improve performance.

Also keeping the Nagle algorithm doesn't help, but decreases performance to ~ 2 MiB/s
https://unix.stackexchange.com/questions/434825/ssh-speed-greatly-improved-via-proxycommand-but-why

Ubuntu 18.04.2 with HWE-Kernel 4.18

Client Configuration:
ssh -L 1564:localhost:564 -NT -o StrictHostKeyChecking=accept-new servername &
mount -t 9p -n -o aname=/,access=client,cache=loose,msize=65536,port=1564 127.0.0.1 /mnt

Server Configuration:
/etc/diod.conf
listen = { "127.0.0.1:564" }
auth_required = 0
exports = { "ctl", "/", "/var" }

@mia-0
Copy link
Contributor

mia-0 commented Mar 31, 2019

You might want to try using WireGuard instead, if your setup allows it. It will drop packets that are not authenticated for their address on the VPN. This also works better for more permanent setups, and has higher tolerance for unstable networks.

@mia-0
Copy link
Contributor

mia-0 commented Mar 31, 2019

FWIW, I am getting very slow speeds when caching is enabled, too. And without it, throughput isn’t great either: Roughly 50-60 MiB/s instead of the usual 100, lower with smaller msize. I assume the protocol is very latency-sensitive then?

@TobiasKarnat
Copy link
Author

The network is not unstable or has a high latency (our two data centers are only 150 meters apart).
It's interesting what causes the slowness. CPU is not maxed out, so something else is the cause.

WireGuard or any other VPN is too complex and permanent for my setup.
For my over 100 Linux VMs using Wireguard I would need to give them all an additional address and firewall rule. I just need to mount filesystems remotely from time to time to copy or compare files.

I choose 9P because it is small, doesn't require dependencies like rpcbind and only requires one port.

You might want to try using WireGuard instead, if your setup allows it. It will drop packets that are not authenticated for their address on the VPN. This also works better for more permanent setups, and has higher tolerance for unstable networks.

@mia-0
Copy link
Contributor

mia-0 commented Mar 31, 2019 via email

@garlick
Copy link
Member

garlick commented Mar 31, 2019

Just curious - did the slow test perform OK when running without the loopback tunnel?

If you're sharing root file systems, my guess is 9P is a bad fit. And the observation that only with really low latency can you get away with it is probably spot on!

Not sure if it helps in your particular case, but on our big linux clusters, we shared root file systems with NFS for many years, briefly experimented with 9P (far too slow!), and more recently settled on sharing one read-only ext4 image to many clients with iSCSI, then layering on a RAM-based block device (like zram) on the client at the block layer, to make it appear writeable. Nothing beats networking at the block layer for this type of workload.

@andrewnazarov
Copy link

Out of curiosity, @garlick have you got a write up about your experiments and stuff?

@garlick
Copy link
Member

garlick commented Apr 4, 2019

Ah right, I never restored the google code wiki stuff here. Following artifacts in #20 - this may be of interest

https://github.com/JamesB192/diod/wiki/Performance

@TobiasKarnat
Copy link
Author

Without the tunnel it's faster by a factor of 10.
I find it interesting that ssh can slow down it that much.

There are some Linux patches (not stable yet) for async rpc:
https://marc.info/?l=v9fs-developer&m=154453214509159&w=2
https://marc.info/?l=v9fs-developer&m=154453214709161&w=2
https://marc.info/?l=v9fs-developer&m=154453214609160&w=2

Would these help and would changes to diod be necessary?

Just curious - did the slow test perform OK when running without the loopback tunnel?

@garlick
Copy link
Member

garlick commented Apr 4, 2019

Finally put up those wiki pages here

https://github.com/chaos/diod/wiki

@garlick
Copy link
Member

garlick commented Apr 4, 2019

There are some Linux patches (not stable yet) for async rpc:

I don't know offhand if those patches would be likely to help. I would need to refresh my memory of the kernel code before I could grok what is going on there. The kernel certainly didn't block before waiting for RPC responses, so it must be about not giving up the CPU when there is more work to do?

Since 9P semantics already support overlapped RPCs I don't see how this could affect the diod code.

How are you measuring performance? E.g. what workload are you putting on the file system?

@TobiasKarnat
Copy link
Author

Found the solution: -o cache=loose caused the slowdown (not the ssh session).
Without it I can get 60 MiB/s on average with peaks to 90 MiB/s over ssh.

For benchmark I just mounted the remote share and did dd:
watch dd if=/mnt/tmp/largefile of=/dev/null bs=1M count=20

I loose the filesystem cache by omitting the option.
Is the slowdown with cache a bug or expected behavior?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants