Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4.0 webserver extremely slow on linux #7983

Closed
robaho opened this issue Nov 9, 2023 · 10 comments · Fixed by #7989
Closed

4.0 webserver extremely slow on linux #7983

robaho opened this issue Nov 9, 2023 · 10 comments · Fixed by #7989
Assignees
Milestone

Comments

@robaho
Copy link

robaho commented Nov 9, 2023

Running Helidon 4.0 on Linux shows really slow performance. The exact code runs on OSX as expected.

The startup shows:

Nov 09, 2023 4:44:04 PM io.helidon.webserver.ServerListener start
INFO: [0x34e21d23] http://127.0.0.1:8888 bound for socket '@default'
Nov 09, 2023 4:44:04 PM io.helidon.common.features.HelidonFeatures features
INFO: Helidon SE 4.0.0 features: [Config, Encoding, Media, WebServer]
Nov 09, 2023 4:44:04 PM io.helidon.webserver.LoomServer startIt
INFO: Started all channels in 60 milliseconds. 289 milliseconds since JVM startup. Java 21+35-LTS-2513

The Linux environment is:

5.15.0-88-generic #98~20.04.1-Ubuntu SMP Mon Oct 9 16:43:45 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

You can open the Helidon test project here

There are instructions on using h2load in the readme.

The "baseline" for the test is available here.

Running against the enobufs baseline:

rengels:~$ h2load -n100000 -c4 -m10 -t4 --h1 http://localhost:8888/devnull?size=1000000
starting benchmark...
spawning thread #0: 1 total client(s). 25000 total requests
spawning thread #1: 1 total client(s). 25000 total requests
spawning thread #2: 1 total client(s). 25000 total requests
spawning thread #3: 1 total client(s). 25000 total requests
Application protocol: http/1.1
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
progress: 100% done

finished in 15.68s, 6375.89 req/s, 5.94GB/s
requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 93.14GB (100006800000) total, 3.91MB (4100000) headers (space savings 0.00%), 93.13GB (100000000000) data
                     min         max         mean         sd        +/- sd
time for request:     3.26ms     34.84ms      6.24ms      1.22ms    81.98%
time for connect:      125us       201us       149us        34us    75.00%
time to 1st byte:    15.66ms     17.08ms     16.19ms       627us    75.00%
req/s           :    1594.00     1609.09     1601.37        7.05    50.00%
rengels:~$ 

I can't even run the same the test against Helidon since it takes too long. Lowering the number of requests from 100k to 1k (it seems to be an O(N^2) increase, I get:

rengels:~$ h2load -n1000 -c4 -m10 -t4 --h1 http://localhost:8888/devnull?size=1000000
starting benchmark...
spawning thread #0: 1 total client(s). 250 total requests
spawning thread #1: 1 total client(s). 250 total requests
spawning thread #2: 1 total client(s). 250 total requests
spawning thread #3: 1 total client(s). 250 total requests
Application protocol: http/1.1
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
progress: 100% done

finished in 165.26s, 6.05 req/s, 5.77MB/s
requests: 1000 total, 1000 started, 1000 done, 1000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 1000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 953.81MB (1000146000) total, 108.40KB (111000) headers (space savings 0.00%), 953.67MB (1000000000) data
                     min         max         mean         sd        +/- sd
time for request:     2.47ms       6.73s       6.48s    866.25ms    96.20%
time for connect:      175us       211us       193us        15us    50.00%
time to 1st byte:     1.53ms      1.79ms      1.68ms       120us    75.00%
req/s           :       1.51        1.52        1.51        0.00    75.00%

The code that prompted this investigation is here which uses a similar design as Helidon and runs the same on Linux and OSX and achieves the same performance as the baseline.

Like I started with, the Helidon code runs as expected when running the test on OSX.

@robaho
Copy link
Author

robaho commented Nov 9, 2023

For reference, here is the Helidon running on OSX:

iMac:~ robertengels$ h2load -n100000 -c4 -m10 -t4 --h1 http://localhost:8888/devnull?size=1000000
starting benchmark...
spawning thread #0: 1 total client(s). 25000 total requests
spawning thread #1: 1 total client(s). 25000 total requests
spawning thread #2: 1 total client(s). 25000 total requests
spawning thread #3: 1 total client(s). 25000 total requests
Application protocol: http/1.1
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
progress: 100% done

finished in 67.90s, 1472.79 req/s, 1.37GB/s
requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 93.15GB (100014600000) total, 10.59MB (11100000) headers (space savings 0.00%), 93.13GB (100000000000) data
                     min         max         mean         sd        +/- sd
time for request:    11.62ms    223.87ms     27.14ms     10.14ms    97.27%
time for connect:      160us       189us       177us        12us    75.00%
time to 1st byte:    73.40ms     73.44ms     73.42ms        21us    75.00%
req/s           :     368.20      368.56      368.44        0.16    75.00%

and the baseline running on the same OSX machine:

iMac:~ robertengels$ h2load -n100000 -c4 -m10 -t4 --h1 http://localhost:8888/devnull?size=1000000
starting benchmark...
spawning thread #0: 1 total client(s). 25000 total requests
spawning thread #1: 1 total client(s). 25000 total requests
spawning thread #2: 1 total client(s). 25000 total requests
spawning thread #3: 1 total client(s). 25000 total requests
Application protocol: http/1.1
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
progress: 100% done

finished in 19.07s, 5244.72 req/s, 4.88GB/s
requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 93.14GB (100006800000) total, 3.91MB (4100000) headers (space savings 0.00%), 93.13GB (100000000000) data
                     min         max         mean         sd        +/- sd
time for request:     3.50ms     92.39ms      7.33ms      3.54ms    96.90%
time for connect:      133us       155us       148us        10us    75.00%
time to 1st byte:     3.85ms      4.22ms      4.02ms       161us    50.00%
req/s           :    1311.19     1524.31     1369.00      103.65    75.00%

Note that the OSX version requires the setSendBufferSize(256k) on OSX in order to not exhaust the network buffers.

@danielkec
Copy link
Contributor

Did you try to run the Linux version with higher sendBufferSize as you did on OSX?

@danielkec
Copy link
Contributor

I don't see any issue with .connectionOptions(b -> b.socketSendBufferSize(1024 * 1024)):

➜  helidon_test git:(main) ✗ uname -a
Linux defiant 6.5.0-10-generic #10-Ubuntu SMP PREEMPT_DYNAMIC Fri Oct 13 13:49:38 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
➜  helidon_test git:(main) ✗ h2load -n100000 -c50 -m10 --h1 http://localhost:8888/devnull\?size\=1000000
starting benchmark...
spawning thread #0: 50 total client(s). 100000 total requests
Application protocol: http/1.1
progress: 10% done
...
progress: 100% done

finished in 17.83s, 5609.77 req/s, 5.23GB/s
requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 93.15GB (100014700000) total, 10.68MB (11200000) headers (space savings 0.00%), 93.13GB (100000000000) data
                     min         max         mean         sd        +/- sd
time for request:     3.70ms      14.92s     62.52ms    575.16ms    99.70%
time for connect:       43us       568us       275us       138us    60.00%
time to 1st byte:    61.47ms      14.89s       6.00s       5.58s    40.00%
req/s           :     112.20      271.49      182.39       68.82    42.00%

@danielkec danielkec self-assigned this Nov 10, 2023
@danielkec
Copy link
Contributor

Can you share the benchmark with robaho httpserver? What is the SO_SNDBUF setting used there?

@robaho
Copy link
Author

robaho commented Nov 10, 2023

Sure. You can runSimpleFileServer gradle task in github.com/robaho/httpserver.

In the baseline, you can comment out the setSendBufferSize() - that is only there for OSX which has a bug. On Linux if you let the kernel buffer size float/default there is no issue

it sounds like Helidon sets a default kernel buffer size of 32k (from loom dev) - I was not aware of this.

Usually you don’t want to set this and you let the os float this for maximum performance. (But OS X has a bug with this)

I will add a similar config option to the robaho httpserver and try to sync up.

If Helidon forces a buffer size of 32k as a default that could easily explain it.

@danielkec
Copy link
Contributor

I see we are turning off the TCP auto-tuning by setting the JDK default explicitly

@robaho
Copy link
Author

robaho commented Nov 10, 2023

The auto tuning is very valuable when you have a mixture of requests types (typical data loads vs file uploads/downloads). Otherwise you either get slow performance or you need lots of wasted memory for every connection.

@robaho
Copy link
Author

robaho commented Nov 10, 2023

It is also interesting that doing this on OSX doesn’t degrade the performance so drastically.

danielkec added a commit to danielkec/helidon that referenced this issue Nov 10, 2023
@danielkec danielkec added this to the 4.0.1 milestone Nov 10, 2023
@robaho
Copy link
Author

robaho commented Nov 11, 2023

Nice quick work. Glad to see it was trivial. Thanks for responding.

@danielkec
Copy link
Contributor

Thx for reporting the issue!

@m0mus m0mus added this to Backlog Aug 12, 2024
@m0mus m0mus moved this to Closed in Backlog Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants