Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build with PGO #5588

Open
bt90 opened this issue Jun 22, 2023 · 12 comments
Open

Build with PGO #5588

bt90 opened this issue Jun 22, 2023 · 12 comments
Labels
help wanted 🆘 Extra attention is needed optimization 📉 Performance or cost improvements

Comments

@bt90
Copy link
Contributor

bt90 commented Jun 22, 2023

go 1.21 will ship with PGO support enabled by default. Maybe we can squeeze a little bit performance out of this.

https://go.dev/doc/pgo

@francislavoie francislavoie added help wanted 🆘 Extra attention is needed optimization 📉 Performance or cost improvements labels Jun 22, 2023
@francislavoie
Copy link
Member

I'm not sure PGO will be a good fit for Caddy. A general purpose webserver that's user-configured can be used in infinite ways, so there's no one profile that would be the best fit.

I think it's unlikely that Matt or I will spend time on this, but contributions are welcome.

@mohammed90
Copy link
Member

I looked into that and thought about it. The optimization depends on the profiles of the production load, analyzing the executed paths, and optimizing the machine code based on known historic workload. Wouldn't this be different for every user? For instance, my own deployment doesn't use any of the FastCGI features, so any optimization based on profiles of my production deployment will not optimize FastCGI aspects. Different users utilize different parts of Caddy, so their preferred optimizations will be different.

What do you think?

@bt90
Copy link
Contributor Author

bt90 commented Jun 22, 2023

It's unlikely that we can cover all usecases but that's also not the point of PGO. The detection and optimization of shared hot codepaths would be good enough.

If I understood it correctly, we can also merge profiles. So we could generate 2-3 based on frequently used workloads:

  • basic webserver
  • reverse proxy
  • etc

@bt90
Copy link
Contributor Author

bt90 commented Jun 22, 2023

Is PGO limited to our code or is this also applied to the dependencies we are using? The benefit would be a lot greater if this would also apply to the code of e.g quic-go

Edit: it's pointed out in the FAQ:

PGO in Go applies to the entire program. All packages are rebuilt to consider potential profile-guided optimizations, including standard library packages [...] , including packages in dependencies

@mholt
Copy link
Member

mholt commented Jun 22, 2023

The little bit I've read about PGO (as of this morning 😅) is that it shouldn't slow down a program, but can offer nominal performance improvements in hot paths with a slightly larger binary size and slightly longer compile times.

I agree with @bt90, maybe we generate profiles that utilize primarily:

  • reverse_proxy
  • file_server
  • host, path, and header matchers (maybe expr and regex too, since those are known to be slower but are somewhat common)
  • Caddyfile-generated JSON config

Of course, because we don't have telemetry (:cry:) we have no idea what the popular configurations are, so we can only guess. (Thank you, unnecessary community backlash of 2018, for leaving us in the dark.)

I'd definitely be open to trying this after releasing 2.7.

@Forza-tng
Copy link
Contributor

Perhaps we can have an option to turn on profiling with xcaddy, then th user can run their workloads for a bit and then run xcaddy again with the profile a input? At least, this is how i do it with gcc pgo builds.

@mholt
Copy link
Member

mholt commented Sep 30, 2023

Profiles can be obtained from any Caddy instance, for years now -- just go to :2019/debug/pprof to see the profile options.

I actually collected a profile this week from our Caddy website and deployed a pgo-optimized instance of Caddy and noticed only barely any speedup... quite insignificant (maybe 2-4% depending on the run of the load test).

Maybe that's significant enough to warrant it, and maybe our profile didn't have enough data (I ran it for an hour but it's not a very busy site compared to big enterprise services).

@Forza-tng
Copy link
Contributor

Forza-tng commented Oct 1, 2023

I had a go at a simple test by benchmarking using h2load against a caddy fileserver. I run each test 3 times and restarted caddy and ran 3 times again. The variation was very small, which makes the results more confident.

h2load -n1000 -c10 -m10 "https://mirrors.tnonline.net/"

Result without PGO:

finished in 1.12s, 888.94 req/s, 16.92MB/s
requests: 1000 total, 1000 started, 1000 done, 1000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 1000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 19.03MB (19953887) total, 8.08KB (8277) headers (space savings 95.57%), 18.99MB (19909000) data
                     min         max         mean         sd        +/- sd
time for request:     1.65ms    576.89ms     98.96ms    113.61ms    84.80%
time for connect:     5.99ms     21.65ms     14.14ms      5.27ms    60.00%
time to 1st byte:    33.93ms    156.91ms     58.03ms     36.55ms    90.00%
req/s           :      88.96      123.50       97.79       10.42    90.00%

Result with PGO:

finished in 919.88ms, 1087.10 req/s, 20.69MB/s
requests: 1000 total, 1000 started, 1000 done, 1000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 1000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 19.03MB (19953864) total, 8.06KB (8254) headers (space savings 95.59%), 18.99MB (19909000) data
                     min         max         mean         sd        +/- sd
time for request:     1.83ms    550.37ms     77.79ms    103.68ms    84.20%
time for connect:     4.64ms     23.81ms     14.22ms      6.52ms    60.00%
time to 1st byte:    32.42ms    182.82ms     91.29ms     60.17ms    70.00%
req/s           :     108.84      137.31      116.76        9.73    80.00%

This is a 22% increase in handled requests per second. Not bad IMHO.

Profile was collected with go tool pprof "http://127.0.0.1:2019/debug/pprof/profile?seconds=600" while I was browsing the server from my phone, including other domains with mediawiki and nextcloud using php-fpm. Basically normal usage pattern for this server.

The build script I use is:

#!/bin/sh
export XCADDY_SETCAP=1
export GOARCH="amd64"
export GOAMD64="v3"
export CGO_ENABLED=1
export GOFLAGS="-pgo=/usr/src/caddy/default.pgo"
/root/go/bin/xcaddy build --with github.com/caddyserver/caddy/v2=/usr/src/caddy/git/caddy  --with github.com/ueffel/caddy-brotli --with github.com/caddyserver/transform-encoder --with github.com/caddyserver/cache-handler --with github.com/kirsch33/realip --with github.com/git001/caddyv2-upload
strip -s -v caddy
setcap cap_net_bind_service=+ep ./caddy

Graph (PGO)
profile001 pgo

Graph (no PGO)
profile001

@Forza-tng
Copy link
Contributor

I'm not sure PGO will be a good fit for Caddy. A general purpose webserver that's user-configured can be used in infinite ways, so there's no one profile that would be the best fit.

I think it's unlikely that Matt or I will spend time on this, but contributions are welcome.

I agree. PGO can be highly dependant on use-case and the host hardware configuration.

It may be better to include PGO support as an option with xcaddy? xcaddy --profile=/path/to.pprof. In addition, we can document how to gather several short samples over time for the running caddy instance, how to merge them and feed the result to xcaddy.

Quoting from the Go PGO page below:

A more robust strategy is collecting multiple profiles at different times from different instances to limit the impact of differences between individual instance profiles. Multiple profiles may then be merged into a single profile for use with PGO.

Many organizations run “continuous profiling” services that perform this kind of fleet-wide sampling profiling automatically, which could then be used as a source of profiles for PGO.

go tool pprof -proto sample1.pprof sample2.pprof > merged.pprof

https://go.dev/doc/pgo

@mholt
Copy link
Member

mholt commented Oct 2, 2023

I know @WeidiDeng has merged some Caddy profiles successfully for pgo.

Maybe I should ask the community to submit their profiles and we'll try merging them and see if that helps. Seeing your results above is encouraging so maybe we just need a variety.

@Forza-tng
Copy link
Contributor

I'm thinking that the xcaddy option to build caddy with profile input is a good first step. What do you think of opening a issue at https://github.com/caddyserver/xcaddy ?

@mholt
Copy link
Member

mholt commented Dec 19, 2023

@Forza-tng That sounds like a plan. See caddyserver/xcaddy#163

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted 🆘 Extra attention is needed optimization 📉 Performance or cost improvements
Projects
None yet
Development

No branches or pull requests

5 participants