-
-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ssl_ctx test #196
base: master
Are you sure you want to change the base?
Add ssl_ctx test #196
Conversation
One of the tasks here should ideally be to isolate the cost of loading the certificate store in the SSL_CTX, by sharing the store across multiple SSL_CTXs. The mechanism for this is described in the manpage of SSL_CTX_set1_cert_store(3). Basically get a handle on the store of the first fully initialised CTX, and then set as the store for all the others. In other words, compare the cost of creating a fully fleshed SSL_CTX with a private store vs. creating a similar SSL_CTX using a pre-existing store, amortising CAfile loading over multiple SSL_CTX objects. However, there may then be multiple threads accessing that shared store, and that could impact performance of store operations. So a trade-off. Thus we should also look for any impact on handshake performance when multiple SSL_CTX objects are used, but their underlying cert store is shared. |
Is that really what we want here though? I'm honestly askinig, as I'm not sure. While I understand that what you're describing is an optimization to creating multiple SSL_CTX objects, this is a performance test. Do we want to test the optimized path, or do we want to test the worst case path? I could see an argument for both, but if we're working under the assumption that the optimized application commonly only creates a single (or very few) SSL_CTX objects that then get shared among multiple SSL objects, I would make an argument that we want to test and optimize the slow path, that rebuilds the store every time, so that the single creation case gets addressed. Opinions welcome of course. |
The point isn't to test the optimisation, rather it is to determine whether, in fact as surmised, the majority of the cost of SSL_CTX creation is loading the CAfile, and therefore any effort to reduce that cost should be spent on decoders, and the store, or whether there are other costs that also matter. So certainly comparing shared store vs. per-CTX store was/is part of what we want to learn. Whether this then impacts runtime perf is I guess then interesting, but optional, in that we're now exploring whether to recommend this optimisation, or whether it has a non-trivial downside. Good to learn, but perhaps a separate question? I'd like to know the answer, but it is a separate question. |
But we can't (or well, we can, but arguably shouldn't) create fan out for every combination of operation in one test (that would become unruly very quickly I think). If you want to know where you spend most of your time in a given set of call paths, use them all and run a tool like perf or gprof to determiine where your time is most frequently spent. |
Sure, which is why we can probably skip measuring handshake impact, but if we truly want to understand CTX creation cost, it is I think useful to try to isolate the X509 trust store setup, which is naïvely expected to be the bulk of the cost. Otherwise, what are we really trying to measure here? |
My expectation (based on prior comments), was that we were trying to measure the 'cold' creation time of an SSL_CTX. That is to say, the creation and population of an SSL_CTX with data that would make it usable for handling connections. The identification of highly latent paths within that context is the (IMHO) the responsibility of adjunct tools like gprof or perf, should we find a performance degradation between compared builds. |
That's different from the original discussion during the backlog review. I think that isolating the store creation is an obvious high-level knob that is much simpler to turn than gprof, but I'm probably repeating myself. What was the motivation for doing this work? I thought it was reports of regressions in performance of this call, possibly as a result of higher costs in the decoder side, primarily in the CAfile parsing, which splitting it out would expose. Your call... |
There were no explicit issues raised regarding the latency of this call (that I am aware of). Rather the handshake test creates a shared ssl_ctx for all created connections, and the discussion I recall was that: b) It would be nice to have an ssl_ctx standalone test to measure ssl_ctx_new performance (based on institutional knoweldge that ssl_ctx is generally an expensive operaition). Thats what this test does. The comments here suggested that the calling of SSL_CTX_new by itself is in fact not expensive, but the bulk of the latency occurs when adding a keystore/ca/key/etc), and so this test was modified to add those operations. Its my view that any test should really attempt to test the worst case scenario, from which we can use adjunct tools to identify the slow paths. Creating efficiencies in our test cases really just puts us in the position of observing that in the nominal/efficient case, performance is ok (unless someone opens an issue that says the nominal/efficient case is somehow bad, in which case, yes, that certainly drives the argument that we should create a test case for that scenario). Put another way: Right now we're working on the assumption that SSL_CTX_new with a per-ctx cert store/cey/cafile/etc is our slow path, and thats what we need to improve. If there are reports of other scenarios, yes, we should add those test cases as well, but baring that, Adding efficiencies just makes things run faster, and lets us ignore the slow path. Not saying that we shouldn't have extra tests for various situations, but from my understanding of past discussions adn comments, this is what we want to address right now. |
Thanks for the clarification, makes sense. Just to make sure we're not miscommunication, I was not suggesting only testing the code path where the trust store is shared. Rather, I was anticipating that the test would cover the worst-case scenario you describe, and also, separately (suitable command-line flag) would test with a shared store for comparison to validate the intuition that just he store setup accounts for almost all the costs. Perhaps I already made that clear, if so please pardon the repetition. If you feel that having both metrics is not worth the effort, so be it. Thanks for bearing with me so far. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general, have just two minor suggestions.
we have the handshake test, but it would be nice to have a perf test that measures just the amount of time taken to execute a SSL_CTX_new call. Add that test here
Add command line switches to do ssl_ctx setup for a client or server, and add in the assigning of cert/key for server and certstore for clients
@mattcaswell given that we're moving these tools to the perf repository, I'm actually going to close this and open a new pr there, with your issues addressed |
actually scratch that last comment, i'll fix them up here, it will be much easier for me to port it over once its all fixed up properly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all looks good, thanks.
Hmm.. this PR needs to be transfered over to openssl/perftools repo. |
yeah, see my comment here. My plan is to get these merged, then do a sync pr to perftools, while at the same time doing a PR to remove the code from this one |
IMO there is no point to merge this here. The PR is approved so if you apply identical changes to the perftools repo, you can just merge it there. |
Is this still pertinent? I can review if that still makes sense. |
I think this PR needs to opened against https://github.com/openssl/perftools if we want to move it forward. |
we have the handshake test, but it would be nice to have a perf test that measures just the amount of time taken to execute a SSL_CTX_new call. Add that test here