Performance improvements #453

henrinormak · 2023-02-16T13:43:07Z

I think I found a few places that can be tuned to help increase performance in setups that have many parallel test cases, and to put these into context, I used one of the test sets I have access to where testcontainers is being rolled out.

The test set is fairly large, consisting of about 1300 jest test cases (in about 180 test suites), where almost each test case touches a container - container reuse is enabled, however they still go through GenericContainer and the reuse logic there (i.e no memory level resource sharing between test cases). All of the test-cases are marked concurrent and through all of my test runs, I used 4 jest workers (so 4 test suites were running in parallel).

I ran each scenario 5 times, probably not statistically significant, but as the test set is pretty big, it takes too long to run them more than that. I did discard any test run that failed (there are some test cases that are flaky and fail sometimes), but all in all, <5 runs were discarded.

Here's the baseline I captured with version 9.1.3:

Run 1 - Done in 351.90s.
Run 2 - Done in 337.98s.
Run 3 - Done in 309.03s.
Run 4 - Done in 339.93s.
Run 5 - Done in 316.15s.
= Average 330.99s

After moving the auth config lookup to only occur when an image actually needs pulling, adding a lock around image pulling and avoiding some logic if logging is disabled (mainly determining the system statistics) [1]:

Run 1 - Done in 307.70s.
Run 2 - Done in 340.78s.
Run 3 - Done in 290.94s.
Run 4 - Done in 288.53s.
Run 5 - Done in 278.26s.
= Average 301.24s

After optimising the way the image existence check occurs [2]:

Run 1 - Done in 284.07s
Run 2 - Done in 272.30s.
Run 3 - Done in 270.35s.
Run 4 - Done in 298.30s.
Run 5 - Done in 278.50s.
=  Average 280.70s

Reasoning

[1] - The auth check does not need to occur if we decide to not pull the image, this also fixes a bug, as previously runInContainer did not do the auth config checkup, which I assume it should have. The lock I added to avoid concurrent tests starting to pull an image that they all share, this way we only pull the image once and then determine that it already exists for the other checks. Determining the statistics, although spun off as a promise that is not awaited, I still think it does not make sense if logging is not enabled (which is probably in majority of cases).

[2] - Previously every single check fetched the entire list of images from Docker, which on certain machines can be quite a large set. This new approach caches the results and again also applies a lock to avoid parallel checks for the same image from doing parallel work. Images that do not exist always update the full list of images we have.

Followup

I suspect there is some more room for improvement, one thing I notice is the default waiting strategy, which uses Promise.all for the internal port check commands. It could probably be improved by using a modified approach with Promise.any, allowing for resolving fast in case one of the commands is faster then the others and gives a positive answer. I left this out as Promise.any was introduced in Node 15, and if I understand correctly, this project also supports Node 14.

It is also possible that the execContainer utility could be improved to avoid polling, at least dockerode's issue on this seems to suggest that latest versions of docker have a shim that solves the original issue why stream.on('end') was not properly working on Windows (not sure whether that is the reason why this library uses the polling approach or not).

Avoid some operations if logging is turned off

cristianrgreco · 2023-02-16T14:33:07Z

Thanks @henrinormak, these are great findings and and an excellent write-up! I'll review this shortly. I will also look into if the polling can be dropped from the exec, I think it would make my day if it could be 😄

henrinormak · 2023-02-17T09:00:26Z

Locally on macOS and Docker version 20.10.22, build 3a2c30b, I am indeed able to refactor the polling to instead wait on stream end. I don't have a windows machine at hand to test it out on Win.

Not sure what kind of an impact (if any) it has on the performance

cristianrgreco · 2023-02-17T09:16:16Z

@henrinormak if you have that change ready I'd appreciate the PR, and I can see how to get it tested on Windows, no worries if not, I'll look into it when I can

henrinormak · 2023-02-17T09:26:02Z

@cristianrgreco pushed the change to this branch

cristianrgreco

It's... beautiful 😄

cristianrgreco · 2023-02-18T11:58:58Z

Just tested this on a Windows machine running Docker 20.10.22 and it works

cristianrgreco · 2023-02-20T11:33:19Z

src/docker/functions/image/image-exists.ts


 export const imageExists = async (dockerode: Dockerode, imageName: DockerImageName): Promise<boolean> => {
  log.debug(`Checking if image exists: ${imageName}`);
-  return (await listImages(dockerode)).some((image) => image.equals(imageName));
+
+  return imageCheckLock.acquire(imageName.toString(), async () => {


Is it necessary to provide a timeout here, in case something goes wrong and the lock can't be acquired? Likewise does the lock need to be released in some finally block?

The lock implementation handles the release in case the user code throws, it also allows configuring a timeout, but for now I left it out, as presumably the underlying operations already have timeouts?

henrinormak added 9 commits February 15, 2023 16:03

refactor: move determining auth config inside pullImage

67c07af

feat: add lock around image pulling

7351062

feat: add enabled flag to logger

02cf578

Avoid some operations if logging is turned off

feat: cache image existance check results

3c2c518

fix: disable exec logging if logs are not enabled

e9a2ff5

fix: typo

4389dd8

revert: re-enable tty in port-check

f01cb0d

refactor: move log enabled check in port-check

dce37c1

test: make sure logging is enabled in port-check tests

bea27a8

refactor: drop polling from execContainer

68e8185

cristianrgreco reviewed Feb 17, 2023

View reviewed changes

cristianrgreco added enhancement New feature or request patch Backward compatible bug fix minor Backward compatible functionality and removed patch Backward compatible bug fix labels Feb 17, 2023

cristianrgreco reviewed Feb 20, 2023

View reviewed changes

cristianrgreco merged commit 1533445 into testcontainers:main Feb 28, 2023

henrinormak deleted the performance-improvements branch March 7, 2023 02:24

cristianrgreco changed the title ~~refactor: performance improvements~~ Performance improvements Mar 8, 2023

henrinormak mentioned this pull request Mar 13, 2023

New release #495

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements #453

Performance improvements #453

henrinormak commented Feb 16, 2023 •

edited

Loading

cristianrgreco commented Feb 16, 2023

henrinormak commented Feb 17, 2023

cristianrgreco commented Feb 17, 2023

henrinormak commented Feb 17, 2023

cristianrgreco left a comment

cristianrgreco commented Feb 18, 2023

cristianrgreco Feb 20, 2023

henrinormak Feb 28, 2023

Performance improvements #453

Performance improvements #453

Conversation

henrinormak commented Feb 16, 2023 • edited Loading

Reasoning

Followup

cristianrgreco commented Feb 16, 2023

henrinormak commented Feb 17, 2023

cristianrgreco commented Feb 17, 2023

henrinormak commented Feb 17, 2023

cristianrgreco left a comment

Choose a reason for hiding this comment

cristianrgreco commented Feb 18, 2023

cristianrgreco Feb 20, 2023

Choose a reason for hiding this comment

henrinormak Feb 28, 2023

Choose a reason for hiding this comment

henrinormak commented Feb 16, 2023 •

edited

Loading