-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(reliability): integrate the ryuk container for better container cleanup #314
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. Yesterday, I was looking into the Python implementation and noticed that Ryuk is not yet supported. Although I am not very familiar with the Python implementation, I had a quick look at it and it looks good so far. I will take a closer look at it next week.
I have one question in advance. Do you know if we can add an extra HTTP header to the Docker client requests? I've looked into the docs, but it looks like there is no way to add additional HTTP headers to the client. Is there any other way to do it?
@HofmeisterAn Docker and Ryuk communicates directly via Unix Domain Sockets, so they are not using the HTTP protocol at all, only the lower level TCP protocol. This means you cannot send HTTP headers as you are used to in other network-based applications. So no, it is not possible (to my knowledge) to send extra headers in any form to the Docker server. |
@HofmeisterAn Aha, I see we are thinking of different things then. Did a quick search through the Docker source code, and it seems that the import docker
client = docker.from_env() # Or some other way to instantiate a client
client.api.headers["x-tc-sid"] = "your-session-id" which will then include the header in all requests. If you want to pass this header for just a single request you are gonna have a harder time, as the |
Fortunately, we require that header in every request. This header enables Testcontainers Cloud to operate in turbo mode. It would be great to have it available in Python as well. Would you mind adding the header immediately? Alternatively, I can add it as a follow-up PR. |
@HofmeisterAn Added header in last commit 👍 |
@HofmeisterAn Fixed a circular import, could you run the tests again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I am not very familiar with TC for Python, I think most parts look very good. I have added some minor suggestions. Probably, we do not need to address them immediately. They are just things I noticed comparing it to their language implementations. Thanks 🙏 for the contribution. OC, a final review is necessary by the maintainer.
if labels is None: | ||
labels = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you like you can add the following labels by default too:
org.testcontainers.lang=python
org.testcontainers.version=0.1.0 // The corresponding version of Testcontainers for Python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the org.testcontainers.lang=python
label in a commit, but reading the actual version is a bit harder, because of how this repo is structured. The core
package is bundled together with the other packages published from this repo, and is still on a static v0.0.1rc1. Finding out which package is actually using the core
package during runtime is quite brittle and prone to errors, so I'll leave that as a later task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, for the other languages we just set the core version too. At least for Java and .NET, there are no individual versions.
@@ -26,6 +28,7 @@ def __init__(self, image: str, docker_client_kw: Optional[dict] = None, **kwargs | |||
self.env = {} | |||
self.ports = {} | |||
self.volumes = {} | |||
self.ryuk = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Ryuk should be enabled by default and consider the TESTCONTAINERS_RYUK_DISABLED
env variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that is okay with you, I would love that! I think however we should do a major version bump for that, as it is quite a breaking change for all package users.
I fully support it though, brings the behaviour closer to the rest of the implementations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good point. Usually, it should not affact existing environments. Although aligning with the other implementations would be great, I am fine with either of both. @tillahoffmann any preferences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tillahoffmann Quite a bit of discussion about enabling Ryuk by default in this PR. I think we should make it on by default, but keep the current common usage pattern by using a context if we want automatic cleanup control. See this comment for a more detailed answer: #314 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now changed the implementation to enable Ryuk by default, with the possibility to turn it off for specfic use cases with .with_auto_remove(False)
. As per the discussion in other comments, some testcontainers
implementations doesn't allow Ryuk to be disabled at all, should I still add the env variable to override the programatic approach (or let the programatic approach override the env variable)?
Personally I vote no because of the ambiguity in what should take presendence, then at the very least we need to be very clear about the order of config loading, and apply it for all other env variables as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just looked through the testcontainers-node
implementation, and there the API does not allow disabling Ryuk programmatically at all, only through the TESTCONTAINERS_RYUK_DISABLED
env variable. I am slightly leaning towards removing .with_auto_remove
and instead rely solely on the env variable in order to keep implementations in sync.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just looked through the
testcontainers-node
implementation, and there the API does not allow disabling Ryuk programmatically at all, only through theTESTCONTAINERS_RYUK_DISABLED
env variable. I am slightly leaning towards removing.with_auto_remove
and instead rely solely on the env variable in order to keep implementations in sync.Thoughts?
👍 This is how most TC language implementations do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it's taken so long. I've made a first pass. Would it be possible to add documentation regarding Ryuk? I'm not personally familiar with it, and it might also be useful for other users of the library.
Thank you very much for the thorough reviews, @tillahoffmann and @HofmeisterAn! Just keep it coming, I'll address them this coming Sunday or Monday |
The import from __future__ import annotations effectively adds support for native union syntax (|) because type hints are treated as strings at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall you got my 👍 even if the implementation detail looks a bit java-like to me.
- I gave you some thoughts on why and how
- Take it if you want, but it's not a blocker, it can also be refactored later without more breaking changes induced
The last remaining thing "is this really a breaking change". IMO I don't think so but I'm happy to go with your suggestion.
@alexanderankin can you chime in if you think it's a breaking change? The implementation is LGTM and the only change is that it won't allow my_container.__del__()
- but if you don't resist, I'll just release this as 5.0 and we'll keep going with new releases (release early, release often)
import contextlib | ||
from platform import system | ||
from typing import Optional | ||
from __future__ import annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we sure this is what we want to do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either we need to put almost everything in return types in quotes and switch to old syntax for unions, or we can have this import which postpones the evaluation of type hints by basically making them evaluate as strings. Both approaches are fully supported by linters such as Ruff, Pylance and Mypy. When we drop support for 3.9 we can more easily use modern syntax without breaking backwards compatibility (Even though we have to wait until 3.11 before we can use Self
without any modifications).
from __future__ import annotations
is not really that big of a deal, it even can speed up evaluation of a module because of the postponed evaluation (Not really a concern in our case though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the record, the Optional[T]
is back in place, I've made ruff
respect the runtime types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced the __future__
import with quoted type hints now as well. Easy to revert back if we want to, but as you mentioned on Slack it should probably be done project-wide if we want to go for that approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im not sure about some of the changes in here i would like to minimize them and im not sure i am so comfortable hard-switching the behavior right away. i think the python ecosystem generally benefits from more backwards compatibility not less
|
||
|
||
class Reaper: | ||
_instance: Optional[Reaper] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
presumably if this is here we can actually reuse ryuk on multiple instances but then the actual code for telling ryuk about it is not in a regular instance public method - so not sure but i think this part could be more coherent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really sure if I understand what you are pointing to here - The Reaper
class is intended to be used as a singleton, shared across all spawned containers by the same Python process. Once a Reaper
instance is requested (through the get_instance
method), it stores the reference to the instance in this class-level variable, so that on the next get_instance
call, it imply returns the stored reference. As part of the initial setup when get_instance
is first called, it sets up the required socket connection to the container which is used to communicate what Ryuk should terminate when the socket connection ends (either through Python process exit or the delete_instance
method)
| ----------------------------------------- | ----------------------------- | ---------------------------------------- | | ||
| `TESTCONTAINERS_DOCKER_SOCKET_OVERRIDE` | `/var/run/docker.sock` | Path to Docker's socket used by ryuk | | ||
| `TESTCONTAINERS_RYUK_PRIVILEGED` | `false` | Run ryuk as a privileged container | | ||
| `TESTCONTAINERS_RYUK_DISABLED` | `false` | Disable ryuk | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we change this default to true to try to preserve backwards compatibility - like in principle we should make it really easy for people to move to the new thing and only really bother everyone else much later on.
like in jvm langs you can do echo ryuk.disabled=true >> src/test/resources/testcontainers.properties
- im not sure of the python equivalent off the bat - maybe we have to stick to some sort of env var stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about setting it to true
by default and release 4.1 (or 4.2, whichever this one lands in) and then changing it to false
and release 5.0 straight away? That way we don't force people into using Ryuk on their 4.x builds, but I really think this should be enabled by default, as it is the expected behavior as seen in other implementations.
By releasing it in both versions almost simultaneously we give them time to upgrade, and they have the possibility to test out the functionality without major bumping anything.
🤖 I have created a release *beep* *boop* --- ## [4.1.0](testcontainers-v4.0.1...testcontainers-v4.1.0) (2024-03-19) ### Features * **reliability:** integrate the ryuk container for better container cleanup ([#314](#314)) ([d019874](d019874)) ### Bug Fixes * changelog after release-please ([#469](#469)) ([dcb4f68](dcb4f68)) * **configuration:** strip whitespaces when reading .testcontainers.properties ([#474](#474)) ([ade144e](ade144e)) * try to fix release-please by setting a bootstrap sha ([#472](#472)) ([ca65a91](ca65a91)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…cleanup (testcontainers#314) > [!NOTE] > Editor's note from @totallyzen: > After a thorough discussion between @santi @alexanderankin, @kiview and @totallyzen on the [slack](https://testcontainers.slack.com/archives/C04SRG5AXNU/p1710156743640249) we've decided this isn't a breaking change in api, but a modification in behaviour. Therefore not worth a 5.0 but we'll release it under 4.x > If this did end up breaking your workflow, come talk to us about your use-case! **What are you trying to do?** Use Ryuk as the default resource cleanup strategy. **Why should it be done this way?** The current implementation of tc-python does not handle container lifecycle management the same way as other TC implementations. In this PR I introduce Ryuk to be the default resource cleanup strategy, in order to be better aligned with other TC implementations. Ryuk is enabled by default, but can be disabled with the `TESTCONTAINERS_RYUK_DISABLED` env variable (which follows the behavior of other TC implementations). Ryuk behavior can further be altered with the `TESTCONTAINERS_RYUK_PRIVILEGED`, `RYUK_CONTAINER_IMAGE` and `TESTCONTAINERS_DOCKER_SOCKET_OVERRIDE` (also following the same conventions from other implementations). Documentation of these env variables is added to the README in the root of the repo. The implementation uses a singleton container that starts an instance of Ryuk and stores it on class / module level, so that it is reused within a process. This follows the same pattern as in other languages, and Ryuk behaviour itself is implemented the exact same way as other implementations. **BREAKING CHANGE** `__del__` support is now removed, in order to better align with other TC implementations. From the comments in the `__del__` implementation, it seems like this was added as a final attempt to cleanup resources on exit/garbage collection. This leaves three ways to cleanup resources, which has better defined behaviors and follows modern patterns for resource management in Python: Method 1: Context Manager Init/cleanup through a context manager (__enter__/__exit__): ``` with DockerContainer("postgres") as container: # some logic here # end of context: container is killed by `__exit__` method ``` Method 2: Manual start() and stop() ``` container = DockerContainer("postgres").start() # some logic here container.stop() ``` Method 3: Ryuk ``` container = DockerContainer("postgres").start() # some logic here # You forget to .stop() the container, but Ryuk kills it for you 10 seconds after your process exits. ``` _Why remove `__del__`?_ According to the previous maintainer of the repo, it has been causing “[a bunch of issues](https://github.com/testcontainers/testcontainers-python/pull/314#discussion_r1185321083)”, which I have personally experienced while using TC in a Django app, due to the automatic GC behavior when no longer referencing the container with a variable. E.g. if you instantiate the container in a method, only returning the connection string, the Python garbage collector will automatically call `__del__` on the instance at the end of the function, thus killing your container. This leads to clunky workarounds like having to store a reference to the container in a module-level variable, or always having to return a reference to the container from the function creating the container. In addition, the gc behaviour is not consistent across Python implementations, making the reliance on `__del__` flaky at best. Also, having the __del__ method cleanup your container prevents us from implementing `with_reuse()` (which is implemented in other TC implementations) in the future, as a process exit would always instantly kill the container, preventing us to use it in another process before Ryuk reaps it. **Next steps** Once this PR is accepted, my plan is to implement the `with_reuse()` functionality seen in other implementations, to enable faster / instant usage of existing containers. This is very useful in simple testing scenarios or local development workflows using hot reload behaviour. The `with_reuse()` API requires the removal of `__del__` cleanup, as otherwise the container would not be available for reuse due to the GC reaping the container as soon as the process exits. **Other changes** - Adds “x-tc-sid=SESSION_ID” header to the underlying Docker API client with the value of the current session ID (created on module init), in order to enable Testcontainers Cloud to operate in “Turbo mode” testcontainers#314 (comment) - Adds labels `org.testcontainers.lang=python` and `org.testcontainers.session-id=SESSION_ID`to the containers created by TC - As mentioned above, the env variables TESTCONTAINERS_RYUK_DISABLED, TESTCONTAINERS_RYUK_PRIVILEGED, RYUK_CONTAINER_IMAGE and TESTCONTAINERS_DOCKER_SOCKET_OVERRIDE are now used for customizing tc-python behavior. --------- Co-authored-by: Andre Hofmeister <[email protected]> Co-authored-by: Balint Bartha <[email protected]>
🤖 I have created a release *beep* *boop* --- ## [4.1.0](testcontainers/testcontainers-python@testcontainers-v4.0.1...testcontainers-v4.1.0) (2024-03-19) ### Features * **reliability:** integrate the ryuk container for better container cleanup ([testcontainers#314](testcontainers#314)) ([d019874](testcontainers@d019874)) ### Bug Fixes * changelog after release-please ([testcontainers#469](testcontainers#469)) ([dcb4f68](testcontainers@dcb4f68)) * **configuration:** strip whitespaces when reading .testcontainers.properties ([testcontainers#474](testcontainers#474)) ([ade144e](testcontainers@ade144e)) * try to fix release-please by setting a bootstrap sha ([testcontainers#472](testcontainers#472)) ([ca65a91](testcontainers@ca65a91)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@@ -3,3 +3,8 @@ | |||
MAX_TRIES = int(environ.get("TC_MAX_TRIES", 120)) | |||
SLEEP_TIME = int(environ.get("TC_POOLING_INTERVAL", 1)) | |||
TIMEOUT = MAX_TRIES * SLEEP_TIME | |||
|
|||
RYUK_IMAGE: str = environ.get("RYUK_CONTAINER_IMAGE", "testcontainers/ryuk:0.5.1") | |||
RYUK_PRIVILEGED: bool = environ.get("TESTCONTAINERS_RYUK_PRIVILEGED", "false") == "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets evaluated on import and can't be manupulated afterwards better make it a method for better evaluation. Currently in order for this to work I need:
os.environ['TESTCONTAINERS_RYUK_DISABLED'] = 'true'
from testcontainers.core.container import DockerContainer
This setup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@teodor-t-tenev - #532 solves this, and its included since 4.3.2 - i think it should work if you have regular imports at the top of your file and later do
from testcontainers.core.config import testcontainers_config as config
config.ryuk_disabled = True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reply. That did the work for 4.3.2.
Note
Editor's note from @totallyzen:
After a thorough discussion between @santi @alexanderankin, @kiview and @totallyzen on the slack we've decided this isn't a breaking change in api, but a modification in behaviour. Therefore not worth a 5.0 but we'll release it under 4.x
If this did end up breaking your workflow, come talk to us about your use-case!
What are you trying to do?
Use Ryuk as the default resource cleanup strategy.
Why should it be done this way?
The current implementation of tc-python does not handle container lifecycle management the same way as other TC implementations. In this PR I introduce Ryuk to be the default resource cleanup strategy, in order to be better aligned with other TC implementations.
Ryuk is enabled by default, but can be disabled with the
TESTCONTAINERS_RYUK_DISABLED
env variable (which follows the behavior of other TC implementations). Ryuk behavior can further be altered with theTESTCONTAINERS_RYUK_PRIVILEGED
,RYUK_CONTAINER_IMAGE
andTESTCONTAINERS_DOCKER_SOCKET_OVERRIDE
(also following the same conventions from other implementations). Documentation of these env variables is added to the README in the root of the repo.The implementation uses a singleton container that starts an instance of Ryuk and stores it on class / module level, so that it is reused within a process. This follows the same pattern as in other languages, and Ryuk behaviour itself is implemented the exact same way as other implementations.
BREAKING CHANGE
__del__
support is now removed, in order to better align with other TC implementations. From the comments in the__del__
implementation, it seems like this was added as a final attempt to cleanup resources on exit/garbage collection. This leaves three ways to cleanup resources, which has better defined behaviors and follows modern patterns for resource management in Python:Method 1: Context Manager
Init/cleanup through a context manager (enter/exit):
Method 2: Manual start() and stop()
Method 3: Ryuk
Why remove
__del__
? According to the previous maintainer of the repo, it has been causing “a bunch of issues”, which I have personally experienced while using TC in a Django app, due to the automatic GC behavior when no longer referencing the container with a variable. E.g. if you instantiate the container in a method, only returning the connection string, the Python garbage collector will automatically call__del__
on the instance at the end of the function, thus killing your container. This leads to clunky workarounds like having to store a reference to the container in a module-level variable, or always having to return a reference to the container from the function creating the container. In addition, the gc behaviour is not consistent across Python implementations, making the reliance on__del__
flaky at best.Also, having the del method cleanup your container prevents us from implementing
with_reuse()
(which is implemented in other TC implementations) in the future, as a process exit would always instantly kill the container, preventing us to use it in another process before Ryuk reaps it.Next steps
Once this PR is accepted, my plan is to implement the
with_reuse()
functionality seen in other implementations, to enable faster / instant usage of existing containers. This is very useful in simple testing scenarios or local development workflows using hot reload behaviour. Thewith_reuse()
API requires the removal of__del__
cleanup, as otherwise the container would not be available for reuse due to the GC reaping the container as soon as the process exits.Other changes
org.testcontainers.lang=python
andorg.testcontainers.session-id=SESSION_ID
to the containers created by TC