-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Seed data for DbContainer #541
Comments
Good shout, I didn't know these existed (there goes my evening, reading the manual like I should have done first thing) Let's say, at best, this proposed Or we could just scrap the feature request altogether because it exists already as part of most images, haha. Either way works for me, the feature gap itself is solved by the links provided. |
i think theoretically it could be a neat thing to highlight the feature in the library images, but since it seems like we are now discussing quite a different implementation than proposed in the PR, im going to close the PR |
Makes sense! Just to be clear, the only convenient way I see of exposing the The less convenient way that I don't recommend is to have some kind of FROM somedatabaseimage
COPY db/seeds/ /docker-entrypoint-initdb.d/ and then using that image? That doesn't feel very convenient, building an intermediary image we don't care about (long term) just to get this one test to work? With that in mind, I will explore a similar I'm aiming for: with MysqlContainer(seed="db/seeds/"):
# equivalent to:
container.with_volume_mapping("db/seeds/", "/docker-entrypoint-initdb.d/") I'll need to confirm if this folder works for ALL |
Implemented in #552 for MySQL, to further this discussion with specific example. Edit to add: Note I'm not attached to any of these PRs, so feel free to chuck them out if that's not useful. I'm just happy to get familiar with the innards of this codebase for my own future uses as I explore this feature request, hoping the PRs make the discussion easier. |
i think theoretically there is a preference for is for using transferable over volume mapping, because volume mapping only works locally (local to the docker engine), but we dont have that in python implementation yet - it looks something like this - theoretically my implementation could be even more tied to tar equivalent and also there's an even more concise version of "copy file to container" algorithm floating around in the codebase already, like in the kafka module. havent had time to really think about this too much but i approved your run and it looks like all the checks passed. |
So, I was curious about
Patch: transferring using tar archives. Click to expandFrom ac0fc34717c27b53d3f291ca59b3218d2036c762 Mon Sep 17 00:00:00 2001
From: Jb DOYON <[email protected]>
Date: Fri, 26 Apr 2024 02:06:06 +0100
Subject: [PATCH] Experiment with transfering via tar archives
---
core/testcontainers/core/generic.py | 4 ++++
modules/mysql/testcontainers/mysql/__init__.py | 14 +++++++++++++-
2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/core/testcontainers/core/generic.py b/core/testcontainers/core/generic.py
index 515c283..23c24ea 100644
--- a/core/testcontainers/core/generic.py
+++ b/core/testcontainers/core/generic.py
@@ -71,7 +71,11 @@ class DbContainer(DockerContainer):
self._configure()
super().start()
self._connect()
+ self._transfer_seed()
return self
def _configure(self) -> None:
raise NotImplementedError
+
+ def _transfer_seed(self) -> None:
+ pass
diff --git a/modules/mysql/testcontainers/mysql/__init__.py b/modules/mysql/testcontainers/mysql/__init__.py
index e95b87d..c7ed97b 100644
--- a/modules/mysql/testcontainers/mysql/__init__.py
+++ b/modules/mysql/testcontainers/mysql/__init__.py
@@ -11,7 +11,10 @@
# License for the specific language governing permissions and limitations
# under the License.
import re
+import tarfile
+from io import BytesIO
from os import environ
+from pathlib import Path
from typing import Optional
from testcontainers.core.generic import DbContainer
@@ -85,7 +88,7 @@ class MySqlContainer(DbContainer):
if self.username == "root":
self.root_password = self.password
if seed is not None:
- self.with_volume_mapping(seed, "/docker-entrypoint-initdb.d/")
+ self.seed = seed
def _configure(self) -> None:
self.with_env("MYSQL_ROOT_PASSWORD", self.root_password)
@@ -105,3 +108,12 @@ class MySqlContainer(DbContainer):
return super()._create_connection_url(
dialect="mysql+pymysql", username=self.username, password=self.password, dbname=self.dbname, port=self.port
)
+
+ def _transfer_seed(self) -> None:
+ src_path = Path(self.seed)
+ dest_path = "/docker-entrypoint-initdb.d/"
+ with BytesIO() as archive, tarfile.TarFile(fileobj=archive, mode="w") as tar:
+ for filename in src_path.iterdir():
+ tar.add(filename.absolute(), arcname=filename.relative_to(src_path))
+ archive.seek(0)
+ self.get_wrapped_container().put_archive(dest_path, archive)
--
2.34.1
EDIT: Wrong again, it's because I waited to do the archive transfer till AFTER the connect, not before, which waits for container ready, and moving it before makes it work again. Updated #552 to use this new system! |
Yes, transferring only works after it starts, so you want to run that after
calling super().start() in the overriden .start()
…On Thu, Apr 25, 2024, 9:20 PM Jb DOYON ***@***.***> wrote:
So, I was curious about put_archive and the "transferable" system, so I
implemented it experimentally, and it didn't work:
The files did make it to the folder, but the tables weren't loaded in any
of the tests, which I can see by running docker exec on breakpoints.
Can't prove it, but I think that since the archive can only be loaded once
the container has started (compared to volume mounts which *become* part
of the creation of container, the mysql server is no longer looking at the
/docker-entrypoint-initdb.d/ folder by the time data gets in, so the test
table stays empty and fails test.
Attached is the patch (applies on top of the linked PR), in case it's
useful anyway?
Patch: transferring using tar archives. Click to expand ```patch From
ac0fc34 Mon Sep 17 00:00:00 2001 From: Jb
DOYON Date: Fri, 26 Apr 2024 02:06:06 +0100 Subject: [PATCH] Experiment
with transfering via tar archives
------------------------------
core/testcontainers/core/generic.py | 4 ++++
modules/mysql/testcontainers/mysql/*init*.py | 14 +++++++++++++-
2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/core/testcontainers/core/generic.py
b/core/testcontainers/core/generic.py
index 515c283..23c24ea 100644
--- a/core/testcontainers/core/generic.py
+++ b/core/testcontainers/core/generic.py
@@ -71,7 +71,11 @@ class DbContainer(DockerContainer):
self._configure()
super().start()
self._connect()
-
self._transfer_seed()
return self
def _configure(self) -> None:
raise NotImplementedError
-
-
def _transfer_seed(self) -> None:
-
pass
diff --git a/modules/mysql/testcontainers/mysql/*init*.py
b/modules/mysql/testcontainers/mysql/*init*.py
index e95b87d..c7ed97b 100644
--- a/modules/mysql/testcontainers/mysql/*init*.py
+++ b/modules/mysql/testcontainers/mysql/*init*.py
@@ -11,7 +11,10 @@
License for the specific language governing permissions and limitations under
the License.
import re
+import tarfile
+from io import BytesIO
from os import environ
+from pathlib import Path
from typing import Optional
from testcontainers.core.generic import DbContainer
@@ -85,7 +88,7 @@ class MySqlContainer(DbContainer):
if self.username == "root":
self.root_password = self.password
if seed is not None:
-
self.with_volume_mapping(seed, "/docker-entrypoint-initdb.d/")
-
self.seed = seed
def _configure(self) -> None:
self.with_env("MYSQL_ROOT_PASSWORD", self.root_password)
@@ -105,3 +108,12 @@ class MySqlContainer(DbContainer):
return super()._create_connection_url(
dialect="mysql+pymysql", username=self.username,
password=self.password, dbname=self.dbname, port=self.port
)
-
-
def _transfer_seed(self) -> None:
-
src_path = Path(self.seed)
-
dest_path = "/docker-entrypoint-initdb.d/"
-
with BytesIO() as archive, tarfile.TarFile(fileobj=archive, mode="w") as tar:
-
for filename in src_path.iterdir():
-
tar.add(filename.absolute(), arcname=filename.relative_to(src_path))
-
archive.seek(0)
-
self.get_wrapped_container().put_archive(dest_path, archive)
--
2.34.1
</details>
Curious if you have an idea on how to work around that.
—
Reply to this email directly, view it on GitHub
<#541 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACECGJGIIFE5GNZ5OKQV3L3Y7GTWZAVCNFSM6AAAAABGKF4JQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZYGQ3DGNRTGE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Small bump: Should we get into a review phase for this proof of concept PR? |
Ref #541. New capability of "seeding" a db container using image's support for /docker-entrypoint-initdb.d/ folder. Using the "transferable" system, borrowed from Kafka. Updates DbContainer to have a new (NOOP-default) `_transfer_seed()` method, run after `_start()` and before `_connect()`, to allow the folder transfer. Currently implemented only in MySQL, but extensible to others that use the `/docker-entrypoint-initdb.d/` system. --------- Co-authored-by: Jb DOYON <[email protected]>
Wow, thank you for merging that PR! (I expected more of a fight, haha) Can anyone think of other DbContainer derivatives that may benefit from this feature too? |
Maybe documentation about it would be good
…On Sat, May 11, 2024, 11:47 AM Jb DOYON ***@***.***> wrote:
Wow, thank you for merging that PR! (I expected more of a fight, haha)
I feel like our Postgres friends would be missing out on this, so I'll try
to draft a similar PR for pg, will link here.
Can anyone think of other DbContainer derivatives that may benefit from
this feature too?
—
Reply to this email directly, view it on GitHub
<#541 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACECGJHKCDXQ2XS6IX3PPRDZBY4PJAVCNFSM6AAAAABGKF4JQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBVHEZTONBUGM>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
See #576 which fixes the doctest of mysql, which acted as documentation for the feature, and adds the same feature to postgres. I'm interested in better documentation for the feature, but I'm not sure how that could look like? |
What are you trying to do?
In order to do meaningful tests using my fresh database, I wish to load arbitrary SQL scripts from my DbContainer-s.
I can do (and have done) this per-project, in test folders, but believe there's value in allowing an arbitrary list of SQL scripts to be run before the testcontainer is yielded, and making this a pytest fixture to reuse.
I do this usually by first passing a path to a folder with scripts, volume-mount it to
/seeds
path in the database container, and also pass a list of script files in that folder to be executed. Then before yielding the ready db-container, I loop over eachscript
filename and docontainer.exec_run(["mysql", "-e" f"source /seeds/{script}")
.The
list
of scripts means we can feed both schema first, then arbitrary sample data files.I suggest a new
DbContainer._seed()
method, defaulting toraise NotImplementedError
, to be overriden per database, allowing a new optional parameterseed
.Each database then just has to define their
_seed()
function to the specific way toexec_run
the scripts (psql
for Postgres,mysql
for MySQL CLI...)Example from #542 implementation draft:
Why should it be done this way?
I believe most users of testcontainers, beyond just testing connectivity with a database, want a simple-to-reproduce test environment with realistic datastructure and mock data. This is currently annoying to do locally, and can be upstreamed into a simple-looking
seed
parameter, which could enable the feature for most databases, by pushing it intoDbContainer
itself.See #542 for a draft implementation using MySQL as target (generalizeable to others of course)
The text was updated successfully, but these errors were encountered: