Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate best practices for creating an XLoader Docker image + subsequent CKAN (worker) container #66

Open
kowh-ai opened this issue May 18, 2024 · 3 comments · May be fixed by #76
Open

Comments

@kowh-ai
Copy link
Contributor

kowh-ai commented May 18, 2024

Eventually XLoader will replace DataPusher as the de facto tool to load tabular data into the CKAN DataStore.

Ideally, XLoader will be running in it's own container so will need CKAN installed also

The "worker" process is ckan -c /etc/ckan/default/ckan.ini jobs worker Is there a better way to wrap this process to take advantage of native Docker methods to do this?

Will we use supervisor to look after the running worker process?

ckanext.xloader.api_token is also required to be set somewhere within the setup scripts

@kowh-ai
Copy link
Contributor Author

kowh-ai commented Jul 16, 2024

I have a background worker running in it’s own container however I have the following problem:

Replacement of DataPusher with XLoader:

CKAN (UI) runs in it’s own image/container
CKAN (background worker) also runs in it’s own image/container

These 2 containers are on the same Docker network

CKAN_SITE_URL will be set for either:
1. https://localhost:8443 in the “Production” Docker environment
2. http://localhost:5000 in the “Development” environment

In the past I have had to configure XLoader (worker) in the UI container as I couldn’t get it running in a separate container.

My questions re:

Are any of these variables used for XLoader? I assume not…
CKAN_DATAPUSHER_URL
CKAN__DATAPUSHER__CALLBACK_URL_BASE
DATAPUSHER_REWRITE_RESOURCES
DATAPUSHER_REWRITE_URL

To get the background worker getting “triggered” to Express Load a dataset resource into the Datastore I have had to include the xloader plugin in the CKAN UI container as well as the CKAN worker container. However only the worker container runs “ckan -c ${CKAN_INI” jobs worker

I have tried this setup in both Production mode and Development mode. I get the following error in Dev mode
Screenshot 2024-07-16 at 1 08 34 pm

I get a similar error in Production mode. It seems XLoader uses the CKAN_SITE_URL variable for a call back to CKAN. I need to be able to call back to the CKAN container URL.
Update: - it doesn't use CKAN_SITE_URL, it uses the download URL to call back to CKAN

To workaround this I had to add an entry into the /etc/hosts file in the xloader container to assign localhost to the ckan-dev IP address. It now loaded the tabular data into the DataStore

/etc/hosts
Screenshot 2024-07-16 at 1 04 01 pm

Here is the log from a successful xloader load:
Screenshot 2024-07-16 at 1 02 50 pm

I must be missing something here…any idea?

@kowh-ai
Copy link
Contributor Author

kowh-ai commented Jul 17, 2024

Update 1: I have tried updating both the CKAN_SITE_URL environment variable plus the same parameter in the ckan.ini file on the xloader container to talk back to the CKAN container but it still uses localhost ie: http://localhost:5000/dataset/f29a8063-49e4-4b2b-ae2a-830282d19d2c/resource/2def947f-3bff-43c1-94ca-eab8973d09e5/download/sample.people.csv

@kowh-ai
Copy link
Contributor Author

kowh-ai commented Jul 18, 2024

Update 2: I have managed to get it working but it's not a great solution. Docker containers can access the host network using the host.docker.internal hostname...so if I add an entry to my Mac OS (Big Sur) hosts file for host.docker.internal to be an alias of localhost I can use this URL: http://host.docker.internal:5000 to access CKAN. This URL is also available within the running docker containers thus a callback to the CKAN container via the host network is available

If I also add an entry to my Mac OS hosts file for ckan-dev to be an alias of localhost and use "Dev Mode" I can also get this working as ckan-dev is the container name for the ckan container

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
1 participant