Fix broken Docker recipe and update dependencies #7

tuncK · 2023-11-23T17:27:56Z

Jax version dictated by colabfold is too old resulting in a docker build error.

Also updated tensorflow to 2.15 & CUDA to 12.2 along with other packages.

bgruening · 2023-11-23T20:38:56Z

Dockerfile

    seaborn==0.12.2 \
    voila==0.4.1 \
-    "colabfold[alphafold] @ git+https://github.com/sokrypton/ColabFold" && \
+    && \
+    # As of Nov 2023, colabfold requires 0.3.25 <= jax < 0.4.0, which leads to build errors.


@anuprulez you need to decide if we can remove that.

Strictly speaking, tensorflow errors have been an issue for the past release as well and we now have another IT with a separate container (v. 0.2) that provides the colabfold service.

Option 2: If we want to keep it, I could try to provide colabfold out of a conda env installed in this container so that Tensorflow et al. still work.

@anuprulez ?

Some how I do not get this email even though I am subscribed to this repo. Not even in my spam. Sorry for that!

I will have a look at it today :)

I agree that Colabfold has issues with the latest versions of TensorFlow, maybe with CUDA as well. We can remove it from v0.3 and later versions of this Docker container. We already have it on v0.2 in case it is needed for my defense.
@tuncK @bgruening

@bgruening, shall we merge this?

bgruening · 2023-12-13T09:02:34Z

@anuprulez do you want to create a new release?

tuncK · 2023-12-13T09:09:21Z

@anuprulez is the IT on .eu still broken? Because that is why I had started this.

anuprulez · 2023-12-14T14:10:55Z

@tuncK yes, its broken I think with the following error message when I run a tensorflow notebook:

2023-12-14 14:09:26.827782: W tensorflow/tsl/framework/cpu_allocator_impl.cc:82] Allocation of 188160000 exceeds 10% of free system memory.
2023-12-14 14:09:28.328554: W tensorflow/compiler/xla/service/gpu/nvptx_helper.cc:56] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
  ./cuda_sdk_lib
  /usr/local/cuda-11.2
  /usr/local/cuda
  .
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2023-12-14 14:09:28.339603: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-12-14 14:09:28.340107: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc

libdevice not found at ./libdevice.10.bc
	 [[{{node StatefulPartitionedCall_81}}]] [Op:__inference_train_function_14105]

anuprulez · 2023-12-14T14:20:12Z

I can create a new release but after verifying all the notebooks (except those using Colabfold) by running this container on my VM. Probably next week.

anuprulez · 2023-12-21T15:11:41Z

The newly released v0.4 version of this tool throws the same error as above as well as the v0.3:

libdevice not found at ./libdevice.10.bc

It is not possible to train models using Tensorflow. However, TF recognises GPU but fails while training any model.

Try to fix usegalaxy-eu/gpu-jupyterlab-docker#7

bgruening · 2023-12-21T21:16:58Z

Can you please try this again tomorrow: I tried usegalaxy-eu/infrastructure-playbook#1067

anuprulez · 2023-12-22T09:33:04Z

I tried it, but unfortunately, it did not work. I tried this solution already directly in the Docker container as well. Still, it does not find libdevice file even though it is present at /opt/conda/nvvm/. I will look into it.

tuncK · 2023-12-22T10:32:43Z

The last time I was dealing with this, it had something to do with:

Paths that it is looking to by default
Versions hardcoded in the filenames (/some/path/xlib.v123.so)

tuncK added 2 commits November 23, 2023 15:41

Fix broken Dockerfile due to old jax

878d71c

Upgrade cuda to 12.2

72ef870

bgruening reviewed Nov 23, 2023

View reviewed changes

bgruening merged commit d8cebdf into main Dec 13, 2023
2 checks passed

bgruening deleted the jax branch December 13, 2023 09:02

bgruening added a commit to usegalaxy-eu/infrastructure-playbook that referenced this pull request Dec 21, 2023

Try to fix usegalaxy-eu/gpu-jupyterlab-docker#7

40820ad

bgruening added a commit to usegalaxy-eu/infrastructure-playbook that referenced this pull request Dec 21, 2023

Merge pull request #1067 from usegalaxy-eu/bgruening-patch-6

a3f3231

Try to fix usegalaxy-eu/gpu-jupyterlab-docker#7

anuprulez mentioned this pull request Jan 3, 2024

Fix libdevice error in GPU Jupyterlab tool #8

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix broken Docker recipe and update dependencies #7

Fix broken Docker recipe and update dependencies #7

tuncK commented Nov 23, 2023

bgruening Nov 23, 2023

tuncK Nov 24, 2023

anuprulez Dec 8, 2023

anuprulez Dec 8, 2023

tuncK Dec 13, 2023

bgruening Dec 13, 2023

bgruening commented Dec 13, 2023

tuncK commented Dec 13, 2023

anuprulez commented Dec 14, 2023 •

edited

Loading

anuprulez commented Dec 14, 2023

anuprulez commented Dec 21, 2023 •

edited

Loading

bgruening commented Dec 21, 2023

anuprulez commented Dec 22, 2023

tuncK commented Dec 22, 2023

Fix broken Docker recipe and update dependencies #7

Fix broken Docker recipe and update dependencies #7

Conversation

tuncK commented Nov 23, 2023

bgruening Nov 23, 2023

Choose a reason for hiding this comment

tuncK Nov 24, 2023

Choose a reason for hiding this comment

anuprulez Dec 8, 2023

Choose a reason for hiding this comment

anuprulez Dec 8, 2023

Choose a reason for hiding this comment

tuncK Dec 13, 2023

Choose a reason for hiding this comment

bgruening Dec 13, 2023

Choose a reason for hiding this comment

bgruening commented Dec 13, 2023

tuncK commented Dec 13, 2023

anuprulez commented Dec 14, 2023 • edited Loading

anuprulez commented Dec 14, 2023

anuprulez commented Dec 21, 2023 • edited Loading

bgruening commented Dec 21, 2023

anuprulez commented Dec 22, 2023

tuncK commented Dec 22, 2023

anuprulez commented Dec 14, 2023 •

edited

Loading

anuprulez commented Dec 21, 2023 •

edited

Loading