Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU training doesn't work? #108

Open
yongshuo-Z opened this issue May 24, 2021 · 5 comments
Open

GPU training doesn't work? #108

yongshuo-Z opened this issue May 24, 2021 · 5 comments

Comments

@yongshuo-Z
Copy link

yongshuo-Z commented May 24, 2021

Hi, thanks for your nice code.

When I'm training the model, it trains on cpu, not gpu, which makes the training quite slow.

I've installed tensorflow-gpu 1.14.0 and keras 2.2.5. And the environment works fine with other project (other projects can train on gpu). I wonder is there any configuration we need to set explicitly to make gpu work? Thanks!

@Tokariew
Copy link

Tokariew commented Nov 8, 2021

name: n2vv2
channels:
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=1_gnu
  - abseil-cpp=20210324.2=h9c3ff4c_0
  - absl-py=0.15.0=pyhd8ed1ab_0
  - aiohttp=3.7.4.post0=py39h3811e60_1
  - argon2-cffi=21.1.0=py39h3811e60_2
  - astunparse=1.6.3=pyhd8ed1ab_0
  - async-timeout=3.0.1=py_1000
  - async_generator=1.10=py_0
  - attrs=21.2.0=pyhd8ed1ab_0
  - backcall=0.2.0=pyh9f0ad1d_0
  - backports=1.0=py_2
  - backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
  - bleach=4.1.0=pyhd8ed1ab_0
  - blinker=1.4=py_1
  - brotlipy=0.7.0=py39h3811e60_1003
  - c-ares=1.18.1=h7f98852_0
  - ca-certificates=2021.10.8=ha878542_0
  - cached-property=1.5.2=hd8ed1ab_1
  - cached_property=1.5.2=pyha770c72_1
  - cachetools=4.2.4=pyhd8ed1ab_0
  - certifi=2021.10.8=py39hf3d152e_1
  - cffi=1.15.0=py39h4bc2ebd_0
  - chardet=4.0.0=py39hf3d152e_2
  - click=8.0.3=py39hf3d152e_1
  - cryptography=35.0.0=py39h95dcef6_2
  - cudatoolkit=11.3.1=ha36c431_9
  - cudnn=8.2.1.32=h86fa8c9_0
  - cupti=11.3.1=0
  - dataclasses=0.8=pyhc8e2a94_3
  - debugpy=1.5.1=py39he80948d_0
  - decorator=5.1.0=pyhd8ed1ab_0
  - defusedxml=0.7.1=pyhd8ed1ab_0
  - entrypoints=0.3=pyhd8ed1ab_1003
  - gast=0.4.0=pyh9f0ad1d_0
  - giflib=5.2.1=h36c2ea0_2
  - google-auth=1.35.0=pyh6c4a22f_0
  - google-auth-oauthlib=0.4.6=pyhd8ed1ab_0
  - google-pasta=0.2.0=pyh8c360ce_0
  - grpc-cpp=1.39.1=h850795e_1
  - grpcio=1.39.0=py39hff7568b_0
  - h5py=3.1.0=nompi_py39h25020de_100
  - hdf5=1.10.6=nompi_h6a2412b_1114
  - icu=68.2=h9c3ff4c_0
  - idna=2.10=pyh9f0ad1d_0
  - importlib-metadata=4.8.1=py39hf3d152e_1
  - importlib_resources=5.4.0=pyhd8ed1ab_0
  - ipykernel=6.4.2=py39hef51801_0
  - ipython=7.29.0=py39hef51801_1
  - ipython_genutils=0.2.0=py_1
  - jedi=0.18.0=py39hf3d152e_3
  - jinja2=3.0.2=pyhd8ed1ab_0
  - jpeg=9d=h36c2ea0_0
  - jsonschema=4.2.1=pyhd8ed1ab_0
  - jupyter_client=7.0.6=pyhd8ed1ab_0
  - jupyter_core=4.9.1=py39hf3d152e_0
  - jupyterlab_pygments=0.1.2=pyh9f0ad1d_0
  - keras=2.6.0=pyhd8ed1ab_0
  - keras-preprocessing=1.1.2=pyhd8ed1ab_0
  - krb5=1.19.2=hcc1bbae_3
  - ld_impl_linux-64=2.36.1=hea4e1c9_2
  - libblas=3.9.0=12_linux64_openblas
  - libcblas=3.9.0=12_linux64_openblas
  - libcurl=7.79.1=h2574ce0_1
  - libedit=3.1.20191231=he28a2e2_2
  - libev=4.33=h516909a_1
  - libffi=3.4.2=h9c3ff4c_4
  - libgcc-ng=11.2.0=h1d223b6_11
  - libgfortran-ng=11.2.0=h69a702a_11
  - libgfortran5=11.2.0=h5c6108e_11
  - libgomp=11.2.0=h1d223b6_11
  - liblapack=3.9.0=12_linux64_openblas
  - libnghttp2=1.43.0=h812cca2_1
  - libopenblas=0.3.18=pthreads_h8fe5266_0
  - libpng=1.6.37=h21135ba_2
  - libprotobuf=3.16.0=h780b84a_0
  - libsodium=1.0.18=h36c2ea0_1
  - libssh2=1.10.0=ha56f1ee_2
  - libstdcxx-ng=11.2.0=he4da1e4_11
  - libzlib=1.2.11=h36c2ea0_1013
  - markdown=3.3.4=pyhd8ed1ab_0
  - markupsafe=2.0.1=py39h3811e60_1
  - matplotlib-inline=0.1.3=pyhd8ed1ab_0
  - mistune=0.8.4=py39h3811e60_1005
  - multidict=5.2.0=py39h3811e60_1
  - nbclient=0.5.4=pyhd8ed1ab_0
  - nbconvert=6.2.0=py39hf3d152e_0
  - nbformat=5.1.3=pyhd8ed1ab_0
  - nccl=2.11.4.1=hdc17891_0
  - ncurses=6.2=h58526e2_4
  - nest-asyncio=1.5.1=pyhd8ed1ab_0
  - notebook=6.4.5=pyha770c72_0
  - numpy=1.19.5=py39hdbf815f_2
  - oauthlib=3.1.1=pyhd8ed1ab_0
  - openssl=1.1.1l=h7f98852_0
  - opt_einsum=3.3.0=pyhd8ed1ab_1
  - packaging=21.0=pyhd8ed1ab_0
  - pandoc=2.16.1=h7f98852_0
  - pandocfilters=1.5.0=pyhd8ed1ab_0
  - parso=0.8.2=pyhd8ed1ab_0
  - pexpect=4.8.0=pyh9f0ad1d_2
  - pickleshare=0.7.5=py_1003
  - pip=21.3.1=pyhd8ed1ab_0
  - prometheus_client=0.12.0=pyhd8ed1ab_0
  - prompt-toolkit=3.0.22=pyha770c72_0
  - protobuf=3.16.0=py39he80948d_0
  - ptyprocess=0.7.0=pyhd3deb0d_0
  - pyasn1=0.4.8=py_0
  - pyasn1-modules=0.2.7=py_0
  - pycparser=2.21=pyhd8ed1ab_0
  - pygments=2.10.0=pyhd8ed1ab_0
  - pyjwt=2.3.0=pyhd8ed1ab_0
  - pyopenssl=21.0.0=pyhd8ed1ab_0
  - pyparsing=3.0.5=pyhd8ed1ab_0
  - pyrsistent=0.18.0=py39h3811e60_0
  - pysocks=1.7.1=py39hf3d152e_4
  - python=3.9.7=hb7a2778_3_cpython
  - python-dateutil=2.8.2=pyhd8ed1ab_0
  - python-flatbuffers=1.12=pyhd8ed1ab_1
  - python_abi=3.9=2_cp39
  - pyu2f=0.1.5=pyhd8ed1ab_0
  - pyzmq=22.3.0=py39h37b5a0c_1
  - re2=2021.09.01=h9c3ff4c_0
  - readline=8.1=h46c0cb4_0
  - requests=2.25.1=pyhd3deb0d_0
  - requests-oauthlib=1.3.0=pyh9f0ad1d_0
  - rsa=4.7.2=pyh44b312d_0
  - scipy=1.7.1=py39hee8e79c_0
  - send2trash=1.8.0=pyhd8ed1ab_0
  - setuptools=58.5.3=py39hf3d152e_0
  - six=1.15.0=pyh9f0ad1d_0
  - snappy=1.1.8=he1b5a44_3
  - sqlite=3.36.0=h9cd32fc_2
  - tensorboard=2.6.0=pyhd8ed1ab_1
  - tensorboard-data-server=0.6.0=py39h95dcef6_1
  - tensorboard-plugin-wit=1.8.0=pyh44b312d_0
  - tensorflow=2.6.0=cuda112py39h9dc3950_2
  - tensorflow-base=2.6.0=cuda112py39h0b4cdfd_2
  - tensorflow-estimator=2.6.0=cuda112py39heacc632_2
  - termcolor=1.1.0=py_2
  - terminado=0.12.1=py39hf3d152e_1
  - testpath=0.5.0=pyhd8ed1ab_0
  - tk=8.6.11=h27826a3_1
  - tornado=6.1=py39h3811e60_2
  - traitlets=5.1.1=pyhd8ed1ab_0
  - typing-extensions=3.7.4.3=0
  - typing_extensions=3.7.4.3=py_0
  - tzdata=2021e=he74cb21_0
  - urllib3=1.26.7=pyhd8ed1ab_0
  - wcwidth=0.2.5=pyh9f0ad1d_2
  - webencodings=0.5.1=py_1
  - werkzeug=2.0.1=pyhd8ed1ab_0
  - wheel=0.37.0=pyhd8ed1ab_1
  - wrapt=1.12.1=py39h3811e60_3
  - xz=5.2.5=h516909a_1
  - yarl=1.7.2=py39h3811e60_1
  - zeromq=4.3.4=h9c3ff4c_1
  - zipp=3.6.0=pyhd8ed1ab_0
  - zlib=1.2.11=h36c2ea0_1013
  - pip:
    - csbdeep==0.6.3
    - cycler==0.11.0
    - imagecodecs==2021.8.26
    - kiwisolver==1.3.2
    - matplotlib==3.4.3
    - pillow==8.4.0
    - ruamel-yaml==0.17.17
    - ruamel-yaml-clib==0.2.6
    - tifffile==2021.11.2
    - tqdm==4.62.3
prefix: /home/tokariew/.local/share/conda/envs/n2vv2

with such conda environment GPU training is working for me on linux with nvidia GPU, hope it helps…

nv2 i installed from github and edited setup.py to bump version of keras

@tibuch
Copy link
Collaborator

tibuch commented Nov 17, 2021

The most recent N2V version requires TF2. Could you try this combination:

conda create -n n2v_env python=3.7
conda activate n2v_env
conda install cudatoolkit=10.1 cudnn
pip install tensorflow==2.3
pip install n2v
pip install jupyter

@zxy126
Copy link

zxy126 commented Nov 19, 2021

The most recent N2V version requires TF2. Could you try this combination:

conda create -n n2v_env python=3.7
conda activate n2v_env
conda install cudatoolkit=10.1 cudnn
pip install tensorflow==2.3
pip install n2v
pip install jupyter

And I add the "X:\anaconda3\envs\n2v_env\Library\bin" to the system path. It works very well on Win10.

@Mrc010
Copy link

Mrc010 commented Nov 26, 2021

Got a new GPU and can only use super slow tensorflow==2.2 or slow tensorflow==1.15

conda create -n n2v python=3.7
conda install cudatoolkit=10.0 cudnn=7.6 tensorflow-estimator==1.15.1 keras==2.2.4 tensorflow-gpu==1.15 
pip install n2v==0.2.1

Edit: found a solution for CUDA 11.5 + Tensorflow 1.15 that is fast

conda create -n n2v python=3.8
conda activate n2v
pip install nvidia-pyindex
pip install nvidia-tensorflow
pip install nvidia-tensorboard
pip install n2v==0.2.1

cf. https://github.com/NVIDIA/tensorflow

sidenote: this is on Ubuntu 20.04

Edit 2: for Tensorflow 1.15., adding this to the notebook is useful to prevent annoying warnings and excessive memory allocation:

import tensorflow as tf
conf = tf.compat.v1.ConfigProto()
conf.gpu_options.allow_growth=True
session = tf.compat.v1.Session(config=conf)
tf.compat.v1.logging.set_verbosity('ERROR')

@Wuito
Copy link

Wuito commented Jan 19, 2023

The environment version I am using is TF2, on Win11 and Anaconda.
python==3.9
tensorflow=2.7
CUDA=11.8
cuDNN=8.7
refer to the author's readme for other environment requirements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants