RuntimeError: /job:localhost/replica:0/task:0/device:GPU:0 unknown device. #23

you-old · 2020-01-15T07:53:48Z

大佬，报这个错是什么原因啊

2020-01-15 15:34:26.438882: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2020-01-15 15:34:26.438956: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-01-15 15:34:26.438977: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
[2020-01-15 15:34:27,223] [INFO] The trainer start
2020-01-15 15:34:27.225280: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-01-15 15:34:27.230277: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-01-15 15:34:27.230318: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: user-PowerEdge-T640
2020-01-15 15:34:27.230328: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: user-PowerEdge-T640
2020-01-15 15:34:27.230416: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 430.26.0
2020-01-15 15:34:27.230448: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 430.26.0
2020-01-15 15:34:27.230459: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 430.26.0
2020-01-15 15:34:27.231251: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-01-15 15:34:27.270280: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2600000000 Hz
2020-01-15 15:34:27.277139: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56bbad0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-01-15 15:34:27.277196: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
[2020-01-15 15:34:27,287] [WARNING] Some requested devices in tf.distribute.Strategy are not visible to TensorFlow: /job:localhost/replica:0/task:0/device:GPU:0
[2020-01-15 15:34:27,290] [INFO] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
[2020-01-15 15:34:27,603] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,604] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,721] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,723] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,735] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,736] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,748] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,749] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,914] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,915] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:29,108] [INFO] [x] Get dataset from
[2020-01-15 15:34:49,291] [INFO] the datasets contains 116104 samples
[2020-01-15 15:36:17,209] [INFO] befor balance the dataset contains 116104 images
[2020-01-15 15:36:17,209] [INFO] after balanced the datasets contains 8150004 samples
[0115 15:36:21 @parallel.py:231] [MultiProcessRunner] Will fork a dataflow more than one times. This assumes the datapoints are i.i.d.
[0115 15:36:21 @argtools.py:146] WRN Starting a process with 'fork' method is not safe and may consume unnecessary extra CPU memory. Use 'forkserver/spawn' method (available after Py3.4) instead if you run into any issues. See https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
[2020-01-15 15:36:21,617] [INFO] [x] Get dataset from
[2020-01-15 15:36:22,250] [INFO] the datasets contains 6111 samples
[2020-01-15 15:36:26,920] [INFO] befor balance the dataset contains 6111 images
[2020-01-15 15:36:26,920] [INFO] after balanced the datasets contains 428541 samples
[0115 15:36:27 @parallel.py:231] [MultiProcessRunner] Will fork a dataflow more than one times. This assumes the datapoints are i.i.d.
Traceback (most recent call last):
File "train.py", line 108, in
main()
File "train.py", line 105, in main
strategy)
File "/disk/wangpu/face_algo/face_landmark/face_landmark_tf2/lib/core/base_trainer/net_work.py", line 196, in custom_loop
train_dist_dataset,epoch)
File "/disk/wangpu/face_algo/face_landmark/face_landmark_tf2/lib/core/base_trainer/net_work.py", line 147, in distributed_train_epoch
for one_batch in ds:
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_lib.py", line 565, in iter
self._input_workers)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_lib.py", line 1011, in _create_iterators_per_worker
worker_devices)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_lib.py", line 864, in init
self._make_iterator()
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_lib.py", line 870, in _make_iterator
self._dataset, self._devices)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/multi_device_iterator_ops.py", line 292, in init
self._experimental_slack)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/multi_device_iterator_ops.py", line 202, in _create_device_dataset
ds = ds.prefetch(prefetch_buffer_size)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 1013, in prefetch
return PrefetchDataset(self, buffer_size)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 4114, in init
buffer_size, dtype=dtypes.int64, name="buffer_size")
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1314, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_conversion_registry.py", line 52, in _default_conversion_function
return constant_op.constant(value, dtype, name=name)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 258, in constant
allow_broadcast=True)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 266, in _constant_impl
t = convert_to_eager_tensor(value, ctx, dtype)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 96, in convert_to_eager_tensor
return ops.EagerTensor(value, ctx.device_name, dtype)
RuntimeError: /job:localhost/replica:0/task:0/device:GPU:0 unknown device.

The text was updated successfully, but these errors were encountered:

610265158 · 2020-01-16T02:49:19Z

很多库都没找到，你都tensorflow 有问题吧，

gpu也没找到

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: /job:localhost/replica:0/task:0/device:GPU:0 unknown device. #23

RuntimeError: /job:localhost/replica:0/task:0/device:GPU:0 unknown device. #23

you-old commented Jan 15, 2020

610265158 commented Jan 16, 2020

RuntimeError: /job:localhost/replica:0/task:0/device:GPU:0 unknown device. #23

RuntimeError: /job:localhost/replica:0/task:0/device:GPU:0 unknown device. #23

Comments

you-old commented Jan 15, 2020

610265158 commented Jan 16, 2020