You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2020-01-15 15:34:26.438882: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2020-01-15 15:34:26.438956: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-01-15 15:34:26.438977: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
[2020-01-15 15:34:27,223] [INFO] The trainer start
2020-01-15 15:34:27.225280: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-01-15 15:34:27.230277: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-01-15 15:34:27.230318: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: user-PowerEdge-T640
2020-01-15 15:34:27.230328: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: user-PowerEdge-T640
2020-01-15 15:34:27.230416: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 430.26.0
2020-01-15 15:34:27.230448: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 430.26.0
2020-01-15 15:34:27.230459: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 430.26.0
2020-01-15 15:34:27.231251: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-01-15 15:34:27.270280: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2600000000 Hz
2020-01-15 15:34:27.277139: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56bbad0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-01-15 15:34:27.277196: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
[2020-01-15 15:34:27,287] [WARNING] Some requested devices in tf.distribute.Strategy are not visible to TensorFlow: /job:localhost/replica:0/task:0/device:GPU:0
[2020-01-15 15:34:27,290] [INFO] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
[2020-01-15 15:34:27,603] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,604] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,721] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,723] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,735] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,736] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,748] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,749] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,914] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,915] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:29,108] [INFO] [x] Get dataset from
[2020-01-15 15:34:49,291] [INFO] the datasets contains 116104 samples
[2020-01-15 15:36:17,209] [INFO] befor balance the dataset contains 116104 images
[2020-01-15 15:36:17,209] [INFO] after balanced the datasets contains 8150004 samples
[0115 15:36:21 @parallel.py:231] [MultiProcessRunner] Will fork a dataflow more than one times. This assumes the datapoints are i.i.d.
[0115 15:36:21 @argtools.py:146] WRN Starting a process with 'fork' method is not safe and may consume unnecessary extra CPU memory. Use 'forkserver/spawn' method (available after Py3.4) instead if you run into any issues. See https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
[2020-01-15 15:36:21,617] [INFO] [x] Get dataset from
[2020-01-15 15:36:22,250] [INFO] the datasets contains 6111 samples
[2020-01-15 15:36:26,920] [INFO] befor balance the dataset contains 6111 images
[2020-01-15 15:36:26,920] [INFO] after balanced the datasets contains 428541 samples
[0115 15:36:27 @parallel.py:231] [MultiProcessRunner] Will fork a dataflow more than one times. This assumes the datapoints are i.i.d.
Traceback (most recent call last):
File "train.py", line 108, in
main()
File "train.py", line 105, in main
strategy)
File "/disk/wangpu/face_algo/face_landmark/face_landmark_tf2/lib/core/base_trainer/net_work.py", line 196, in custom_loop
train_dist_dataset,epoch)
File "/disk/wangpu/face_algo/face_landmark/face_landmark_tf2/lib/core/base_trainer/net_work.py", line 147, in distributed_train_epoch
for one_batch in ds:
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_lib.py", line 565, in iter
self._input_workers)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_lib.py", line 1011, in _create_iterators_per_worker
worker_devices)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_lib.py", line 864, in init
self._make_iterator()
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_lib.py", line 870, in _make_iterator
self._dataset, self._devices)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/multi_device_iterator_ops.py", line 292, in init
self._experimental_slack)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/multi_device_iterator_ops.py", line 202, in _create_device_dataset
ds = ds.prefetch(prefetch_buffer_size)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 1013, in prefetch
return PrefetchDataset(self, buffer_size)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 4114, in init
buffer_size, dtype=dtypes.int64, name="buffer_size")
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1314, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_conversion_registry.py", line 52, in _default_conversion_function
return constant_op.constant(value, dtype, name=name)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 258, in constant
allow_broadcast=True)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 266, in _constant_impl
t = convert_to_eager_tensor(value, ctx, dtype)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 96, in convert_to_eager_tensor
return ops.EagerTensor(value, ctx.device_name, dtype)
RuntimeError: /job:localhost/replica:0/task:0/device:GPU:0 unknown device.
The text was updated successfully, but these errors were encountered:
大佬,报这个错是什么原因啊
2020-01-15 15:34:26.438882: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2020-01-15 15:34:26.438956: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-01-15 15:34:26.438977: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
[2020-01-15 15:34:27,223] [INFO] The trainer start
2020-01-15 15:34:27.225280: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-01-15 15:34:27.230277: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-01-15 15:34:27.230318: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: user-PowerEdge-T640
2020-01-15 15:34:27.230328: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: user-PowerEdge-T640
2020-01-15 15:34:27.230416: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 430.26.0
2020-01-15 15:34:27.230448: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 430.26.0
2020-01-15 15:34:27.230459: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 430.26.0
2020-01-15 15:34:27.231251: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-01-15 15:34:27.270280: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2600000000 Hz
2020-01-15 15:34:27.277139: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56bbad0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-01-15 15:34:27.277196: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
[2020-01-15 15:34:27,287] [WARNING] Some requested devices in
tf.distribute.Strategy
are not visible to TensorFlow: /job:localhost/replica:0/task:0/device:GPU:0[2020-01-15 15:34:27,290] [INFO] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
[2020-01-15 15:34:27,603] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,604] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,721] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,723] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,735] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,736] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,748] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,749] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,914] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:27,915] [INFO] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[2020-01-15 15:34:29,108] [INFO] [x] Get dataset from
[2020-01-15 15:34:49,291] [INFO] the datasets contains 116104 samples
[2020-01-15 15:36:17,209] [INFO] befor balance the dataset contains 116104 images
[2020-01-15 15:36:17,209] [INFO] after balanced the datasets contains 8150004 samples
[0115 15:36:21 @parallel.py:231] [MultiProcessRunner] Will fork a dataflow more than one times. This assumes the datapoints are i.i.d.
[0115 15:36:21 @argtools.py:146] WRN Starting a process with 'fork' method is not safe and may consume unnecessary extra CPU memory. Use 'forkserver/spawn' method (available after Py3.4) instead if you run into any issues. See https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
[2020-01-15 15:36:21,617] [INFO] [x] Get dataset from
[2020-01-15 15:36:22,250] [INFO] the datasets contains 6111 samples
[2020-01-15 15:36:26,920] [INFO] befor balance the dataset contains 6111 images
[2020-01-15 15:36:26,920] [INFO] after balanced the datasets contains 428541 samples
[0115 15:36:27 @parallel.py:231] [MultiProcessRunner] Will fork a dataflow more than one times. This assumes the datapoints are i.i.d.
Traceback (most recent call last):
File "train.py", line 108, in
main()
File "train.py", line 105, in main
strategy)
File "/disk/wangpu/face_algo/face_landmark/face_landmark_tf2/lib/core/base_trainer/net_work.py", line 196, in custom_loop
train_dist_dataset,epoch)
File "/disk/wangpu/face_algo/face_landmark/face_landmark_tf2/lib/core/base_trainer/net_work.py", line 147, in distributed_train_epoch
for one_batch in ds:
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_lib.py", line 565, in iter
self._input_workers)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_lib.py", line 1011, in _create_iterators_per_worker
worker_devices)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_lib.py", line 864, in init
self._make_iterator()
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/distribute/input_lib.py", line 870, in _make_iterator
self._dataset, self._devices)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/multi_device_iterator_ops.py", line 292, in init
self._experimental_slack)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/multi_device_iterator_ops.py", line 202, in _create_device_dataset
ds = ds.prefetch(prefetch_buffer_size)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 1013, in prefetch
return PrefetchDataset(self, buffer_size)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 4114, in init
buffer_size, dtype=dtypes.int64, name="buffer_size")
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1314, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_conversion_registry.py", line 52, in _default_conversion_function
return constant_op.constant(value, dtype, name=name)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 258, in constant
allow_broadcast=True)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 266, in _constant_impl
t = convert_to_eager_tensor(value, ctx, dtype)
File "/home/puwang/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 96, in convert_to_eager_tensor
return ops.EagerTensor(value, ctx.device_name, dtype)
RuntimeError: /job:localhost/replica:0/task:0/device:GPU:0 unknown device.
The text was updated successfully, but these errors were encountered: