The initialization time is too long during mnist test #170

Natelu · 2022-11-23T09:41:49Z

Initializing from `Creating TensorFlow device` to task running in my training session of mnist takes too much time(about 5mins to ready)

2022-11-23 08:15:22.173334: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2022-11-23 08:15:22.173363: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1
2022-11-23 08:15:22.173375: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0:   Y Y
2022-11-23 08:15:22.173384: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1:   Y Y
2022-11-23 08:15:22.173402: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla V100-PCIE-16GB, pci bus id: 0000:3b:00.0, compute capability: 7.0)
2022-11-23 08:15:22.173450: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: Tesla V100-PCIE-16GB, pci bus id: 0000:44:00.0, compute capability: 7.0)
[-------------COST ABOUT 2mins ---------------------]
Initialized!
[-------------COST ABOUT 3mins ---------------------]
Step 0 (epoch 0.00), 2118.7 ms
Minibatch loss: 8.334, learning rate: 0.010000
Minibatch error: 85.9%
Validation error: 84.5%
duration between initialized and running is %d s 210.556521893
duration between initialized and running is %d s 210.559849024
duration between initialized and running is %d s 210.563081026

Base environment

Device: Tesla V100-PCIE-16GB; Driver Version: 470.141.03 CUDA Version: 11.4

System ENV

KUBE: v1.23.10
RUNC: 1.1.1
Containerd: v1.6.4
OS Kernel: Linux 3.10.0-1160.el7.x86_64
OS version: CentOS Linux 7 (Core)
CPU: Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
Pod Resource:

kind: Deployment
metadata:
  labels:
    k8s-app: vcuda-test
    qcloud-app: vcuda-test
  name: vcuda-test
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: vcuda-test
  template:
    metadata:
      labels:
        k8s-app: vcuda-test
        qcloud-app: vcuda-test
    spec:
      containers:
      - command:
        - sleep
        - 360000s
        env:
        - name: PATH
          value: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
        image: <internal-repository>/tensorflow-gputest:0.2
        imagePullPolicy: IfNotPresent
        name: tensorflow-test
        resources:
          limits:
            cpu: "4"
            memory: 8Gi
            tencent.com/vcuda-core: "200"
            tencent.com/vcuda-memory: "30"
          requests:
            cpu: "4"
            memory: 8Gi
            tencent.com/vcuda-core: "200"
            tencent.com/vcuda-memory: "30"

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The initialization time is too long during mnist test #170

The initialization time is too long during mnist test #170

Natelu commented Nov 23, 2022

The initialization time is too long during mnist test #170

The initialization time is too long during mnist test #170

Comments

Natelu commented Nov 23, 2022

Initializing from Creating TensorFlow device to task running in my training session of mnist takes too much time(about 5mins to ready)

Base environment

System ENV

Initializing from `Creating TensorFlow device` to task running in my training session of mnist takes too much time(about 5mins to ready)