You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.
This is my config:
experimentName: example_mnist
trialConcurrency: 1
maxTrialNumber: 1000
searchSpaceFile: search_space.json
experimentWorkingDirectory: ./nni_files/exps
trialCommand: python run_trainer.py
trialCodeDirectory: .
trialGpuNumber: 1
tuner:
name: TPE
classArgs:
optimize_mode: maximize
trainingService:
platform: local
useActiveGpu: true
Before modifying my code, I could be certain that my original model was running on the GPU, but when I wanted to use NNI for automatic parameter tuning, I found that my GPU couldn't be used.
This is a partial log.
[2023-10-17 20:08:43] INFO (main) Start NNI manager
[2023-10-17 20:08:43] INFO (RestServer) Starting REST server at port 8080, URL prefix: "/"
[2023-10-17 20:08:43] INFO (RestServer) REST server started.
[2023-10-17 20:08:43] INFO (NNIDataStore) Datastore initialization done
[2023-10-17 20:08:43] INFO (NNIManager) Starting experiment: vns84rok
[2023-10-17 20:08:43] INFO (NNIManager) Setup training service...
[2023-10-17 20:08:43] INFO (NNIManager) Setup tuner...
[2023-10-17 20:08:43] INFO (NNIManager) Change NNIManager status from: INITIALIZED to: RUNNING
[2023-10-17 20:08:44] INFO (NNIManager) Add event listeners
[2023-10-17 20:08:44] INFO (LocalV3.local) Start
[2023-10-17 20:08:44] INFO (NNIManager) NNIManager received command from dispatcher: ID,
[2023-10-17 20:08:44] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 0, "parameter_source": "algorithm", "parameters": {"weight_dn4": 0.8581659030726451, "weight_simsiam": 0.07937784001819415}, "parameter_index": 0}
[2023-10-17 20:08:44] INFO (NNIManager) submitTrialJob: form: {
sequenceId: 0,
hyperParameters: {
value: '{"parameter_id": 0, "parameter_source": "algorithm", "parameters": {"weight_dn4": 0.8581659030726451, "weight_simsiam": 0.07937784001819415}, "parameter_index": 0}',
index: 0
}, placementConstraint: { type: 'None', gpus: [] }
}
[2023-10-17 20:08:44] INFO (GpuInfoCollector) Forced update: {
gpuNumber: 1,
driverVersion: '528.49',
cudaVersion: 12000,
gpus: [
{
index: 0,
model: 'NVIDIA GeForce RTX 3090',
cudaCores: 10496,
gpuMemory: 25769803776,
freeGpuMemory: 24810237952,
gpuCoreUtilization: 0.13,
gpuMemoryUtilization: 0.15
}
],
I have read many posts and still cannot solve my problem. T-T
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
This is my config:
experimentName: example_mnist
trialConcurrency: 1
maxTrialNumber: 1000
searchSpaceFile: search_space.json
experimentWorkingDirectory: ./nni_files/exps
trialCommand: python run_trainer.py
trialCodeDirectory: .
trialGpuNumber: 1
tuner:
name: TPE
classArgs:
optimize_mode: maximize
trainingService:
platform: local
useActiveGpu: true
Before modifying my code, I could be certain that my original model was running on the GPU, but when I wanted to use NNI for automatic parameter tuning, I found that my GPU couldn't be used.
This is a partial log.
[2023-10-17 20:08:43] INFO (main) Start NNI manager
[2023-10-17 20:08:43] INFO (RestServer) Starting REST server at port 8080, URL prefix: "/"
[2023-10-17 20:08:43] INFO (RestServer) REST server started.
[2023-10-17 20:08:43] INFO (NNIDataStore) Datastore initialization done
[2023-10-17 20:08:43] INFO (NNIManager) Starting experiment: vns84rok
[2023-10-17 20:08:43] INFO (NNIManager) Setup training service...
[2023-10-17 20:08:43] INFO (NNIManager) Setup tuner...
[2023-10-17 20:08:43] INFO (NNIManager) Change NNIManager status from: INITIALIZED to: RUNNING
[2023-10-17 20:08:44] INFO (NNIManager) Add event listeners
[2023-10-17 20:08:44] INFO (LocalV3.local) Start
[2023-10-17 20:08:44] INFO (NNIManager) NNIManager received command from dispatcher: ID,
[2023-10-17 20:08:44] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 0, "parameter_source": "algorithm", "parameters": {"weight_dn4": 0.8581659030726451, "weight_simsiam": 0.07937784001819415}, "parameter_index": 0}
[2023-10-17 20:08:44] INFO (NNIManager) submitTrialJob: form: {
sequenceId: 0,
hyperParameters: {
value: '{"parameter_id": 0, "parameter_source": "algorithm", "parameters": {"weight_dn4": 0.8581659030726451, "weight_simsiam": 0.07937784001819415}, "parameter_index": 0}',
index: 0
},
placementConstraint: { type: 'None', gpus: [] }
}
[2023-10-17 20:08:44] INFO (GpuInfoCollector) Forced update: {
gpuNumber: 1,
driverVersion: '528.49',
cudaVersion: 12000,
gpus: [
{
index: 0,
model: 'NVIDIA GeForce RTX 3090',
cudaCores: 10496,
gpuMemory: 25769803776,
freeGpuMemory: 24810237952,
gpuCoreUtilization: 0.13,
gpuMemoryUtilization: 0.15
}
],
I have read many posts and still cannot solve my problem. T-T
Beta Was this translation helpful? Give feedback.
All reactions