-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support mixed precision training @open sesame 03/08 07:57 #2455
Conversation
📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2455. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/. |
cibot: @jihochu, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2455-202402021547410.65952706336975-af2e8829e8e0ac70333370e438e9b7b37bc604f2/. |
cibot: @jihochu, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2455-202402021637130.47981810569763-5ce56ff64b70e29561125de65169bff8ee06a41d/. |
It checks derivative validation after backwarding, and apply gradient if derivative validation success. Signed-off-by: Jiho Chu <[email protected]>
clone method with tensor type is added for creating tensor with differenct datatype. And, some convenient methods for loss scale is added. Signed-off-by: Jiho Chu <[email protected]>
It adds tests for conv2d fp16 test. Signed-off-by: Jiho Chu <[email protected]>
It fixes doygen comments from clang format checker. Signed-off-by: Jiho Chu <[email protected]>
It installs loss_layer header file for custom loss layer. Signed-off-by: Jiho Chu <[email protected]>
It is assumed that activations and weight are fully compotaible, so it's unnecessary to be converted to. input layer and loss layres are different, cause input data and label data is assumed to be always float 32 type now. Signed-off-by: Jiho Chu <[email protected]>
It may get an invalid value for both internal tensor or gradient. This patch checks the validation of the data, and fix for it. Also, sscal api is replace with scopy for setZero, because it produces the invalid value if invalid input value is used. Signed-off-by: Jiho Chu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jihochu, 💯 All CI checkers are successfully verified. Thanks.
Recommentation: Keep a PR with every related commits as a test basis and mark it "Do Not Merge" or "Draft PR" |
if (num_w_opt_m > 0) | ||
run_context->getWeightOptMasterVar(i, j).read(file); | ||
else | ||
run_context->getWeightOptVar(i, j).read(file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be reversed. The base model data need to be saved in FP16. Not the FP32. We could read FP16 and save it to Master Wegith.
weight_tensor_type); | ||
TensorDim hidden_state_dim(batch_size, 1, max_timestep, unit, | ||
weight_tensor_type); | ||
hidden_state_dim.setDataType(context.getActivationDataType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be just TensorDim hidden_state_dim(batch_size, 1, max_timestep, unit, context.getActivationDataType())
.
void Manager::deallocateWeights() { weight_pool.deallocate(); } | ||
void Manager::deallocateWeights() { | ||
weight_pool.deallocate(); | ||
weight_master_pool.deallocate(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need a separate pool for the weight master.
dim_a.setDataType(act_type); | ||
var = weight_pool.requestOrExtend(shared_name, dim_a, var_exec_order, | ||
var_ls, t_initializer); | ||
var_m = weight_master_pool.requestOrExtend( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think tensor pool can manage if we just request to weight_pool.
dim_a.setDataType(act_type); | ||
var = weight_pool.request(name, dim_a, var_exec_order, var_ls, | ||
t_initializer); | ||
var_m = weight_master_pool.request(name, dim, var_exec_order, var_ls, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The execution order of var_m should be applyGradient_order
only.
@@ -353,10 +363,15 @@ sharedConstTensors NetworkGraph::forwarding( | |||
bool training, | |||
std::function<void(std::shared_ptr<LayerNode>, bool)> forwarding_op, | |||
std::function<bool(void *userdata)> stop_cb, void *userdata) { | |||
|
|||
for (auto w : clip_weights) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we have to enable gradient clip property also true to use mixed precision training. I guess, this PR doesn't consider the case which enabled mixed + gradient clip.
This PR is to update the mixed precision layer. - integrate nnstreamer#2568 & nnstreamer#2455 - will update more test **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>
This PR is to update the mixed precision layer. - integrate nnstreamer#2568 & nnstreamer#2455 - will update more test **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>
This PR is to update the mixed precision layer. - integrate nnstreamer#2568 & nnstreamer#2455 - will update more test **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>
This PR is to update the mixed precision layer. - integrate nnstreamer#2568 & nnstreamer#2455 - will update more test **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>
This PR is to update the mixed precision layer. - integrate nnstreamer#2568 & nnstreamer#2455 - will update more test **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>
This PR is to update the mixed precision layer. - integrate nnstreamer#2568 & nnstreamer#2455 - will update more test **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>
This PR is to update the mixed precision layer. - integrate nnstreamer#2568 & nnstreamer#2455 - will update more test **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>
This PR is to update the mixed precision layer. - integrate nnstreamer#2568 & nnstreamer#2455 - will update more test **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>
closed by #2663 |
It adds loss scale factor for removing invalid data while training.
The factor is dynamically calculated while gradient clipping step, And it initially disabled until
loss scale proeprty is set.
fc/pooling/conv2d/softmax layers are modified for loss scale and mixed tensor type.
Signed-off-by: Jiho Chu [email protected]