- Interclass distance fluctuates near a constant because of using weight norm. The value of this constant depends on the batch size.
- When finetuning Hyperparameter
alpha_start_value
, be careful about A-softmax loss exploding. Too big value ofalpha_start_value
causes loss exploding and too small reduces the ability of MHE regularization. - Value of
alpha_start_value
: Batch size 256 needs 1~10, batch size 128 needs 0.1~2 empirically. (Not sure) - Because of the trade-off between A-softmax-loss and MHE regularization, A-softmax loss of SphereFace+ with MHE is a little larger than SphereFace. Don't worry, it is reasonable.
- When A-softmax loss explodes,kill the training and restart it. (You may need to finetune the
alpha_start_value
.) - Higher value of SphereFace parameter(like m=4), lower stability of training and lower gain of testing. Trying m=1 / m=2, you will find the training process is more stable. Results can be seen in README.md.
- We highly recommend a noise-controlled dataset, ECCV2018-IMDb-Face. Interested users can try to train SphereFace+ / SphereFace on their IMDb-Face dataset.
- (To be supplemented... :)