Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eval can be by iteration number, not epoch #1

Open
christ1ne opened this issue Oct 19, 2018 · 6 comments
Open

eval can be by iteration number, not epoch #1

christ1ne opened this issue Oct 19, 2018 · 6 comments

Comments

@christ1ne
Copy link

SSD evaluates on iterations, instead of epochs.

$ python mlp_compliance.py run.log.7
FAIL: Check has_eval_epoch failed on
:::MLPv0.5.0 ssd 1539850795.942373991 (train.py:134) eval_accuracy: {"iteration": 120000, "value": 0.1751619840501347}
FAILED: compliance errors.
$ grep -r has_eval_epoch .
./configs/v0.5.0_common.yaml: NAME: has_eval_epoch
./configs/v0.5.0_level1.yaml: NAME: has_eval_epoch

@bitfort
Copy link
Owner

bitfort commented Oct 19, 2018

I'll take a look at this and what it should print.

@christ1ne
Copy link
Author

Please check https://github.com/mlperf/training/blob/ssd_logging_v2/single_stage_detector/ssd/train.py

The lr is adjusted at specific iteration numbers so there won’t be same number of lr prints as epoch prints.

@bitfort
Copy link
Owner

bitfort commented Oct 19, 2018

To clarify, is "iteration" counting batches or examples?

@bitfort
Copy link
Owner

bitfort commented Oct 19, 2018

I've been talking with some engineers. We're curious if we could round to the epoch number for this print, and then we print the exact iteration # in a separate tag? Do you think this could work?

@christ1ne
Copy link
Author

christ1ne commented Oct 22, 2018

@bitfort the following also failed since the evaluation does not start at epoch 0.
FAIL: Check each_eval_accuracy_has_0th_epoch failed.
FAIL: Check each_eval_start_has_0th_epoch failed.
FAIL: Check each_eval_stop_has_0th_epoch failed.

What's your suggestion on this?

@bitfort
Copy link
Owner

bitfort commented Oct 22, 2018

I added a thing to ignore SSD and resnet;

CODE: "v['epoch'] == 0 and ll.benchmark not in ['resnet', 'ssd']"

Can you try again on the newest version of this repo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants