-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor & standardize evaluation with Evaluator
#287
Conversation
@alexhernandezgarcia what was the reason for storing metrics as attributes to the |
No important reason. I think Nikita started it that way at some point and we just never followed it up. Like I said, we should not feel restricted by the way things are currently done. |
Controversial change @alexhernandezgarcia @michalkoziarski @carriepl @josephdviviano @AlexandraVolokhova thoughts? If you agree this should be used in the |
I am not against using a method to hide all the complexity in the tests or in the evaluation script. However, I think it is good that Also, as a separate comments: I would try if possible to not include too many changes in a single PR - especially if they are out of the scope of the PR. In other words, I would try to spin off changes like this one in a separate PR. It's just to make the review process a tad easier. |
Sure I'll move that to another PR to merge after this one then |
@alexhernandezgarcia I'm following those steps https://stackoverflow.com/a/30893291/3867406 -> have you pulled or fetched from THIS branch ( If I edit the commit history on MY machine then FORCE push to Github but you have a local version of |
You can go ahead without impact on my work / local copies. |
I can't see what this controversial change is |
Great, that means I reverted the commit appropriately :p I think we should not have multiple places in the code that instantiate a |
Without total context I agree with @vict0rsch - this sounds like something that should be one and done. |
for name, metric in results["metrics"].items(): | ||
print(f"{name:20}: {metric:.4f}") | ||
|
||
data = results.get("data", {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: understand why the argument {}
is needed.
Co-authored-by: Alex <[email protected]>
Co-authored-by: Alex <[email protected]>
Co-authored-by: Alex <[email protected]>
@@ -51,21 +52,21 @@ policy: | |||
shared_weights: False | |||
checkpoint: backward | |||
|
|||
# Evaluator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format for these evaluator arguments is different than in the icml23/ctorus.yaml config file. Is that a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure what this comment refers to exactly, but I would just say that the icml23/ctorus.yaml file is really old (January 2023) so it would be fine to deprecate it / adapt it if needed. Yes, it contains the experiments of a paper, but I believe it's ok to adapt it to the new state of the repo.
|
||
|
||
# def setup(sphinx): | ||
# sphinx.connect("autoapi-skip-member", skip_util_classes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what this part is meant to do. Is that something outdates that should be removed from the PR? Or is this a work in progress that should be finished and then uncommented?
Alright, at this point :
|
Thanks Pierre Luc!
Could this be done in a new PR or should it be done before merging? |
Interesting... the CI is currently failing because of a test that was passing before my last commit which is only changing a commit. I guess that this is a test that fails very infrequently. At this point, I don't think it's related to this PR but I could be wrong. |
Don't worry. I am pretty sure this is related to tests of the Batch class that in this still branch use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to make a few additional changes after checking the sanity runs (Tetris topK figures were missing). The solution is a bit of a quick fix.
I have realised that a bunch of things will need more work, but this is a great step forwards since the Evaluator will give us the flexibility to extend the evaluation without having to do ugly additions to the former test() function of the GFN.
I have added a couple of quick issues about things that are needed as a reminder.
Great work everyone!!! I will merge.
GFlowNetAgent
eval()
logic to be semantically distinct from (unit, integration etc.)-testslogger
vsevaluator
rolesCheck out tutorial and docs @ https://gflownet.readthedocs.io/en/evaluator
Questions / Need help
BaseEvaluator.compute_density_metrics
.eval()
in Scenario 1test__gflownet_minimal_runs
? (for instancegflownet_for_tests
inconftest.py
or at leastcommon.py:gflownet_from_config()
Scenario 1
Scenario 2
$ python main.py user=$USER +experiments=icml23/ctorus device=cpu logger.do.online=False evaluator.checkpoints_period=20