-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add leaderboard tasks tests #8
base: main
Are you sure you want to change the base?
Conversation
d6c0a94
to
085f1d0
Compare
|
||
for key, reference_val in reference.items(): | ||
if isinstance(reference_val, Dict): | ||
if recursive: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which cases would we not want to test the subcontents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used it in e.g. the first test function to check only the lowest level of the config. I am now trying to structure configs so that this is no longer used
subtasks_configs = ( | ||
filter_dict(task_configs.to_dict(), task_name) | ||
if is_multitask | ||
else { | ||
task_name: { | ||
k: v for k, v in task_configs.to_dict().items() if k != "limit" | ||
} | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ternary expression is a bit long
if is_multitask:
substasks_config = {k: v for k, v in task_configs.to_dict.items() if task_name in k}
else:
substasks_config = {task_name: {k: v for k, v in task_configs.to_dict().items() if k != "limit"}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is typically a function which could use more comments inside.
Why are you not removing limit in the first case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In multitask scenario, the limit is defined twice in the config: once as an argument for evaluation (which is removed to avoid mismatch), and once at a different level for output checking. In the first case, it can be used for both
|
||
|
||
class ConfigParser: | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it would be good to be homogeneous, and either use a ConfigParser everywhere, or to use nested dicts everywhere - (and I'm in favor of the latter as it's less likely to introduce issues)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by everywehre you mean for both expected results and generated results ?
for task_name, expected_n_samples in config.n_samples.items(): | ||
results = all_results[task_name] | ||
observed_n_samples = ( | ||
results["n-samples"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we choose the task name if the len is > 1 ?
- Evaluation settings | ||
- Expected results for the given configuration | ||
""" | ||
return request.param |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
load_all_configs
returns a list, doesn't this function return a list as well ? Or does pytest automatically yields each element of the list ?
No description provided.