Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add leaderboard tasks tests #8

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

KonradSzafer
Copy link

No description provided.

@KonradSzafer KonradSzafer force-pushed the add-leaderboard-tasks-tests branch from d6c0a94 to 085f1d0 Compare July 23, 2024 09:28

for key, reference_val in reference.items():
if isinstance(reference_val, Dict):
if recursive:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which cases would we not want to test the subcontents?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used it in e.g. the first test function to check only the lowest level of the config. I am now trying to structure configs so that this is no longer used

Comment on lines +304 to +312
subtasks_configs = (
filter_dict(task_configs.to_dict(), task_name)
if is_multitask
else {
task_name: {
k: v for k, v in task_configs.to_dict().items() if k != "limit"
}
}
)
Copy link
Member

@clefourrier clefourrier Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ternary expression is a bit long

if is_multitask:
    substasks_config = {k: v for k, v in task_configs.to_dict.items() if task_name in k}
else:
    substasks_config = {task_name: {k: v for k, v in task_configs.to_dict().items() if k != "limit"}}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is typically a function which could use more comments inside.

Why are you not removing limit in the first case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In multitask scenario, the limit is defined twice in the config: once as an argument for evaluation (which is removed to avoid mismatch), and once at a different level for output checking. In the first case, it can be used for both



class ConfigParser:
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it would be good to be homogeneous, and either use a ConfigParser everywhere, or to use nested dicts everywhere - (and I'm in favor of the latter as it's less likely to introduce issues)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by everywehre you mean for both expected results and generated results ?

for task_name, expected_n_samples in config.n_samples.items():
results = all_results[task_name]
observed_n_samples = (
results["n-samples"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we choose the task name if the len is > 1 ?

- Evaluation settings
- Expected results for the given configuration
"""
return request.param
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_all_configs returns a list, doesn't this function return a list as well ? Or does pytest automatically yields each element of the list ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants