Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New submission format :( #6

Open
KyleKaminky opened this issue Mar 15, 2024 · 2 comments
Open

New submission format :( #6

KyleKaminky opened this issue Mar 15, 2024 · 2 comments

Comments

@KyleKaminky
Copy link

The kaggle competition has changed their submission format, I'll try to start working on a PR for this

@cshaley
Copy link
Owner

cshaley commented Mar 19, 2024

Wow, just got around to reviewing the change. Complete change of submission format.

The file should contain a header and have the following format:

RowId,Tournament,Bracket,Slot,Team
1,M,1,R1W1,W01
2,M,1,R1W8,W08
3,M,1,R1W5,W05
...
Here, the RowId column is a dummy index required by the metric; it should be a simple enumeration of the rows. The Tournament column indicates either the Men's (M) tournament or the Women's (W) tournament. The Bracket column enumerates the brackets in each tournament, starting from 1; you should use a unique enumeration for each tournament. The Team column should contain the team you predict to win in that respective Slot.

So updated design of this should:

  1. Allow user to specify for which bracket prediction (e.g. Men's, Fixes the no '%' in slot[134] issue. #1) to generate a bracket.
  2. Allow user to generate multiple brackets - or all in a submission.
  3. It'd be nice to implement the Brier probability scoring function to generate a bracket based on mean prediction of a submission.

New unit testing may require some new fake data (Kaggle owns this dataset).

Need to update dependencies and build process. Way out of date.

@cshaley
Copy link
Owner

cshaley commented Mar 19, 2024

Pseudo code

Two externally accessible functions:
Build bracket
Build brackets

Build brackets just calls build bracket multiple times depending on args, passing only relevant data to build bracket.
Load submission file
Filter to only requested rows, iterate if multiple submissions.
Output dir param

Build bracket
Output file param
Bracket choices come from submission df (param)
Map from slot data to team name for image (test with slot data in image first?)
Remove probability comparison
Determine winner based on slot data instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants