-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/sort FSRSItem by length to speed up training #32
Conversation
Is the change I pushed what you had in mind? For a given review length, the items should now appear in a random order. |
It's effect is nuanced to the training. Because the order of item in the same batch is not important. I plan to implement a shuffle in batch-level. https://github.com/burn-rs/burn/blob/main/burn-core/src/data/dataloader/strategy.rs |
So you want a batch of size 3, then a batch of size 10, then a batch of size 2, etc? |
I want a batch with seq_len = 3, then a batch with seq_len = 10, then a batch with seq_len = 2. |
0a1f15a
to
d99d925
Compare
Ok, how about now? A test is currently failing due to #31, as I rebased this PR on the main branch. |
I have seen your code. But I think we should write a new dataloader rather than manipulate the dataset before build the dataloader. I'm coding now. |
The shuffle only applies once in your implementation (if I'm wrong, please feel free to correct me). Ideally, the shuffle should apply before each epoch of training. |
Ok, I see. Hopefully the code will be useful as a reference anyway. :-) |
@dae, I have a weird observation during my coding. The results of training always vary over time, even when I remove shuffle. Then I figure it out. It's caused by Then, I implement |
Wait for tracel-ai/burn#703 |
In most languages, HashMaps don't retain the order items are put into them - Python's dictionaries are a bit special in that regard. I think we can do this a bit more efficiently by sorting on the SQL end and then using .group_by() - I've pushed an update and some other tidyups. |
Nice work on the batch shuffling! And thanks @nathanielsimard for the upstream fix. |
Would you like me to rebase this or would you like to? |
Would you mind doing it? Thanks. I'm out right now. |
This allows easily running all tests in one file at once, and allows tests to share code that is not used in production. Also removed the duplicate test_next_stability/difficulty tests.
Co-authored-by: Asuka Minato <[email protected]>
Co-authored-by: Asuka Minato <[email protected]>
f6e2c90
to
d6c71f6
Compare
If you guys are happy with the current state of this, I'd suggest we merge it in, even though the PR mentioned above hasn't landed yet. |
Should we update the dependencies of burn in Cargo.toml before merging? |
Ignore my last comment. I confused this RR as another PR. |
Note: this code remove shuffle of items. It would break stochastic gradient descent. I wander whether it is possible to shuffle at the level of batchs rather than items.