Skip to content

Pull requests: huggingface/nanotron

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Fix wrong initialization of lr scheduler
#256 opened Nov 29, 2024 by kylematoba Loading…
[NEW] Llama3.2 weight converters 🦙
#255 opened Nov 28, 2024 by TJ-Solergibert Loading…
6 tasks
Fix initial_lr when resuming training
#243 opened Nov 17, 2024 by Lauler Loading…
Load random states from checkpoint
#238 opened Nov 2, 2024 by gritukan Loading…
lighteval support after checkpoint, UX refactor
#222 opened Aug 24, 2024 by eliebak Loading…
Refactor pre tokenization tool
#219 opened Aug 21, 2024 by eliebak Loading…
Created interconnect benchmark before the training
#200 opened Jun 22, 2024 by RamenBuddha Loading…
Move MoE Implementation into src/, add Load Balancing Losses
#192 opened Jun 6, 2024 by haeggee Loading…
1 task done
[Feature] Monitor model states during training
#183 opened May 25, 2024 by xrsrke Loading…
Fix overflow in nanosets with big datasets
#182 opened May 23, 2024 by jquesnelle Loading…
Ring attention
#181 opened May 23, 2024 by zzhhjjj Loading…
Llama3 conversion scripts 🦙
#174 opened May 20, 2024 by TJ-Solergibert Loading…
9 tasks done
[Feature] Mixture of Depths
#171 opened May 15, 2024 by xrsrke Draft
[Feature] Infini Attention
#169 opened May 14, 2024 by xrsrke Loading…
Core attention
#168 opened May 13, 2024 by zzhhjjj Loading…
llama tests
#157 opened Apr 30, 2024 by zzhhjjj Loading…
Fix TestContext warning
#156 opened Apr 29, 2024 by AleHD Loading…
Checkpoint 1.3 backwards compatibility
#152 opened Apr 25, 2024 by AleHD Loading…
3 tasks done
Use CUDA Events for measuring elapsed time
#143 opened Apr 20, 2024 by staghado Loading…
ProTip! Type g i on any issue or pull request to go back to the issue listing page.