Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add epochs to levanter #768

Merged
merged 38 commits into from
Nov 7, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
234a945
wip epochs
ahmeda14960 Oct 16, 2024
f0b1eaa
fix
ahmeda14960 Oct 16, 2024
020a1b2
add epoch flag, sanity check tulu one epoch
ahmeda14960 Oct 16, 2024
50500b9
epochs work
ahmeda14960 Oct 16, 2024
49afb5d
minor fix
ahmeda14960 Oct 16, 2024
c2ed3ee
fix ci
ahmeda14960 Oct 16, 2024
667a5a3
fix ci
ahmeda14960 Oct 16, 2024
37e77fb
fix config file
ahmeda14960 Oct 17, 2024
7c195ba
add suggested fix from david
ahmeda14960 Oct 18, 2024
e71ed16
Merge remote-tracking branch 'origin/main' into sft
ahmeda14960 Oct 23, 2024
54a6007
restore toml
ahmeda14960 Oct 23, 2024
e2646d6
Update src/levanter/callbacks.py
ahmeda14960 Oct 23, 2024
fd18cae
refactor
ahmeda14960 Oct 23, 2024
1706803
add suggested fix from david
ahmeda14960 Oct 23, 2024
f0ca163
update for v4 so we don't crash
ahmeda14960 Oct 23, 2024
c971ebf
remove changes that break epochs
ahmeda14960 Oct 23, 2024
4733f3b
final fixes
ahmeda14960 Oct 23, 2024
e82eec2
final fixes
ahmeda14960 Oct 24, 2024
08fd427
substatial changes to save on epochs w callback
ahmeda14960 Oct 24, 2024
18a5352
epoch tracking still broken
ahmeda14960 Oct 24, 2024
f1ef2c7
Merge remote-tracking branch 'origin/main' into sft
ahmeda14960 Oct 25, 2024
c38b076
WIP
ahmeda14960 Oct 25, 2024
7331774
update epochs to save latest checkpoints
ahmeda14960 Oct 28, 2024
aa47d4e
Update src/levanter/checkpoint.py
ahmeda14960 Oct 28, 2024
0148cd0
update tulu config to match olmo sft
ahmeda14960 Oct 28, 2024
a7459e0
Merge remote-tracking branch 'origin/sft' into sft
ahmeda14960 Oct 28, 2024
dde75ac
Merge remote-tracking branch 'origin/main' into sft
ahmeda14960 Oct 28, 2024
5343096
pre commit
ahmeda14960 Oct 28, 2024
fd39828
fix sft bug caused by exemplar
ahmeda14960 Oct 29, 2024
313a3f4
add actual sft file
ahmeda14960 Oct 29, 2024
b3718c1
precommit
ahmeda14960 Oct 29, 2024
5f36eb8
sft working w levanter chkpt
ahmeda14960 Oct 29, 2024
f5533d6
add back option for hf models on sft
ahmeda14960 Oct 29, 2024
91fc5df
WIP for david
ahmeda14960 Oct 30, 2024
ba682ca
debug epochs
ahmeda14960 Oct 31, 2024
812accb
load data from marin sources
ahmeda14960 Nov 6, 2024
2d7170c
merge main
ahmeda14960 Nov 6, 2024
caf0a38
merge main
ahmeda14960 Nov 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
epoch tracking still broken
  • Loading branch information
ahmeda14960 committed Oct 24, 2024
commit 18a535299275477039bd24e08df7f8d495304ef7
2 changes: 1 addition & 1 deletion src/levanter/callbacks.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ def log_length():
import asyncio

async def compute_length():
length = await ds.async_len()
length = await ds.dataset.async_len()
return length

# Run the async function synchronously in this thread
Expand Down
2 changes: 1 addition & 1 deletion src/levanter/checkpoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,7 @@ def __call__(self, step_info):
# Use existing checkpointer's save_checkpoint method
self.checkpointer.save_checkpoint(
step_info,
f"epoch-{current_epoch}"
f"epoch-{current_epoch}",
)
self._last_saved_epoch = current_epoch

Expand Down
Loading