-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create interface for accessing the training code #12
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed the code part, just pyproject/requirements/namespace package portions. That all looks good, though I gave some suggested edits for some minor cleanup. That includes dropping Python 3.9 since I think we landed on 3.10 as the minimum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just focused on the Python packaging stuff, will leave the training code itself to y'all for now 😄
a0a812a
to
4296d0d
Compare
a72396a
to
493e755
Compare
Note: Current requirements will overwrite existing nvidia pytorch installs. Need to ensure that if those exist, we are not installing our own torch. |
999d112
to
e1fbe54
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two thoughts, one being a minor nit-pick, but in general this is very good, I'll check that it runs.
Thank you for the review @JamesKunstle, I've created an issue about your quantization comment here: #29 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initial comments. Want to make sure this is aligned with CLI impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you squash the commit history here before merging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v good thank you for all your work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
biggestion question here is about the use of class variables.
In terms of PR structure -- it would have been nice to have the PR that's doing the library setup setup from all the API introduction stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, tested full training (no lora/offload) with both base scripts and updated interface, both functioning as expected
d07fea4
to
1226f00
Compare
just to be super clear:
|
Signed-off-by: Oleg S <[email protected]> turn the training code into a library that can be invoked by another package Signed-off-by: Oleg S <[email protected]> update interface Signed-off-by: Oleg S <[email protected]> rename ilab_train -> train Signed-off-by: Oleg S <[email protected]> remove setup.py in favor of pyproject.toml Signed-off-by: Oleg S <[email protected]>
Signed-off-by: Oleg S <[email protected]>
1226f00
to
da65a32
Compare
We talked about this, so I know your intent was to create this issue, but I can't find it anywhere. I went ahead and filed this one: #34 |
This PR turns this into a library.
View the following PR for an example: instructlab/instructlab#1329
The main idea is that we provide a function like:
Each class
TorchrunArguments
andFullTrainingArguments
provides training-specific arguments. And then from another library, you would simply provide these arguments to thetrain_torchrun
function.For any other training method that we define, we could provide a similar interface depending on which arguments are needed.
We separate the arguments here because
TorchrunArguments
are the ones passed totorchrun
and then the full training arguments are the ones that we actually train with. It's not crucial that these are different, but it makes our lives a lot easier from a maintenance standpoint.