-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests : add integration test for lora adapters #8957
tests : add integration test for lora adapters #8957
Conversation
9b4f2ff
to
35d04e7
Compare
@ngxson I've done it for Llama-3 and Gemma-2.
In the meantime, I was planning to do Phi-3 now. Which other architectures are popular in llama.cpp to add a couple more? |
|
||
echo "Running llama-cli with exported lora for $model_name with size $size_matrix..." | ||
OUTPUT_LORA_MERGED=$(llama-cli -m $MODELS_REPO/$model_name/size=$size_matrix/base/Base-F32-lora-merged.gguf \ | ||
-p "I see a little silhouetto" -n 50 --seed 42) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're missing --temp 0
for consistent result.
@ngxson Phi-3 is also working fine, and I added assertions in the .sh test file. I think we can finish off this PR by:
For Gemma-2 I'll try figure out where the bug is this week (as it could very well be in my code to reduce the size of the model) and add it as a todo in the issue #8708. |
9aa01b7
to
56a8992
Compare
5ebdefc
to
c19e4fa
Compare
@ngxson @mofosyne I fixed and added the Gemma2 model to the tests (the bug was in my code from the fact that Gemma embedding weights equal the lm_head weights , which I think can cause some issues when converting adapters, I'll open an issue about it this weekend) Anyway, to complete this PR I still need to:
|
c19e4fa
to
32ed624
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved the PR so that the CI can run when you push new commits.
I'll add you to https://huggingface.co/ggml-org - let me know your HF username
@ggerganov Thanks! My HF Username is ltoniazzi |
@ggerganov Should I create a new model repo in ggml-org called I'm not sure what is a good naming, as also I want to make sure people do not think it's related to reducing size by quantization (basically these "reduced" models keep the original architecture the same apart from reducing the |
Yes, make a new repo. Can call it |
I'm not sure if you will need special permissions to create a new github workflow. The code for CI can be based on But I think I can do this later on. I'll need to see what's the condition to trigger the workflow, because I think it's not useful to run the workflow on all commits.
I see you now have access to ggml-org. Ideally I think each model should be in a dedicated repo (since you're also uploading safetensors files, not just GGUF). The model can be prefixed, for example For adapters, you can also upload the PEFT safetensors files, one repo per adapter, with prefix |
32ed624
to
163f43f
Compare
@ngxson Thanks, but before refactoring the repo can we make the current test run successfully? I added the test to the Make/CMake files so it runs locally with |
@ngxson I am uploading only safetensors, as the test includes conversion to gguf of both base and adapter. |
|
@ngxson If you point me to a base workflow that I can mimic I can have a go at it. Or do you prefer to merge the new test script alone and create the new workflow separately? |
I think it's ok to merge this PR as-is, please tell me when the models are uploaded to huggingface so I can run the test locally. If you still want to try out the workflow:
Server workflow already have python and cmake, you can take it as the template. |
163f43f
to
9a8f050
Compare
@ngxson They are uploaded, you can run the script locally ./tests/test-lora-conversion-inference.sh --verbose Ok let's merge this, then I'll have a go at the workflow in a separate PR and we can discuss the model's folder structure in the meantime. |
I made some minor changes in this commit: d6f7b8f I ran it locally and confirmed that it works. So it's good to merge now. Thanks for taking your time on this. |
* Add printing to check weights match torch version * minor code style changes --------- Co-authored-by: Xuan Son Nguyen <[email protected]>
* Add printing to check weights match torch version * minor code style changes --------- Co-authored-by: Xuan Son Nguyen <[email protected]>
Context
Partially solving Issue #8708, by adding a test that:
safetensors
(where each architecture is overfitted to some text ) along with a corresponding adapter (overfitted to another text)..gguf
.llama-cli
(--lora
)TODOs
make models smaller (size 32?, now 64-> ~50/70MB). (Getting input size issues in gguf with llama at 32)Figure out how sh files works in the actions. The script needsIn a separate PR, followllama-cli
, python env and git lfs.github/workflows/server.yml
Current output
./tests/test-lora-conversion-inference.sh --verbose
:PR checks