-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cvector-generator
example
#7514
Add cvector-generator
example
#7514
Conversation
Could you add a quick usage summary - do you just run Also, tried implementing PCA using the |
Hi @christianazinn and thanks for your response. We'll move the discussion to here. Quick explanation: my code has been able to take a pair of positive + negative prompt, calculate embeddings for each layer, and then substract to get the diff. In the end, for each layer, we have one matrix with shape The way to use it: It is not urgent so take your time. And feel free to let me know if you have other questions. Thank you. |
Looking into PCA implementation and I realize we have the problem that we're not actually getting square matrices from However, it appears the matrices we receive are usually tall and skinny. SciPy's original implementation indicates that in this case, the problem is best handled by SVD with the covariance matrix. We may care to implement this after everything else works. I also don't have push permissions to this branch so whatever changes I make, I'll fork the branch and PR into it. |
@christianazinn Thanks for the explanation. Yes I was also wonder how can we turn the embedding vectors into square matrix. It's all clear for me now. I'll have a look during the weekend. In the meantime, I invited you to my forked repo. You can push directly onto this branch, or you can work on your own PR if you want. Feel free to tag me if you have questions. Thank you ! |
Implements PCA and file writing using mostly standard libraries. The output is recognized as a functional control vector, but outputs gibberish.
Thank you, have pushed an implementation with primitives/stdlib. Currently assumes Mistral architecture for the Currently, however, it outputs gibberish when inferencing: e.g. |
Added basic command-line parameters for outfile and one each positive/negative prompt. Refactored some messy code in PCA computation and GGUF exporting. Left a bunch of comments regarding further work needed.
Notes follow. I have implemented basic command-line arguments for I've left a few comments about what needs to be fixed in my shoddy implementation, and other things we need to deal with, such as the prompt parsing thing mentioned. It appears we do just parse the individual positive/negative prompts - @ngxson confirm? We will likely want to change this to provide a larger sample space; the blogpost and Python implementation provide reference on implementation. However, I am seeing promising results with "funny" vs. "boring". Llama2 Q8_0, prompt (for completion) "Here's a funny joke: ". Llama2 was used because #5970 indicates support has not been implemented for architectures other than Llama, but that is probably outdated. Control vector -1: |
@christianazinn Wow this is awesome. I quickly had a look at the code, looks good to me. I'm try when I get back to home.
I started with single pair of pos-neg for simplification. But yes, eventually we will allow to have multiple pairs of pos - neg. The python implementation does that by calculating mean value of output We can allow the program to take as input 2 file of prompts (one prompt per line), so we have 2 file: neg.txt and pos.txt for example. I can implement this quickly if needed.
Very promising result. Even me (a human) sometimes struggle to control my own funny / boring vector. |
Thank you! Take your time, I will keep testing in the meantime. Other results are varied: a test on happy/sad generates complete gibberish, and another control vector for funny/boring is ineffective.
Just to make sure we are on the same page, because there are two places where multiple pairs might be needed. We will also want to implement multiple sentiment pairs (i.e. happy/sad and funny/boring), but what I referred to was having multiple prompts generated from the same sentiment pair run through the tokenizer as in the second code block here. Currently we appear to just tokenize the term e.g. I think we want to be able to do that preprocessing in C++, so the user inputs the positive/negative sentiments and we create the template, format it, and pass it to I believe the great variance in my results may be due to only having one sample token sequence per sentiment, and therefore high variability in the resulting vectors between runs, hence my concern over this topic. However, more runs of PCA would slow down the already slow stdlib implementation to the point of unusability, so that is left for the GGML implementation. |
Implements an example template set built from the positive/negative prompts like the control vector Python implementation.
It appears the way the Python implementation handles concatenating the matrices from each different prompt callback is by stacking them, so e.g. if each callback returned a 4096x2 matrix then using 1024 test prompts would yield a 4096x2048 matrix. Intuitively because rank AA^T = rank A this allows for more degrees of freedom/less dependency on each individual callback in each layer's overall matrix, and since the result will be 4096x4096 regardless of the other dimension this should not change much with the PCA. Will try to implement this. (Strictly, it vertically stacks, but it doesn't matter since we multiply by transpose anyway.) |
I updated this PR with 2 small changes (feel free to test / adapt it if you want):
|
@christianazinn I'm having a problem is that |
@ngxson I'll take a look, thanks - not sure how I didn't think to check that, would explain why I was getting gibberish on 9/10 tests. My code is very patchwork at the moment, so there's likely to be a lot of these fixes. Thanks for the progress so far. |
Strangely each matrix returned by What's printed to stdout from UPDATE: Am I misunderstanding these lines (I assumed this means we get a 4096x2x1x1 matrix?): UPDATE 2: I had my numbers backward with zero/nonzero. Even more confused now. |
Thinking about it further, this isn't even true. I would still like to know how the dimensions are stored (image above). Is it a flattened matrix of dimensions Frankly, this whole headache could probably be avoided if we just wrote the GGML implementation, but I don't know how. |
fixed it... one liner... ugh |
printf("\n"); | ||
} | ||
|
||
static int ctrlvec_params_parse_ex(int argc, char ** argv, ctrl_params & params) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge ctrl_params
into gpt_params
so that we have a consistent handling of CLI args in all examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't noticed that the gpt_params
has been refactored. It's way easier to work with it now!
I moved ctrl_params
to gpt_params
. Please have a look on 679f513 . Thanks!
This should be fine - just test it. With what you mention below about
that should work much better. I think that's actually what the Python implementation does but I'm not certain. Feel free to try it if you like, or if you think the current outputs are acceptable, we can add that in a later PR. (We should compile a list of future improvements for this.) I'll add my review for the code itself in a moment, and will test the generated control vectors when I get the chance. |
Actually I updated a list the description of this PR. Feel free to let me know if you have other ideas to add.
Nice. Thanks for taking time to develop and to review this PR! |
I am very excited for control vectors and I have been routinely testing this PR. I got it to work yesterday with only a couple issues.
I fixed 1 and 2 in a PR in the fork ngxson#6. 2 is fixed by adding a command line flag to combine all of the prompt lines into one prompt. |
@calvin-laurenson Thanks for testing out Regarding the ability to have multi-line prompt, I prefer to add
The problem with
Edit: CUDA backend does not support GGML_OP_SQRT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't done tests, but I'm sure people will play with this and if there are any issues we can resolve them from master
common/common.cpp
Outdated
options.push_back({ "control-vector" }); | ||
options.push_back({ "cvector", "-o, --output FNAME", "output file (default: '%s')", params.cvector_outfile.c_str() }); | ||
options.push_back({ "cvector", "--positive-file FNAME", "positive prompts file, one prompt per line (default: '%s')", params.cvector_positive_file.c_str() }); | ||
options.push_back({ "cvector", "--negative-file FNAME", "negative prompts file, one prompt per line (default: '%s')", params.cvector_negative_file.c_str() }); | ||
options.push_back({ "cvector", "--completions-file FNAME","completions file (default: '%s')", params.cvector_completions_file.c_str() }); | ||
options.push_back({ "cvector", "--completions N", "number of lines of completions file to use (default: %d)", params.n_completions }); | ||
options.push_back({ "cvector", "--batch-pca N", "batch size used for PCA. Larger batch runs faster, but uses more memory (default: %d)", params.n_pca_batch }); | ||
options.push_back({ "cvector", "--iter-pca N", "number of iterations used for PCA (default: %d)", params.n_pca_iterations }); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whitespace padding should be kept so that the arguments are vertically aligned when the help is printed:
options.push_back({ "control-vector" }); | |
options.push_back({ "cvector", "-o, --output FNAME", "output file (default: '%s')", params.cvector_outfile.c_str() }); | |
options.push_back({ "cvector", "--positive-file FNAME", "positive prompts file, one prompt per line (default: '%s')", params.cvector_positive_file.c_str() }); | |
options.push_back({ "cvector", "--negative-file FNAME", "negative prompts file, one prompt per line (default: '%s')", params.cvector_negative_file.c_str() }); | |
options.push_back({ "cvector", "--completions-file FNAME","completions file (default: '%s')", params.cvector_completions_file.c_str() }); | |
options.push_back({ "cvector", "--completions N", "number of lines of completions file to use (default: %d)", params.n_completions }); | |
options.push_back({ "cvector", "--batch-pca N", "batch size used for PCA. Larger batch runs faster, but uses more memory (default: %d)", params.n_pca_batch }); | |
options.push_back({ "cvector", "--iter-pca N", "number of iterations used for PCA (default: %d)", params.n_pca_iterations }); | |
options.push_back({ "control-vector" }); | |
options.push_back({ "cvector", "-o, --output FNAME", "output file (default: '%s')", params.cvector_outfile.c_str() }); | |
options.push_back({ "cvector", " --positive-file FNAME", "positive prompts file, one prompt per line (default: '%s')", params.cvector_positive_file.c_str() }); | |
options.push_back({ "cvector", " --negative-file FNAME", "negative prompts file, one prompt per line (default: '%s')", params.cvector_negative_file.c_str() }); | |
options.push_back({ "cvector", " --completions-file FNAME", | |
"completions file (default: '%s')", params.cvector_completions_file.c_str() }); | |
options.push_back({ "cvector", " --completions N", "number of lines from the completions file to use (default: %d)", params.n_completions }); | |
options.push_back({ "cvector", " --batch-pca N", "batch size used for PCA. Larger batch runs faster, but uses more memory (default: %d)", params.n_pca_batch }); | |
options.push_back({ "cvector", " --iter-pca N", "number of iterations used for PCA (default: %d)", params.n_pca_iterations }); | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I also changed the example name + binary name to llama-cvector-generator
|
||
``` | ||
<|im_start|>system\nAct like a person who is extremely happy.<|im_end|> | ||
<|im_start|>system\nYou are in a very good mood today<|im_end|> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@calvin-laurenson I ended up enabling escape new line by default, which should be more convenient for most users.
control-vector-generator
examplecvector-generator
example
FYI, the help text refers to Also, if the completion portion bails out due to the number of positive prompts != negative prompts, PCA still tries to run: Log
|
Resolve #6880
Result from last working version: #7514 (comment)
TODO in next PRs:
cvector-generator
example #7514 (comment))llama_decode
)