Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cvector-generator example #7514

Merged
merged 58 commits into from
Jun 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
0a46d73
add control-vector-generator
ngxson May 24, 2024
c31c118
calc diff
ngxson May 24, 2024
b30bea3
add comments
ngxson May 24, 2024
73747fe
proof-of-concept stdlib implementation
christianazinn May 30, 2024
f58f6af
param parsing, refactor, comments
christianazinn May 30, 2024
dc46264
example template completions
christianazinn May 30, 2024
447023f
add multi prompts, multi-thread for PCA
ngxson May 30, 2024
287da25
fix mem error
ngxson May 30, 2024
d446c6d
add debugs
ngxson May 30, 2024
31f153f
fix matrix transpose multiplication
christianazinn May 31, 2024
fa85ba6
preliminary template/multiprompt support
christianazinn May 31, 2024
4d88cd1
fix zero output & param parsing, functional templating
christianazinn May 31, 2024
4d7d71b
fix square_diff matmul index range and CRLF->LF line endings
christianazinn Jun 1, 2024
6256036
add command-line args for num threads, num completions file lines, al…
christianazinn Jun 1, 2024
db3ba10
code aestheticization
christianazinn Jun 1, 2024
86842b2
fix compiler warnings
christianazinn Jun 1, 2024
5442688
in-series multithreading for prompt embedding?
christianazinn Jun 1, 2024
3090c48
remove unnecessary multithreading
christianazinn Jun 1, 2024
df623ff
interim fix memory leak
christianazinn Jun 1, 2024
0e1f973
translated everything but PCA (I think)
christianazinn Jun 1, 2024
b67ea65
tentatively translate the rest
christianazinn Jun 2, 2024
a23c72e
fix ggml errors and make new ones
christianazinn Jun 2, 2024
15d5c25
fix cb_eval
ngxson Jun 2, 2024
07dba13
temporary commit while I move dev environments
christianazinn Jun 3, 2024
23fd1b5
update debug statements
christianazinn Jun 4, 2024
3815a0c
pre-tokenize so we can allocate correct memory to ctx_diffs_wrapped
christianazinn Jun 4, 2024
a42e783
update comments
christianazinn Jun 4, 2024
a710df7
(wip) refactor
ngxson Jun 7, 2024
c241b50
clean up PCA ggml implementation
ngxson Jun 10, 2024
6a5adf3
fix shape of v_diff_original
ngxson Jun 10, 2024
9e39571
add n_batch for pca
ngxson Jun 11, 2024
1a088fb
working version
ngxson Jun 11, 2024
1639168
remember to copy back the last_eigenvector
ngxson Jun 11, 2024
446da90
fix n_completions
christianazinn Jun 11, 2024
d41c719
bring back n_completions
ngxson Jun 11, 2024
3223133
default n_pca_batch to 20
ngxson Jun 11, 2024
da6babd
fix macos build
ngxson Jun 11, 2024
85db22d
Merge branch 'master' into xsn/control-vector-generator
ngxson Jun 11, 2024
54f77e2
add to makefile all targets
ngxson Jun 11, 2024
04c91d2
use ggml_format_name
ngxson Jun 11, 2024
5ffba9e
add readme
ngxson Jun 11, 2024
e9cb3b3
fix .editorconfig
ngxson Jun 11, 2024
7297817
use ggml_backend_tensor_copy
ngxson Jun 12, 2024
e683b9a
attemp to fix compile problem on mac
ngxson Jun 12, 2024
8ee0c96
fix compile warn
ngxson Jun 12, 2024
f54cb8e
reuse allocr
ngxson Jun 12, 2024
679f513
move param parser to common
ngxson Jun 12, 2024
a2a5f1b
better error handling
ngxson Jun 12, 2024
b22c845
clean up a bit
ngxson Jun 12, 2024
c59bfa6
add print_usage
ngxson Jun 12, 2024
334dbae
shorten help msg
ngxson Jun 12, 2024
25fb0a6
beautify help msg
ngxson Jun 13, 2024
ca86d4f
escape prompt by default
ngxson Jun 13, 2024
2f05558
Merge branch 'master' into xsn/control-vector-generator
ngxson Jun 13, 2024
64cad20
change compile target to llama-cvector-generator
ngxson Jun 13, 2024
91f7dbf
typo
ngxson Jun 13, 2024
f99be2c
disable GPU for PCA
ngxson Jun 13, 2024
6d2464a
code style
ngxson Jun 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,6 @@ indent_size = 2

[examples/llama.swiftui/llama.swiftui.xcodeproj/*]
indent_style = tab

[examples/cvector-generator/*.txt]
insert_final_newline = unset
5 changes: 5 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ BUILD_TARGETS = \
llama-tokenize \
llama-train-text-from-scratch \
llama-vdot \
llama-cvector-generator \
tests/test-c.o

# Binaries only useful for tests
Expand Down Expand Up @@ -922,6 +923,10 @@ llama-eval-callback: examples/eval-callback/eval-callback.cpp ggml.o llama.o $(C
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)

llama-cvector-generator: examples/cvector-generator/cvector-generator.cpp ggml.o llama.o $(COMMON_DEPS) $(OBJS)
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)

llama-train-text-from-scratch: examples/train-text-from-scratch/train-text-from-scratch.cpp ggml.o llama.o $(COMMON_DEPS) train.o $(OBJS)
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
Expand Down
60 changes: 60 additions & 0 deletions common/common.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1576,6 +1576,7 @@ bool gpt_params_find_arg(int argc, char ** argv, const std::string & arg, gpt_pa
return true;
}
params.out_file = argv[i];
params.cvector_outfile = argv[i];
return true;
}
if (arg == "-ofreq" || arg == "--output-frequency") {
Expand Down Expand Up @@ -1610,6 +1611,55 @@ bool gpt_params_find_arg(int argc, char ** argv, const std::string & arg, gpt_pa
params.i_chunk = std::stoi(argv[i]);
return true;
}
// cvector params
if (arg == "--completions-file") {
if (++i >= argc) {
invalid_param = true;
return true;
}
params.cvector_completions_file = argv[i];
return true;
}
if (arg == "--positive-file") {
if (++i >= argc) {
invalid_param = true;
return true;
}
params.cvector_positive_file = argv[i];
return true;
}
if (arg == "--negative-file") {
if (++i >= argc) {
invalid_param = true;
return true;
}
params.cvector_negative_file = argv[i];
return true;
}
if (arg == "--completions") {
if (++i >= argc) {
invalid_param = true;
return true;
}
params.n_completions = std::stoi(argv[i]);
return true;
}
if (arg == "--pca-batch") {
if (++i >= argc) {
invalid_param = true;
return true;
}
params.n_pca_batch = std::stoi(argv[i]);
return true;
}
if (arg == "--pca-iter") {
if (++i >= argc) {
invalid_param = true;
return true;
}
params.n_pca_iterations = std::stoi(argv[i]);
return true;
}
#ifndef LOG_DISABLE_LOGS
// Parse args for logging parameters
if (log_param_single_parse(argv[i])) {
Expand Down Expand Up @@ -1931,6 +1981,16 @@ void gpt_params_print_usage(int /*argc*/, char ** argv, const gpt_params & param
options.push_back({ "logging", " --log-append", "Don't truncate the old log file." });
#endif // LOG_DISABLE_LOGS

options.push_back({ "cvector" });
options.push_back({ "cvector", "-o, --output FNAME", "output file (default: '%s')", params.cvector_outfile.c_str() });
options.push_back({ "cvector", " --positive-file FNAME", "positive prompts file, one prompt per line (default: '%s')", params.cvector_positive_file.c_str() });
options.push_back({ "cvector", " --negative-file FNAME", "negative prompts file, one prompt per line (default: '%s')", params.cvector_negative_file.c_str() });
options.push_back({ "cvector", " --completions-file FNAME",
"completions file (default: '%s')", params.cvector_completions_file.c_str() });
options.push_back({ "cvector", " --completions N", "number of lines of completions file to use (default: %d)", params.n_completions });
options.push_back({ "cvector", " --batch-pca N", "batch size used for PCA. Larger batch runs faster, but uses more memory (default: %d)", params.n_pca_batch });
options.push_back({ "cvector", " --iter-pca N", "number of iterations used for PCA (default: %d)", params.n_pca_iterations });

printf("usage: %s [options]\n", argv[0]);

for (const auto & o : options) {
Expand Down
9 changes: 9 additions & 0 deletions common/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,15 @@ struct gpt_params {

bool process_output = false; // collect data for the output tensor
bool compute_ppl = true; // whether to compute perplexity

// cvector-generator params
int n_completions = 64;
int n_pca_batch = 20;
int n_pca_iterations = 1000;
std::string cvector_outfile = "control_vector.gguf";
std::string cvector_completions_file = "examples/cvector-generator/completions.txt";
std::string cvector_positive_file = "examples/cvector-generator/positive.txt";
std::string cvector_negative_file = "examples/cvector-generator/negative.txt";
};

void gpt_params_handle_model_default(gpt_params & params);
Expand Down
1 change: 1 addition & 0 deletions examples/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ include_directories(${CMAKE_CURRENT_SOURCE_DIR})

if (EMSCRIPTEN)
else()
add_subdirectory(cvector-generator)
add_subdirectory(baby-llama)
add_subdirectory(batched-bench)
add_subdirectory(batched)
Expand Down
5 changes: 5 additions & 0 deletions examples/cvector-generator/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
set(TARGET llama-cvector-generator)
add_executable(${TARGET} cvector-generator.cpp pca.hpp)
install(TARGETS ${TARGET} RUNTIME)
target_link_libraries(${TARGET} PRIVATE common llama ${CMAKE_THREAD_LIBS_INIT})
target_compile_features(${TARGET} PRIVATE cxx_std_11)
34 changes: 34 additions & 0 deletions examples/cvector-generator/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# cvector-generator

This example demonstrates how to generate a control vector using gguf models.

Related PRs:
- [Add support for control vectors](https://github.com/ggerganov/llama.cpp/pull/5970)
- (Issue) [Generate control vector using llama.cpp](https://github.com/ggerganov/llama.cpp/issues/6880)
- [Add cvector-generator example](https://github.com/ggerganov/llama.cpp/pull/7514)

## Examples

```sh
# CPU only
./cvector-generator -m ./dolphin-2.0-mistral-7b.Q4_K_M.gguf

# With GPU
./cvector-generator -m ./dolphin-2.0-mistral-7b.Q4_K_M.gguf -ngl 99

# With advanced options
./cvector-generator -m ./dolphin-2.0-mistral-7b.Q4_K_M.gguf -ngl 99 --completions 128 --pca-iter 2000 --batch-pca 100

# To see help message
./cvector-generator -h
# Then, have a look at "cvector" section
```

## Tips and tricks

If you have multiple lines per prompt, you can escape the newline character (change it to `\n`). For example:

```
<|im_start|>system\nAct like a person who is extremely happy.<|im_end|>
<|im_start|>system\nYou are in a very good mood today<|im_end|>
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@calvin-laurenson I ended up enabling escape new line by default, which should be more convenient for most users.

```
Loading
Loading