Control vector loading fixes #8137

jukofyork · 2024-06-26T13:48:49Z

Fixed llama_control_vector_load() to allow result.data to be extended as needed.
Fixed leak before last return in llama_control_vector_load_one().
Changed llama_control_vector_load() to continue trying to load further control vectors on failure to load one.
I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

…_vector_load() to grow

slaren · 2024-06-26T14:07:44Z

There are still ggml and gguf context leaks in some cases. It has been a while since I took a look at #6289, but I think the changes there are correct and it would be better to use it as the baseline.

jukofyork · 2024-06-26T14:25:55Z

There are still ggml and gguf context leaks in some cases. It has been a while since I took a look at #6289, but I think the changes there are correct and it would be better to use it as the baseline.

Yeah, sorry I just pushed what I had so far but I'm busy incorporating those fixes now :)

jukofyork · 2024-06-26T15:51:01Z

I've completely refactored llama_control_vector_load_one() to remove the unnecessary 2-passes and introduce a single point of return to lessen the likelihood of leaks in the future.

Can you have a quick look over it to see if there is anything you can see that I've done wrong:

static llama_control_vector_data llama_control_vector_load_one(const llama_control_vector_load_info & load_info) {

    llama_control_vector_data result = { -1, {} };

    ggml_context * ctx = nullptr;
    struct gguf_init_params meta_gguf_params = {
        /* .no_alloc = */ false,
        /* .ctx      = */ &ctx,
    };
    struct gguf_context * ctx_gguf = gguf_init_from_file(load_info.fname.c_str(), meta_gguf_params);
    if (!ctx_gguf) {
        fprintf(stderr, "%s: failed to load control vector from %s\n", __func__, load_info.fname.c_str());
        ggml_free(ctx);
        return result;
    }

    int32_t n_tensors = gguf_get_n_tensors(ctx_gguf);
    if (n_tensors == 0) {
        fprintf(stderr, "%s: no direction tensors found in %s\n", __func__, load_info.fname.c_str());
    }

    for (int i = 0; i < n_tensors; i++) {
        std::string name = gguf_get_tensor_name(ctx_gguf, i);

        int layer_idx = -1;

        // split on '.'
        size_t dotpos = name.find('.');
        if (dotpos != std::string::npos && name.substr(0, dotpos) == "direction") {
            try {
                layer_idx = std::stoi(name.substr(dotpos + 1));
            } catch (...) {
                layer_idx = -1;
            }
        }
        if (layer_idx < 0) {
            fprintf(stderr, "%s: invalid/unparsable direction tensor layer index in %s\n", __func__, load_info.fname.c_str());
            continue;
        } else if (layer_idx == 0) {
            fprintf(stderr, "%s: invalid (zero) direction tensor layer index in %s\n", __func__, load_info.fname.c_str());
            continue;
        }

        struct ggml_tensor * tensor = ggml_get_tensor(ctx, name.c_str());
        if (tensor->type != GGML_TYPE_F32) {
            fprintf(stderr, "%s: invalid (non-F32) direction tensor type in %s\n", __func__, load_info.fname.c_str());
            continue;
        }
        if (ggml_n_dims(tensor) != 1) {
            fprintf(stderr, "%s: invalid (non-1D) direction tensor shape in %s\n", __func__, load_info.fname.c_str());
            continue;
        }

        if (result.n_embd == -1) {
            result.n_embd = ggml_nelements(tensor);
        } else if (ggml_nelements(tensor) != result.n_embd) {
            fprintf(stderr, "%s: direction tensor in %s does not match previous dimensions\n", __func__, load_info.fname.c_str());
            continue;
        }

        // extend if necessary - do not store data for layer 0 (it's not used)
        result.data.resize(std::max(result.data.size(), static_cast<size_t>(result.n_embd * layer_idx)), 0.0f);

        const float * src = (const float *) tensor->data;
        float * dst = result.data.data() + result.n_embd * (layer_idx - 1);  // layer 1 at [0]
        for (int j = 0; j < result.n_embd; j++) {
            dst[j] = src[j] * load_info.strength;
        }

    }

    gguf_free(ctx_gguf);
    ggml_free(ctx);

    return result;
}

I've tried to keep with the "continue" instead of break or hard-exit if possible for both functions now and tried to make the errors a little more informative about using layer 0, etc.

I've tested it very briefly on mistral using 3 sets of control vectors and it appears to work OK, but will double-check again later after some food when feeling fresher, so don't merge yet! :)

jukofyork · 2024-06-26T17:25:59Z

I've tested it some more using mistral with 4 sets of control vectors and some extra debugging output shows it's correctly resizing the .data vector in both locations.

The failed test looks to be something totally unrelated and is a missing "common.h" header.

Two final things to think about before merging:

If we should allow multiple direction vectors for the same layer in the same file?

IMO, this should be allowed and just have them summed as we do for the separate files. This will also tie in with my intention of adding "control projectors" next as these will be combined using V.V^T instead of summation as it does likely make sense to want to perform the orthogonal projection using a subspace of rank-2 or more. Allowing multiple directions in the "control vectors" file although redundant, doesn't really do any harm I think?

If it's decided against having this, then I think the code should probably warn the user as currently it just overwrites the old layer's data...

If we want to go with this change then I think the only change needed is this:

dst[j] += src[j] * load_info.strength;

as the result.data.resize(..., 0f) call above should have already initialized the data to zeros.

Can we change the name of llama.cpp::llama_control_vector_apply() to something more appropriate? I think possibly llama_control_vector_set might be the best choice as this is ultimately what it is doing:

    for (size_t il = 1; il < model.hparams.n_layer; il++) {
        assert(cvec.tensors[il] != nullptr);

        const size_t off = n_embd * (il - 1); // buffer doesn't have data for layer 0, since it's never present
        if (off + n_embd <= len) {
            ggml_backend_tensor_set(cvec.tensors[il], data + off, 0, n_embd * ggml_element_size(cvec.tensors[il]));
        }
    }

jukofyork · 2024-06-26T20:01:02Z

I pushed this change to see if it will clear the failed tests:

dst[j] += src[j] * load_info.strength;

slaren · 2024-06-27T00:24:32Z

The code looks good to me. I think it would be better to change the continue to break and abort the process in case of error, it seems very unlikely that the result will correct if there are errors in the control vector file, so it is better to report the error and abort. That's the way we handle models in llama.cpp, eg. a model will fail to load even if it has all the expected tensors, but it also has some additional unused ones.

1. If we should allow multiple direction vectors for the same layer in the same file?

I don't really have an opinion about this, it seems pointless if they are just going to be summed in the end, but if there are other cases where this separation may be necessary then I guess it is ok.

2. Can we change the name of llama.cpp::llama_control_vector_apply() to something more appropriate? I think possibly llama_control_vector_set might be the best choice as this is ultimately what it is doing:

I agree that llama_control_vector_set could be a better name, but I don't think it is an good idea to make a breaking API chance for this reason only. It might make more sense to do that if it is part of a larger refactor of the implementation.

…reak on error

jukofyork · 2024-06-27T07:51:41Z

The code looks good to me. I think it would be better to change the continue to break and abort the process in case of error, it seems very unlikely that the result will correct if there are errors in the control vector file, so it is better to report the error and abort. That's the way we handle models in llama.cpp, eg. a model will fail to load even if it has all the expected tensors, but it also has some additional unused ones.

I've made both functions abort loading any more control vectors and also clear the .data member on abort. This should hopefully avoid the case that confused me (as a user), of loading some control vector files but not others:

//
// Control vector utils
//

static llama_control_vector_data llama_control_vector_load_one(const llama_control_vector_load_info & load_info) {
    llama_control_vector_data result = { -1, {} };

    ggml_context * ctx = nullptr;
    struct gguf_init_params meta_gguf_params = {
        /* .no_alloc = */ false,
        /* .ctx      = */ &ctx,
    };
    struct gguf_context * ctx_gguf = gguf_init_from_file(load_info.fname.c_str(), meta_gguf_params);
    if (!ctx_gguf) {
        fprintf(stderr, "%s: failed to load control vector file from %s\n", __func__, load_info.fname.c_str());
        ggml_free(ctx);
        return result;
    }

    int32_t n_tensors = gguf_get_n_tensors(ctx_gguf);
    if (n_tensors == 0) {
        fprintf(stderr, "%s: no direction tensors found in %s\n", __func__, load_info.fname.c_str());
    }

    for (int i = 0; i < n_tensors; i++) {
        std::string name = gguf_get_tensor_name(ctx_gguf, i);

        int layer_idx = -1;

        // split on '.'
        size_t dotpos = name.find('.');
        if (dotpos != std::string::npos && name.substr(0, dotpos) == "direction") {
            try {
                layer_idx = std::stoi(name.substr(dotpos + 1));
            } catch (...) {
                layer_idx = -1;
            }
        }
        if (layer_idx < 0) {
            fprintf(stderr, "%s: invalid/unparsable direction tensor layer index in %s\n", __func__, load_info.fname.c_str());
            result.n_embd = -1;
            break;
        } else if (layer_idx == 0) {
            fprintf(stderr, "%s: invalid (zero) direction tensor layer index in %s\n", __func__, load_info.fname.c_str());
            result.n_embd = -1;
            break;
        }

        struct ggml_tensor * tensor = ggml_get_tensor(ctx, name.c_str());
        if (tensor->type != GGML_TYPE_F32) {
            fprintf(stderr, "%s: invalid (non-F32) direction tensor type in %s\n", __func__, load_info.fname.c_str());
            result.n_embd = -1;
            break;
        }
        if (ggml_n_dims(tensor) != 1) {
            fprintf(stderr, "%s: invalid (non-1D) direction tensor shape in %s\n", __func__, load_info.fname.c_str());
            result.n_embd = -1;
            break;
        }

        if (result.n_embd == -1) {
            result.n_embd = ggml_nelements(tensor);
        } else if (ggml_nelements(tensor) != result.n_embd) {
            fprintf(stderr, "%s: direction tensor in %s does not match previous dimensions\n", __func__, load_info.fname.c_str());
            result.n_embd = -1;
            break;
        }

        // extend if necessary - do not store data for layer 0 (it's not used)
        result.data.resize(std::max(result.data.size(), static_cast<size_t>(result.n_embd * layer_idx)), 0.0f);

        const float * src = (const float *) tensor->data;
        float * dst = result.data.data() + result.n_embd * (layer_idx - 1);  // layer 1 at [0]
        for (int j = 0; j < result.n_embd; j++) {
            dst[j] += src[j] * load_info.strength;  // allows multiple directions for same layer in same file
        }

    }

    if (result.n_embd == -1) {
        fprintf(stderr, "%s: skipping %s due to invalid direction tensors\n", __func__, load_info.fname.c_str());
        result.data.clear();
    }

    gguf_free(ctx_gguf);
    ggml_free(ctx);

    return result;
}

llama_control_vector_data llama_control_vector_load(const std::vector<llama_control_vector_load_info> & load_infos) {
    llama_control_vector_data result = { -1, {} };

    for (const auto & info : load_infos) {
        auto cur = llama_control_vector_load_one(info);

        if (cur.n_embd == -1) {
            result.n_embd = -1;
            break;
        }
        if (result.n_embd != -1 && result.n_embd != cur.n_embd) {
            fprintf(stderr, "%s: control vectors in %s does not match previous dimensions\n", __func__, info.fname.c_str());
            result.n_embd = -1;
            break;
        }

        if (result.n_embd == -1) {
            result = std::move(cur);
        } else {
            result.data.resize(std::max(result.data.size(), cur.data.size()), 0.0f);  // extend if necessary
            for (size_t i = 0; i < cur.data.size(); i++) {
                result.data[i] += cur.data[i];
            }
        }
    }

    if (result.n_embd == -1) {
        fprintf(stderr, "%s: no valid control vector files passed\n", __func__);
        result.data.clear();
    }

    return result;
}

If we should allow multiple direction vectors for the same layer in the same file?

I don't really have an opinion about this, it seems pointless if they are just going to be summed in the end, but if there are other cases where this separation may be necessary then I guess it is ok.

It was more a case of should it trigger an error, but I think since it's just getting summed then it is probably valid/OK to have.

Can we change the name of llama.cpp::llama_control_vector_apply() to something more appropriate? I think possibly llama_control_vector_set might be the best choice as this is ultimately what it is doing:

I agree that llama_control_vector_set could be a better name, but I don't think it is an good idea to make a breaking API chance for this reason only. It might make more sense to do that if it is part of a larger refactor of the implementation.

I'll leave off this then, but my next task after this is to look at introducing the "control projectors" and may look to refactor that then.

I actually think this code for the llama_control_vector_data struct would have been better to use a std::map<int, std::vector<float>> for the .data member and will likely have to use std::map<int, std::vector<std::vector<float>>> for the "control projectors" to allow separation of the multiple vectors per layer for the V.V^T (or equivalent v_1.v_1^T + v_2.v_2^T + ...) calculation.

I know @ggerganov is very anti-template due to the increased compile times, so just checking if it is likely to be OK to use a std::map like this? I think the overhead will be minimal (each vector is likely 4-8k of contiguous 4-byte floats) and the non-contiguous memory layout not much of a factor either (ie: only a couple of MB and likely used to initialise a GGML tensor down the chain anyway).

I will leave any of this for the next PR anyway and I think this PR is done now unless you can see anything else?

common/common.cpp

slaren · 2024-06-27T13:15:51Z

I know @ggerganov is very anti-template due to the increased compile times, so just checking if it is likely to be OK to use a std::map like this?

std::map and other templates from the C++ standard library are used everywhere in llama.cpp, it is ok to use them.

jukofyork · 2024-06-27T13:47:07Z

I know @ggerganov is very anti-template due to the increased compile times, so just checking if it is likely to be OK to use a std::map like this?

std::map and other templates from the C++ standard library are used everywhere in llama.cpp, it is ok to use them.

I think this may not be such a good idea after all:

    LLAMA_API int32_t llama_control_vector_apply(
            struct llama_context * lctx,
                     std::map<int, std::pair<std::vector<float>, std::vector<std::vector<float>>>> data,
                         int32_t   n_embd,
                         int32_t   il_start,
                         int32_t   il_end);

😁

I think to start with I will just allow a single "control projector" per layer and think later how to do this without the map of doom...

If I keep 2 separate scale factors (called strength in the code) then I can actually use the same control vectors for both purposes and it will be very minimal changes up until the need to call the ggml_matmul(), etc code.

I think I'll leave it a couple of days and come back fresh though, so will look again next week at this.

slaren · 2024-06-27T13:49:37Z

Ah yes, the llama.cpp public API is strictly a C API, if this function is part of the llama.cpp API then it cannot take a C++ class as a parameter.

jukofyork · 2024-06-27T15:14:35Z

Ah yes, the llama.cpp public API is strictly a C API, if this function is part of the llama.cpp API then it cannot take a C++ class as a parameter.

Yeah, I think it wasn't a good idea trying to use a map even if not. I think I should be able to get this working very easily for a maximum of 1 projected direction, but might need some help again by the time I get the the ggml calls (likely early next week now).

* Fixed leak in llama_control_vector_load_one() and allow llama_control_vector_load() to grow * refactored `llama_control_vector_load_one()` * allow multiple directions for same layer in same file * llama_control_vector_load_one() and llama_control_vector_load() now break on error * removed unnecessary ggml_free() call

jukofyork · 2024-07-14T09:59:25Z

This paper:

https://arxiv.org/abs/2311.06668

has a couple of interesting points, even though it looks to be almost the same and barely references the Representation Engineering: A Top-Down Approach to AI Transparency paper:

[42] introduce “representation engineering” to align model behavior
with certain concepts.

I was thinking about scaling the sum between 1/n vs sqrt(1/n) depending on how correlated the directions are when doing this PR, but this seems a better idea.

This paper also seems to get task arithmetic of the vectors working, but so far I've not had much luck with this and am thinking about just creating a bunch of weights for each combination by enumerating equally spaced points on the n-simplex, and then blending the training data or hidden state samples when creating the direction vectors:

There is definitely a problem with non-independent traits like "Dark" and "Chaotic" creative writing styles overlapping and "over cooking" the hidden states.
I think the addition of the vectors is possibly also sending the hidden states "out of distribution":

I have the strange phenomenon of nicely working "Dark" and "Chaotic" creative writing style control vectors that work well on their own, but as soon as I try to mix them; the naturally "positive" models like mistral, miqu or wizard-lm-2 have the characters commit suicide about 75% of the time! :/

Fixed leak in llama_control_vector_load_one() and allow llama_control…

2b27675

…_vector_load() to grow

jukofyork mentioned this pull request Jun 26, 2024

fixes #7999 (adds control vectors to all build_XXX() functions in llama.cpp [needs testing] #8060

Merged

4 tasks

refactored llama_control_vector_load_one()

7b4d87a

allow multiple directions for same layer in same file

cb4bf40

Merge branch 'ggerganov:master' into fix-control-vector-loading

a4ab2a7

jukofyork added 2 commits June 27, 2024 07:27

Merge branch 'ggerganov:master' into fix-control-vector-loading

0395a3c

llama_control_vector_load_one() and llama_control_vector_load() now b…

a299709

…reak on error

Merge branch 'ggerganov:master' into fix-control-vector-loading

e778701

mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Jun 27, 2024

slaren reviewed Jun 27, 2024

View reviewed changes

common/common.cpp Outdated Show resolved Hide resolved

removed unnecessary ggml_free() call

271c511

slaren approved these changes Jun 27, 2024

View reviewed changes

slaren merged commit 97877eb into ggerganov:master Jun 27, 2024
51 of 52 checks passed

jukofyork deleted the fix-control-vector-loading branch July 27, 2024 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Control vector loading fixes #8137

Control vector loading fixes #8137

jukofyork commented Jun 26, 2024 •

edited

Loading

slaren commented Jun 26, 2024

jukofyork commented Jun 26, 2024

jukofyork commented Jun 26, 2024

jukofyork commented Jun 26, 2024 •

edited

Loading

jukofyork commented Jun 26, 2024

slaren commented Jun 27, 2024 •

edited

Loading

jukofyork commented Jun 27, 2024

slaren commented Jun 27, 2024

jukofyork commented Jun 27, 2024

slaren commented Jun 27, 2024

jukofyork commented Jun 27, 2024

jukofyork commented Jul 14, 2024 •

edited

Loading

Control vector loading fixes #8137

Control vector loading fixes #8137

Conversation

jukofyork commented Jun 26, 2024 • edited Loading

slaren commented Jun 26, 2024

jukofyork commented Jun 26, 2024

jukofyork commented Jun 26, 2024

jukofyork commented Jun 26, 2024 • edited Loading

jukofyork commented Jun 26, 2024

slaren commented Jun 27, 2024 • edited Loading

jukofyork commented Jun 27, 2024

slaren commented Jun 27, 2024

jukofyork commented Jun 27, 2024

slaren commented Jun 27, 2024

jukofyork commented Jun 27, 2024

jukofyork commented Jul 14, 2024 • edited Loading

jukofyork commented Jun 26, 2024 •

edited

Loading

jukofyork commented Jun 26, 2024 •

edited

Loading

slaren commented Jun 27, 2024 •

edited

Loading

jukofyork commented Jul 14, 2024 •

edited

Loading