Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control vector loading fixes #8137

Merged
merged 8 commits into from
Jun 27, 2024
Merged

Control vector loading fixes #8137

merged 8 commits into from
Jun 27, 2024

Conversation

jukofyork
Copy link
Contributor

@jukofyork jukofyork commented Jun 26, 2024

  • Fixed llama_control_vector_load() to allow result.data to be extended as needed.

  • Fixed leak before last return in llama_control_vector_load_one().

  • Changed llama_control_vector_load() to continue trying to load further control vectors on failure to load one.

  • I have read the contributing guidelines

  • Self-reported review complexity:

    • Low
    • Medium
    • High

@slaren
Copy link
Collaborator

slaren commented Jun 26, 2024

There are still ggml and gguf context leaks in some cases. It has been a while since I took a look at #6289, but I think the changes there are correct and it would be better to use it as the baseline.

@jukofyork
Copy link
Contributor Author

There are still ggml and gguf context leaks in some cases. It has been a while since I took a look at #6289, but I think the changes there are correct and it would be better to use it as the baseline.

Yeah, sorry I just pushed what I had so far but I'm busy incorporating those fixes now :)

@jukofyork
Copy link
Contributor Author

I've completely refactored llama_control_vector_load_one() to remove the unnecessary 2-passes and introduce a single point of return to lessen the likelihood of leaks in the future.

Can you have a quick look over it to see if there is anything you can see that I've done wrong:

static llama_control_vector_data llama_control_vector_load_one(const llama_control_vector_load_info & load_info) {

    llama_control_vector_data result = { -1, {} };

    ggml_context * ctx = nullptr;
    struct gguf_init_params meta_gguf_params = {
        /* .no_alloc = */ false,
        /* .ctx      = */ &ctx,
    };
    struct gguf_context * ctx_gguf = gguf_init_from_file(load_info.fname.c_str(), meta_gguf_params);
    if (!ctx_gguf) {
        fprintf(stderr, "%s: failed to load control vector from %s\n", __func__, load_info.fname.c_str());
        ggml_free(ctx);
        return result;
    }

    int32_t n_tensors = gguf_get_n_tensors(ctx_gguf);
    if (n_tensors == 0) {
        fprintf(stderr, "%s: no direction tensors found in %s\n", __func__, load_info.fname.c_str());
    }

    for (int i = 0; i < n_tensors; i++) {
        std::string name = gguf_get_tensor_name(ctx_gguf, i);

        int layer_idx = -1;

        // split on '.'
        size_t dotpos = name.find('.');
        if (dotpos != std::string::npos && name.substr(0, dotpos) == "direction") {
            try {
                layer_idx = std::stoi(name.substr(dotpos + 1));
            } catch (...) {
                layer_idx = -1;
            }
        }
        if (layer_idx < 0) {
            fprintf(stderr, "%s: invalid/unparsable direction tensor layer index in %s\n", __func__, load_info.fname.c_str());
            continue;
        } else if (layer_idx == 0) {
            fprintf(stderr, "%s: invalid (zero) direction tensor layer index in %s\n", __func__, load_info.fname.c_str());
            continue;
        }

        struct ggml_tensor * tensor = ggml_get_tensor(ctx, name.c_str());
        if (tensor->type != GGML_TYPE_F32) {
            fprintf(stderr, "%s: invalid (non-F32) direction tensor type in %s\n", __func__, load_info.fname.c_str());
            continue;
        }
        if (ggml_n_dims(tensor) != 1) {
            fprintf(stderr, "%s: invalid (non-1D) direction tensor shape in %s\n", __func__, load_info.fname.c_str());
            continue;
        }

        if (result.n_embd == -1) {
            result.n_embd = ggml_nelements(tensor);
        } else if (ggml_nelements(tensor) != result.n_embd) {
            fprintf(stderr, "%s: direction tensor in %s does not match previous dimensions\n", __func__, load_info.fname.c_str());
            continue;
        }

        // extend if necessary - do not store data for layer 0 (it's not used)
        result.data.resize(std::max(result.data.size(), static_cast<size_t>(result.n_embd * layer_idx)), 0.0f);

        const float * src = (const float *) tensor->data;
        float * dst = result.data.data() + result.n_embd * (layer_idx - 1);  // layer 1 at [0]
        for (int j = 0; j < result.n_embd; j++) {
            dst[j] = src[j] * load_info.strength;
        }

    }

    gguf_free(ctx_gguf);
    ggml_free(ctx);

    return result;
}

I've tried to keep with the "continue" instead of break or hard-exit if possible for both functions now and tried to make the errors a little more informative about using layer 0, etc.

I've tested it very briefly on mistral using 3 sets of control vectors and it appears to work OK, but will double-check again later after some food when feeling fresher, so don't merge yet! :)

@jukofyork
Copy link
Contributor Author

jukofyork commented Jun 26, 2024

I've tested it some more using mistral with 4 sets of control vectors and some extra debugging output shows it's correctly resizing the .data vector in both locations.

The failed test looks to be something totally unrelated and is a missing "common.h" header.


Two final things to think about before merging:

  1. If we should allow multiple direction vectors for the same layer in the same file?

IMO, this should be allowed and just have them summed as we do for the separate files. This will also tie in with my intention of adding "control projectors" next as these will be combined using V.V^T instead of summation as it does likely make sense to want to perform the orthogonal projection using a subspace of rank-2 or more. Allowing multiple directions in the "control vectors" file although redundant, doesn't really do any harm I think?

If it's decided against having this, then I think the code should probably warn the user as currently it just overwrites the old layer's data...

If we want to go with this change then I think the only change needed is this:

dst[j] += src[j] * load_info.strength;

as the result.data.resize(..., 0f) call above should have already initialized the data to zeros.

  1. Can we change the name of llama.cpp::llama_control_vector_apply() to something more appropriate? I think possibly llama_control_vector_set might be the best choice as this is ultimately what it is doing:
    for (size_t il = 1; il < model.hparams.n_layer; il++) {
        assert(cvec.tensors[il] != nullptr);

        const size_t off = n_embd * (il - 1); // buffer doesn't have data for layer 0, since it's never present
        if (off + n_embd <= len) {
            ggml_backend_tensor_set(cvec.tensors[il], data + off, 0, n_embd * ggml_element_size(cvec.tensors[il]));
        }
    }

@jukofyork
Copy link
Contributor Author

I pushed this change to see if it will clear the failed tests:

dst[j] += src[j] * load_info.strength;

@slaren
Copy link
Collaborator

slaren commented Jun 27, 2024

The code looks good to me. I think it would be better to change the continue to break and abort the process in case of error, it seems very unlikely that the result will correct if there are errors in the control vector file, so it is better to report the error and abort. That's the way we handle models in llama.cpp, eg. a model will fail to load even if it has all the expected tensors, but it also has some additional unused ones.

1. If we should allow multiple direction vectors for the same layer in the same file?

I don't really have an opinion about this, it seems pointless if they are just going to be summed in the end, but if there are other cases where this separation may be necessary then I guess it is ok.

2. Can we change the name of llama.cpp::llama_control_vector_apply() to something more appropriate? I think possibly llama_control_vector_set might be the best choice as this is ultimately what it is doing:

I agree that llama_control_vector_set could be a better name, but I don't think it is an good idea to make a breaking API chance for this reason only. It might make more sense to do that if it is part of a larger refactor of the implementation.

@jukofyork
Copy link
Contributor Author

The code looks good to me. I think it would be better to change the continue to break and abort the process in case of error, it seems very unlikely that the result will correct if there are errors in the control vector file, so it is better to report the error and abort. That's the way we handle models in llama.cpp, eg. a model will fail to load even if it has all the expected tensors, but it also has some additional unused ones.

I've made both functions abort loading any more control vectors and also clear the .data member on abort. This should hopefully avoid the case that confused me (as a user), of loading some control vector files but not others:

//
// Control vector utils
//

static llama_control_vector_data llama_control_vector_load_one(const llama_control_vector_load_info & load_info) {
    llama_control_vector_data result = { -1, {} };

    ggml_context * ctx = nullptr;
    struct gguf_init_params meta_gguf_params = {
        /* .no_alloc = */ false,
        /* .ctx      = */ &ctx,
    };
    struct gguf_context * ctx_gguf = gguf_init_from_file(load_info.fname.c_str(), meta_gguf_params);
    if (!ctx_gguf) {
        fprintf(stderr, "%s: failed to load control vector file from %s\n", __func__, load_info.fname.c_str());
        ggml_free(ctx);
        return result;
    }

    int32_t n_tensors = gguf_get_n_tensors(ctx_gguf);
    if (n_tensors == 0) {
        fprintf(stderr, "%s: no direction tensors found in %s\n", __func__, load_info.fname.c_str());
    }

    for (int i = 0; i < n_tensors; i++) {
        std::string name = gguf_get_tensor_name(ctx_gguf, i);

        int layer_idx = -1;

        // split on '.'
        size_t dotpos = name.find('.');
        if (dotpos != std::string::npos && name.substr(0, dotpos) == "direction") {
            try {
                layer_idx = std::stoi(name.substr(dotpos + 1));
            } catch (...) {
                layer_idx = -1;
            }
        }
        if (layer_idx < 0) {
            fprintf(stderr, "%s: invalid/unparsable direction tensor layer index in %s\n", __func__, load_info.fname.c_str());
            result.n_embd = -1;
            break;
        } else if (layer_idx == 0) {
            fprintf(stderr, "%s: invalid (zero) direction tensor layer index in %s\n", __func__, load_info.fname.c_str());
            result.n_embd = -1;
            break;
        }

        struct ggml_tensor * tensor = ggml_get_tensor(ctx, name.c_str());
        if (tensor->type != GGML_TYPE_F32) {
            fprintf(stderr, "%s: invalid (non-F32) direction tensor type in %s\n", __func__, load_info.fname.c_str());
            result.n_embd = -1;
            break;
        }
        if (ggml_n_dims(tensor) != 1) {
            fprintf(stderr, "%s: invalid (non-1D) direction tensor shape in %s\n", __func__, load_info.fname.c_str());
            result.n_embd = -1;
            break;
        }

        if (result.n_embd == -1) {
            result.n_embd = ggml_nelements(tensor);
        } else if (ggml_nelements(tensor) != result.n_embd) {
            fprintf(stderr, "%s: direction tensor in %s does not match previous dimensions\n", __func__, load_info.fname.c_str());
            result.n_embd = -1;
            break;
        }

        // extend if necessary - do not store data for layer 0 (it's not used)
        result.data.resize(std::max(result.data.size(), static_cast<size_t>(result.n_embd * layer_idx)), 0.0f);

        const float * src = (const float *) tensor->data;
        float * dst = result.data.data() + result.n_embd * (layer_idx - 1);  // layer 1 at [0]
        for (int j = 0; j < result.n_embd; j++) {
            dst[j] += src[j] * load_info.strength;  // allows multiple directions for same layer in same file
        }

    }

    if (result.n_embd == -1) {
        fprintf(stderr, "%s: skipping %s due to invalid direction tensors\n", __func__, load_info.fname.c_str());
        result.data.clear();
    }

    gguf_free(ctx_gguf);
    ggml_free(ctx);

    return result;
}

llama_control_vector_data llama_control_vector_load(const std::vector<llama_control_vector_load_info> & load_infos) {
    llama_control_vector_data result = { -1, {} };

    for (const auto & info : load_infos) {
        auto cur = llama_control_vector_load_one(info);

        if (cur.n_embd == -1) {
            result.n_embd = -1;
            break;
        }
        if (result.n_embd != -1 && result.n_embd != cur.n_embd) {
            fprintf(stderr, "%s: control vectors in %s does not match previous dimensions\n", __func__, info.fname.c_str());
            result.n_embd = -1;
            break;
        }

        if (result.n_embd == -1) {
            result = std::move(cur);
        } else {
            result.data.resize(std::max(result.data.size(), cur.data.size()), 0.0f);  // extend if necessary
            for (size_t i = 0; i < cur.data.size(); i++) {
                result.data[i] += cur.data[i];
            }
        }
    }

    if (result.n_embd == -1) {
        fprintf(stderr, "%s: no valid control vector files passed\n", __func__);
        result.data.clear();
    }

    return result;
}
  1. If we should allow multiple direction vectors for the same layer in the same file?

I don't really have an opinion about this, it seems pointless if they are just going to be summed in the end, but if there are other cases where this separation may be necessary then I guess it is ok.

It was more a case of should it trigger an error, but I think since it's just getting summed then it is probably valid/OK to have.

  1. Can we change the name of llama.cpp::llama_control_vector_apply() to something more appropriate? I think possibly llama_control_vector_set might be the best choice as this is ultimately what it is doing:

I agree that llama_control_vector_set could be a better name, but I don't think it is an good idea to make a breaking API chance for this reason only. It might make more sense to do that if it is part of a larger refactor of the implementation.

I'll leave off this then, but my next task after this is to look at introducing the "control projectors" and may look to refactor that then.


I actually think this code for the llama_control_vector_data struct would have been better to use a std::map<int, std::vector<float>> for the .data member and will likely have to use std::map<int, std::vector<std::vector<float>>> for the "control projectors" to allow separation of the multiple vectors per layer for the V.V^T (or equivalent v_1.v_1^T + v_2.v_2^T + ...) calculation.

I know @ggerganov is very anti-template due to the increased compile times, so just checking if it is likely to be OK to use a std::map like this? I think the overhead will be minimal (each vector is likely 4-8k of contiguous 4-byte floats) and the non-contiguous memory layout not much of a factor either (ie: only a couple of MB and likely used to initialise a GGML tensor down the chain anyway).

I will leave any of this for the next PR anyway and I think this PR is done now unless you can see anything else?

@mofosyne mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Jun 27, 2024
common/common.cpp Outdated Show resolved Hide resolved
@slaren
Copy link
Collaborator

slaren commented Jun 27, 2024

I know @ggerganov is very anti-template due to the increased compile times, so just checking if it is likely to be OK to use a std::map like this?

std::map and other templates from the C++ standard library are used everywhere in llama.cpp, it is ok to use them.

@jukofyork
Copy link
Contributor Author

I know @ggerganov is very anti-template due to the increased compile times, so just checking if it is likely to be OK to use a std::map like this?

std::map and other templates from the C++ standard library are used everywhere in llama.cpp, it is ok to use them.

I think this may not be such a good idea after all:

    LLAMA_API int32_t llama_control_vector_apply(
            struct llama_context * lctx,
                     std::map<int, std::pair<std::vector<float>, std::vector<std::vector<float>>>> data,
                         int32_t   n_embd,
                         int32_t   il_start,
                         int32_t   il_end);

😁

I think to start with I will just allow a single "control projector" per layer and think later how to do this without the map of doom...

If I keep 2 separate scale factors (called strength in the code) then I can actually use the same control vectors for both purposes and it will be very minimal changes up until the need to call the ggml_matmul(), etc code.

I think I'll leave it a couple of days and come back fresh though, so will look again next week at this.

@slaren
Copy link
Collaborator

slaren commented Jun 27, 2024

Ah yes, the llama.cpp public API is strictly a C API, if this function is part of the llama.cpp API then it cannot take a C++ class as a parameter.

@slaren slaren merged commit 97877eb into ggerganov:master Jun 27, 2024
51 of 52 checks passed
@jukofyork
Copy link
Contributor Author

Ah yes, the llama.cpp public API is strictly a C API, if this function is part of the llama.cpp API then it cannot take a C++ class as a parameter.

Yeah, I think it wasn't a good idea trying to use a map even if not. I think I should be able to get this working very easily for a maximum of 1 projected direction, but might need some help again by the time I get the the ggml calls (likely early next week now).

Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Jun 28, 2024
* Fixed leak in llama_control_vector_load_one() and allow llama_control_vector_load() to grow

* refactored `llama_control_vector_load_one()`

* allow multiple directions for same layer in same file

* llama_control_vector_load_one() and llama_control_vector_load() now break on error

* removed unnecessary ggml_free() call
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Jun 28, 2024
* Fixed leak in llama_control_vector_load_one() and allow llama_control_vector_load() to grow

* refactored `llama_control_vector_load_one()`

* allow multiple directions for same layer in same file

* llama_control_vector_load_one() and llama_control_vector_load() now break on error

* removed unnecessary ggml_free() call
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jun 30, 2024
* Fixed leak in llama_control_vector_load_one() and allow llama_control_vector_load() to grow

* refactored `llama_control_vector_load_one()`

* allow multiple directions for same layer in same file

* llama_control_vector_load_one() and llama_control_vector_load() now break on error

* removed unnecessary ggml_free() call
MagnusS0 pushed a commit to MagnusS0/llama.cpp-normistral-tokenizer that referenced this pull request Jul 1, 2024
* Fixed leak in llama_control_vector_load_one() and allow llama_control_vector_load() to grow

* refactored `llama_control_vector_load_one()`

* allow multiple directions for same layer in same file

* llama_control_vector_load_one() and llama_control_vector_load() now break on error

* removed unnecessary ggml_free() call
@jukofyork
Copy link
Contributor Author

jukofyork commented Jul 14, 2024

This paper:

https://arxiv.org/abs/2311.06668

has a couple of interesting points, even though it looks to be almost the same and barely references the Representation Engineering: A Top-Down Approach to AI Transparency paper:

[42] introduce “representation engineering” to align model behavior
with certain concepts.

Screenshot_20240714-103805

I was thinking about scaling the sum between 1/n vs sqrt(1/n) depending on how correlated the directions are when doing this PR, but this seems a better idea.


This paper also seems to get task arithmetic of the vectors working, but so far I've not had much luck with this and am thinking about just creating a bunch of weights for each combination by enumerating equally spaced points on the n-simplex, and then blending the training data or hidden state samples when creating the direction vectors:

  • There is definitely a problem with non-independent traits like "Dark" and "Chaotic" creative writing styles overlapping and "over cooking" the hidden states.
  • I think the addition of the vectors is possibly also sending the hidden states "out of distribution":

I have the strange phenomenon of nicely working "Dark" and "Chaotic" creative writing style control vectors that work well on their own, but as soon as I try to mix them; the naturally "positive" models like mistral, miqu or wizard-lm-2 have the characters commit suicide about 75% of the time! :/

@jukofyork jukofyork deleted the fix-control-vector-loading branch July 27, 2024 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants