Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama : move vocab, grammar and sampling into separate files #8508

Merged
merged 11 commits into from
Jul 23, 2024

Conversation

ggerganov
Copy link
Owner

@ggerganov ggerganov commented Jul 16, 2024

Some refactoring attempts, mainly trying to reorganize the llama code to prepare for #5214 and #5215

API Changes:

  • llama_sample_grammar -> llama_grammar_sample

Summary:

  • move llama_vocab to llama-vocab.h/.cpp
  • move tokenizer implementations from llama.cpp to llama-vocab.cpp
  • move llama_sample_ implementation to llama-sampling.h/.cpp
  • move llama_grammar_ implementation to llama-grammar.h/.cpp

TODO:

  • Fix Makefile header deps
  • Suffix all internal APIs with _impl for consistency
    The reason for this change is to be able to more easily distinguish public from private calls and not rely on function overloads. For example:
    // in llama.cpp:
    
    // without "_impl" suffix
    void llama_set_rng_seed(struct llama_context * ctx, uint32_t seed) {
        llama_set_rng_seed(&ctx->sampling, seed);
    }
    
    // with "_impl" suffix
    void llama_set_rng_seed(struct llama_context * ctx, uint32_t seed) {
        llama_set_rng_seed_impl(&ctx->sampling, seed);
    }
    And the reason to have this indirection (llama.cpp:llama_set_rng_seed -> llama-samlping.cpp:llama_set_rng_seed_impl) is to decouple llama-sampling.cpp from llama_context

Conflicting PRs:

Follow-up PRs:

@ggerganov ggerganov force-pushed the gg/llama-reorganize branch 2 times, most recently from d4f8f52 to 516746a Compare July 16, 2024 11:51
@github-actions github-actions bot added testing Everything test related examples labels Jul 16, 2024
@ggerganov ggerganov force-pushed the gg/llama-reorganize branch 2 times, most recently from db39019 to 0049b1a Compare July 16, 2024 13:40
@ggerganov ggerganov force-pushed the gg/llama-reorganize branch 3 times, most recently from da7f831 to dc96d90 Compare July 19, 2024 13:33
src/llama-impl.h Outdated Show resolved Hide resolved
@mofosyne mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Jul 19, 2024
@ggerganov ggerganov force-pushed the gg/llama-reorganize branch 2 times, most recently from ec7c6d9 to 8c5f2c2 Compare July 19, 2024 15:16
@oldgithubman
Copy link

Some refactoring attempts, mainly trying to reorganize the llama code to prepare for #5214 and #5215

API Changes:

  • llama_sample_grammar -> llama_grammar_sample

Summary:

  • move llama_vocab to llama-vocab.h/.cpp
  • move tokenizer implementations from llama.cpp to llama-vocab.cpp
  • move llama_sample_ implementation to llama-sampling.h/.cpp
  • move llama_grammar_ implementation to llama-grammar.h/.cpp

TODO:

  • Fix Makefile header deps
  • Suffix all internal APIs with _impl for consistency

Are we mitigating breakages this time?

@ggerganov ggerganov force-pushed the gg/llama-reorganize branch from 8c5f2c2 to 0c14b04 Compare July 22, 2024 14:17
@ggerganov ggerganov force-pushed the gg/llama-reorganize branch from 0c14b04 to 39fbaf9 Compare July 22, 2024 16:46
@ggerganov ggerganov marked this pull request as ready for review July 22, 2024 17:05
@ggerganov ggerganov requested a review from slaren July 22, 2024 17:05
@ggerganov
Copy link
Owner Author

This is a first step in partitioning llama.cpp into multiple files for easier maintenance. If this seems reasonable, we can after that do similar stuff for the KV cache, lora, state, etc.

Copy link
Collaborator

@slaren slaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for a first step, it will probably need more work to completely decouple the different components. Some notes:

  • llama_get_vocab, llama_get_sampling are unused and probably should be removed
  • llm_load_vocab should eventually be moved to llama-vocab.cpp
  • The symbols in unicode.h and unicode-data.h should have a llama_ prefix, or moved to a namespace
  • LLAMA_API_INTERNAL in llama.h should be removed, and tests should include the private headers instead

src/llama-impl.h Outdated Show resolved Hide resolved
@oldgithubman
Copy link

This is a first step in partitioning llama.cpp into multiple files for easier maintenance. If this seems reasonable, we can after that do similar stuff for the KV cache, lora, state, etc.

I think it's reasonable if you can avoid breakages

@ggerganov
Copy link
Owner Author

LLAMA_API_INTERNAL in llama.h should be removed, and tests should include the private headers instead

I started implementing that in #8643. Will look to merge this PR in the meantime to avoid resolving bigger conflicts

@ggerganov ggerganov merged commit 938943c into master Jul 23, 2024
54 checks passed
@ggerganov ggerganov changed the title llama : refactoring llama : move vocab, grammar and sampling into separate files Jul 23, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 27, 2024
…ov#8508)

* llama : move sampling code into llama-sampling

ggml-ci

* llama : move grammar code into llama-grammar

ggml-ci

* cont

ggml-ci

* cont : pre-fetch rules

* cont

ggml-ci

* llama : deprecate llama_sample_grammar

* llama : move tokenizers into llama-vocab

ggml-ci

* make : update llama.cpp deps [no ci]

* llama : redirect external API to internal APIs

ggml-ci

* llama : suffix the internal APIs with "_impl"

ggml-ci

* llama : clean-up
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants