Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama : llama_perf + option to disable timings during decode #9355

Merged
merged 9 commits into from
Sep 13, 2024

Conversation

ggerganov
Copy link
Owner

@ggerganov ggerganov commented Sep 7, 2024

  • Add option to disable time system calls during decode (llama_context_params.no_perf). Performance measurements are disabled by default for libllama, but for the examples in llama.cpp they are enabled by default
  • Restore getting internal timing information llama_perf_get

TODO:


@ggerganov ggerganov marked this pull request as ready for review September 8, 2024 05:58
@ggerganov ggerganov mentioned this pull request Sep 8, 2024
4 tasks
common/common.cpp Outdated Show resolved Hide resolved
@ggerganov ggerganov added the breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. label Sep 10, 2024
Comment on lines 344 to 348
bool offload_kqv; // whether to offload the KQV ops (including the KV cache) to GPU
bool flash_attn; // whether to use flash attention [EXPERIMENTAL]
//bool no_perf; // whether to measure performance timings, TODO: implement
bool no_perf; // whether to measure performance timings

// Abort callback
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is minor libllama API breaking change due to the addition of the no_perf parameter

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will be a breaking change, since struct llama_context_params is expected to be created by llama_context_default_params(), right?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggerganov ggerganov requested review from slaren and ngxson September 10, 2024 09:27
src/llama.cpp Outdated Show resolved Hide resolved
src/llama.cpp Show resolved Hide resolved
Co-authored-by: Xuan Son Nguyen <[email protected]>
include/llama.h Outdated
enum llama_perf_type {
LLAMA_PERF_TYPE_CONTEXT = 0,
LLAMA_PERF_TYPE_SAMPLER_CHAIN = 1,
};

LLAMA_API struct llama_perf_data llama_perf_get(const void * ctx, enum llama_perf_type type);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be preferable to have two separate functions, just to remove the possibility of calling it with the wrong type of pointer.

src/llama.cpp Outdated
}

const auto * p = (const struct llama_sampler_chain *) chain->ctx;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These casts are very error prone and should always be checked. To do so, I would suggest moving these functions to llama-sampling.cpp, and checking the interface pointer. The llama_sampler_chain struct could also be moved to llama-sampling.cpp.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, since this only works with the chain sampler, it should be documented somewhere, either in the function/struct names, or with an explicit comment, otherwise the natural assumption is that it should work with any sampler.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon passing a non-chain sampler, should it return empty data or call GGML_ABORT()?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an abort would be better here until we can return status codes from functions, since it is most definitely not intended and the important part is that the programmer notices.

@ggerganov ggerganov merged commit 0abc6a2 into master Sep 13, 2024
58 checks passed
@ggerganov ggerganov deleted the gg/llama-perf branch September 13, 2024 06:53
dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
…ov#9355)

* llama : llama_perf + option to disable timings during decode

ggml-ci

* common : add llama_arg

* Update src/llama.cpp

Co-authored-by: Xuan Son Nguyen <[email protected]>

* perf : separate functions in the API

ggml-ci

* perf : safer pointer handling + naming update

ggml-ci

* minor : better local var name

* perf : abort on invalid sampler pointer

ggml-ci

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
…ov#9355)

* llama : llama_perf + option to disable timings during decode

ggml-ci

* common : add llama_arg

* Update src/llama.cpp

Co-authored-by: Xuan Son Nguyen <[email protected]>

* perf : separate functions in the API

ggml-ci

* perf : safer pointer handling + naming update

ggml-ci

* minor : better local var name

* perf : abort on invalid sampler pointer

ggml-ci

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
…ov#9355)

* llama : llama_perf + option to disable timings during decode

ggml-ci

* common : add llama_arg

* Update src/llama.cpp

Co-authored-by: Xuan Son Nguyen <[email protected]>

* perf : separate functions in the API

ggml-ci

* perf : safer pointer handling + naming update

ggml-ci

* minor : better local var name

* perf : abort on invalid sampler pointer

ggml-ci

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. examples
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants