Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: add repeat penalty sigmoid #9076

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

z80maniac
Copy link
Contributor

@z80maniac z80maniac commented Aug 18, 2024

Summary

This PR for server allows to apply a sigmoid (to be precise - the logistic curve) function to the repeat_penalty over the repeat_last_n range.

It may be useful to apply more penalty for the tokens that are closer to the end of the text, and less penalty to the tokens at the beginning of the penalty range. This will allow to set higher penalty values and they will be applied only to the recent tokens, and the older tokens will receive lower penalty and AI will have a chance to use them more freely for inference. This feature was inspired by KoboldAI's repetition penalty slope parameter, which in turn got it from NovelAI. However, the implementation in the current PR functions slightly differently (explained below), so I named it differently too to avoid confusion.

Math

The new parameter is added to the server API: repeat_penalty_sigmoid_growth. It only affects repeat_penalty, not other penalties. This param is called B in the Wikipedia, but let's call it growth here.

  • growth = 0 - the feature is disabled (default). The repetition penalty is constant across the entire penalty range.

  • growth = 1 - the penalty will be changing linearly within the repeat_last_n range from 1 to repeat_penalty.

  • growth > 1 - the usual logistic curve is applied to the penalty, making it grow slower at the start, then raise rapidly in the middle, and then slowing down towards the end of the range. The formula is k = 1 / (1 + exp((-x + 0.5) * growth)), where x is the normalized token position from the start of the penalty range, and k is the coefficient to be applied to the penalty (explained below).

  • 0 < growth < 1 - a regular sigmoid function will make almost no difference within this range, but I wanted this range to be useful somehow. So I "invented" what I called in the source code "mirrored sigmoid". It means that for the range of (0;1) the logistic function is mirrored relative to k=x diagonal. The formula is k = 0.5 - log((1 - x) / x) / growth.

  • growth < 0 - basically, the same as above, but mirrored vertically (relative to k=0.5 line).

All x and k are normalized in the range of [0;1]. In the current implementation the mirrored sigmoid is technically not smooth at x=0 and x=1, but I don't think it matters in practice.

The k is applied to the initial penalty so the resulting penalty changes from 1 to repeat_penalty. For example, if k = 0.9 and repeat_penalty = 1.5 then the resulting penalty is 1.45. If k = 0.9 and repeat_penalty = 0.5 then the resulting penalty is 0.55.

Graphs

sigmoid_pr

Notes

  • If the "mirrored sigmoid" is too weird, I can remove it.

  • I put all the code in the sigmoid struct to better organize it. It will also allow to quickly add the same sigmoid to the other penalties (presence and frequency) if needed. Since it is only used in one function, I put the struct right into that function.

  • In the sigmoid's constructor I initialize all the fields even if they are not used afterwards (when enabled=false), because otherwise the compiler will print lots of warnings about possibly uninitialized fields.

  • The new code uses a long identifier name penalty_repeat_sigmoid_growth and it does align with some of the existing formatting.

  • The position of the penalized token (x) is the position of the last occurrence of this token in the penalty range.

  • I measured the sampling speed with and without this functionality and didn't observe any measurable impact.

  • Some tests are added to tests/test-sampling.cpp.

@z80maniac
Copy link
Contributor Author

Added some tests in tests/test-sampling.cpp.

@z80maniac z80maniac marked this pull request as ready for review September 25, 2024 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples server testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant