server: add repeat penalty sigmoid #9076

z80maniac · 2024-08-18T12:57:14Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Summary

This PR for server allows to apply a sigmoid (to be precise - the logistic curve) function to the repeat_penalty over the repeat_last_n range.

It may be useful to apply more penalty for the tokens that are closer to the end of the text, and less penalty to the tokens at the beginning of the penalty range. This will allow to set higher penalty values and they will be applied only to the recent tokens, and the older tokens will receive lower penalty and AI will have a chance to use them more freely for inference. This feature was inspired by KoboldAI's repetition penalty slope parameter, which in turn got it from NovelAI. However, the implementation in the current PR functions slightly differently (explained below), so I named it differently too to avoid confusion.

Math

The new parameter is added to the server API: repeat_penalty_sigmoid_growth. It only affects repeat_penalty, not other penalties. This param is called B in the Wikipedia, but let's call it growth here.

growth = 0 - the feature is disabled (default). The repetition penalty is constant across the entire penalty range.
growth = 1 - the penalty will be changing linearly within the repeat_last_n range from 1 to repeat_penalty.
growth > 1 - the usual logistic curve is applied to the penalty, making it grow slower at the start, then raise rapidly in the middle, and then slowing down towards the end of the range. The formula is k = 1 / (1 + exp((-x + 0.5) * growth)), where x is the normalized token position from the start of the penalty range, and k is the coefficient to be applied to the penalty (explained below).
0 < growth < 1 - a regular sigmoid function will make almost no difference within this range, but I wanted this range to be useful somehow. So I "invented" what I called in the source code "mirrored sigmoid". It means that for the range of (0;1) the logistic function is mirrored relative to k=x diagonal. The formula is k = 0.5 - log((1 - x) / x) / growth.
growth < 0 - basically, the same as above, but mirrored vertically (relative to k=0.5 line).

All x and k are normalized in the range of [0;1]. In the current implementation the mirrored sigmoid is technically not smooth at x=0 and x=1, but I don't think it matters in practice.

The k is applied to the initial penalty so the resulting penalty changes from 1 to repeat_penalty. For example, if k = 0.9 and repeat_penalty = 1.5 then the resulting penalty is 1.45. If k = 0.9 and repeat_penalty = 0.5 then the resulting penalty is 0.55.

Graphs

Notes

If the "mirrored sigmoid" is too weird, I can remove it.
I put all the code in the sigmoid struct to better organize it. It will also allow to quickly add the same sigmoid to the other penalties (presence and frequency) if needed. Since it is only used in one function, I put the struct right into that function.
In the sigmoid's constructor I initialize all the fields even if they are not used afterwards (when enabled=false), because otherwise the compiler will print lots of warnings about possibly uninitialized fields.
The new code uses a long identifier name penalty_repeat_sigmoid_growth and it does align with some of the existing formatting.
The position of the penalized token (x) is the position of the last occurrence of this token in the penalty range.
I measured the sampling speed with and without this functionality and didn't observe any measurable impact.
Some tests are added to tests/test-sampling.cpp.

z80maniac · 2024-09-25T17:47:30Z

Added some tests in tests/test-sampling.cpp.

github-actions bot added testing Everything test related examples server labels Aug 18, 2024

z80maniac force-pushed the penalty-sigmoid branch from 623ffb9 to db49390 Compare September 15, 2024 12:05

z80maniac added 2 commits September 25, 2024 19:59

server: add repeat penalty sigmoid

3722c72

add tests

c795d8b

z80maniac force-pushed the penalty-sigmoid branch from db49390 to c795d8b Compare September 25, 2024 17:45

z80maniac marked this pull request as ready for review September 25, 2024 17:48

fix formatting

8b0d3ab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: add repeat penalty sigmoid #9076

server: add repeat penalty sigmoid #9076

z80maniac commented Aug 18, 2024 •

edited

Loading

z80maniac commented Sep 25, 2024

server: add repeat penalty sigmoid #9076

Are you sure you want to change the base?

server: add repeat penalty sigmoid #9076

Conversation

z80maniac commented Aug 18, 2024 • edited Loading

Summary

Math

Graphs

Notes

z80maniac commented Sep 25, 2024

z80maniac commented Aug 18, 2024 •

edited

Loading