Releases: kalomaze/text-generation-webui
snapshot-2023-11-19
Full Changelog: exl2-entropy...snapshot-2023-11-19
snapshot-2023-11-12
Full Changelog: exl2-entropy...snapshot-2023-11-12
snapshot-2023-11-05
Full Changelog: exl2-entropy...snapshot-2023-11-05
snapshot-2023-10-29
MinP for Exllama 2
Min P sampling added. When Top P = 0.69
it will override and scale based on 'Min P'. Replace the sampler.py in /text-generation-webui-main/installer_files/env/Lib/site-packages/exllamav2/generator
and it should function.
The way that it works is:
- Every possible token has a probability percentage attached to it.
- The base min p value represents the starting required percentage. (For example, 0.05 = only include tokens that are at least 5% probable)
- This gets multiplied by the top token in the entire list's probability. So if your top token is 90%, then that 5% is multiplied by 0.9x (4.5%)
- So if the top token is 90% probable, and your base_min_p is set to 0.05, then only tokens that are at least 4.5% probable will be sampled from before temperature is applied.
This method seems more effective at selecting the reasonable tokens compared to both Top P and Top K.
Edit the SamplerBaseMinP.txt
file to change the base 'consideration' value. The default is 0.05 (5%), but lower values can work surprisingly well even with a high temperature.
This is how you toggle it on.
Note: This is built off the ALT version of the Entropy sampling implementation, but the Dynamic Temp is still only applied if your temp is set to 1.84, so you are not forced to use it.
Graphic Explanation of Min P:
Exllama2 Entropy Sampling implementation for Text-Generation-WebUI
This is kind of a hackjob, but it works.
Replace the sampler.py in /text-generation-webui-main/installer_files/env/Lib/site-packages/exllamav2/generator
and it should work.
- Set 1.84 temperature to override normal temp, and enable the Dynamic Temp sampling.
- You can configure the EntropyTemp.txt file that gets created in your
/text-generation-webui-main/
folder to change the minimum and maximum temperature respectively. By default, these are 0.0 minimum temp and 2.0 maximum temp.
The way Entropy sampling works is by measuring the amount of randomness in the probabilities. If there is high certainty in the predictions, it will use a value closer to your minimum temperature setting. This means that the more good choices there are, the more random or 'creative' it will get; this includes lowering Top P or Top K, since this will narrow it down to more probable choices which causes it to scale higher.
EDIT: Reuploaded to fix 0 temp issue specific to ExLlama not handling it gracefully like kobold.
EDIT EDIT: Reuploaded again to fix random typo. Oops.
ADDITIONAL NOTE:
- sampler_ALT is an alternative version. Obviously you will need to replace the 'sampler.py' in the same way. This one is different, in that it measures the raw probability distribution before it is changed, which may be more consistent across different sampler settings (e.g Top P 0.90, etc...) Your mileage may vary, this is not tested. It's just a hunch I had that measuring consistently might improve the results across different sampler settings; 2.0 max temp might be too low considering how high the entropy can (theoretically) scale, though, so keep that in mind.
- EDIT: I'm being told that sampler_ALT trends toward being a lot more deterministic, but still seems to introduce a good amount of variation into extremely quantized (think 2.5bpw) 70b models. YMMV though!