Re AllTalk & updating the API #1249

erew123 · 2024-12-03T14:45:13Z

Hi @LostRuins its erew123 from AllTalk. Hope you are keeping well. Figured I would be ok to catch you here.

Ive been very tempted to update the AllTalk integration at some point. Theres quite a few more TTS engines built in to AllTalk, RVC and other things that it can take advantage of. Ive also highly detailed the API with code examples https://github.com/erew123/alltalk_tts/wiki#-api-documentation and best practices etc https://github.com/erew123/alltalk_tts/wiki/API-%E2%80%90-Best-Practices-AllTalk-API-Integration

I think its the https://github.com/LostRuins/koboldcpp/blob/concedo/klite.embd file that the implementation code? I have to admit that's quite a large file and I have no idea how to go about updating it or where to poke about in the code to build the interface bits and then push all the functions in. Its Javascript too right? (my nemesis). I do have full Javascript examples of code working with AllTalk V2 in its entirety..... What do you think would be the best way going about attacking this (one day)?

Any thoughts/suggestions or advice would be appreciated.

Thanks

LostRuins · 2024-12-03T14:54:11Z

Hi @erew123 , thanks for the message. I apologize for the code mess, the implementation is indeed somewhat scattered. The relevant bits would be on the lite repo, at these 2 lines mainly:

Fetching Voices:
https://github.com/LostRuins/lite.koboldai.net/blob/main/index.html#L11950

Triggering Generation:
https://github.com/LostRuins/lite.koboldai.net/blob/main/index.html#L12180

Also for reference, the original PR is here although the implementation was subsequently moved
#719

As you can see, the current code is mostly expecting a synchronous JSON response for the payload. There are a few avenues we can consider to do streaming, but none of them are very straightforward except the XTTS hack (playing the streamed audio directly from XTTS) which will not work over the network.

The streamed audio source idea that you are currently using has merit but swapping it in is not trivial. You're welcome to give it a look and i'll be glad to help however I can.

Another alternative to consider is HTTP SSE streaming, however that is also not trivial to implement.

erew123 · 2024-12-03T15:02:40Z

does the kobold interface/code have support for an audioplayer aka, is it just a web-browser effectively? And, I may be shooting up the wrong path here, but, if it does, at a quick glance, do you think this code would probably mostly just slot in? https://github.com/erew123/alltalk_tts/blob/alltalkbeta/system/SillyTavern%20Extension/For%20AllTalk%20V2/alltalk.js

If so, this would give both a streaming and playing back wav/mp3/whatever, access to 5x TTS engines, RVC, narrator etc.

Its one of those things Ive had a lot going on in life and I have a lot going on with AllTalk, but in the back of my mind I keep having this little niggle that I need to give kobold some love when I can find some time! :)

LostRuins · 2024-12-03T15:23:59Z

Yes, it is a web browser, and has access to HTML audio/video capabilities. The code you've linked, however, is probably a bit too verbose and detailed to fit into lite, considering how packed it is already. Perhaps you could just keep the current reference implementation and port over the streaming portion?

If you have an easy way to launch an AllTalk colab instance, I can help talk a look with your current https://github.com/erew123/alltalk_tts/blob/alltalkbeta/system/admin.html#L804 page.

erew123 · 2024-12-03T15:35:47Z

Yep, colab is all sorted :) https://github.com/erew123/alltalk_tts/wiki/Google-COLAB

Though it will download the piper model as default, its an extra step to download XTTS... though I just built a new bit into alltlak for resolving first model downloads.

Unfortunately, the only engine I have that supports streaming is XTTS currently. The others don't as a manufacturer limitation. Well, kind of.. I do have a kludge in the backend that might tell your browser its streaming audio, but actually its generating a full wav/mp3 file and will send it over when its finished generating. Which might provide a workaround for getting non-streaming engines working.

Tell you what, you've pointed me at the right direction in the code, Im very in the middle of a few bits at the minute, but seeing as that support ticket came in, I thought id take the opportunity to touch base while it was fresh in my mind. Im going away for a week tomorrow, but let me come back to this in 10 days, have a poke around in the code, see where I get to and give you a shout back,

Thanks for responding and catch you back soon! (hopefully with a full PR)

erew123 · 2024-12-03T15:36:50Z

P.S. thought it best to close the ticket, but will respond back here at some point. Doing my best to keep your ticket list clean!

LostRuins · 2024-12-03T15:48:54Z

cheers.

LostRuins · 2024-12-03T15:49:06Z

I'll leave it open so we can follow up

erew123 · 2024-12-11T19:22:29Z

Hi @LostRuins

I'm back (for now) and have managed to give this a shot:

I can submit this over as a PR if you are ready for me to https://github.com/erew123/koboldcpp/blob/concedo/index.html

The lines I have changed are:

Added extra interface settings https://github.com/erew123/koboldcpp/blob/concedo/index.html#L19727-L19744
Handle those extra interface elements https://github.com/erew123/koboldcpp/blob/concedo/index.html#L11884-L11999
Deal with Standard or Streaming generation https://github.com/erew123/koboldcpp/blob/concedo/index.html#L12118-L12214

Benefits to you/Kobold are:

Added Standard Generation mode as an alternative to Streaming
Integrated RVC (voice conversion) support with voice selection
Added RVC pitch adjustment (-24 to +24)
RVC controls automatically disable when using Streaming mode
Standard generation mode set as default

This now opens up Kobold to all the TTS engines that AllTalk supports, as well as the RVC/Voice2voice pipeline. So for example, you can use Piper TTS, which is very low on GPU/CPU RAM and resource, then use the RVC/Voice2voice pipeline to change the TTS output to sound like any RVC based voice you want. Full details here https://github.com/erew123/alltalk_tts/wiki/RVC-(Retrieval%E2%80%90based-Voice-Conversion) along with a link to 100,000+ voices.

Streaming generation only works with the Coqui XTTS engine,

Other bits

I think I mentioned that Javascript/HTML isn't my favourite and therefore its highly possible that you may be able to make better tweaks or changes to the code so that things are placed in the file where you want them, or working a slightly different way. But it is (best I can test/tell) working now and its really opened up all TTS engines that AllTalk will be supporting both now and moving forward, as well as the updated Voice2Voice pipeline (when I release that soon, though RVC is already supported and working).

The only other thing I considered was a Refresh button, to re-pull the voices and rvc_voices list down, should someone update them, but refreshing the entire page will do that, so I thought it best to add as little code as possible.....

Maybe take a quick look and I can post a PR if you are ready? Or you are welcome to tell me that its a load of..... well, be gentle if my code is bad, as I say, Javascript is not my favourite!

erew123 · 2024-12-11T19:27:01Z

@LostRuins Oh forgot to say, using Standard generation will resolve any problems that Streaming generation suffers on certain browsers, aka, it should always work in any browser if you use Standard generation! :)

LostRuins · 2024-12-12T16:36:41Z

Cool thanks! I will check it out once I can... appreciate the effort!

LostRuins · 2024-12-13T02:06:01Z

@erew123 could you create a PR to the Lite dev branch? It can be modified as we go, that way it's easier to visualize the changes

https://github.com/LostRuins/lite.koboldai.net/pulls

erew123 · 2024-12-13T17:37:50Z

@LostRuins Done LostRuins/lite.koboldai.net#98

Had to add a second update to match existing dev branch changes vs my index.html. You will see in my PR though. Needless to say, all should be correct.

LostRuins · 2024-12-20T06:06:27Z

Hi @erew123 , alltalkv2 integration is now available in v1.80. Hope you're doing well and do get back again when things are better.

erew123 closed this as completed Dec 3, 2024

LostRuins reopened this Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re AllTalk & updating the API #1249

Re AllTalk & updating the API #1249

erew123 commented Dec 3, 2024

LostRuins commented Dec 3, 2024

erew123 commented Dec 3, 2024

LostRuins commented Dec 3, 2024

erew123 commented Dec 3, 2024

erew123 commented Dec 3, 2024

LostRuins commented Dec 3, 2024

LostRuins commented Dec 3, 2024

erew123 commented Dec 11, 2024

erew123 commented Dec 11, 2024

LostRuins commented Dec 12, 2024

LostRuins commented Dec 13, 2024 •

edited

Loading

erew123 commented Dec 13, 2024

LostRuins commented Dec 20, 2024

Re AllTalk & updating the API #1249

Re AllTalk & updating the API #1249

Comments

erew123 commented Dec 3, 2024

LostRuins commented Dec 3, 2024

erew123 commented Dec 3, 2024

LostRuins commented Dec 3, 2024

erew123 commented Dec 3, 2024

erew123 commented Dec 3, 2024

LostRuins commented Dec 3, 2024

LostRuins commented Dec 3, 2024

erew123 commented Dec 11, 2024

The lines I have changed are:

Benefits to you/Kobold are:

Other bits

erew123 commented Dec 11, 2024

LostRuins commented Dec 12, 2024

LostRuins commented Dec 13, 2024 • edited Loading

erew123 commented Dec 13, 2024

LostRuins commented Dec 20, 2024

LostRuins commented Dec 13, 2024 •

edited

Loading