Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re AllTalk & updating the API #1249

Open
erew123 opened this issue Dec 3, 2024 · 13 comments
Open

Re AllTalk & updating the API #1249

erew123 opened this issue Dec 3, 2024 · 13 comments

Comments

@erew123
Copy link

erew123 commented Dec 3, 2024

Hi @LostRuins its erew123 from AllTalk. Hope you are keeping well. Figured I would be ok to catch you here.

Ive been very tempted to update the AllTalk integration at some point. Theres quite a few more TTS engines built in to AllTalk, RVC and other things that it can take advantage of. Ive also highly detailed the API with code examples https://github.com/erew123/alltalk_tts/wiki#-api-documentation and best practices etc https://github.com/erew123/alltalk_tts/wiki/API-%E2%80%90-Best-Practices-AllTalk-API-Integration

I think its the https://github.com/LostRuins/koboldcpp/blob/concedo/klite.embd file that the implementation code? I have to admit that's quite a large file and I have no idea how to go about updating it or where to poke about in the code to build the interface bits and then push all the functions in. Its Javascript too right? (my nemesis). I do have full Javascript examples of code working with AllTalk V2 in its entirety..... What do you think would be the best way going about attacking this (one day)?

Any thoughts/suggestions or advice would be appreciated.

Thanks

@LostRuins
Copy link
Owner

Hi @erew123 , thanks for the message. I apologize for the code mess, the implementation is indeed somewhat scattered. The relevant bits would be on the lite repo, at these 2 lines mainly:

Fetching Voices:
https://github.com/LostRuins/lite.koboldai.net/blob/main/index.html#L11950

Triggering Generation:
https://github.com/LostRuins/lite.koboldai.net/blob/main/index.html#L12180

Also for reference, the original PR is here although the implementation was subsequently moved
#719

As you can see, the current code is mostly expecting a synchronous JSON response for the payload. There are a few avenues we can consider to do streaming, but none of them are very straightforward except the XTTS hack (playing the streamed audio directly from XTTS) which will not work over the network.

The streamed audio source idea that you are currently using has merit but swapping it in is not trivial. You're welcome to give it a look and i'll be glad to help however I can.

Another alternative to consider is HTTP SSE streaming, however that is also not trivial to implement.

@erew123
Copy link
Author

erew123 commented Dec 3, 2024

does the kobold interface/code have support for an audioplayer aka, is it just a web-browser effectively? And, I may be shooting up the wrong path here, but, if it does, at a quick glance, do you think this code would probably mostly just slot in? https://github.com/erew123/alltalk_tts/blob/alltalkbeta/system/SillyTavern%20Extension/For%20AllTalk%20V2/alltalk.js

If so, this would give both a streaming and playing back wav/mp3/whatever, access to 5x TTS engines, RVC, narrator etc.

Its one of those things Ive had a lot going on in life and I have a lot going on with AllTalk, but in the back of my mind I keep having this little niggle that I need to give kobold some love when I can find some time! :)

@LostRuins
Copy link
Owner

Yes, it is a web browser, and has access to HTML audio/video capabilities. The code you've linked, however, is probably a bit too verbose and detailed to fit into lite, considering how packed it is already. Perhaps you could just keep the current reference implementation and port over the streaming portion?

If you have an easy way to launch an AllTalk colab instance, I can help talk a look with your current https://github.com/erew123/alltalk_tts/blob/alltalkbeta/system/admin.html#L804 page.

@erew123
Copy link
Author

erew123 commented Dec 3, 2024

Yep, colab is all sorted :) https://github.com/erew123/alltalk_tts/wiki/Google-COLAB

Though it will download the piper model as default, its an extra step to download XTTS... though I just built a new bit into alltlak for resolving first model downloads.

Unfortunately, the only engine I have that supports streaming is XTTS currently. The others don't as a manufacturer limitation. Well, kind of.. I do have a kludge in the backend that might tell your browser its streaming audio, but actually its generating a full wav/mp3 file and will send it over when its finished generating. Which might provide a workaround for getting non-streaming engines working.

Tell you what, you've pointed me at the right direction in the code, Im very in the middle of a few bits at the minute, but seeing as that support ticket came in, I thought id take the opportunity to touch base while it was fresh in my mind. Im going away for a week tomorrow, but let me come back to this in 10 days, have a poke around in the code, see where I get to and give you a shout back,

Thanks for responding and catch you back soon! (hopefully with a full PR)

@erew123 erew123 closed this as completed Dec 3, 2024
@erew123
Copy link
Author

erew123 commented Dec 3, 2024

P.S. thought it best to close the ticket, but will respond back here at some point. Doing my best to keep your ticket list clean!

@LostRuins
Copy link
Owner

cheers.

@LostRuins LostRuins reopened this Dec 3, 2024
@LostRuins
Copy link
Owner

I'll leave it open so we can follow up

@erew123
Copy link
Author

erew123 commented Dec 11, 2024

Hi @LostRuins

I'm back (for now) and have managed to give this a shot:

image

I can submit this over as a PR if you are ready for me to https://github.com/erew123/koboldcpp/blob/concedo/index.html

The lines I have changed are:

Benefits to you/Kobold are:

  • Added Standard Generation mode as an alternative to Streaming
  • Integrated RVC (voice conversion) support with voice selection
  • Added RVC pitch adjustment (-24 to +24)
  • RVC controls automatically disable when using Streaming mode
  • Standard generation mode set as default

This now opens up Kobold to all the TTS engines that AllTalk supports, as well as the RVC/Voice2voice pipeline. So for example, you can use Piper TTS, which is very low on GPU/CPU RAM and resource, then use the RVC/Voice2voice pipeline to change the TTS output to sound like any RVC based voice you want. Full details here https://github.com/erew123/alltalk_tts/wiki/RVC-(Retrieval%E2%80%90based-Voice-Conversion) along with a link to 100,000+ voices.

Streaming generation only works with the Coqui XTTS engine,

Other bits

I think I mentioned that Javascript/HTML isn't my favourite and therefore its highly possible that you may be able to make better tweaks or changes to the code so that things are placed in the file where you want them, or working a slightly different way. But it is (best I can test/tell) working now and its really opened up all TTS engines that AllTalk will be supporting both now and moving forward, as well as the updated Voice2Voice pipeline (when I release that soon, though RVC is already supported and working).

The only other thing I considered was a Refresh button, to re-pull the voices and rvc_voices list down, should someone update them, but refreshing the entire page will do that, so I thought it best to add as little code as possible.....

Maybe take a quick look and I can post a PR if you are ready? Or you are welcome to tell me that its a load of..... well, be gentle if my code is bad, as I say, Javascript is not my favourite!

@erew123
Copy link
Author

erew123 commented Dec 11, 2024

@LostRuins Oh forgot to say, using Standard generation will resolve any problems that Streaming generation suffers on certain browsers, aka, it should always work in any browser if you use Standard generation! :)

@LostRuins
Copy link
Owner

Cool thanks! I will check it out once I can... appreciate the effort!

@LostRuins
Copy link
Owner

LostRuins commented Dec 13, 2024

@erew123 could you create a PR to the Lite dev branch? It can be modified as we go, that way it's easier to visualize the changes

https://github.com/LostRuins/lite.koboldai.net/pulls

@erew123
Copy link
Author

erew123 commented Dec 13, 2024

@LostRuins Done LostRuins/lite.koboldai.net#98

Had to add a second update to match existing dev branch changes vs my index.html. You will see in my PR though. Needless to say, all should be correct.

@LostRuins
Copy link
Owner

Hi @erew123 , alltalkv2 integration is now available in v1.80. Hope you're doing well and do get back again when things are better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants