-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re AllTalk & updating the API #1249
Comments
Hi @erew123 , thanks for the message. I apologize for the code mess, the implementation is indeed somewhat scattered. The relevant bits would be on the lite repo, at these 2 lines mainly: Fetching Voices: Triggering Generation: Also for reference, the original PR is here although the implementation was subsequently moved As you can see, the current code is mostly expecting a synchronous JSON response for the payload. There are a few avenues we can consider to do streaming, but none of them are very straightforward except the XTTS hack (playing the streamed audio directly from XTTS) which will not work over the network. The streamed audio source idea that you are currently using has merit but swapping it in is not trivial. You're welcome to give it a look and i'll be glad to help however I can. Another alternative to consider is HTTP SSE streaming, however that is also not trivial to implement. |
does the kobold interface/code have support for an audioplayer aka, is it just a web-browser effectively? And, I may be shooting up the wrong path here, but, if it does, at a quick glance, do you think this code would probably mostly just slot in? https://github.com/erew123/alltalk_tts/blob/alltalkbeta/system/SillyTavern%20Extension/For%20AllTalk%20V2/alltalk.js If so, this would give both a streaming and playing back wav/mp3/whatever, access to 5x TTS engines, RVC, narrator etc. Its one of those things Ive had a lot going on in life and I have a lot going on with AllTalk, but in the back of my mind I keep having this little niggle that I need to give kobold some love when I can find some time! :) |
Yes, it is a web browser, and has access to HTML audio/video capabilities. The code you've linked, however, is probably a bit too verbose and detailed to fit into lite, considering how packed it is already. Perhaps you could just keep the current reference implementation and port over the streaming portion? If you have an easy way to launch an AllTalk colab instance, I can help talk a look with your current https://github.com/erew123/alltalk_tts/blob/alltalkbeta/system/admin.html#L804 page. |
Yep, colab is all sorted :) https://github.com/erew123/alltalk_tts/wiki/Google-COLAB Though it will download the piper model as default, its an extra step to download XTTS... though I just built a new bit into alltlak for resolving first model downloads. Unfortunately, the only engine I have that supports streaming is XTTS currently. The others don't as a manufacturer limitation. Well, kind of.. I do have a kludge in the backend that might tell your browser its streaming audio, but actually its generating a full wav/mp3 file and will send it over when its finished generating. Which might provide a workaround for getting non-streaming engines working. Tell you what, you've pointed me at the right direction in the code, Im very in the middle of a few bits at the minute, but seeing as that support ticket came in, I thought id take the opportunity to touch base while it was fresh in my mind. Im going away for a week tomorrow, but let me come back to this in 10 days, have a poke around in the code, see where I get to and give you a shout back, Thanks for responding and catch you back soon! (hopefully with a full PR) |
P.S. thought it best to close the ticket, but will respond back here at some point. Doing my best to keep your ticket list clean! |
cheers. |
I'll leave it open so we can follow up |
Hi @LostRuins I'm back (for now) and have managed to give this a shot: I can submit this over as a PR if you are ready for me to https://github.com/erew123/koboldcpp/blob/concedo/index.html The lines I have changed are:
Benefits to you/Kobold are:
This now opens up Kobold to all the TTS engines that AllTalk supports, as well as the RVC/Voice2voice pipeline. So for example, you can use Piper TTS, which is very low on GPU/CPU RAM and resource, then use the RVC/Voice2voice pipeline to change the TTS output to sound like any RVC based voice you want. Full details here https://github.com/erew123/alltalk_tts/wiki/RVC-(Retrieval%E2%80%90based-Voice-Conversion) along with a link to 100,000+ voices. Streaming generation only works with the Coqui XTTS engine, Other bitsI think I mentioned that Javascript/HTML isn't my favourite and therefore its highly possible that you may be able to make better tweaks or changes to the code so that things are placed in the file where you want them, or working a slightly different way. But it is (best I can test/tell) working now and its really opened up all TTS engines that AllTalk will be supporting both now and moving forward, as well as the updated Voice2Voice pipeline (when I release that soon, though RVC is already supported and working). The only other thing I considered was a Maybe take a quick look and I can post a PR if you are ready? Or you are welcome to tell me that its a load of..... well, be gentle if my code is bad, as I say, Javascript is not my favourite! |
@LostRuins Oh forgot to say, using Standard generation will resolve any problems that Streaming generation suffers on certain browsers, aka, it should always work in any browser if you use Standard generation! :) |
Cool thanks! I will check it out once I can... appreciate the effort! |
@erew123 could you create a PR to the Lite dev branch? It can be modified as we go, that way it's easier to visualize the changes |
@LostRuins Done LostRuins/lite.koboldai.net#98 Had to add a second update to match existing dev branch changes vs my index.html. You will see in my PR though. Needless to say, all should be correct. |
Hi @erew123 , alltalkv2 integration is now available in v1.80. Hope you're doing well and do get back again when things are better. |
Hi @LostRuins its erew123 from AllTalk. Hope you are keeping well. Figured I would be ok to catch you here.
Ive been very tempted to update the AllTalk integration at some point. Theres quite a few more TTS engines built in to AllTalk, RVC and other things that it can take advantage of. Ive also highly detailed the API with code examples https://github.com/erew123/alltalk_tts/wiki#-api-documentation and best practices etc https://github.com/erew123/alltalk_tts/wiki/API-%E2%80%90-Best-Practices-AllTalk-API-Integration
I think its the https://github.com/LostRuins/koboldcpp/blob/concedo/klite.embd file that the implementation code? I have to admit that's quite a large file and I have no idea how to go about updating it or where to poke about in the code to build the interface bits and then push all the functions in. Its Javascript too right? (my nemesis). I do have full Javascript examples of code working with AllTalk V2 in its entirety..... What do you think would be the best way going about attacking this (one day)?
Any thoughts/suggestions or advice would be appreciated.
Thanks
The text was updated successfully, but these errors were encountered: