Replies: 3 comments
-
Hey, thanks for bringing this up! You're right, we intentionally left out those parameters. There are two main reasons for this:
That being said, adding support for more parameters isn't really a big deal. We do plan to make the API support as many features as possible in future updates, including the parameters you mentioned. |
Beta Was this translation helpful? Give feedback.
-
even if i passed the style param in the google api,it could not work, i tried it on the playground,all the styles were the same voice |
Beta Was this translation helpful? Give feedback.
-
"All the styles were the same voice" might just be a perception issue, but the control over speed and intonation should be effective. For instance, if you set the prefix to The weak style control ability is actually expected, and further improvements in control capabilities depend on the official release of more components by ChatTTS. Firstly, the current style (i.e., the prompt1 and prompt2 slots) can be seen as a mechanism similar to the system prompt in ChatGPT. However, the control ability is not strong because ChatTTS has not specifically fine-tuned for these parameters (at least not in the current released version). Therefore, future improvements might depend on the official release of new models or provide LoRA fine-tuning solutions for the community to address. Secondly, stronger style control relies on the ChatTTS encoder weights, which are also not open-sourced at the moment. As a result, we can only use "text as a system prompt" instead of "audio as a system prompt." Similarly, this also depends on further open-sourcing by the official team. |
Beta Was this translation helpful? Give feedback.
-
Issue Description
Hello,
I have noticed a potential issue in the
google_text_synthesize
function within thegoogle_api.py
module of the ChatTTS-Forge project. Specifically, theChatTTSConfig
object is being instantiated without theprompt1
,prompt2
, andprefix
parameters. The current code snippet is as follows:However, it should include the
prompt1
,prompt2
, andprefix
parameters to ensure that the TTS generation can accurately reflect the desired emotional tone and style. The corrected code should look like this:Without these parameters, downstream components will not be able to access
prompt1
,prompt2
, andprefix
, which may result in the generated speech lacking the intended emotional variation corresponding to different styles.Could you please confirm whether this omission was intentional or an oversight? If it was an oversight, could you update the code to include these parameters?
Thank you for your attention to this matter.
Beta Was this translation helpful? Give feedback.
All reactions