You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm interested in using the encoder to encode an audio fragment of a few seconds into just one codebook vector. However, the model returns a sequence of several audio_codes (of course, it is the only way to succesfully decode the audio afterwards).
How would you recommend using the encoder, and/or pre-postprocessing the audio input or audio_codes to obtain just one audio code "at utterance level"?
Thanks in advance.
The text was updated successfully, but these errors were encountered:
❓ Questions
I'm interested in using the encoder to encode an audio fragment of a few seconds into just one codebook vector. However, the model returns a sequence of several
audio_codes
(of course, it is the only way to succesfully decode the audio afterwards).How would you recommend using the encoder, and/or pre-postprocessing the audio input or
audio_codes
to obtain just one audio code "at utterance level"?Thanks in advance.
The text was updated successfully, but these errors were encountered: