Two outputs from gemini-2.0-flash #379

AleksNet5 · 2024-12-20T10:52:32Z

Description of the feature request:

Hello everyone!

I want to make gemini-2.0-flash output to me both text and audio, but when I trying to add TEXT variable to response_modalities I get this kind of error : [ORIGINAL ERROR] generic::invalid_argument: Error in program Instantiation for language; then sent 1007 (invalid frame payload data) Request trace id: 4ad28f357e6c292e, [ORIGINAL ERROR] generic::invalid_argument: Error in program Instantiation for language

What problem are you trying to solve with this feature?

Two outputs for one respons

Any other information you'd like to share?

I could not find any information related to this

The text was updated successfully, but these errors were encountered:

Giom-V · 2024-12-21T22:08:40Z

Audio-out is only available to a selected few early access customers. You can only use the live api at the moment that only outputs audio.

LarsDu · 2024-12-29T23:05:53Z

I second this feature request!
I've been building a Unity game engine plugin for Gemini using the native audio feature. It would be incredibly useful to output both text and audio at the same time, and it would also spare hitting gemini twice with identical requests.

I suspect the reason it currently outputs only one or the other is because there probably isn't an intermediate text output head on multimodal llm model and there isn't an intermediate text representation when getting audio output?

If this is the case, and we can't expect matched text and audio output in the future, please let us know, as one can nowadays easily wire up a Whisper type speech to text model to get the text from audio, but of course this requires additional overhead.

Giom-V · 2025-01-06T10:37:43Z

I'll route the feature request to the product team. I also agree that it would be great to get both but I'm not sure either of the feasability considering the model natively outputs audio.

mdailey · 2025-01-16T23:30:30Z

Interesting...

In December, I could only get audio or text from the live API (Vertex AI mode) but not both, but no error when requesting both...

As of last week up until yesterday, I was getting both text and audio from the live API by specifying response_modalities=['AUDIO', 'TEXT'] in the config and also emphasizing that I want both audio and text in the system instruction.

But then today, I ran the same code and got error 1007 (invalid frame payload data) and "generic::invalid_argument: Only one of text or audio output is allowed."

It's fun to experiment with a rapidly changing tool! But here's one vote for allowing us to continue to be able to get both audio and text. Otherwise, why would response_modalities be a list? 😃 And it was working fine yesterday, after all... 😸

Giom-V · 2025-01-20T13:16:15Z

For things like native image-out or audio-out but those features are not available yet.

mikecpeck · 2025-02-03T19:29:11Z

We're also looking to implement the Live API to return both the Audio and the Text in the response, and have not had luck specifying both in response_modalities=['AUDIO', 'TEXT'].

manojssmk added type:feature request New feature request/enhancement status:triaged Issue/PR triaged to the corresponding sub-team component:other Issues unrelated to examples/quickstarts labels Dec 24, 2024

manojssmk assigned Giom-V Dec 24, 2024

github-actions bot mentioned this issue Jan 1, 2025

Monthly issue metrics report markmcd/gemini-api-cookbook#10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Two outputs from gemini-2.0-flash #379

Two outputs from gemini-2.0-flash #379

AleksNet5 commented Dec 20, 2024

Giom-V commented Dec 21, 2024 •

edited

Loading

LarsDu commented Dec 29, 2024

Giom-V commented Jan 6, 2025

mdailey commented Jan 16, 2025

Giom-V commented Jan 20, 2025

mikecpeck commented Feb 3, 2025

Two outputs from gemini-2.0-flash #379

Two outputs from gemini-2.0-flash #379

Comments

AleksNet5 commented Dec 20, 2024

Description of the feature request:

What problem are you trying to solve with this feature?

Any other information you'd like to share?

Giom-V commented Dec 21, 2024 • edited Loading

LarsDu commented Dec 29, 2024

Giom-V commented Jan 6, 2025

mdailey commented Jan 16, 2025

Giom-V commented Jan 20, 2025

mikecpeck commented Feb 3, 2025

Giom-V commented Dec 21, 2024 •

edited

Loading