Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two outputs from gemini-2.0-flash #379

Open
AleksNet5 opened this issue Dec 20, 2024 · 6 comments
Open

Two outputs from gemini-2.0-flash #379

AleksNet5 opened this issue Dec 20, 2024 · 6 comments
Assignees
Labels
component:other Issues unrelated to examples/quickstarts status:triaged Issue/PR triaged to the corresponding sub-team type:feature request New feature request/enhancement

Comments

@AleksNet5
Copy link

Description of the feature request:

Hello everyone!

I want to make gemini-2.0-flash output to me both text and audio, but when I trying to add TEXT variable to response_modalities I get this kind of error : [ORIGINAL ERROR] generic::invalid_argument: Error in program Instantiation for language; then sent 1007 (invalid frame payload data) Request trace id: 4ad28f357e6c292e, [ORIGINAL ERROR] generic::invalid_argument: Error in program Instantiation for language

What problem are you trying to solve with this feature?

Two outputs for one respons

Any other information you'd like to share?

I could not find any information related to this

@Giom-V
Copy link
Collaborator

Giom-V commented Dec 21, 2024

Audio-out is only available to a selected few early access customers. You can only use the live api at the moment that only outputs audio.

@manojssmk manojssmk added type:feature request New feature request/enhancement status:triaged Issue/PR triaged to the corresponding sub-team component:other Issues unrelated to examples/quickstarts labels Dec 24, 2024
@LarsDu
Copy link

LarsDu commented Dec 29, 2024

I second this feature request!
I've been building a Unity game engine plugin for Gemini using the native audio feature. It would be incredibly useful to output both text and audio at the same time, and it would also spare hitting gemini twice with identical requests.

I suspect the reason it currently outputs only one or the other is because there probably isn't an intermediate text output head on multimodal llm model and there isn't an intermediate text representation when getting audio output?

If this is the case, and we can't expect matched text and audio output in the future, please let us know, as one can nowadays easily wire up a Whisper type speech to text model to get the text from audio, but of course this requires additional overhead.

@Giom-V
Copy link
Collaborator

Giom-V commented Jan 6, 2025

I'll route the feature request to the product team. I also agree that it would be great to get both but I'm not sure either of the feasability considering the model natively outputs audio.

@mdailey
Copy link

mdailey commented Jan 16, 2025

Interesting...

In December, I could only get audio or text from the live API (Vertex AI mode) but not both, but no error when requesting both...

As of last week up until yesterday, I was getting both text and audio from the live API by specifying response_modalities=['AUDIO', 'TEXT'] in the config and also emphasizing that I want both audio and text in the system instruction.

But then today, I ran the same code and got error 1007 (invalid frame payload data) and "generic::invalid_argument: Only one of text or audio output is allowed."

It's fun to experiment with a rapidly changing tool! But here's one vote for allowing us to continue to be able to get both audio and text. Otherwise, why would response_modalities be a list? 😃 And it was working fine yesterday, after all... 😸

@Giom-V
Copy link
Collaborator

Giom-V commented Jan 20, 2025

For things like native image-out or audio-out but those features are not available yet.

@mikecpeck
Copy link

We're also looking to implement the Live API to return both the Audio and the Text in the response, and have not had luck specifying both in response_modalities=['AUDIO', 'TEXT'].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:other Issues unrelated to examples/quickstarts status:triaged Issue/PR triaged to the corresponding sub-team type:feature request New feature request/enhancement
Projects
None yet
Development

No branches or pull requests

6 participants