Gemini 2.0 Flash structured output value repetition and missing fields #449

Yiling-J · 2025-02-09T09:37:23Z

Description of the bug:

When using Gemini 2.0 Flash with structured output, the model often repeats values indefinitely until it reaches the token limit.
When using Gemini 2.0 Flash with structured output, some expected fields are not generated in the output sometimes.

Schema Used:

{
  "type": "object",
  "properties": {
    "data": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "UkLWZg": {
            "type": "string"
          },
          "gbHJdm": {
            "type": "string"
          },
          "EfhxLZ": {
            "type": "array",
            "items": {
              "type": "string"
            }
          },
          "VqXmZF": {
            "type": "string"
          },
          "uw2YK1": {
            "type": "integer"
          }
        }
      }
    }
  }
}

Prompt Used:

Give me 10 new rows for the table in JSON array format.  
<TableDescription>table of recipes</TableDescription>  
Generate values for the following missing columns:  
<MissingColumns>  
  <Column id="UkLWZg" name="Name" description="recipe name" type="string" >  
  <Column id="gbHJdm" name="Country" description="country" type="string" >  
  <Column id="EfhxLZ" name="Ingredients List" description="The list of ingredients for the recipe." type="array" >  
  <Column id="VqXmZF" name="Instructions" description="The instructions for preparing the recipe." type="string" >  
  <Column id="uw2YK1" name="Preparation Time" description="The total time required to prepare the recipe, in minutes." type="integer" >  
</MissingColumns>

Actual vs expected behavior:

Repeated Values:

Missing Fields:

Using the same schema and prompt, GPT-4o generates a structured response without value repetition or missing fields.
I didn't test o3 mini or other OpenAI models due to OpenAI's restrictions. This report is based on tests using the GPT-4o model on GitHub Models.

Any other information you'd like to share?

No response

The text was updated successfully, but these errors were encountered:

redwraith2 · 2025-02-10T06:45:01Z

I have also experienced missing fields using a relatively simple schema for a list of multiple choice quiz questions.

class QuestionAnswerSet(BaseModel):
question: str = Field(..., description="The quiz question based on the input image.")
correct_answer: str = Field(..., description="The correct answer to the quiz question based on the input image.")
false_answers: List[str] = Field(..., description="A list of exactly three false answers.")

class QuestionAnswerList(BaseModel):
RootModel: List[QuestionAnswerSet]

felixvor · 2025-02-10T08:03:58Z

We had this exact problem in a few-shot document classification/entity extraction use case (with Gemini 1.5-pro). For now, we were able to fix it by downgrading google-cloud-aiplatform:

pip install "google-cloud-aiplatform==1.69.0" --force-reinstall

We can reproduce the error reliably in sandboxed environments by switching the package versions back and forth - from looping text output until token limit is reached (latest version) to straight forward correct output (version 1.69.0). Sadly, the few shot documents used to create this problem are proprietary and I can't share them. I really can't imagine what the package is doing to influence the model outputs this badly, but that's what we're working with...

Maybe this solves the issue for anyone else who has this problem!

VladimirArustamian · 2025-02-10T11:57:13Z

I have the same problem with structured output both in gemini-1.5-pro-002, but it's even worse in gemini-flash-2.0
It basically makes the new Flash model unusable for me.
I also see a tendency to go into repetition when I feed it German language.

KelSolaar · 2025-02-11T02:27:12Z

@felixvor : Out of curiosity, do you know what could be the difference between 1.69.0 and whatever version you had before?

felixvor · 2025-02-11T10:38:44Z

@felixvor : Out of curiosity, do you know what could be the difference between 1.69.0 and whatever version you had before?

No idea, I checked the changelogs but they gave me no clue insofar they seemed to have nothing to do with the way the model generates text.

We are currently experimenting using the latest version again in combination with frequency penalty and presence penalty. Maybe that also helps in your case!

Update: The problem now appears with "google-cloud-aiplatform==1.69.0" as well and can easily be reproduced. So either I got extremely lucky in all my previous back and forth "version-switch" experiments, or they did some further changes in the backend that now break our downgrade fix.

felixvor · 2025-02-13T17:14:11Z

I found a solution 🙌

Use the required attribute in your response schema and include the names of all fields from your schema (Docs). Then use a keyword like the string "" in your prompt or empty lists in your schema to make the model respond to optional fields correctly. In our case this stopped the model from generating repeating sequences.

{
  "type": "object",
  "properties": {
    "data": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "UkLWZg": {
            "type": "string"
          },
          "gbHJdm": {
            "type": "string"
          },
          "EfhxLZ": {
            "type": "array",
            "items": {
              "type": "string"
            }
          },
          "VqXmZF": {
            "type": "string"
          },
          "uw2YK1": {
            "type": "integer"
          }
        }
      }
    }
  },
  "required": ["UkLWZg", "gbHJdm", "EfhxLZ", "VqXmZF", "uw2YK1"]
}

I hope this helps others who encounter this problem!

Yiling-J · 2025-02-14T01:25:03Z

@felixvor Thank you for your suggestion! I’ve added the required field and tested it 10 times. Everything works as expected, and no errors were encountered. I do feel the quality difference with/without the required field is a bit dramatic. It seems the issue has already been triaged, so let’s wait for an official recommendation/reply from the Gemini team.

karayanni · 2025-02-15T02:47:20Z

Same issue here. we use Gemini 2.0 Flash - apparently this problem has been faced across other models in the past:
GPT for instance

I will go ahead and try the Required attributed - thanks @felixvor

We currently use the Python data classes for structuring:

class SomeObject(BaseModel):
    field_1: str
    field_2: str
    field_3: int

class ListOfFirstObject(BaseModel):
    objects_list: list[SomeObject]

Is there a way to add the required attribute using the python data classes? or should we switch to a hard coded schema?

karayanni · 2025-02-15T02:49:21Z

@markmcd Thought this might be worth taking a look at. Thanks

felix-vorlaender-mittelstand-ai · 2025-02-17T08:18:41Z

@karayanni Try to use pydantic Fields. You can use the ... (Ellipsis) to explicitly mark them as "required".

from pydantic import BaseModel, Field
class SomeObject(BaseModel):
    field_1: str = Field(...)

Let me know if that improves the model output!

markmcd · 2025-02-18T10:19:35Z

class SomeObject(BaseModel):
    field_1: str
    field_2: str
    field_3: int

class ListOfFirstObject(BaseModel):
    objects_list: list[SomeObject]
Is there a way to add the required attribute using the python data classes? or should we switch to a hard coded schema?

The SDK does mark them as required by default. Using your schema:

>>> from google.genai import _transformers
>>> pprint(_transformers.t_schema(client, SomeObject).to_json_dict())
{'properties': {'field_1': {'type': 'STRING'},
                'field_2': {'nullable': True, 'type': 'STRING'},
                'field_3': {'type': 'INTEGER'}},
 'required': ['field_1', 'field_2', 'field_3'],
 'type': 'OBJECT'}

I'm not sure if there's a way to turn it off though (Optional fields are represented as nullable). If you want to request an SDK FR you can file it over on the SDK repo.

karayanni · 2025-02-18T18:32:29Z

Thanks @markmcd and @felix-vorlaender-mittelstand-ai

It appears that the default behavior with the required is not solving the structured output edge case with infinite loop.

We are adding a retry with variants of the prompt to overcome this problem.. but it seems like there is a fundamental problem with the LLMs structured output flow

If there are any additional tips / future improvements would love to be updated to figure out better approaches/solutions.

karayanni · 2025-02-20T00:56:50Z

@Yiling-J did you figure out a solution?

This is making Gemini 2 flash not usable as almost 20% of the calls fail and waste so much tokens..
Just infinite stream of \n\n\n\n\n\n\n\

Our team now has a fall back strategy with 3 different prompts - with few shot examples, when it fails, it fails on all 3 different prompts!

FYI - @markmcd please let us know if there are any updates - we would much appreciate it

Yiling-J · 2025-02-20T01:31:11Z

@karayanni Adding the required field works for me, before doing so, around 80% of my calls would fail—either due to loops or missing fields. After explicitly adding the required field, I haven’t encountered any errors. The only remaining issue for me is the noticeable quality difference between calls with and without the required field.

I actually encountered errors using the OpenAI Go SDK, not the Python Gemini SDK. But I was able to reproduce it using the Gemini web as I reported in this issue. After applying the fix, both the SDK and web are now working as expected.

xylophone21 · 2025-02-25T02:53:43Z

@karayanni Adding the required field works for me, before doing so, around 80% of my calls would fail—either due to loops or missing fields. After explicitly adding the required field, I haven’t encountered any errors. The only remaining issue for me is the noticeable quality difference between calls with and without the required field.

I actually encountered errors using the OpenAI Go SDK, not the Python Gemini SDK. But I was able to reproduce it using the Gemini web as I reported in this issue. After applying the fix, both the SDK and web are now working as expected.

do you add required filed by code? can you give a example?

class SomeObject(BaseModel):
field_1: str
field_2: str
field_3: int

class ListOfFirstObject(BaseModel):
objects_list: list[SomeObject]

Yiling-J · 2025-02-25T04:11:11Z

@xylophone21 According to #449 (comment), the google.genai SDK should automatically add the required fields. You can try printing the schema to verify. In my case, I do manually add the required fields in code since I'm working with a dynamic schema (not from an existing class or struct) but I'm using Go with the OpenAI Go SDK ,so unfortunately, I can’t offer much help with your Python question.

Yiling-J changed the title ~~Gemini Flash 2.0 structured output value repetition and missing fields~~ Gemini 2.0 Flash structured output value repetition and missing fields Feb 9, 2025

Gunand3043 added type:bug Something isn't working status:triaged Issue/PR triaged to the corresponding sub-team component:gemini 2.0 Issues/PR referencing gemini2 folder off-topic:model quality Not cookbook related: model quality issue. labels Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemini 2.0 Flash structured output value repetition and missing fields #449

Gemini 2.0 Flash structured output value repetition and missing fields #449

Yiling-J commented Feb 9, 2025

redwraith2 commented Feb 10, 2025

felixvor commented Feb 10, 2025

VladimirArustamian commented Feb 10, 2025 •

edited

Loading

KelSolaar commented Feb 11, 2025

felixvor commented Feb 11, 2025 •

edited

Loading

felixvor commented Feb 13, 2025

Yiling-J commented Feb 14, 2025

karayanni commented Feb 15, 2025 •

edited

Loading

karayanni commented Feb 15, 2025

felix-vorlaender-mittelstand-ai commented Feb 17, 2025

markmcd commented Feb 18, 2025

karayanni commented Feb 18, 2025

karayanni commented Feb 20, 2025

Yiling-J commented Feb 20, 2025

xylophone21 commented Feb 25, 2025

Yiling-J commented Feb 25, 2025

Gemini 2.0 Flash structured output value repetition and missing fields #449

Gemini 2.0 Flash structured output value repetition and missing fields #449

Comments

Yiling-J commented Feb 9, 2025

Description of the bug:

Actual vs expected behavior:

Any other information you'd like to share?

redwraith2 commented Feb 10, 2025

felixvor commented Feb 10, 2025

VladimirArustamian commented Feb 10, 2025 • edited Loading

KelSolaar commented Feb 11, 2025

felixvor commented Feb 11, 2025 • edited Loading

felixvor commented Feb 13, 2025

Yiling-J commented Feb 14, 2025

karayanni commented Feb 15, 2025 • edited Loading

karayanni commented Feb 15, 2025

felix-vorlaender-mittelstand-ai commented Feb 17, 2025

markmcd commented Feb 18, 2025

karayanni commented Feb 18, 2025

karayanni commented Feb 20, 2025

Yiling-J commented Feb 20, 2025

xylophone21 commented Feb 25, 2025

Yiling-J commented Feb 25, 2025

VladimirArustamian commented Feb 10, 2025 •

edited

Loading

felixvor commented Feb 11, 2025 •

edited

Loading

karayanni commented Feb 15, 2025 •

edited

Loading