Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a more descriptive error for text/plain files that are not UTF-8 #221

Open
handsomefox opened this issue Oct 31, 2024 · 5 comments
Open
Assignees
Labels
component:documentation Improvements or additions to documentation status:triaged Issue/PR triaged to the corresponding sub-team

Comments

@handsomefox
Copy link

handsomefox commented Oct 31, 2024

Description of the bug:

I've had to recently debug an issue .txt file. We had no conversions to UTF-8 inplace and this resulted in a confusing problem, where Gemini uploads just result in:

googleapi: got HTTP response code 400 with body: [{\n  \"error\": {\n    \"code\": 400,\n    \"message\": \"Request contains an invalid argument.\",\n    \"status\": \"INVALID_ARGUMENT\"\n  }\n}\n]"

This pretty much gives you nothing and I've spent some time to debug.

I think two things can fix this issue for future users:

  1. The docs for client.UploadFile can better indicate that UTF-8 is required for text files, or validate it itself.
  2. The API error that we get should include more details about happened.

The validation in the first point is quite easy if you decode the content-type correctly, since anything that begins with text/ you can just call utf8.Valid(b) on.
The second point I have no idea about.

You could also provide a conversion method for generic cases like UTF16(LE/BE) and Windows 1252/1251 since they're quite common.

Actual vs expected behavior:

No response

Any other information you'd like to share?

No response

@handsomefox handsomefox added the type:bug Something isn't working label Oct 31, 2024
@gmKeshari gmKeshari added component:documentation Improvements or additions to documentation status:triaged Issue/PR triaged to the corresponding sub-team labels Nov 5, 2024
@eliben eliben removed the type:bug Something isn't working label Nov 5, 2024
@eliben
Copy link
Member

eliben commented Nov 5, 2024

We're trying not to add validation around the backend model, but we could perhaps document this better

@jba WDYT?

@jba
Copy link
Collaborator

jba commented Nov 6, 2024

We should document that UTF-8 is required, and add tests to make sure that it is indeed broken for non-UTF-8. That way if the backend learns non-UTF-8, we will know to remove the doc.

@eliben
Copy link
Member

eliben commented Nov 18, 2024

@handsomefox can you link an example of a text file encoded in 1252 or whatever you tried, that gave you this error?
I sent #224 with a test for a generic bad encoding, which seems to result in a different error from the service

eliben added a commit that referenced this issue Nov 18, 2024
@handsomefox
Copy link
Author

handsomefox commented Nov 22, 2024

@eliben i've retested all of these files and they've all returned [{\n \"error\": {\n \"code\": 400,\n \"message\": \"Request contains an invalid argument.\",\n \"status\": \"INVALID_ARGUMENT\"\n }\n}\n] and no errors when encoded to utf8

utf16le.txt
1251.txt
1252.txt

From the differences from your test, I can only see that I'm using ChatSession.SendMessage instead of Model.GenerateContent, some model parameters are changed, system instruction is added and the file comes before the text but that doesn't seem like it should matter.

@eliben
Copy link
Member

eliben commented Nov 23, 2024

I've sent #228 adding one of the files @handsomefox uploaded, it indeed errors out as described.

W.r.t. updating the documentation - the natural place for this seems to be https://pkg.go.dev/github.com/google/generative-ai-go/genai#UploadFileOptions, but it already documents the MIME pointing to a link that was redirected and now doesn't seem to contain the required information. I think we should find where the supported MIME types are described now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:documentation Improvements or additions to documentation status:triaged Issue/PR triaged to the corresponding sub-team
Projects
None yet
Development

No branches or pull requests

4 participants