Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error: 'latin-1' codec can't encode character '\u2026' #1035

Open
wiktorw767 opened this issue Dec 4, 2024 · 5 comments · May be fixed by #1088
Open

error: 'latin-1' codec can't encode character '\u2026' #1035

wiktorw767 opened this issue Dec 4, 2024 · 5 comments · May be fixed by #1088
Labels
bug Something isn't working

Comments

@wiktorw767
Copy link

Steps to reproduce

How'd you do it?

  1. Issuing command garak --model_type rest -G ../garak/Desktop/GarakConfig/garak-config2.json --probes malwaregen.Evasion --generations 1

  2. Receiving an error: 'latin-1' codec can't encode character '\u2026' in position 512: ordinal not in range(256)

The AI chatbot I am trying to scan is outputting '\u2026' - the 'three thinking dots'. It looks like this is causing the encoding error above.

garak version

garak LLM vulnerability scanner v0.10.0

Additional Information

  1. Operating system: Ubuntu 24.04.1 LTS
  2. Python version: 3.12.3
  3. Install method: pip
  4. Logs from execution run report.html / report.jsonl / hitlog.jsonl and if possible garak.log garak.log attached, the error is in it
  5. Details of execution config such as command line flags or config files: CLI ran as indicated above:
    garak --model_type rest -G ../garak/Desktop/GarakConfig/garak-config2.json --probes malwaregen.Evasion --generations 1
  6. Any relevant hardware or resource information: N/A
    Image
    Image
    garak.log
@wiktorw767 wiktorw767 added the bug Something isn't working label Dec 4, 2024
@leondz
Copy link
Collaborator

leondz commented Dec 4, 2024

Thanks @wiktorw767 . Will take a look!

@wiktorw767
Copy link
Author

I attempted very same on a different machine running SUSE Linux 15 SP6 and I am getting the very same error. Just wanted to check this, in case it is OS specific.

Image

@leondz
Copy link
Collaborator

leondz commented Jan 15, 2025

Are you able to share any of "../garak/Desktop/GarakConfig/garak-config2.json" ?

@wiktorw767
Copy link
Author

wiktorw767 commented Jan 15, 2025

Sure, here it is:

{
    "rest": {
      "RestGenerator": {
        "name": "Chat Tester",
        "uri": "REDACTED",
        "method": "post",
        "headers": {
          "Authorization": "Bearer $KEY",
          "Content-Type": "application/json"
        },
        "req_template_json_object": {
          "messages": [
            {
              "role": "USER",
              "content": "$INPUT"
            }
          ]
        },
        "response_json": true,
        "response_json_field": "content"
      }
    }
  }

The authentication looks like is working fine - if I don't specify $KEY correctly then I get 401 error. With correct $KEY set I get the '\u2026' error

The question-answer flow I see in browser developer tools looks like this:

{
        "id": "REDACTED question ID",
        "createdAt": "REDACTED timestamp",
        "role": "USER",
        "content": "Hi chat what is your name?",
        "fileName": null,
        "fileSize": null
    },
    {
        "id": "REDACTED answer ID",
        "createdAt": "REDACTED timestamp",
        "role": "ASSISTANT",
        "content": "My name is blah blah",
        "feedback": null
    }

The '\u2026' character is not visible in the developer tools inspection, though it is clearly visible on screen while waiting for the reply to be issued

@leondz
Copy link
Collaborator

leondz commented Jan 15, 2025

It looks like the HTTP standard has unclear support for some characters in HTTP header values, and that Python 3.12 / requests is hard-coded to encode HTTP headers as latin-1, which is the standard supported in RFC 2616.

This is a choice that doesn't give much flexibility and not something that it's easy for us to do things about. From the JSON config above and the current version of garak's rest.py the place I can see where something might get in is in the authentication key. I'm not sure if that's applicable. Also, some editors can automatically change "..." to \u2026 - this may have happened here.

@leondz leondz linked a pull request Jan 27, 2025 that will close this issue
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants