Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image generation with Dall-e #398

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
Draft

Image generation with Dall-e #398

wants to merge 10 commits into from

Conversation

ceicke
Copy link
Contributor

@ceicke ceicke commented Jun 10, 2024

@krschacht this is wild... and so easy to implement.

On a more serious note, is this the direction you are looking for?

I would need help to get the current user in the Toolbox::Dalle class. Not sure how I can access it to get the openai_api_key.

@krschacht
Copy link
Contributor

@ceicke I know, right? I'm really excited about tools. I'm adding a bunch of them. I want to turn this project into my own personal Jarvis / Samantha / Computer (star trek) — pick your favorite sci fi computer reference. :)

On the API key, I already solved this in a branch so I just plucked it into main. Rebase and you can now access Current.user within your tool. commit: 00be174

@ceicke ceicke force-pushed the dalle branch 2 times, most recently from 195ebc4 to 38d6604 Compare June 11, 2024 01:45
@krschacht
Copy link
Contributor

krschacht commented Jun 13, 2024 via email

@ceicke
Copy link
Contributor Author

ceicke commented Jun 13, 2024

I just pushed my changes... I can't quite get the document attached to the message and I don't understand why... maybe you can spot it. Code is not really nice to look at.

document = Document.build(message: self)
p document

image_urls.each do |image_url|
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense yet to iterate over multiple image_urls and then only attaching one... WIP WIP WIP :-D

@krschacht
Copy link
Contributor

I'm taking a look. But the key thing is that the reason GPT-4o is including the image as markdown is just because it's doing it's best to be helpful. We don't want to parse that markdown and grab the image, instead we want to tell GPT not to include the image in markdown because we're going to pre-attach it.

I'll over-share a little bit here just in case the additional context helps you.

First, I had a simple conversation — 1 message and 1 response:

image

And in console I see there are 4 messages, as I expected. There are 2 hidden tool messages preceding the assistants reply:

> Conversation.find(7).messages.ordered
  Conversation Load (1.5ms)  SELECT "conversations".* FROM "conversations" WHERE "conversations"."id" = $1 LIMIT $2  [["id", 7], ["LIMIT", 1]]
  Message Load (0.9ms)  SELECT "messages"."index", "messages"."version" FROM "messages" WHERE "messages"."conversation_id" = $1 ORDER BY "messages"."version" DESC, "messages"."i
=> [#<Message:0x0000000109bab558
  id: 43,
  conversation_id: 7,
  role: "user",
  content_text: "can you generate a cartoon image of a cat",
  created_at: Thu, 13 Jun 2024 14:44:23.945717000 UTC +00:00,
  updated_at: Thu, 13 Jun 2024 14:44:23.945717000 UTC +00:00,
  content_document_id: nil,
  run_id: nil,
  assistant_id: 1,
  cancelled_at: nil,
  processed_at: nil,
  index: 0,
  version: 1,
  branched: false,
  branched_from_version: nil,
  content_tool_calls: {},
  tool_call_id: nil,
  versions: nil>,
 #<Message:0x0000000109baa3d8
  id: 44,
  conversation_id: 7,
  role: "assistant",
  content_text: "",
  created_at: Thu, 13 Jun 2024 14:44:23.958958000 UTC +00:00,
  updated_at: Thu, 13 Jun 2024 14:44:40.943010000 UTC +00:00,
  content_document_id: nil,
  run_id: nil,
  assistant_id: 1,
  cancelled_at: nil,
  processed_at: Thu, 13 Jun 2024 14:44:24.394137000 UTC +00:00,
  index: 1,
  version: 1,
  branched: false,
  branched_from_version: nil,
  content_tool_calls:
   [{:index=>0,
     :id=>"call_s3cYMxsn11IfPxEMsxMp9Ycb",
     :type=>"function",
     :function=>{:name=>"dalle_generate_an_image", :arguments=>"{\"image_generation_prompt\":\"cartoon image of a cat\"}"}}],
  tool_call_id: nil,
  versions: nil>,
 #<Message:0x0000000109baa298
  id: 45,
  conversation_id: 7,
  role: "tool",
  content_text:
   "{\"url\":\"https://oaidalleapiprodscus.blob.core.windows.net/private/org-a24zvqoU0AA6ZH1jX1jiJmzF/user-PfPIZghymt4g6e00oFxN9yvL/img-vi2fAv0uQ9EQTtyntuu0Qd1L.png?st=2024-06-13T13%3A44%3A40Z\\u0026se=2024-06-13T15%3A44%3A40Z\\u0026sp=r\\u0026sv=2023-11-03\\u0026sr=b\\u0026rscd=inline\\u0026rsct=image/png\\u0026skoid=6aaadede-4fb3-4698-a8f6-684d7786b067\\u0026sktid=a48cca56-e6da-484e-a814-9c849652bcb3\\u0026skt=2024-06-12T18%3A49%3A15Z\\u0026ske=2024-06-13T18%3A49%3A15Z\\u0026sks=b\\u0026skv=2023-11-03\\u0026sig=kSyBRJW4R%2BvRWNO0xQt8FJ38VBWAirUkx8T2CzKm/0g%3D\"}",
  created_at: Thu, 13 Jun 2024 14:44:40.861781000 UTC +00:00,
  updated_at: Thu, 13 Jun 2024 14:44:40.861781000 UTC +00:00,
  content_document_id: nil,
  run_id: nil,
  assistant_id: 1,
  cancelled_at: nil,
  processed_at: Thu, 13 Jun 2024 14:44:40.838686000 UTC +00:00,
  index: 2,
  version: 1,
  branched: false,
  branched_from_version: nil,
  content_tool_calls: {},
  tool_call_id: "call_s3cYMxsn11IfPxEMsxMp9Ycb",
  versions: nil>,
 #<Message:0x0000000109baa018
  id: 46,
  conversation_id: 7,
  role: "assistant",
  content_text:
   "Here's the cartoon image of a cat you requested:\n\n[Cartoon Cat](https://oaidalleapiprodscus.blob.core.windows.net/private/org-a24zvqoU0AA6ZH1jX1jiJmzF/user-PfPIZghymt4g6e00oFxN9yvL/img-vi2fAv0uQ9EQTtyntuu0Qd1L.png?st=2024-06-13T13%3A44%3A40Z&se=2024-06-13T15%3A44%3A40Z&sp=r&sv=2023-11-03&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2024-06-12T18%3A49%3A15Z&ske=2024-06-13T18%3A49%3A15Z&sks=b&skv=2023-11-03&sig=kSyBRJW4R%2BvRWNO0xQt8FJ38VBWAirUkx8T2CzKm/0g%3D)",
  created_at: Thu, 13 Jun 2024 14:44:40.892866000 UTC +00:00,
  updated_at: Thu, 13 Jun 2024 14:44:48.047174000 UTC +00:00,
  content_document_id: nil,
  run_id: nil,
  assistant_id: 1,
  cancelled_at: nil,
  processed_at: Thu, 13 Jun 2024 14:44:41.049892000 UTC +00:00,
  index: 3,
  version: 1,
  branched: false,
  branched_from_version: nil,
  content_tool_calls: {},
  tool_call_id: nil,
  versions: nil>]
[5] pry(main)> 

Message 2 of 4 is the API deciding to call a tool, which our system does. And then message 3 of 4 is our system creating the tool's response. This is the key one. This is the serialized hash that your function is returning as json, stored in the content text. Here is the key part:

> JSON.parse(Conversation.find(7).messages.ordered.third.content_text)
=> {"url"=>
  "https://oaidalleapiprodscus.blob.core.windows.net/private/org-a24zvqoU0AA6ZH1jX1jiJmzF/user-PfPIZghymt4g6e00oFxN9yvL/img-vi2fAv0uQ9EQTtyntuu0Qd1L.png?st=2024-06-13T13%3A44%3A40Z&se=2024-06-13T15%3A44%3A40Z&sp=r&sv=2023-11-03&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2024-06-12T18%3A49%3A15Z&ske=2024-06-13T18%3A49%3A15Z&sks=b&skv=2023-11-03&sig=kSyBRJW4R%2BvRWNO0xQt8FJ38VBWAirUkx8T2CzKm/0g%3D"}

First, I'll manually attach this image in console just to work through the code:

url = JSON.parse(Conversation.find(7).messages.ordered.third.content_text)["url"]
d = Message.last.documents.new
d.file.attach(io: URI.open(url), filename: "image.png")
d.save!

I just refreshed my conversation and now I see it:
image

Now, how do we do this automatically and get it to skip the markdown. It's all about handling the 3rd message differently. The place this third message got created was this line of code:
https://github.com/AllYourBot/hostedgpt/blob/main/app/jobs/get_next_ai_message_job.rb#L160

So here's my suggestion for how to get the image cleanly attached and then tell the AI not to do it as markdown:

  1. In your dalle toolbox class, the response only includes the "url" key. Let's change that key to "url_of_dalle_generated_image" (verbose on purpose, using words that match the function description and function call). Also add another key such as prompt_given and one more key: `note_to_assistant: "The image at the URL is already being shown on screen so reply with a nice message confirming the image has been generated, maybe re-describing it, but don't include the link to it."
  2. Inside this msgs.each is where we created message 3, which has the URL. So add an extra conditional check right after creation which looks for the key url_of_dalle_generated_image and if it's present it creates a NEW document with the attachment but doesn't save yet (and it does NOT attach the document to the tool message)
  3. Right outside this loop is where it stubs out the assistants' response (the 4th message in the sequence). Here is where it attaches the document that was created above and executes save! on the whole thing.

@ceicke
Copy link
Contributor Author

ceicke commented Jun 14, 2024

@krschacht again, thanks for your patience and guiding me through it. It works fine. Almost. There is an error happening and here is the important part of it:

base-1      | worker   | [SolidQueue] Claimed 1 jobs
base-1      | worker   | 
base-1      | worker   | ### Finished GetNextAIMessageJob attempt #1 with ERROR: #<ActionView::Template::Error: no implicit conversion of nil into Hash>
base-1      | worker   | /rails/lib/active_storage/service/postgresql_service.rb:106:in `block in generate_url'
base-1      | worker   | /usr/local/bundle/gems/activesupport-7.1.3.2/lib/active_support/notifications.rb:206:in `block in instrument'
base-1      | worker   | /usr/local/bundle/gems/activesupport-7.1.3.2/lib/active_support/notifications/instrumenter.rb:58:in `instrument'
base-1      | worker   | /usr/local/bundle/gems/activesupport-7.1.3.2/lib/active_support/notifications.rb:206:in `instrument'
base-1      | worker   | /usr/local/bundle/gems/activestorage-7.1.3.2/lib/active_storage/service.rb:165:in `instrument'
base-1      | worker   | /rails/lib/active_storage/service/postgresql_service.rb:93:in `generate_url'
base-1      | worker   | /rails/lib/active_storage/service/postgresql_service.rb:63:in `private_url'
base-1      | worker   | /usr/local/bundle/gems/activestorage-7.1.3.2/lib/active_storage/service.rb:125:in `block in url'
base-1      | worker   | /usr/local/bundle/gems/activesupport-7.1.3.2/lib/active_support/notifications.rb:206:in `block in instrument'
base-1      | worker   | /usr/local/bundle/gems/activesupport-7.1.3.2/lib/active_support/notifications/instrumenter.rb:58:in `instrument'
base-1      | worker   | /usr/local/bundle/gems/activesupport-7.1.3.2/lib/active_support/notifications.rb:206:in `instrument'
base-1      | worker   | /usr/local/bundle/gems/activestorage-7.1.3.2/lib/active_storage/service.rb:165:in `instrument'
base-1      | worker   | /usr/local/bundle/gems/activestorage-7.1.3.2/lib/active_storage/service.rb:120:in `url'
base-1      | worker   | /rails/lib/active_storage/service/postgresql_service.rb:71:in `url'
base-1      | worker   | /usr/local/bundle/gems/activestorage-7.1.3.2/app/models/active_storage/blob.rb:216:in `url'
base-1      | worker   | /usr/local/bundle/gems/activesupport-7.1.3.2/lib/active_support/core_ext/module/delegation.rb:333:in `public_send'
base-1      | worker   | /usr/local/bundle/gems/activesupport-7.1.3.2/lib/active_support/core_ext/module/delegation.rb:333:in `method_missing'
base-1      | worker   | /usr/local/bundle/gems/activesupport-7.1.3.2/lib/active_support/core_ext/module/delegation.rb:333:in `public_send'
base-1      | worker   | /usr/local/bundle/gems/activesupport-7.1.3.2/lib/active_support/core_ext/module/delegation.rb:333:in `method_missing'
base-1      | worker   | /usr/local/bundle/gems/activestorage-7.1.3.2/app/models/active_storage/variant_with_record.rb:36:in `url'
base-1      | worker   | /rails/app/models/document.rb:46:in `fully_processed_url'
base-1      | worker   | /rails/app/models/message/document_image.rb:23:in `document_image_path'
base-1      | worker   | /rails/app/views/messages/_message.html.erb:76:in `_app_views_messages__message_html_erb___4486497489240695771_23380'
...

I redacted a bunch of stuff.

I think what is happening is that it's trying to display the small version of the image, but that is not yet processed. I am unsure how to handle this, I saw that you can pass some sort of a fallback into the URL generation of the version in document_image.rb#document_image_path. But actually it's checking whether that version has been processed already. Weird. Maybe I am also interpreting it wrong.

@krschacht
Copy link
Contributor

krschacht commented Jun 14, 2024

Hmm, this is tricky. I'm scratching my head on this one. But when I initially wrote this document / file / active storage stuff I never fully wrapped my mind around file variants and how processing of that worked. The root thing that I got stuck on was:

Some methods of accessing a file variant (I think the typical methods) are blocking: if the variant doesn't exist it processes it right then and hangs until it's done. I didn't want this. It took me a bunch of trial and error to figure out how to check if a variant is processed in a non-blocking way so that I could show a fallback image / spinner and then just check again on some interval.

I got it working, but it was definitely a case where I wrote some code that I didn't fully understand what it was doing and when I finally got it working I was done staring at that problem. :) I've been wanting to come back to it. I even have a tiny bit of work I did on it in another branch:

There is a small diff on this file (linking straight to document.rb) and the one right below it:
https://github.com/AllYourBot/hostedgpt/pull/362/files#diff-893a2b609f3fc4aade494f624a20993eb6c32598512b5e70d9ad39866b8eb51d

But that diff is solving a different problem I ran into with images. I don't think it solves the problem you're bumping into.... But I'm not sure what problem you're bumping into... :)

I think you shoudl put a breakpoint in document.rb right here:

  def fully_processed_url(variant)
    binding.pry
    file.attached? && variant.present? && file.representation(variant.to_sym).processed.url
  end

I assume file.attached? is returning true and variant.present? are returning true. But they're somehow lying and the variant isn't fully done? It may take some trial and error on this command file.representation(variant.to_sym).processed.url to figure out what's accessible and what isn't. Maybe a & is needed after one of those periods? That doesn't make sense though. Maybe variant.present? becomes true when a variant has started processing but before it's finished processing? Ugh....

The problem with this breakpoint is that there may be a bug at the moment the breakpoint is hit, but by the time you get in console the image may have finished processing and when you execute file.representation(variant.to_sym).processed.url it may work fine.

I'm really not sure how to debug this...

Ohoh, did you try uploading an image yourself into a conversation, through the app's UI? Does that fully work? You should be 100% certain that the existing image upload flow works. Ideally try uploading a really big image (10-20mb so that it takes a few seconds to process) and confirm for yourself that you initially see a spinner on the image and then later it pops in. Note that there are two places to check: the image appearing in your conversation directly and, click the image, there is a different variant that opens int he modal. We want to make sure there isn't some deeper configuration issue in your environment which is breaking all image processing. Like maybe a corrupted libmagick install or something (I forget what library the app uses).

The last thing that occurs to me is: active job is pushing a _message to the client. Until now the only time the image checks of _message are executing is when the partial is embedded in a view. But within the context of a job, _message is being rendered — and hitting those image checks — within the context of a job so it's outside of the controller. Maybe there is some context about being outside of a controller that messes up these view checks?

We could try to re-write the logic of message.document_image_path(...) so that when this is rendering within a job (we pass in some extra param in_job: true) then when it hits document_image_path() we tell it to skip any variant processed checks and always give us the redirect URL. It would be something like:

  GetNextAIMessageJob.broadcast_updated_message(@message, thinking: false, outside_controller: true)
  message.document_image_path(..., force_redirect: outside_controller)
  def document_image_path(variant, fallback: nil, force_redirect: false)
    return nil unless has_document_image?

    if !force_redirect && documents.first.has_file_variant_processed?(variant)
      documents.first.fully_processed_url(variant)
    elsif force_redirect || fallback.nil?
      documents.first.redirect_to_processed_path(variant)
    else
      fallback
    end

If image uploads fully work for you when done manually, and if poking around that binding.pry does not yield any clues, then I'd try this solution. I don't love it because it's not actually fixing the problem it's just side stepping it. It would be forcing actioncable pushed messages to never render images and then letting the front-end refresh the image. But I still can't imagine why an image variant check would fail in a queue context but work in a controller context.

@@ -103,7 +103,8 @@ def generate_url(key, expires_in:, filename:, disposition:, content_type:)
)

generated_url = url_helpers.rails_postgresql_service_url(verified_key_with_expiration,
**url_options,
# **url_options,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krschacht had some time today to look into the exception again. It looks like the **url_options are nil. Replacing it with only_path: true solves the issue (at least in development). I am not sure if this is an acceptable fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, interesting. That makes sense. I can imagine the URL or host configuration may be different when running in workers. Probably we can just leave that change in place for always. It’s not like it ever needed a full URL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure it will create a problem down the road for larger deployments with a CDN. But that's a future problem then.

@ceicke ceicke changed the title WIP: image generation with Dall-e Image generation with Dall-e Jun 27, 2024
@ceicke ceicke marked this pull request as ready for review June 27, 2024 07:19
@krschacht krschacht marked this pull request as draft July 1, 2024 15:29
@krschacht
Copy link
Contributor

I just converted the PR to a draft to help me stay organized in knowing which ones are waiting on me to review. But feel free to switch it back whenever you're ready.

@ceicke
Copy link
Contributor Author

ceicke commented Jul 16, 2024

Hey @krschacht, do you have an idea why the test is failing? I can't figure it out, especially since the test above it basically tests a very similar thing...

@krschacht
Copy link
Contributor

Hi @ceicke, I haven’t had a chance to dig in, sorry for the delay! I think I’ll have some time tomorrow AM. I’ve been doing some travel in England so a bit going on, but I’ll be catching up on things soon.

d = Document.new
d.file.attach(io: URI.open(url_of_dalle_generated_image), filename: 'image.png')
assistant_reply.documents << d
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should assume that @message.content_tool_calls can have multiple Dalle toolbox calls. This means that within msgs.each do |tool_message| each of those calls will set the url_of_dalle_generated_image overwriting the previous.

So let's collect multiple url_of_dalle_generated_image into an array so that in this block here we push multiple documents into the assistant_reply.

@krschacht
Copy link
Contributor

krschacht commented Jul 21, 2024

@ceicke I looked closely at the two failing test cases. In document_test.rb line 57, you just need to remove the assertion that the URL starts with "http" since it's now only_path: true. For the second test failure within get_next_ai_job, I added comments inline regarding that one.

Also, can you mirror the structure of the test get_next_ai_message_job_openai_test.rb:24 and add one that basically tests:

test "properly handles a tool response call from the assistant when images are included" do

After this, I think we can merge this in!

@@ -56,6 +65,52 @@ class GetNextAIMessageJobOpenaiTest < ActiveJob::TestCase
refute second_new_message.finished?, "This message SHOULD NOT be considered finished yet"
end

test "properly handles a tool response call from the assistant when images are included" do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krschacht I tried to implement a test for the job, but I cannot get it to work. It already fails to add 2 new messages. Maybe I setup something wrong in the fixtures or in the setup...

@unameisfine
Copy link

Hello!

Any news about the image generation feature? I think this should be more of a priority than tokens, for example. Image generation based on LLM answers is the gold standard for chat apps nowadays.
I am really looking forward to support for not just DALL-E but also MJ and SD.
A simple use case: we ask GPT to describe the artistic approach of an image, and based on the description, generate an image in the same style as the original.

With this feature, hostedgpt will be one of the best open-source solutions for using key AI tools today — LLM and content generation. Both individually and as a team.

Especially if you also add a mode for simultaneous interaction with multiple neural networks (like in chathub).

@krschacht
Copy link
Contributor

Yes, sorry @unameisfine and @ceicke, this was on my plate. Ack. I'll make a point of digging in over the next couple days to unblock this. I know Chris has gotten it really close!

Not to get too distracted, but unameisfine — I've never had a desire for simultaneous chatting with multiple LLMs. Can you share more about why that's something you would want to do? I suppose when I'm doing a quick eval of which LLM to use as a developer, but in that case there isn't a huge motivation to set up your own open-source self-hosted version since it's not an on-going use-case. You can just visit https://sdk.vercel.ai/playground or openrouter.ai and do the quick test there.

@unameisfine
Copy link

Not to get too distracted, but unameisfine — I've never had a desire for simultaneous chatting with multiple LLMs. Can you share more about why that's something you would want to do? I suppose when I'm doing a quick eval of which LLM to use as a developer, but in that case there isn't a huge motivation to set up your own open-source self-hosted version since it's not an on-going use-case. You can just visit https://sdk.vercel.ai/playground or openrouter.ai and do the quick test there.

It can be found in the Chrome plugin Chathub. This feature, which allows simultaneous speaking with LLMs, can be used for the comparison of answers when dealing with difficult questions.

@jlvallelonga
Copy link
Contributor

continuing work on this here: #526

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants