Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add vision support #4076

Merged
merged 1 commit into from
Feb 20, 2024
Merged

feat: Add vision support #4076

merged 1 commit into from
Feb 20, 2024

Conversation

TheRamU
Copy link
Contributor

@TheRamU TheRamU commented Feb 19, 2024

  • Gemini Pro Vision supported
  • GPT-4 Vision supported
  • Fixed not reporting to the user when Gemini request error
  • Image features are now globally supported in this application

Copy link

vercel bot commented Feb 19, 2024

@TheRamU is attempting to deploy a commit to the NextChat Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

Your build has completed!

Preview deployment

@H0llyW00dzZ
Copy link
Contributor

LGTM

Copy link
Contributor

@H0llyW00dzZ H0llyW00dzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request Change Summarize for gemini-pro and gemini-pro-vision

if (currentModel.startsWith("gpt")) {
return SUMMARIZE_MODEL;
}
if (currentModel.startsWith("gemini-pro")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

It's more effective to use Summarize with Gemini's own models. For example, gemini-pro refers to the Gemini Pro model, and gemini-pro-vision refers to the Gemini Pro Vision model. These models are more affordable compared to OpenAI hahaha and more efficient, as each model has its own specific token limit. For instance, the token input limit for the gemini-pro-vision model is around 12288 tokens.

Proof:

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gemini Pro Vision does not enable multiturn chat, which limits its role as a "Summarize Model"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gemini Pro Vision does not enable multiturn chat, which limits its role as a "Summarize Model"

rip, its better model, unlike openai gpt-4-vision-preview

@H0llyW00dzZ
Copy link
Contributor

Show off

image

@fred-bf fred-bf merged commit e2da340 into ChatGPTNextWeb:main Feb 20, 2024
1 of 2 checks passed
H0llyW00dzZ pushed a commit to H0llyW00dzZ/ChatGPT-Next-Web that referenced this pull request Feb 20, 2024
H0llyW00dzZ added a commit to H0llyW00dzZ/ChatGPT-Next-Web that referenced this pull request Feb 20, 2024
* Add vision support (ChatGPTNextWeb#4076)

* Refactor [UI/UX] [Front End] [Chat] Remove Duplicate "onUserInput"

- [+] refactor(chat.tsx): remove duplicate onUserInput call and localStorage.setItem in _Chat function

* Feat [UI/UX] [Chat List] Search Support for Multimodal Content

- [+] feat(chat-list.tsx): add search support for array of MultimodalContent in ChatList component

* Style [UI/UX] [Chat List] Linting

- [+] style(chat-list.tsx): improve readability by breaking down lengthy if condition into multiple lines

* Adding Back Text Moderation

- [+] feat(openai.ts): add support for text moderation in chat method

* Feat [LLM APIs] [Google] InlineData

- [+] feat(google.ts): add InlineData to MessagePart, refactor message construction
- [+] chore(google.ts): add comments for clarity

* Style [LLM APIs] [Google] InlineData

- [+] style(google.ts): update comment for InlineData interface

* Todo [LLM APIs] [Google] InlineData

- [+] todo(google.ts): add TODO comment to improve safety settings configuration

* Todo [UI/UX] [Front End] [Chat] Summarize

- [+] chore(chat.ts): add TODO comment to improve the summary for gemini-pro-vision

---------

Co-authored-by: TheRamU <[email protected]>
X-Zero-L pushed a commit to X-Zero-L/ChatGPT-Next-Web that referenced this pull request Feb 21, 2024
@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Is it possible to select the Google visual model so that the conversation does not use multiple rounds of chat, and only uploads single-discussion conversations, without needing to clear the context every time it is used?

@fengzai6
Copy link
Contributor

能否在选中谷歌视觉模型的时候,对话不使用多轮聊天,仅上传单次对话,不用每次使用需要清除上下文

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Can the conversation not use multiple rounds of chat when the Google visual model is selected, and only a single conversation can be uploaded without the need to clear the context each time it is used?

@PengLingJun
Copy link

Why are my responses using the vision-preview api incomplete

image image

gaogao1030 pushed a commit to gaogao1030/ChatGPT-Next-Web that referenced this pull request May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants