Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Image input #389

Merged
merged 7 commits into from
Apr 18, 2024
Merged

✨ Image input #389

merged 7 commits into from
Apr 18, 2024

Conversation

sudoskys
Copy link
Member

@sudoskys sudoskys commented Apr 18, 2024

  • Sender Photo collect
  • Change Schema

Summary by CodeRabbit

  • New Features
    • Enhanced file handling capabilities in private messages, allowing for efficient management and processing of files.
    • Introduces a new "Vision With Voice" feature in the demo section, expanding the app's functionality.
  • Refactor
    • Improved management of objects based on user IDs and timestamps to streamline operations.

- Add TimerObjectContainer class with methods to manage user objects
- Implement add_object, get_objects, and clear_objects methods
- Ensure objects are added, retrieved, and cleared based on time limits
Copy link
Contributor

coderabbitai bot commented Apr 18, 2024

Walkthrough

The recent update brings significant enhancements to the system's functionality. It introduces a new entity, TimerObjectContainer, for efficient object management based on user IDs and timestamps. Moreover, improvements in file processing and handling within private messages aim to provide a smoother user experience, particularly in the Telegram module.

Changes

Files Change Summary
app/sender/telegram/__init__.py,
app/sender/util_func.py
Introduced TimerObjectContainer for efficient object management. Enhanced file handling and processing in private messages.
app/middleware/llm_task.py,
app/receiver/slack/__init__.py
Updated message handling and error logging. Changes in function signatures and imports for better message processing.
llmkira/openai/cell.py,
llmkira/openai/request.py
Added new classes and methods for content handling and image processing. Improved URL formatting and vision-related checks.
llmkira/openai/utils.py,
llmkira/task/schema.py
Introduced image resizing function and significant modifications to message handling for different platforms.
README.md Updated model adherence, new features, and roadmap details.

Recent Review Details

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between ead4058 and 9677832.
Files selected for processing (1)
  • README.md (2 hunks)
Additional Context Used
LanguageTool (66)
README.md (66)

Near line 40: Possible spelling mistake found.
Context: ...️ > Python>=3.9 This project uses the ToolCall feature. It integrates a message queui...


Near line 42: ‘prior to’ might be wordy. Consider a shorter alternative.
Context: ...ng plugin mechanisms and authentication prior to plugin execution. The model adheres to...


Near line 45: Possible spelling mistake found.
Context: ...in execution. The model adheres to the Openai Format Schema. Please adapt using [gate...


Near line 46: Possible spelling mistake found.
Context: ...ps://github.com/Portkey-AI/gateway) or [one-api](https://github.com/songquanpeng/one-ap...


Near line 48: Possible typo: you repeated a whitespace
Context: ...quanpeng/one-api) independently. | Demo | Vision With Voice | |------...


Near line 48: Possible typo: you repeated a whitespace
Context: ... | Vision With Voice | |-----------------------------------|-...


Near line 79: Loose punctuation mark.
Context: ...s ### 🍔 Login Modes - Login via url: Use /login token$https://provider.com...


Near line 79: Possible spelling mistake found.
Context: ... 🍔 Login Modes - Login via url: Use /login token$https://provider.com to Login. The program p...


Near line 81: The currency mark is usually put at the beginning of the number.
Context: ...onfiguration information - Login: Use /login https://api.com/v1$key$model to login ### 🧀 Plugin Previ...


Near line 85: Possible typo: you repeated a whitespace
Context: ... 🧀 Plugin Previews | Sticker Converter | Timer Function | Tran...


Near line 85: Possible typo: you repeated a whitespace
Context: ...erter | Timer Function | Translate Function ...


Near line 85: Possible typo: you repeated a whitespace
Context: ...on | Translate Function | |-------------------------------------...


Near line 91: Possible typo: you repeated a whitespace
Context: ...atform | Support | File System | Remarks | |----------|---------|-------------|--...


Near line 93: Possible typo: you repeated a whitespace
Context: ...------------------------| | Telegram | ✅ | ✅ | ...


Near line 93: Possible typo: you repeated a whitespace
Context: ...--------------| | Telegram | ✅ | ✅ | ...


Near line 93: Possible typo: you repeated a whitespace
Context: ...--| | Telegram | ✅ | ✅ | | | Discord | ✅ | ✅ | ...


Near line 94: Possible typo: you repeated a whitespace
Context: ... | | Discord | ✅ | ✅ | ...


Near line 94: Possible typo: you repeated a whitespace
Context: ... | | Discord | ✅ | ✅ | ...


Near line 94: Possible typo: you repeated a whitespace
Context: ... | | Discord | ✅ | ✅ | ...


Near line 94: Possible typo: you repeated a whitespace
Context: ... | | Discord | ✅ | ✅ | | | Kook | ✅ | ✅ | D...


Near line 95: Possible typo: you repeated a whitespace
Context: ... | | Kook | ✅ | ✅ | Does not suppo...


Near line 95: Possible typo: you repeated a whitespace
Context: ... | | Kook | ✅ | ✅ | Does not support `trigge...


Near line 95: Possible typo: you repeated a whitespace
Context: ... | | Kook | ✅ | ✅ | Does not support triggering by reply...


Near line 96: Possible typo: you repeated a whitespace
Context: ... support triggering by reply | | Slack | ✅ | ✅ | Does not suppo...


Near line 96: Possible typo: you repeated a whitespace
Context: ...t triggering by reply | | Slack | ✅ | ✅ | Does not support `trigge...


Near line 96: Possible typo: you repeated a whitespace
Context: ...ing by reply| | Slack | ✅ | ✅ | Does not supporttriggering by reply`...


Near line 97: Possible spelling mistake found.
Context: ...s not support triggering by reply | | QQ | ❌ | | ...


Near line 97: Possible typo: you repeated a whitespace
Context: ...not support triggering by reply | | QQ | ❌ | | ...


Near line 97: Possible typo: you repeated a whitespace
Context: ...t triggering by reply | | QQ | ❌ | | ...


Near line 97: Possible typo: you repeated a whitespace
Context: ...ering by reply` | | QQ | ❌ | | ...


Near line 97: Possible typo: you repeated a whitespace
Context: ...` | | QQ | ❌ | | | | Wechat | ❌ | | ...


Near line 98: The official name of this popular chat service is spelled with a capital “C”.
Context: ... | | Wechat | ❌ | | ...


Near line 98: Possible typo: you repeated a whitespace
Context: ... | | Wechat | ❌ | | ...


Near line 98: Possible typo: you repeated a whitespace
Context: ... | | Wechat | ❌ | | ...


Near line 98: Possible typo: you repeated a whitespace
Context: ... | | Wechat | ❌ | | ...


Near line 98: Possible typo: you repeated a whitespace
Context: ... | | Wechat | ❌ | | | | Twitter | ❌ | | ...


Near line 99: Possible typo: you repeated a whitespace
Context: ... | | Twitter | ❌ | | ...


Near line 99: Possible typo: you repeated a whitespace
Context: ... | | Twitter | ❌ | | ...


Near line 99: Possible typo: you repeated a whitespace
Context: ... | | Twitter | ❌ | | ...


Near line 99: Possible typo: you repeated a whitespace
Context: ... | | Twitter | ❌ | | | | Matrix | ❌ | | ...


Near line 100: Possible typo: you repeated a whitespace
Context: ... | | Matrix | ❌ | | ...


Near line 100: Possible typo: you repeated a whitespace
Context: ... | | Matrix | ❌ | | ...


Near line 100: Possible typo: you repeated a whitespace
Context: ... | | Matrix | ❌ | | ...


Near line 100: Possible typo: you repeated a whitespace
Context: ... | | Matrix | ❌ | | | | IRC | ❌ | | ...


Near line 101: Possible typo: you repeated a whitespace
Context: ... | | IRC | ❌ | | ...


Near line 101: Possible typo: you repeated a whitespace
Context: ... | | IRC | ❌ | | ...


Near line 101: Possible typo: you repeated a whitespace
Context: ... | | IRC | ❌ | | ...


Near line 101: Possible typo: you repeated a whitespace
Context: ... | | IRC | ❌ | | | | ... | | | C...


Near line 102: Possible typo: you repeated a whitespace
Context: ... | | ... | | | Create Issue/P...


Near line 102: Possible typo: you repeated a whitespace
Context: ... | | ... | | | Create Issue/PR ...


Near line 102: Possible typo: you repeated a whitespace
Context: ... | | ... | | | Create Issue/PR ...


Near line 102: Possible typo: you repeated a whitespace
Context: ... | | Create Issue/PR | ## 📦 Quick Start Refer to the [🧀 D...


Near line 133: Unpaired symbol: ‘]’ seems to be missing
Context: ...pm2.json ``` ### 🥣 Docker Build Hub: [sudoskys/llmbot](https://hub.docker.com/...


Near line 133: Possible spelling mistake found.
Context: ...m2.json ``` ### 🥣 Docker Build Hub: [sudoskys/llmbot](https://hub.docker.com/reposito...


Near line 133: Possible spelling mistake found.
Context: ...`` ### 🥣 Docker Build Hub: [sudoskys/llmbot](https://hub.docker.com/repository/dock...


Near line 137: ‘brand new’ seems to be a compound adjective before a noun. Use a hyphen: “brand-new”.
Context: ...ompose Installation If you are using a brand new server, you can use the following shell...


Near line 140: Possible spelling mistake found.
Context: ...ng Docker methods. If you have deployed redis, rabbitmq, mongodb, please modify ...


Near line 140: Possible spelling mistake found.
Context: ... methods. If you have deployed redis, rabbitmq, mongodb, please modify the `docker-...


Near line 140: Possible spelling mistake found.
Context: ... you have deployed redis, rabbitmq, mongodb, please modify the `docker-compose.yml...


Near line 160: Possible spelling mistake found.
Context: ...image using docker-compose pull. Use docker exec -it llmbot /bin/bash to view Shell in Docker, ent...


Near line 183: Possible spelling mistake found.
Context: ...entation. ### Hooks Hooks control the EventMessage in sender and receiver. For example, we...


Near line 184: This sentence does not start with an uppercase letter.
Context: ...e have voice_hook in built-in hooks. you can enable it by setting `VOICE_REPLY_M...


Near line 191: This sentence does not start with an uppercase letter.
Context: ...O_VOICE_KEY= ``` use /env VOICE_REPLY_ME=NONE to disable t...


Near line 193: This sentence does not start with an uppercase letter.
Context: ...CE_REPLY_ME=NONEto disable this env. check the source code inllmkira/extra/voice...


Near line 194: Possible spelling mistake found.
Context: ...ble this env. check the source code in llmkira/extra/voice_hook.py, learn to write yo...


Near line 202: Possible spelling mistake found.
Context: ...m) ## 📜 Notice > This project, named OpenAiBot, signifying "Open Artificial Intelligen...

Additional comments not posted (3)
README.md (3)

45-46: Update the model to adhere to the Openai Format Schema and provide adaptation links.


48-50: Add the "Vision With Voice" feature to the demo section.


62-62: Add standalone support for gpt-4-turbo and vision to the roadmap.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

- Fix indentation and spacing issues
- Add missing import statement
- Update method signatures
Add AssistantMessage type for Slack replies in the receiver module.
- Fixed indentation in telegram sender init file for better readability.
Adds a utility function to resize images for OpenAI models based on
specified mode (low, high, auto). Images are resized to meet specific
dimension requirements.

- Resizes images to 512x512 for 'low' mode
- Resizes images based on length and width limits for 'high' mode
- Automatically resizes images for 'auto' mode

https://platform.openai.com/docs/guides/vision
Adds a utility function to resize images for OpenAI models based on
specified mode (low, high, auto). Images are resized to meet specific
dimension requirements.

- Resizes images to 512x512 for 'low' mode
- Resizes images based on length and width limits for 'high' mode
- Automatically resizes images for 'auto' mode

https://platform.openai.com/docs/guides/vision
@sudoskys sudoskys merged commit 0d90606 into main Apr 18, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Comment on lines +7 to +48
def resize_openai_image(
image_bytes: bytes, mode: Literal["low", "high", "auto"] = "auto"
) -> bytes:
"""
如果是 low,缩放到 512*512。如果是 high,图像的短边应小于 768 像素,长边应小于 2,000 像素,如果大于这个尺寸,按比例缩放到这个尺寸,长宽比不变。如果是 Auto,如果尺寸大于512但是小于 768,缩放到 512,如果长边大于2000或宽边大于768,按照比例缩放到合适的尺寸
https://platform.openai.com/docs/guides/vision
:param image_bytes: 图片的二进制数据
:param mode: 模式
:return: 处理后的图片二进制数据
"""
# 将 bytes 转换为图片对象
image = Image.open(BytesIO(image_bytes))
# 获取图片的尺寸
width, height = image.size
# 限定尺寸的阈值
limit_small = 512
limit_short = 768
limit_long = 2000
# 决定是否需要改变图片尺寸的标记
resize_flag = False
new_size = width, height
if mode == "low":
if max(width, height) > limit_small:
new_size = limit_small, limit_small
resize_flag = True
elif mode == "high":
if min(width, height) > limit_short or max(width, height) > limit_long:
new_size = min(limit_short, width), min(limit_long, height)
resize_flag = True
elif mode == "auto":
if limit_small < max(width, height) < limit_short:
new_size = limit_small, limit_small
resize_flag = True
elif min(width, height) > limit_short or max(width, height) > limit_long:
new_size = min(limit_short, width), min(limit_long, height)
resize_flag = True
if resize_flag:
image.thumbnail(new_size, Image.Resampling.BICUBIC)
bytes_io = BytesIO()
image.save(bytes_io, format="PNG")
bytes_return = bytes_io.getvalue()
return bytes_return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add validation for the mode parameter in resize_openai_image.

+    if mode not in ["low", "high", "auto"]:
+        raise ValueError("Invalid mode. Allowed values are 'low', 'high', 'auto'.")

Consider handling exceptions that may arise from Image.open and image.save to enhance the robustness of this function.


Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def resize_openai_image(
image_bytes: bytes, mode: Literal["low", "high", "auto"] = "auto"
) -> bytes:
"""
如果是 low,缩放到 512*512如果是 high,图像的短边应小于 768 像素长边应小于 2,000 像素如果大于这个尺寸按比例缩放到这个尺寸长宽比不变如果是 Auto如果尺寸大于512但是小于 768缩放到 512,如果长边大于2000或宽边大于768,按照比例缩放到合适的尺寸
https://platform.openai.com/docs/guides/vision
:param image_bytes: 图片的二进制数据
:param mode: 模式
:return: 处理后的图片二进制数据
"""
# 将 bytes 转换为图片对象
image = Image.open(BytesIO(image_bytes))
# 获取图片的尺寸
width, height = image.size
# 限定尺寸的阈值
limit_small = 512
limit_short = 768
limit_long = 2000
# 决定是否需要改变图片尺寸的标记
resize_flag = False
new_size = width, height
if mode == "low":
if max(width, height) > limit_small:
new_size = limit_small, limit_small
resize_flag = True
elif mode == "high":
if min(width, height) > limit_short or max(width, height) > limit_long:
new_size = min(limit_short, width), min(limit_long, height)
resize_flag = True
elif mode == "auto":
if limit_small < max(width, height) < limit_short:
new_size = limit_small, limit_small
resize_flag = True
elif min(width, height) > limit_short or max(width, height) > limit_long:
new_size = min(limit_short, width), min(limit_long, height)
resize_flag = True
if resize_flag:
image.thumbnail(new_size, Image.Resampling.BICUBIC)
bytes_io = BytesIO()
image.save(bytes_io, format="PNG")
bytes_return = bytes_io.getvalue()
return bytes_return
def resize_openai_image(
image_bytes: bytes, mode: Literal["low", "high", "auto"] = "auto"
) -> bytes:
"""
如果是 low,缩放到 512*512如果是 high,图像的短边应小于 768 像素长边应小于 2,000 像素如果大于这个尺寸按比例缩放到这个尺寸长宽比不变如果是 Auto如果尺寸大于512但是小于 768缩放到 512,如果长边大于2000或宽边大于768,按照比例缩放到合适的尺寸
https://platform.openai.com/docs/guides/vision
:param image_bytes: 图片的二进制数据
:param mode: 模式
:return: 处理后的图片二进制数据
"""
if mode not in ["low", "high", "auto"]:
raise ValueError("Invalid mode. Allowed values are 'low', 'high', 'auto'.")
# 将 bytes 转换为图片对象
image = Image.open(BytesIO(image_bytes))
# 获取图片的尺寸
width, height = image.size
# 限定尺寸的阈值
limit_small = 512
limit_short = 768
limit_long = 2000
# 决定是否需要改变图片尺寸的标记
resize_flag = False
new_size = width, height
if mode == "low":
if max(width, height) > limit_small:
new_size = limit_small, limit_small
resize_flag = True
elif mode == "high":
if min(width, height) > limit_short or max(width, height) > limit_long:
new_size = min(limit_short, width), min(limit_long, height)
resize_flag = True
elif mode == "auto":
if limit_small < max(width, height) < limit_short:
new_size = limit_small, limit_small
resize_flag = True
elif min(width, height) > limit_short or max(width, height) > limit_long:
new_size = min(limit_short, width), min(limit_long, height)
resize_flag = True
if resize_flag:
image.thumbnail(new_size, Image.Resampling.BICUBIC)
bytes_io = BytesIO()
image.save(bytes_io, format="PNG")
bytes_return = bytes_io.getvalue()
return bytes_return

Comment on lines +165 to +180
@model_validator(mode="after")
def check_vision(self):
if not self.model.startswith(("gpt-4-vision", "gpt-4-turbo", "claude-3")):
logger.info(
"Remove the image content part from the messages, because the model is not supported."
)
for message in self.messages:
if isinstance(message, UserMessage) and isinstance(
message.content, list
):
message.content = [
content
for content in message.content
if content.type != "image_url"
]
return self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor the image content removal logic in the check_vision method for efficiency.

+                    message.content = [content for content in message.content if not isinstance(content, ImageContent)]

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
@model_validator(mode="after")
def check_vision(self):
if not self.model.startswith(("gpt-4-vision", "gpt-4-turbo", "claude-3")):
logger.info(
"Remove the image content part from the messages, because the model is not supported."
)
for message in self.messages:
if isinstance(message, UserMessage) and isinstance(
message.content, list
):
message.content = [
content
for content in message.content
if content.type != "image_url"
]
return self
@model_validator(mode="after")
def check_vision(self):
if not self.model.startswith(("gpt-4-vision", "gpt-4-turbo", "claude-3")):
logger.info(
"Remove the image content part from the messages, because the model is not supported."
)
for message in self.messages:
if isinstance(message, UserMessage) and isinstance(
message.content, list
):
message.content = [content for content in message.content if not isinstance(content, ImageContent)]
return self

Comment on lines +253 to +259
def add_image(
self,
image_url: Union[str, bytes],
detail: Literal["low", "high", "auto"] = "auto",
):
self.content.append(ContentPart.create_image(url=image_url, detail=detail))
return self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimize the add_image method in the UserMessage class to handle image resizing more efficiently.

+            # Optimize resizing by directly using the resized image URL
+            url = f"data:image/jpeg;base64,{base64.b64encode(url).decode('utf-8')}"

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def add_image(
self,
image_url: Union[str, bytes],
detail: Literal["low", "high", "auto"] = "auto",
):
self.content.append(ContentPart.create_image(url=image_url, detail=detail))
return self
def add_image(
self,
image_url: Union[str, bytes],
detail: Literal["low", "high", "auto"] = "auto",
):
# Optimize resizing by directly using the resized image URL
url = f"data:image/jpeg;base64,{base64.b64encode(image_url).decode('utf-8')}"
self.content.append(ContentPart.create_image(url=url, detail=detail))
return self

@sudoskys sudoskys linked an issue Apr 19, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Image Input
1 participant