-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for multimodal #453
Conversation
|
Caution Review failedThe pull request is closed. WalkthroughThe changes introduce a new class Changes
Possibly related PRs
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (2)
templates/components/settings/python/settings.py (2)
78-81
: Improve multi-modal LLM initialization with documentation.The multi-modal initialization could be more explicit and better documented:
+ # Initialize multi-modal LLM if the model supports it, otherwise keep it as None if model_name in GPT4V_MODELS: global multi_modal_llm multi_modal_llm = OpenAIMultiModal(model=model_name) + else: + global multi_modal_llm + multi_modal_llm = NoneAlso consider adding a docstring to the function explaining the side effect of modifying the global
multi_modal_llm
.
7-8
: Consider future multi-modal support for other providers.The current implementation only handles multi-modal capabilities for OpenAI. As other providers (like Gemini) add multi-modal support, consider creating a more generic pattern for multi-modal initialization across different providers.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (2)
templates/components/engines/python/agent/tools/query_engine.py
(2 hunks)templates/components/settings/python/settings.py
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
templates/components/engines/python/agent/tools/query_engine.py (1)
Pattern templates/**
: For files under the templates
folder, do not report 'Missing Dependencies Detected' errors.
templates/components/settings/python/settings.py (1)
Pattern templates/**
: For files under the templates
folder, do not report 'Missing Dependencies Detected' errors.
🔇 Additional comments (3)
templates/components/settings/python/settings.py (1)
2-8
: LGTM! Clean implementation of multi-modal LLM singleton.
The imports and global variable declaration are well-structured with proper type hints.
templates/components/engines/python/agent/tools/query_engine.py (2)
134-137
:
Conflict with image node retrieval in 'auto_routed' mode
The comment on line 135 states that image nodes are not supported for auto_routed
or chunk
retrieval modes. However, kwargs["retrieve_image_nodes"]
is set to True
when retrieval_mode
is 'auto_routed'
, which could lead to inconsistent behavior.
Consider adjusting the logic to ensure retrieve_image_nodes
is only set to True
when the retrieval mode supports image nodes. For example:
if index.__class__.__name__ == "LlamaCloudIndex":
retrieval_mode = kwargs.get("retrieval_mode")
if retrieval_mode is None:
kwargs["retrieval_mode"] = "auto_routed"
- if multi_modal_llm:
+ if multi_modal_llm and retrieval_mode != "auto_routed":
# Note: image nodes are not supported for auto_routed or chunk retrieval mode
kwargs["retrieve_image_nodes"] = True
return MultiModalQueryEngine(
retriever=index.as_retriever(**kwargs),
multi_modal_llm=multi_modal_llm,
)
24-35
:
Initialize self._multi_modal_llm
in the __init__
method
The methods synthesize
and asynthesize
use self._multi_modal_llm
, but it is not initialized in the __init__
method. This could lead to an AttributeError
at runtime.
Apply this diff to fix the issue:
def __init__(
self,
text_synthesizer: Optional[BaseSynthesizer] = None,
+ multi_modal_llm=None,
*args,
**kwargs,
):
super().__init__(*args, **kwargs)
+ self._multi_modal_llm = multi_modal_llm
# Use a response synthesizer for text nodes summarization
self._text_synthesizer = text_synthesizer or get_response_synthesizer(
streaming=False,
response_mode=ResponseMode.TREE_SUMMARIZE,
)
Likely invalid or redundant comment.
templates/components/engines/python/agent/tools/query_engine.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
templates/components/engines/python/agent/tools/query_engine.py (2)
19-22
: Enhance class docstring for better documentation.The current docstring could be more descriptive. Consider adding:
- Purpose of the class and when to use it
- Key features and capabilities
- Example usage
class MultiModalQueryEngine(SimpleMultiModalQueryEngine): """ - A multi-modal query engine that splits the retrieval results into chunks then summarizes each chunk to reduce the number of tokens in the response. + A multi-modal query engine that processes both text and image nodes to generate comprehensive responses. + + This engine extends SimpleMultiModalQueryEngine by adding text node summarization capabilities + to handle large text contexts efficiently. It splits retrieval results into chunks and + summarizes them to reduce token usage while maintaining context quality. + + Key features: + - Efficient text node summarization using tree summarize mode + - Integrated handling of both text and image nodes + - Support for both synchronous and asynchronous operations + + Example: + engine = MultiModalQueryEngine( + retriever=index.as_retriever(), + multi_modal_llm=multi_modal_llm + ) + response = engine.query("Describe the image and relate it to the text context") """
51-115
: Reduce code duplication in synthesis methods.The
synthesize
andasynthesize
methods share almost identical code. Consider extracting the common logic into a private helper method.+ async def _prepare_synthesis( + self, + query_bundle: QueryBundle, + nodes: List[NodeWithScore], + ) -> tuple[str, List[NodeWithScore], List[NodeWithScore]]: + image_nodes, text_nodes = _get_image_and_text_nodes(nodes) + text_response = self._summarize_text_nodes( + query_bundle=query_bundle, + nodes=text_nodes, + ) + fmt_prompt = self._text_qa_template.format( + context_str=text_response, + query_str=query_bundle.query_str, + ) + return fmt_prompt, image_nodes, text_nodes + def synthesize( self, query_bundle: QueryBundle, nodes: List[NodeWithScore], additional_source_nodes: Optional[Sequence[NodeWithScore]] = None, ) -> RESPONSE_TYPE: - image_nodes, text_nodes = _get_image_and_text_nodes(nodes) - text_response = self._summarize_text_nodes( - query_bundle=query_bundle, - nodes=text_nodes, - ) - fmt_prompt = self._text_qa_template.format( - context_str=text_response, - query_str=query_bundle.query_str, - ) + fmt_prompt, image_nodes, text_nodes = self._prepare_synthesis(query_bundle, nodes) llm_response = self._multi_modal_llm.complete( prompt=fmt_prompt, image_documents=[ image_node.node for image_node in image_nodes if isinstance(image_node.node, ImageNode) ], ) return Response( response=str(llm_response), source_nodes=nodes, metadata={"text_nodes": text_nodes, "image_nodes": image_nodes}, ) async def asynthesize( self, query_bundle: QueryBundle, nodes: List[NodeWithScore], additional_source_nodes: Optional[Sequence[NodeWithScore]] = None, ) -> RESPONSE_TYPE: - image_nodes, text_nodes = _get_image_and_text_nodes(nodes) - text_response = self._summarize_text_nodes( - query_bundle=query_bundle, - nodes=text_nodes, - ) - fmt_prompt = self._text_qa_template.format( - context_str=text_response, - query_str=query_bundle.query_str, - ) + fmt_prompt, image_nodes, text_nodes = await self._prepare_synthesis(query_bundle, nodes)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (1)
templates/components/engines/python/agent/tools/query_engine.py
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
templates/components/engines/python/agent/tools/query_engine.py (1)
Pattern templates/**
: For files under the templates
folder, do not report 'Missing Dependencies Detected' errors.
🔇 Additional comments (2)
templates/components/engines/python/agent/tools/query_engine.py (2)
24-35
:
Initialize missing _text_qa_template
in constructor.
The _text_qa_template
is used in the synthesize
and asynthesize
methods but is not initialized in the constructor. This could lead to runtime errors.
131-141
: Clarify image node support limitations and verify retrieval modes.
The comment about image node support limitations could be more explicit. Also, we should verify the retrieval modes that don't support image nodes.
Consider updating the comment to be more explicit:
- # Note: image nodes are not supported for auto_routed or chunk retrieval mode
+ # Enable image node retrieval. Note: While we set this to True,
+ # image nodes may not be effectively retrieved in 'auto_routed' or
+ # 'chunk' retrieval modes due to underlying implementation limitations.
templates/components/engines/python/agent/tools/query_engine.py
Outdated
Show resolved
Hide resolved
templates/components/engines/python/agent/tools/query_engine.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (2)
templates/components/engines/python/agent/tools/query_engine.py (1)
89-102
: Simplify conditional logic and update docstringThe nested conditions could be simplified and the docstring should be updated to reflect new parameters:
def create_query_engine(index, **kwargs) -> BaseQueryEngine: """ Create a query engine for the given index. Args: index: The index to create a query engine for. - params (optional): Additional parameters for the query engine, e.g: similarity_top_k + **kwargs: Additional parameters including: + - retrieval_mode: The retrieval mode to use (default: "auto_routed" for LlamaCloudIndex) + - similarity_top_k: Number of top results to retrieve + - filters: Query filters to apply """ # ... rest of the code ... - if index.__class__.__name__ == "LlamaCloudIndex": - retrieval_mode = kwargs.get("retrieval_mode") - if retrieval_mode is None: - kwargs["retrieval_mode"] = "auto_routed" - mm_model = multi_modal_llm.get() - if mm_model: - kwargs["retrieve_image_nodes"] = True - return RetrieverQueryEngine( - retriever=index.as_retriever(**kwargs), - response_synthesizer=MultiModalSynthesizer( - multimodal_model=mm_model - ), - ) + if index.__class__.__name__ == "LlamaCloudIndex" and kwargs.get("retrieval_mode") is None: + kwargs["retrieval_mode"] = "auto_routed" + mm_model = multi_modal_llm.get() + if mm_model: + kwargs["retrieve_image_nodes"] = True + return RetrieverQueryEngine( + retriever=index.as_retriever(**kwargs), + response_synthesizer=MultiModalSynthesizer( + multimodal_model=mm_model + ), + )templates/components/settings/python/settings.py (1)
80-82
: Consider future multi-modal support for other providers.As other providers (like Gemini) may support multi-modal capabilities, consider abstracting the multi-modal initialization into a separate function that can be called from each provider's init function.
Example structure:
def init_multi_modal(provider: str, model_name: str) -> Optional[MultiModalLLM]: """Initialize multi-modal LLM based on provider and model.""" if not os.getenv("MODEL"): return None match provider: case "openai": if model_name in GPT4V_MODELS: return OpenAIMultiModal(model=model_name) # Add cases for other providers return None
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (2)
templates/components/engines/python/agent/tools/query_engine.py
(1 hunks)templates/components/settings/python/settings.py
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
templates/components/engines/python/agent/tools/query_engine.py (1)
Pattern templates/**
: For files under the templates
folder, do not report 'Missing Dependencies Detected' errors.
templates/components/settings/python/settings.py (1)
Pattern templates/**
: For files under the templates
folder, do not report 'Missing Dependencies Detected' errors.
🔇 Additional comments (6)
templates/components/engines/python/agent/tools/query_engine.py (4)
2-23
: LGTM: Imports are well-organized and complete
All necessary dependencies for multi-modal functionality are properly imported.
26-41
: LGTM: Class initialization is properly implemented
The class correctly:
- Extends TreeSummarize
- Initializes text_qa_template with a default value
- Stores the multimodal_model
42-48
: Remove unused parameter
The additional_source_nodes
parameter is not used in the implementation.
93-94
: Verify multi_modal_llm initialization
Ensure that multi_modal_llm.get()
is properly initialized before use.
✅ Verification successful
Multi-modal LLM initialization is properly handled
The code is correctly implemented:
multi_modal_llm
is defined as a ContextVar in settings.py- It's properly initialized in settings.py when the model is in GPT4V_MODELS
- The query engine safely checks for the model's existence with
if mm_model:
before using it
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check multi_modal_llm initialization
# Search for multi_modal_llm initialization in settings.py
rg -A 5 "multi_modal_llm\s*=|def\s+init.*modal"
# Search for potential usage before initialization
rg "multi_modal_llm\.get\(\)"
Length of output: 782
Script:
#!/bin/bash
# Check for multi_modal_llm class/module definition and initialization
ast-grep --pattern 'class $_ {
$$$
get() {
$$$
}
$$$
}'
# Search for multi_modal_llm imports and related code
rg -B 2 -A 2 "multi_modal_llm|multimodal" --type python
Length of output: 158
Script:
#!/bin/bash
# Search for multi_modal_llm imports and related code without file type restriction
rg -B 2 -A 2 "multi_modal_llm|multimodal"
# Find Python files that might contain the implementation
fd "\.py$" --exec grep -l "multi_modal_llm\|multimodal" {} \;
Length of output: 5513
templates/components/settings/python/settings.py (2)
8-10
: Improve documentation for the multi_modal_llm variable.
The docstring should clarify that this is a singleton for multi-modal LLM since Settings
doesn't support setting MultiModalLLM
.
multi_modal_llm: ContextVar[Optional[MultiModalLLM]] = ContextVar(
- "multi_modal_llm", default=None
+ "multi_modal_llm", # Singleton for multi-modal LLM (as `Settings` does not support setting `MultiModalLLM`)
+ default=None
)
73-73
: Fix typo in default model name.
The default model name "gpt-4o-mini" appears to be a typo.
- model_name = os.getenv("MODEL", "gpt-4o-mini")
+ model_name = os.getenv("MODEL", "gpt-4")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
templates/components/engines/python/agent/tools/query_engine.py (1)
121-134
: Improve logging and model validationConsider these improvements:
- Replace print statement with proper logging
- Add validation for multi-modal model capabilities
mm_model = multi_modal_llm.get() if mm_model: + if not hasattr(mm_model, 'complete') or not hasattr(mm_model, 'acomplete'): + raise ValueError("Invalid multi-modal model: missing required methods") kwargs["retrieve_image_nodes"] = True - print("Using multi-modal model") + logger.info("Initialized query engine with multi-modal capabilities") return RetrieverQueryEngine(
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (1)
templates/components/engines/python/agent/tools/query_engine.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
templates/components/engines/python/agent/tools/query_engine.py (1)
Pattern templates/**
: For files under the templates
folder, do not report 'Missing Dependencies Detected' errors.
🔇 Additional comments (3)
templates/components/engines/python/agent/tools/query_engine.py (3)
2-23
: LGTM! Well-organized imports
The imports are properly organized and all are necessary for the multi-modal functionality implementation.
42-48
: Remove unused parameter
The additional_source_nodes
parameter is not used in either synthesize
or asynthesize
methods.
def synthesize(
self,
query: QueryTextType,
nodes: List[NodeWithScore],
- additional_source_nodes: Optional[Sequence[NodeWithScore]] = None,
**response_kwargs: Any,
) -> RESPONSE_TYPE:
Apply the same change to asynthesize
method.
Also applies to: 74-80
59-66
: Add error handling for LLM operations
Both synchronous and asynchronous LLM calls lack error handling.
+ try:
llm_response = self._multi_modal_llm.complete(
prompt=fmt_prompt,
image_documents=[
image_node.node
for image_node in image_nodes
if isinstance(image_node.node, ImageNode)
],
)
+ except Exception as e:
+ raise RuntimeError(f"Failed to generate response: {str(e)}") from e
Apply similar error handling to acomplete
call in asynthesize
method.
Also applies to: 91-98
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (1)
templates/components/engines/python/agent/tools/query_engine.py (1)
26-30
: Enhance class documentationWhile the docstring provides a basic description, it would be helpful to include:
- Parameters description
- Usage examples
- Return type expectations
class MultiModalSynthesizer(TreeSummarize): """ - A synthesizer that summarizes text nodes and uses a multi-modal LLM to generate a response. + A synthesizer that summarizes text nodes and uses a multi-modal LLM to generate a response. + + Args: + multimodal_model (Optional[MultiModalLLM]): The multi-modal language model to use + text_qa_template (Optional[BasePromptTemplate]): Template for text QA formatting + + Example: + >>> synthesizer = MultiModalSynthesizer(multimodal_model=llm) + >>> response = await synthesizer.asynthesize(query, nodes) + """
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (2)
templates/components/engines/python/agent/tools/query_engine.py
(1 hunks)templates/components/settings/python/settings.py
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
templates/components/engines/python/agent/tools/query_engine.py (1)
Pattern templates/**
: For files under the templates
folder, do not report 'Missing Dependencies Detected' errors.
templates/components/settings/python/settings.py (1)
Pattern templates/**
: For files under the templates
folder, do not report 'Missing Dependencies Detected' errors.
🔇 Additional comments (4)
templates/components/engines/python/agent/tools/query_engine.py (1)
31-41
:
Add model validation and fix template initialization
The initialization needs two improvements:
- Validate the multi-modal model parameter
- Fix the template initialization as noted in the previous review
def __init__(
self,
multimodal_model: Optional[MultiModalLLM] = None,
text_qa_template: Optional[BasePromptTemplate] = None,
*args,
**kwargs,
):
super().__init__(*args, **kwargs)
+ if multimodal_model is None:
+ raise ValueError("multimodal_model is required")
self._multi_modal_llm = multimodal_model
- self._text_qa_template = text_qa_template or DEFAULT_TREE_SUMMARIZE_PROMPT_SEL
+ self._text_qa_template = text_qa_template or DEFAULT_TREE_SUMMARIZE_PROMPT_SEL.get_prompt()
Likely invalid or redundant comment.
templates/components/settings/python/settings.py (3)
7-9
: Improve documentation for global variable.
The comment should better explain why a global variable is needed instead of using the Settings class.
-# `Settings` does not support setting `MultiModalLLM`
-# so we use a global variable to store it
+# Singleton for multi-modal LLM (as `Settings` does not support setting `MultiModalLLM`)
83-85
: Improve multi-modal initialization logic.
The multi-modal initialization should only occur if the model is explicitly set through the environment variable, not just based on model name matching.
- if model_name in GPT4V_MODELS:
+ if os.getenv("MODEL") and model_name in GPT4V_MODELS:
global _multi_modal_llm
_multi_modal_llm = OpenAIMultiModal(model=model_name)
76-76
: Fix typo in default model name.
The default model name "gpt-4o-mini" appears to be a typo.
- model_name = os.getenv("MODEL", "gpt-4o-mini")
+ model_name = os.getenv("MODEL", "gpt-4")
templates/components/engines/python/agent/tools/query_engine.py
Outdated
Show resolved
Hide resolved
templates/components/engines/python/agent/tools/query_engine.py
Outdated
Show resolved
Hide resolved
templates/components/engines/python/agent/tools/query_engine.py
Outdated
Show resolved
Hide resolved
templates/components/engines/python/agent/tools/query_engine.py
Outdated
Show resolved
Hide resolved
templates/components/engines/python/agent/tools/query_engine.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (1)
templates/components/engines/python/agent/tools/query_engine.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
templates/components/engines/python/agent/tools/query_engine.py (1)
Pattern templates/**
: For files under the templates
folder, do not report 'Missing Dependencies Detected' errors.
🔇 Additional comments (2)
templates/components/engines/python/agent/tools/query_engine.py (2)
1-23
: LGTM! Well-organized imports
The imports are properly organized and include all necessary dependencies for multi-modal functionality.
42-72
:
Add comprehensive error handling and remove unused parameter
The asynthesize
method needs several improvements:
- Remove unused
additional_source_nodes
parameter - Add error handling for text summarization and LLM calls
- Add type validation for query
Apply these changes:
async def asynthesize(
self,
query: QueryTextType,
nodes: List[NodeWithScore],
- additional_source_nodes: Optional[Sequence[NodeWithScore]] = None,
**response_kwargs: Any,
) -> RESPONSE_TYPE:
image_nodes, text_nodes = _get_image_and_text_nodes(nodes)
+ if not nodes:
+ raise ValueError("No nodes provided for synthesis")
+ try:
# Summarize the text nodes to avoid exceeding the token limit
text_response = str(await super().asynthesize(query, nodes))
+ except Exception as e:
+ raise RuntimeError(f"Failed to summarize text nodes: {str(e)}") from e
+ if not hasattr(query, 'query_str'):
+ raise ValueError("Query must have 'query_str' attribute")
fmt_prompt = self._text_qa_template.format(
context_str=text_response,
- query_str=query.query_str, # type: ignore
+ query_str=query.query_str,
)
+ try:
llm_response = await self._multi_modal_llm.acomplete(
prompt=fmt_prompt,
image_documents=[
image_node.node
for image_node in image_nodes
if isinstance(image_node.node, ImageNode)
],
)
+ except Exception as e:
+ raise RuntimeError(f"Failed to generate multi-modal response: {str(e)}") from e
return Response(
response=str(llm_response),
source_nodes=nodes,
metadata={"text_nodes": text_nodes, "image_nodes": image_nodes},
)
Likely invalid or redundant comment.
templates/components/engines/python/agent/tools/query_engine.py
Outdated
Show resolved
Hide resolved
templates/components/engines/python/agent/tools/query_engine.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
templates/components/engines/python/agent/tools/query_engine.py (1)
26-29
: Enhance class docstring with more detailsThe current docstring could be more informative. Consider adding parameter descriptions, usage examples, and return type information.
class MultiModalSynthesizer(BaseSynthesizer): """ - A synthesizer that summarizes text nodes and uses a multi-modal LLM to generate a response. + A synthesizer that processes both text and image nodes to generate multi-modal responses. + + Args: + multimodal_model (MultiModalLLM): The multi-modal language model for processing + response_synthesizer (Optional[BaseSynthesizer]): Synthesizer for text node processing + text_qa_template (Optional[BasePromptTemplate]): Template for formatting QA prompts + + Example: + >>> synthesizer = MultiModalSynthesizer( + ... multimodal_model=multi_modal_llm, + ... response_synthesizer=get_response_synthesizer() + ... ) + >>> response = await synthesizer.asynthesize(query, nodes) """
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (1)
templates/components/engines/python/agent/tools/query_engine.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
templates/components/engines/python/agent/tools/query_engine.py (1)
Pattern templates/**
: For files under the templates
folder, do not report 'Missing Dependencies Detected' errors.
🔇 Additional comments (4)
templates/components/engines/python/agent/tools/query_engine.py (4)
44-50
: 🛠️ Refactor suggestion
Remove unused parameter
The additional_source_nodes
parameter is never used in the implementation.
async def asynthesize(
self,
query: QueryTextType,
nodes: List[NodeWithScore],
- additional_source_nodes: Optional[Sequence[NodeWithScore]] = None,
**response_kwargs: Any,
) -> RESPONSE_TYPE:
Likely invalid or redundant comment.
63-70
:
Add error handling for LLM operations
The multi-modal LLM call should include error handling.
+ try:
llm_response = await self._multi_modal_llm.acomplete(
prompt=fmt_prompt,
image_documents=[
image_node.node
for image_node in image_nodes
if isinstance(image_node.node, ImageNode)
],
)
+ except Exception as e:
+ raise RuntimeError(f"Failed to generate multi-modal response: {str(e)}") from e
Likely invalid or redundant comment.
53-61
:
Add error handling and type validation
The text summarization and prompt formatting need proper error handling and type validation.
# Summarize the text nodes to avoid exceeding the token limit
+ try:
text_response = str(
await self._response_synthesizer.asynthesize(query, text_nodes)
)
+ except Exception as e:
+ raise RuntimeError(f"Failed to summarize text nodes: {str(e)}") from e
+ if not hasattr(query, 'query_str'):
+ raise ValueError("Query must have 'query_str' attribute")
fmt_prompt = self._text_qa_template.format(
context_str=text_response,
- query_str=query.query_str, # type: ignore
+ query_str=query.query_str,
)
Likely invalid or redundant comment.
93-102
: 🛠️ Refactor suggestion
Improve multi-modal query engine creation
The current implementation has several limitations:
- Multi-modal support is restricted to LlamaCloudIndex
- Not using MultiModalSynthesizer for response synthesis
if index.__class__.__name__ == "LlamaCloudIndex":
retrieval_mode = kwargs.get("retrieval_mode")
if retrieval_mode is None:
kwargs["retrieval_mode"] = "auto_routed"
- if get_multi_modal_llm():
- kwargs["retrieve_image_nodes"] = True
- return RetrieverQueryEngine(
- retriever=index.as_retriever(**kwargs),
- response_synthesizer=get_response_synthesizer(),
- )
+ multi_modal_llm = get_multi_modal_llm()
+ if multi_modal_llm:
+ kwargs["retrieve_image_nodes"] = True
+ return RetrieverQueryEngine(
+ retriever=index.as_retriever(**kwargs),
+ response_synthesizer=MultiModalSynthesizer(
+ multimodal_model=multi_modal_llm,
+ response_synthesizer=get_response_synthesizer()
+ ),
+ )
Likely invalid or redundant comment.
3171d73
to
235159e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
templates/components/engines/python/agent/tools/query_engine.py (1)
26-29
: Enhance class docstring with more detailsThe current docstring could be more informative. Consider adding:
- Parameters description
- Return value description
- Usage example
class MultiModalSynthesizer(BaseSynthesizer): """ - A synthesizer that summarizes text nodes and uses a multi-modal LLM to generate a response. + A synthesizer that summarizes text nodes and uses a multi-modal LLM to generate a response. + + Args: + multimodal_model (MultiModalLLM): The multi-modal language model to use + response_synthesizer (Optional[BaseSynthesizer]): Synthesizer for text responses + text_qa_template (Optional[BasePromptTemplate]): Template for text QA + + Returns: + RESPONSE_TYPE: A response containing both text and image analysis + + Example: + >>> synthesizer = MultiModalSynthesizer( + ... multimodal_model=get_multi_modal_llm(), + ... response_synthesizer=get_response_synthesizer() + ... ) """
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (1)
templates/components/engines/python/agent/tools/query_engine.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
templates/components/engines/python/agent/tools/query_engine.py (1)
Pattern templates/**
: For files under the templates
folder, do not report 'Missing Dependencies Detected' errors.
🔇 Additional comments (3)
templates/components/engines/python/agent/tools/query_engine.py (3)
31-43
:
Add parameter validation for required arguments
The constructor should validate required parameters and properly handle optional ones.
def __init__(
self,
multimodal_model: MultiModalLLM,
- response_synthesizer: Optional[BaseSynthesizer],
+ response_synthesizer: Optional[BaseSynthesizer] = None,
text_qa_template: Optional[BasePromptTemplate] = None,
*args,
**kwargs,
):
super().__init__(*args, **kwargs)
+ if multimodal_model is None:
+ raise ValueError("multimodal_model is required")
+ if response_synthesizer is None:
+ raise ValueError("response_synthesizer is required")
self._multi_modal_llm = multimodal_model
self._response_synthesizer = response_synthesizer
self._text_qa_template = text_qa_template or DEFAULT_TEXT_QA_PROMPT_SEL
44-76
:
Add comprehensive error handling for async operations
The method needs proper error handling for async operations and text processing.
async def asynthesize(
self,
query: QueryTextType,
nodes: List[NodeWithScore],
- additional_source_nodes: Optional[Sequence[NodeWithScore]] = None,
**response_kwargs: Any,
) -> RESPONSE_TYPE:
image_nodes, text_nodes = _get_image_and_text_nodes(nodes)
+ try:
text_response = str(
await self._response_synthesizer.asynthesize(query, text_nodes)
)
+ except Exception as e:
+ raise RuntimeError(f"Failed to synthesize text response: {str(e)}") from e
fmt_prompt = self._text_qa_template.format(
context_str=text_response,
query_str=query.query_str, # type: ignore
)
+ try:
llm_response = await self._multi_modal_llm.acomplete(
prompt=fmt_prompt,
image_documents=[
image_node.node
for image_node in image_nodes
if isinstance(image_node.node, ImageNode)
],
)
+ except Exception as e:
+ raise RuntimeError(f"Failed to generate multi-modal response: {str(e)}") from e
93-106
: 🛠️ Refactor suggestion
Improve multi-modal support flexibility
The current implementation restricts multi-modal support to LlamaCloudIndex. Consider making it available for other index types when multi-modal LLM is available.
- if index.__class__.__name__ == "LlamaCloudIndex":
- retrieval_mode = kwargs.get("retrieval_mode")
- if retrieval_mode is None:
- kwargs["retrieval_mode"] = "auto_routed"
- if get_multi_modal_llm():
- kwargs["retrieve_image_nodes"] = True
- return RetrieverQueryEngine(
- retriever=index.as_retriever(**kwargs),
- response_synthesizer=MultiModalSynthesizer(
- multimodal_model=get_multi_modal_llm(),
- response_synthesizer=get_response_synthesizer(),
- ),
- )
+ # Set auto-routed mode for LlamaCloudIndex
+ if index.__class__.__name__ == "LlamaCloudIndex" and kwargs.get("retrieval_mode") is None:
+ kwargs["retrieval_mode"] = "auto_routed"
+
+ # Enable multi-modal support if available
+ multi_modal_llm = get_multi_modal_llm()
+ if multi_modal_llm:
+ kwargs["retrieve_image_nodes"] = True
+ return RetrieverQueryEngine(
+ retriever=index.as_retriever(**kwargs),
+ response_synthesizer=MultiModalSynthesizer(
+ multimodal_model=multi_modal_llm,
+ response_synthesizer=get_response_synthesizer(),
+ ),
+ )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (1)
templates/components/engines/python/agent/tools/query_engine.py (1)
82-85
: Enhance class documentationThe class documentation could be more descriptive about its functionality and requirements.
Consider expanding the docstring:
class MultiModalSynthesizer(BaseSynthesizer): """ - A synthesizer that summarizes text nodes and uses a multi-modal LLM to generate a response. + A synthesizer that processes both text and image nodes to generate multi-modal responses. + + This synthesizer: + 1. Separates text and image nodes from the input + 2. Summarizes text nodes using a response synthesizer + 3. Combines the summarized text with images using a multi-modal LLM + + Args: + multimodal_model: The multi-modal LLM to use for generating responses + response_synthesizer: The synthesizer to use for text node summarization + text_qa_template: Optional template for formatting the text query """
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (1)
templates/components/engines/python/agent/tools/query_engine.py
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
templates/components/engines/python/agent/tools/query_engine.py (1)
Pattern templates/**
: For files under the templates
folder, do not report 'Missing Dependencies Detected' errors.
🔇 Additional comments (1)
templates/components/engines/python/agent/tools/query_engine.py (1)
87-99
:
Improve constructor parameter validation and initialization
The constructor needs validation for required parameters and proper initialization of templates.
Apply these changes:
def __init__(
self,
multimodal_model: MultiModalLLM,
response_synthesizer: Optional[BaseSynthesizer],
text_qa_template: Optional[BasePromptTemplate] = None,
*args,
**kwargs,
):
super().__init__(*args, **kwargs)
+ if multimodal_model is None:
+ raise ValueError("multimodal_model is required")
+ if response_synthesizer is None:
+ raise ValueError("response_synthesizer is required")
self._multi_modal_llm = multimodal_model
self._response_synthesizer = response_synthesizer
- self._text_qa_template = text_qa_template or DEFAULT_TEXT_QA_PROMPT_SEL
+ self._text_qa_template = text_qa_template or DEFAULT_TEXT_QA_PROMPT_SEL.get_prompt()
Likely invalid or redundant comment.
templates/components/engines/python/agent/tools/query_engine.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
templates/components/engines/python/agent/tools/query_engine.py (1)
81-188
: Consider extracting common logic to reduce duplicationThe
synthesize
andasynthesize
methods share significant logic. Consider extracting the common parts into private helper methods.+ def _prepare_prompt( + self, + query: QueryTextType, + text_response: str, + ) -> str: + if not hasattr(query, 'query_str'): + raise ValueError("Query must have 'query_str' attribute") + return self._text_qa_template.format( + context_str=text_response, + query_str=query.query_str, + ) + + def _prepare_image_documents( + self, + image_nodes: List[NodeWithScore], + ) -> List[ImageNode]: + return [ + image_node.node + for image_node in image_nodes + if isinstance(image_node.node, ImageNode) + ]
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (1)
templates/components/engines/python/agent/tools/query_engine.py
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
templates/components/engines/python/agent/tools/query_engine.py (1)
Pattern templates/**
: For files under the templates
folder, do not report 'Missing Dependencies Detected' errors.
🔇 Additional comments (3)
templates/components/engines/python/agent/tools/query_engine.py (3)
155-188
:
Apply the same improvements to the synchronous synthesize method
The synchronous method has the same issues as its asynchronous counterpart.
Apply the same improvements as suggested for the asynthesize
method:
- Remove unused parameter
- Add error handling
- Fix type checking
86-98
:
Add parameter validation and fix template initialization
The constructor needs validation for the required multimodal_model
parameter and proper initialization of the text QA template.
def __init__(
self,
multimodal_model: MultiModalLLM,
response_synthesizer: Optional[BaseSynthesizer] = None,
text_qa_template: Optional[BasePromptTemplate] = None,
*args,
**kwargs,
):
super().__init__(*args, **kwargs)
+ if multimodal_model is None:
+ raise ValueError("multimodal_model is required")
self._multi_modal_llm = multimodal_model
self._response_synthesizer = response_synthesizer or get_response_synthesizer()
- self._text_qa_template = text_qa_template or DEFAULT_TEXT_QA_PROMPT_SEL
+ self._text_qa_template = text_qa_template or DEFAULT_TEXT_QA_PROMPT_SEL.get_prompt()
Likely invalid or redundant comment.
118-153
:
Improve error handling and remove unused parameter in asynthesize
The method needs several improvements:
- Remove unused
additional_source_nodes
parameter - Add error handling for text summarization and LLM calls
- Fix type checking for query.query_str
async def asynthesize(
self,
query: QueryTextType,
nodes: List[NodeWithScore],
- additional_source_nodes: Optional[Sequence[NodeWithScore]] = None,
**response_kwargs: Any,
) -> RESPONSE_TYPE:
+ if not nodes:
+ raise ValueError("No nodes provided for synthesis")
+
image_nodes, text_nodes = _get_image_and_text_nodes(nodes)
if len(image_nodes) == 0:
return await self._response_synthesizer.asynthesize(query, text_nodes)
- # Summarize the text nodes to avoid exceeding the token limit
- text_response = str(
- await self._response_synthesizer.asynthesize(query, text_nodes)
- )
+ try:
+ text_response = str(
+ await self._response_synthesizer.asynthesize(query, text_nodes)
+ )
+ if not hasattr(query, 'query_str'):
+ raise ValueError("Query must have 'query_str' attribute")
+ fmt_prompt = self._text_qa_template.format(
+ context_str=text_response,
+ query_str=query.query_str,
+ )
+ llm_response = await self._multi_modal_llm.acomplete(
+ prompt=fmt_prompt,
+ image_documents=[
+ image_node.node
+ for image_node in image_nodes
+ if isinstance(image_node.node, ImageNode)
+ ],
+ )
+ except Exception as e:
+ raise RuntimeError(f"Failed to generate response: {str(e)}") from e
Likely invalid or redundant comment.
templates/components/engines/python/agent/tools/query_engine.py
Outdated
Show resolved
Hide resolved
7adbaf1
to
a7f56a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (1)
templates/components/engines/python/agent/tools/query_engine.py (1)
35-47
: Simplify multi-modal configuration logicThe current implementation has nested conditions that make the code harder to follow. Consider separating the concerns:
multimodal_llm = get_multi_modal_llm() + # Configure multi-modal support if multimodal_llm: kwargs["response_synthesizer"] = MultiModalSynthesizer( multimodal_model=multimodal_llm, ) + kwargs["retrieve_image_nodes"] = True + # Configure LlamaCloudIndex settings if index.__class__.__name__ == "LlamaCloudIndex": - if kwargs.get("retrieval_mode") is None: - kwargs["retrieval_mode"] = "auto_routed" - if multimodal_llm: - kwargs["retrieve_image_nodes"] = True + kwargs.setdefault("retrieval_mode", "auto_routed")
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (1)
templates/components/engines/python/agent/tools/query_engine.py
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
templates/components/engines/python/agent/tools/query_engine.py (1)
Pattern templates/**
: For files under the templates
folder, do not report 'Missing Dependencies Detected' errors.
🔇 Additional comments (2)
templates/components/engines/python/agent/tools/query_engine.py (2)
133-136
:
Add type validation for query attributes
The type ignore comment suggests potential type issues that should be handled explicitly:
+ if not hasattr(query, 'query_str'):
+ raise ValueError("Query must have 'query_str' attribute")
fmt_prompt = self._text_qa_template.format(
context_str=text_response,
- query_str=query.query_str, # type: ignore
+ query_str=query.query_str,
)
Likely invalid or redundant comment.
138-145
:
Add error handling for LLM operations
The LLM call could fail for various reasons (API errors, rate limits, etc.):
+ try:
llm_response = await self._multi_modal_llm.acomplete(
prompt=fmt_prompt,
image_documents=[
image_node.node
for image_node in image_nodes
if isinstance(image_node.node, ImageNode)
],
)
+ except Exception as e:
+ raise RuntimeError(f"Failed to generate multi-modal response: {str(e)}") from e
Likely invalid or redundant comment.
templates/components/engines/python/agent/tools/query_engine.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (1)
templates/components/engines/python/agent/tools/query_engine.py
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
templates/components/engines/python/agent/tools/query_engine.py (1)
Pattern templates/**
: For files under the templates
folder, do not report 'Missing Dependencies Detected' errors.
🔇 Additional comments (2)
templates/components/engines/python/agent/tools/query_engine.py (2)
134-137
: 🛠️ Refactor suggestion
Add type checking for query.query_str access
Replace the type ignore comment with proper type checking:
+ if not hasattr(query, 'query_str'):
+ raise ValueError("Query must have 'query_str' attribute")
fmt_prompt = self._text_qa_template.format(
context_str=text_response,
- query_str=query.query_str, # type: ignore
+ query_str=query.query_str,
)
Likely invalid or redundant comment.
139-146
: 🛠️ Refactor suggestion
Add error handling for LLM operations
Add proper error handling for LLM operations:
+ try:
llm_response = await self._multi_modal_llm.acomplete(
prompt=fmt_prompt,
image_documents=[
image_node.node
for image_node in image_nodes
if isinstance(image_node.node, ImageNode)
],
)
+ except Exception as e:
+ raise RuntimeError(f"Failed to generate multi-modal response: {str(e)}") from e
Likely invalid or redundant comment.
Summary by CodeRabbit
New Features
MultiModalSynthesizer
for enhanced multi-modal query processing, integrating both text and image nodes.asynthesize
method.Bug Fixes
Documentation