Skip to content
This repository has been archived by the owner on Nov 13, 2024. It is now read-only.

context engine docstring #130

Merged
merged 3 commits into from
Nov 1, 2023
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions src/canopy/context_engine/context_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,25 @@ async def aquery(self, queries: List[Query], max_context_tokens: int, ) -> Conte


class ContextEngine(BaseContextEngine):
"""
ContextEngine is responsible for providing context to the LLM, given a set of search queries.

Once called with a set of queries, the ContextEngine will go through the following steps:
1. Query the knowledge base for relevant documents
2. Build a context from the documents retrieved that can be injected into the LLM prompt

The context engine considers token budgeting when building the context, and tries to maximize the amount of relevant information that can be provided to the LLM within the token budget.
Comment on lines +28 to +34
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it! 😍


To create a context engine, you must provide a knowledge base and optionally a context builder.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should also mention that the returned Context object could be either structured or unstructured, but it always supports a .to_text() method dumping it as formatted text?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's too deep for putting in this layer of docstrings, it's something to look at in the context builder I think because it relevant only once you want to costumes it

Example:
>>> from canopy.context_engine import ContextEngine
>>> from canopy.models.data_models import Query
>>> context_engine = ContextEngine(knowledge_base=knowledge_base)
>>> context_engine.query(Query(text="What is the capital of France?"), max_context_tokens=1000)

To create a knowledge base, see the documentation for the knowledge base module (canopy.knowledge_base.knowledge_base).
""" # noqa: E501

_DEFAULT_COMPONENTS = {
'knowledge_base': KnowledgeBase,
Expand All @@ -36,6 +55,14 @@ def __init__(self,
context_builder: Optional[ContextBuilder] = None,
global_metadata_filter: Optional[dict] = None
):
"""
Initialize a new ContextEngine.

Args:
knowledge_base: The knowledge base to query for retrieving documents
context_builder: The context builder to use for building the context from the retrieved documents
acatav marked this conversation as resolved.
Show resolved Hide resolved
global_metadata_filter: A metadata filter to apply to all queries. See: https://docs.pinecone.io/docs/metadata-filtering
""" # noqa: E501

if not isinstance(knowledge_base, BaseKnowledgeBase):
raise TypeError("knowledge_base must be an instance of BaseKnowledgeBase, "
Expand All @@ -55,6 +82,22 @@ def __init__(self,
self.global_metadata_filter = global_metadata_filter

def query(self, queries: List[Query], max_context_tokens: int, ) -> Context:
"""
Query the knowledge base for relevant documents and build a context from the retrieved documents that can be injected into the LLM prompt.

Args:
queries: A list of queries to use for retrieving documents from the knowledge base
max_context_tokens: The maximum number of tokens to use for the context

Returns:
A Context object containing the retrieved documents and metadata

Example:
>>> from canopy.context_engine import ContextEngine
>>> from canopy.models.data_models import Query
>>> context_engine = ContextEngine(knowledge_base=knowledge_base)
>>> context_engine.query(Query(text="What is the capital of France?"), max_context_tokens=1000)
""" # noqa: E501
query_results = self.knowledge_base.query(
queries,
global_metadata_filter=self.global_metadata_filter)
Expand Down