Skip to content
This repository has been archived by the owner on Nov 13, 2024. It is now read-only.

Commit

Permalink
Merge remote-tracking branch 'origin/main' into include-config-in-pac…
Browse files Browse the repository at this point in the history
…akge
  • Loading branch information
acatav committed Feb 8, 2024
2 parents 4302e60 + a2cf274 commit 65b85ee
Show file tree
Hide file tree
Showing 19 changed files with 1,373 additions and 14 deletions.
19 changes: 9 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ This will open a similar chat interface window, but will show both the RAG and n

## Considerations

* Canopy currently only supports OpenAI as the backend for both the embedding model and the LLM. Rate limits and pricing set by OpenAI will apply.
* Rate limits and pricing set by model providers apply to Canopy usage. Canopy currently works with OpenAI, Azure OpenAI, Anyscale, and Cohere models.
* More integrations will be supported in the near future.

## Contributing
Expand Down Expand Up @@ -248,19 +248,18 @@ client = OpenAI(base_url="http://localhost:8000/v1/my-namespace")

### Running Canopy server in production

Canopy is using FastAPI as the web framework and Uvicorn as the ASGI server. It is recommended to use Gunicorn as the production server, mainly because it supports multiple worker processes and can handle multiple requests in parallel, more details can be found [here](https://www.uvicorn.org/deployment/#using-a-process-manager).

To run the canopy server for production, please run:
Canopy is using FastAPI as the web framework and Uvicorn as the ASGI server.
To use Canopy in production, it is recommended to utilize Canopy's docker image, available on [GitHub Packages](https://github.com/pinecone-io/canopy/pkgs/container/canopy),
for your production needs.
For guidance on deploying Canopy on the Google Cloud Platform (GCP), refer to the example provided in the
[Deployment to GCP](docs/deployment-gcp.md) documentation.

Alternatively, you can use Gunicorn as production-grade WSGI, more details [here](https://www.uvicorn.org/deployment/#using-a-process-manager).
Set your desired `PORT` and `WORKER_COUNT` envrionment variables, and start the server with:
```bash
gunicorn canopy_server.app:app --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:PORT --workers WORKER_COUNT
gunicorn canopy_server.app:app --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:$PORT --workers $WORKER_COUNT
```

Alternatively, consider utilizing the Canopy Docker image available on [GitHub Packages](https://github.com/pinecone-io/canopy/pkgs/container/canopy)
for your production needs. For guidance on deploying Canopy on the Google Cloud Platform (GCP), refer to the example provided in the
[Deployment to GCP](docs/deployment-gcp.md) documentation.


> [!IMPORTANT]
> The server interacts with services like Pinecone and OpenAI using your own authentication credentials.
When deploying the server on a public web hosting provider, it is recommended to enable an authentication mechanism,
Expand Down
6 changes: 4 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ openai = "^1.2.3"
tiktoken = "^0.3.3"
pydantic = "^1.10.7"
pandas-stubs = "^2.0.3.230814"
fastapi = ">=0.92.0, <1.0.0"
fastapi = ">=0.93.0, <1.0.0"
uvicorn = ">=0.20.0, <1.0.0"
tenacity = "^8.2.1"
sse-starlette = "^1.6.5"
Expand Down Expand Up @@ -77,8 +77,10 @@ module = [
'pinecone_datasets',
'pinecone',
'transformers.*',
'tokenizers.*',
'cohere.*',
'pinecone.grpc'
'pinecone.grpc',
'huggingface_hub.utils'
]
ignore_missing_imports = true

Expand Down
1 change: 1 addition & 0 deletions src/canopy/chat_engine/query_generator/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
from .function_calling import FunctionCallingQueryGenerator
from .last_message import LastMessageQueryGenerator
from .instruction import InstructionQueryGenerator
from .cohere import CohereQueryGenerator
42 changes: 42 additions & 0 deletions src/canopy/chat_engine/query_generator/cohere.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
from typing import List, Optional, cast

from canopy.chat_engine.query_generator import QueryGenerator
from canopy.chat_engine.history_pruner.raising import RaisingHistoryPruner
from canopy.llm import BaseLLM, CohereLLM
from canopy.models.data_models import Messages, Query


class CohereQueryGenerator(QueryGenerator):
"""
Query generator for LLM clients that have a built-in feature to
generate search queries from chat messages.
"""
_DEFAULT_COMPONENTS = {
"llm": CohereLLM,
}

def __init__(self,
*,
llm: Optional[BaseLLM] = None):
self._llm = llm or self._DEFAULT_COMPONENTS["llm"]()

if not isinstance(self._llm, CohereLLM):
raise NotImplementedError(
"CohereQueryGenerator only compatible with CohereLLM"
)

self._history_pruner = RaisingHistoryPruner()

def generate(self,
messages: Messages,
max_prompt_tokens: int) -> List[Query]:
messages = self._history_pruner.build(chat_history=messages,
max_tokens=max_prompt_tokens)
llm = cast(CohereLLM, self._llm)
queries = llm.generate_search_queries(messages)
return [Query(text=query) for query in queries]

async def agenerate(self,
messages: Messages,
max_prompt_tokens: int) -> List[Query]:
raise NotImplementedError
83 changes: 83 additions & 0 deletions src/canopy/config/cohere.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# ==================================================================
# Configuration file for Canopy Server with Cohere.
# ==================================================================

# ---------------------------------------------------------------------------------
system_prompt: &system_prompt |
Use the documents to answer the user question at the next messages. The documents are retrieved from a knowledge
database and you should use only the facts from the documents to answer. Always remember to include the source to
the documents you used from their 'source' field in the format 'Source: $SOURCE_HERE'.
If you don't know the answer, just say that you don't know, don't try to make up an answer, use the documents.
Don't address the documents directly, but use them to answer the user question like it's your own knowledge.


# -------------------------------------------------------------------------------------------
# Tokenizer configuration
# -------------------------------------------------------------------------------------------
tokenizer:
type: CohereHFTokenizer
params:
model_name: Cohere/Command-nightly


# -------------------------------------------------------------------------------------------------------------
# Chat engine configuration
# -------------------------------------------------------------------------------------------------------------
chat_engine:
params:
system_prompt: *system_prompt

# -------------------------------------------------------------------------------------------------------------
# LLM configuration
# -------------------------------------------------------------------------------------------------------------
llm: &llm
type: CohereLLM
params:
model_name: command
# You can add any additional parameters which are supported by the Cohere Co.Chat API. The values set
# here will be used in every Co.Chat API call. For example:
# prompt_truncation: "AUTO"
# citation_quality: "accurate"
# temperature: 0.85
# Specifying connectors is contrary to Canopy's purpose of searching the Pinecone knowledge base only,
# but technically can still be passed like this:
# connectors:
# - "web-search"
# Uncomment to suppress errors when unrecognized or unsupported model params are sent to CohereLLM.
# ignore_unrecognized_params: true

# --------------------------------------------------------------------
# Configuration for the QueryBuilder subcomponent of the chat engine.
# --------------------------------------------------------------------
query_builder:
type: CohereQueryGenerator
params: {}
llm:
<<: *llm


# -------------------------------------------------------------------------------------------------------------
# ContextEngine configuration
# -------------------------------------------------------------------------------------------------------------
context_engine:
# -----------------------------------------------------------------------------------------------------------
# KnowledgeBase configuration
# -----------------------------------------------------------------------------------------------------------
knowledge_base:
params:
default_top_k: 100

# --------------------------------------------------------------------------
# Configuration for the RecordEncoder subcomponent of the knowledge base.
# --------------------------------------------------------------------------
record_encoder:
type: CohereRecordEncoder
params:
model_name: # The name of the model to use for encoding
"embed-english-v3.0"
batch_size: 100 # The number of document chunks to encode in each call to the encoding model

reranker:
type: CohereReranker
params:
top_n: 5
1 change: 1 addition & 0 deletions src/canopy/knowledge_base/record_encoder/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@
from .anyscale import AnyscaleRecordEncoder
from .azure_openai import AzureOpenAIRecordEncoder
from .jina import JinaRecordEncoder
from .sentence_transformers import SentenceTransformerRecordEncoder
from .hybrid import HybridRecordEncoder
2 changes: 1 addition & 1 deletion src/canopy/knowledge_base/record_encoder/dense.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
class DenseRecordEncoder(RecordEncoder):
"""
DenseRecordEncoder is a subclass of RecordEncoder that generates dense vector representation of documents chunks and textual queries.
The dense represntation generated by the `DenseRecordEncoder` is a list of floats in a given dimension.
The dense representation generated by the `DenseRecordEncoder` is a list of floats in a given dimension.
DenseRecordEncoder wraps a BaseDenseEncoder from the `pinecone-text` library to encode the text itself.
for more information about the BaseDenseEncoder see: https://github.com/pinecone-io/pinecone-text
""" # noqa: E501
Expand Down
57 changes: 57 additions & 0 deletions src/canopy/knowledge_base/record_encoder/sentence_transformers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
from typing import Optional
from pinecone_text.dense import SentenceTransformerEncoder
from canopy.knowledge_base.record_encoder.dense import DenseRecordEncoder
from huggingface_hub.utils import RepositoryNotFoundError


class SentenceTransformerRecordEncoder(DenseRecordEncoder):
"""
SentenceTransformerRecordEncoder is a type of DenseRecordEncoder that uses a Sentence Transformer model.
The implementation uses the `SentenceTransformerEncoder` class from the `pinecone-text` library.
For more information about see: https://github.com/pinecone-io/pinecone-text
""" # noqa: E501

def __init__(self,
*,
model_name: str = "sentence-transformers/all-MiniLM-L6-v2",
query_encoder_name: Optional[str] = None,
batch_size: int = 400,
device: Optional[str] = None,
**kwargs) -> None:
"""
Initialize the SentenceTransformerRecordEncoder
Args:
model_name: The name of the embedding model to use for encoding documents.
See https://huggingface.co/models?library=sentence-transformers
for all possible Sentence Transformer models.
query_encoder_name: The name of the embedding model to use for encoding queries.
See https://huggingface.co/models?library=sentence-transformers
for all possible Sentence Transformer models.
Defaults to `model_name`.
batch_size: The number of documents or queries to encode at once.
Defaults to 400.
device: The local device to use for encoding, for example "cpu", "cuda" or "mps".
Defaults to "cuda" if cuda is available, otherwise to "cpu".
**kwargs: Additional arguments to pass to the underlying `pinecone-text.SentenceTransformerEncoder`.
""" # noqa: E501
try:
encoder = SentenceTransformerEncoder(
document_encoder_name=model_name,
query_encoder_name=query_encoder_name,
device=device,
**kwargs,
)
except RepositoryNotFoundError as e:
raise RuntimeError(
"Your chosen Sentence Transformer model(s) could not be found. "
f"Details: {str(e)}"
) from e
except ImportError:
raise ImportError(
f"{self.__class__.__name__} requires the `torch` and `transformers` "
f"extra dependencies. Please install them using "
f"`pip install canopy-sdk[torch,transformers]`."
)
super().__init__(dense_encoder=encoder, batch_size=batch_size)
1 change: 1 addition & 0 deletions src/canopy/llm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
from .openai import OpenAILLM
from .anyscale import AnyscaleLLM
from .azure_openai_llm import AzureOpenAILLM
from .cohere import CohereLLM
Loading

0 comments on commit 65b85ee

Please sign in to comment.