-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search prod ready #114
Closed
Closed
Search prod ready #114
Changes from all commits
Commits
Show all changes
33 commits
Select commit
Hold shift + click to select a range
5cff1a2
Search Tool subclass
jacobvm04 1ec91dd
Cleaning up search.py
jacobvm04 eb8e576
Embedding page summarization
jacobvm04 5dc9464
Search documentation
jacobvm04 04e568a
Bloom integration + async refactor + local embeddings + reranking
jacobvm04 7d8cac6
Embedding Search improvements
jacobvm04 c26f948
Remove Todo 😎
jacobvm04 aa6b520
Update README and .env template
jacobvm04 5780b78
await BloomChain in web
jacobvm04 ccc5404
Update read me + gitignore
jacobvm04 21ac962
Merge remote-tracking branch 'plastic-labs/staging' into staging
jacobvm04 6d78022
Merge branch 'staging' into search-tool
jacobvm04 818878a
Update poetry lock file
jacobvm04 6491388
Quick refactors
jacobvm04 aa1d717
Web UI Search fixes
jacobvm04 f41f5c0
Search switch to gpt-3.5-turbo-instruct
jacobvm04 a634abe
add faiss
hyusap 976b03a
Merge pull request #4 from plastic-labs/search-tool
hyusap d777f11
Merge pull request #102 from jacobvm04/search-tool
hyusap 156d373
Switch to openai embeddings
jacobvm04 db2b902
Merge branch 'staging' into search-prod-ready
hyusap 0edee99
lock update
hyusap 4dcf397
tool llm name change
jacobvm04 a132fea
Changing response prompt instructions for the search_ready prompt
jacobvm04 9d0b71c
Refactoring the search tool, updating sync run
jacobvm04 33478d6
Refactor search to be synchronous
jacobvm04 a3c045b
Decouple search logic from respond
jacobvm04 ece5151
Azure compatability
VVoruganti dac800f
Async refactor
VVoruganti 67e3b2b
fix: need gcc, using build-essential to future proof
vintrocode 04fc6e4
implement autoscroll (#112)
hyusap ac4addd
Search debugging!
jacobvm04 d130b8e
Remove unnecessary playwright dependency
VVoruganti File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,146 @@ | ||
import asyncio | ||
import logging | ||
import os | ||
from typing import Optional, Type | ||
|
||
from dotenv import load_dotenv | ||
from langchain.callbacks.manager import ( | ||
AsyncCallbackManagerForToolRun, | ||
CallbackManagerForToolRun, | ||
) | ||
from langchain.chains import LLMChain | ||
from langchain.docstore.document import Document | ||
from langchain.document_loaders import AsyncHtmlLoader | ||
from langchain.document_transformers import Html2TextTransformer | ||
from langchain.embeddings.base import Embeddings | ||
from langchain.llms.base import BaseLLM | ||
from langchain.output_parsers import ResponseSchema, StructuredOutputParser | ||
from langchain.prompts import Prompt | ||
from langchain.text_splitter import TokenTextSplitter | ||
from langchain.tools.base import BaseTool | ||
from langchain.utilities import GoogleSerperAPIWrapper | ||
from langchain.vectorstores import FAISS | ||
|
||
logger = logging.getLogger(__name__) | ||
load_dotenv() # Load environment variables | ||
|
||
|
||
# import nest_asyncio | ||
# nest_asyncio.apply() # https://github.com/erdewit/nest_asyncio | ||
|
||
# TODO: Store search results for entire conversation in vector store | ||
# TODO: Add answerbox to search results when available | ||
|
||
class SearchTool(BaseTool): | ||
name: str = "search" | ||
description: str = "useful for when you need to search for something on the internet" | ||
llm: BaseLLM | ||
embeddings: Embeddings | ||
search: GoogleSerperAPIWrapper | ||
|
||
@classmethod | ||
def from_llm(cls, llm: BaseLLM, embeddings: Embeddings): | ||
"""Return a tool from a chat model.""" | ||
search = GoogleSerperAPIWrapper() | ||
search.k = 3 | ||
|
||
if os.environ.get("USE_RERANKER") == "true": | ||
from FlagEmbedding import FlagReranker | ||
model = 'BAAI/bge-reranker-base' | ||
|
||
cls.reranker = FlagReranker(model) | ||
logger.info(f"Loaded reranker \"{model}\" for webpage search") | ||
|
||
return cls(llm=llm, embeddings=embeddings, search=search) | ||
|
||
def _run( | ||
self, query: str, run_manager: Optional[CallbackManagerForToolRun] = None | ||
) -> str: | ||
"""Use the tool.""" | ||
return asyncio.run(self._arun(query=query)) | ||
|
||
async def _arun( | ||
self, query: str, run_manager: Optional[AsyncCallbackManagerForToolRun] = None | ||
) -> str: | ||
"""Use the tool asynchronously.""" | ||
|
||
# remove quotes from query if present | ||
if query[0] == '"' and query[-1] == '"': | ||
query = query[1:-1] | ||
jacobvm04 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
results = await self.search.aresults(query=query) | ||
organic_results = results["organic"] | ||
|
||
summaries = await asyncio.gather(*[self._aresearch_url(result["link"], query) for result in organic_results]) | ||
relevant_results = [ | ||
{ | ||
"title": result["title"], | ||
"snippet": result["snippet"], | ||
"link": result["link"], | ||
"summary": summary, | ||
} for result, summary in zip(organic_results, summaries) | ||
] | ||
|
||
formatted_results = [ | ||
f"{result['title']} - {result['link']}\nSnippet: {result['snippet']}\nPage Summary: {result['summary']}" for result in relevant_results | ||
] | ||
formatted_results = "Search Results:\n" + "\n----------------------\n\n".join(formatted_results) | ||
|
||
return formatted_results | ||
|
||
async def _aresearch_url(self, url: str, query: str): | ||
"""Research a URL by embedding the web page and then using the most relevant sections to the query to generate a summary of the most important information on the page.""" | ||
|
||
prompt = Prompt.from_template("Your job is to summarize the information on the web page AS IT PERTAINS TO THE QUERY. You will be given a few selected sections of the web page to base your answer off of. \n\nQuestion: {query}\n\nBEGIN SELECTIONS\n{doc}\nEND SELECTIONS") | ||
llm_chain = LLMChain(llm=self.llm, prompt=prompt) | ||
|
||
try: | ||
# Load HTML | ||
loader = AsyncHtmlLoader([url]) | ||
html2text = Html2TextTransformer() | ||
text_splitter = TokenTextSplitter(chunk_size=300, chunk_overlap=0) | ||
|
||
html = loader.load() | ||
docs = html2text.transform_documents(html) | ||
docs = text_splitter.split_documents(docs) | ||
|
||
# embedding search | ||
db = FAISS.from_documents(docs, self.embeddings) | ||
# query prefix is used per instructions https://github.com/FlagOpen/FlagEmbedding | ||
relevant_sections = await db.asimilarity_search(query=("Represent this sentence for searching relevant passages: " + query), k=12) | ||
|
||
# rerank | ||
if hasattr(self, "reranker"): | ||
scores = self.reranker.compute_score([[query, section.page_content] for section in relevant_sections]) | ||
# if there's only section, scores is a single score, not a list | ||
if isinstance(scores, float): | ||
scores = [scores] | ||
|
||
scores_with_index = zip(scores, range(len(scores))) | ||
scores_with_index = sorted(scores_with_index, key=lambda x: x[0], reverse=True) | ||
relevant_sections = [relevant_sections[index] for _score, index in scores_with_index] | ||
|
||
logger.info("Reranked webpage sections, different from original order: " + str([index for _score, index in scores_with_index]) + " Chunk count: " + str(len(docs))) | ||
|
||
# format sections together to be used as input to the LLM | ||
relevant_sections = "\n".join([f'"{section.page_content}"' for section in relevant_sections[:3]]) | ||
|
||
# summarize the relevant sections | ||
summary = await llm_chain.arun({"query": query, "doc": relevant_sections}) | ||
return summary | ||
except Exception as e: | ||
logger.error("Error loading HTML:", e) | ||
return f"Error loading HTML: {e}" | ||
|
||
|
||
search_generation_schemas = [ | ||
ResponseSchema(name="Reasoning", description="Reasoning behind what google query would be best to find information to answer the question"), | ||
ResponseSchema(name="Search Query", description="The google query that would be best to find information to answer the question. DO NOT USE ANY QUOTES OR OTHER SPECIAL CHARACTERS ANYWHERE."), | ||
] | ||
search_generation_output_parser = StructuredOutputParser.from_response_schemas(search_generation_schemas) | ||
|
||
search_ready_schemas = [ | ||
ResponseSchema(name="Reasoning", description="Reasoning behind whether or not a google search would be necessary to effectively answer the question."), | ||
ResponseSchema(name="Search", description="<true/false> whether or not a google search should be used to find information to answer the question."), | ||
] | ||
search_ready_output_parser = StructuredOutputParser.from_response_schemas(search_ready_schemas) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inferences for the search tool are all done sequentially so there isn't any added benefit of running it with async and it changes up the interface elsewhere the respond method is used.
For consistency we should remove those. I'm also wondering if the logic for tools should be encoded within the respond method or the
BloomChain
class should have a set of methods for tool support then we just add that to thechat
andstream
methodsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I did include a synchronous implementation for the search tool, but right now it's set up to use the async implementation which does each inference async and is much faster from my testing.
I agree on the interface issue, do you think using threads instead here might be cleaner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now,
nest_asyncio
should work to make the search_tool call sync. It seems like Langchain doesn't really work with threading yet.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also decoupled the search logic from respond, so feel free to let me know your thoughts on that too!