Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"RAG with unstructured data", uses documents instead of docs / unused docs variable? #183

Open
xarical opened this issue Aug 25, 2024 · 0 comments

Comments

@xarical
Copy link

xarical commented Aug 25, 2024

https://github.com/huggingface/cookbook/blob/main/notebooks/en/rag_with_unstructured_data.ipynb

Quote"
Setting up the retriever
This example uses ChromaDB as a vector store and BAAI/bge-base-en-v1.5 embeddings model, feel free to use any other vector store.

from langchain_community.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings

from langchain.vectorstores import utils as chromautils

# ChromaDB doesn't support complex metadata, e.g. lists, so we drop it here.
# If you're using a different vector store, you may not need to do this
docs = chromautils.filter_complex_metadata(documents)

embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5")
vectorstore = Chroma.from_documents(documents, embeddings)
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3})

"

Should the documents parameter be replaced with docs on the second to last line, i.e. vectorstore = Chroma.from_documents(docs, embeddings)? Or is this intentional? I'm not familiar with Chroma (hence why I was using this tutorial), but I did wonder what the docs variable was for when going through the tutorial as it didn't seem to have been used anywhere. It seems like docs is a filtered version of documents, in which case it would be passed to from_documents (please correct me if that is not the case). If the docs variable is actually used somehow (in which case my bad), or if documents is meant to be replaced with docs as I think it might, it wasn't clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant