Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Failure in the langchain multimodal rag example #390

Open
jperez999 opened this issue Jan 30, 2025 · 0 comments
Open

[BUG]: Failure in the langchain multimodal rag example #390

jperez999 opened this issue Jan 30, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@jperez999
Copy link
Collaborator

jperez999 commented Jan 30, 2025

Version

main

Which installation method(s) does this occur on?

No response

Describe the bug.

When running the example https://github.com/NVIDIA/nv-ingest/blob/main/examples/langchain_multimodal_rag.ipynb in a brev.dev environment fails to correctly run the ingest with an error like this:

'text' parameter is deprecated and will be ignored. Future versions will remove this argument.
'tables' parameter is deprecated and will be ignored. Future versions will remove this argument.
Error while processing job ID 0: ../data/multimodal_test.pdf
[]: failed
Failed to process the message.
↪ Event that caused this failure: annotation::1bf595f6-2274-4097-8956-0d9b6841ed55 -> All images must have the same dimensions for gRPC batching. Found: [(532, 963, 3), (575, 970, 3)]
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
Cell In[11], line 17
      1 from nv_ingest_client.client import Ingestor
      3 ingestor = (
      4     Ingestor(message_client_hostname="localhost")
      5     .files("../data/multimodal_test.pdf")
   (...)
     14     ).vdb_upload()
     15 )
---> 17 results = ingestor.ingest()

File ~/.local/lib/python3.10/site-packages/nv_ingest_client/client/interface.py:228, in Ingestor.ingest(self, **kwargs)
    226 result = self._client.fetch_job_result(self._job_ids, **fetch_kwargs)
    227 if self._vdb_bulk_upload:
--> 228     self._vdb_bulk_upload.run(result)
    229     # only upload as part of jobs user specified this action
    230     self._vdb_bulk_upload = None

File ~/.local/lib/python3.10/site-packages/nv_ingest_client/util/milvus.py:95, in MilvusOperator.run(self, records)
     93 if isinstance(collection_name, str):
     94     create_nvingest_collection(collection_name, **create_params)
---> 95     write_to_nvingest_collection(records, collection_name, **write_params)
     96 elif isinstance(collection_name, dict):
     97     split_params_list = _dict_to_params(collection_name, write_params)

File ~/.local/lib/python3.10/site-packages/nv_ingest_client/util/milvus.py:570, in write_to_nvingest_collection(records, collection_name, milvus_uri, minio_endpoint, sparse, enable_text, enable_charts, enable_tables, enable_images, bm25_save_path, compute_bm25_stats, access_key, secret_key, bucket_name)
    568 bm25_ef = None
    569 if sparse and compute_bm25_stats:
--> 570     bm25_ef = create_bm25_model(
    571         records,
    572         enable_text=enable_text,
    573         enable_charts=enable_charts,
    574         enable_tables=enable_tables,
    575         enable_images=enable_images,
    576     )
    577     bm25_ef.save(bm25_save_path)
    578 elif sparse and not compute_bm25_stats:

File ~/.local/lib/python3.10/site-packages/nv_ingest_client/util/milvus.py:456, in create_bm25_model(records, enable_text, enable_charts, enable_tables, enable_images)
    453 analyzer = build_default_analyzer(language="en")
    454 bm25_ef = BM25EmbeddingFunction(analyzer)
--> 456 bm25_ef.fit(all_text)
    457 return bm25_ef

File ~/.local/lib/python3.10/site-packages/milvus_model/sparse/bm25/bm25.py:126, in BM25EmbeddingFunction.fit(self, corpus)
    125 def fit(self, corpus: List[str]):
--> 126     self._rebuild(corpus)

File ~/.local/lib/python3.10/site-packages/milvus_model/sparse/bm25/bm25.py:112, in BM25EmbeddingFunction._rebuild(self, corpus)
    110 self._clear()
    111 corpus = self._tokenize_corpus(corpus)
--> 112 term_document_frequencies = self._compute_statistics(corpus)
    113 self._calc_idf(term_document_frequencies)
    114 self._calc_term_indices()

File ~/.local/lib/python3.10/site-packages/milvus_model/sparse/bm25/bm25.py:80, in BM25EmbeddingFunction._compute_statistics(self, corpus)
     78         term_document_frequencies[word] += 1
     79     self.corpus_size += 1
---> 80 self.avgdl = total_word_count / self.corpus_size
     81 return term_document_frequencies

ZeroDivisionError: division by zero

The raised error is from milvus during the sparse vector calculation because no chunks of text are passed to this stage. The real error seems to have something to do with the resizing of images.

@jperez999 jperez999 added the bug Something isn't working label Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant