[BUG]: Failure in the langchain multimodal rag example #390

jperez999 · 2025-01-30T18:59:44Z

Version

main

Which installation method(s) does this occur on?

No response

Describe the bug.

When running the example https://github.com/NVIDIA/nv-ingest/blob/main/examples/langchain_multimodal_rag.ipynb in a brev.dev environment fails to correctly run the ingest with an error like this:

'text' parameter is deprecated and will be ignored. Future versions will remove this argument.
'tables' parameter is deprecated and will be ignored. Future versions will remove this argument.
Error while processing job ID 0: ../data/multimodal_test.pdf
[]: failed
Failed to process the message.
↪ Event that caused this failure: annotation::1bf595f6-2274-4097-8956-0d9b6841ed55 -> All images must have the same dimensions for gRPC batching. Found: [(532, 963, 3), (575, 970, 3)]
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
Cell In[11], line 17
      1 from nv_ingest_client.client import Ingestor
      3 ingestor = (
      4     Ingestor(message_client_hostname="localhost")
      5     .files("../data/multimodal_test.pdf")
   (...)
     14     ).vdb_upload()
     15 )
---> 17 results = ingestor.ingest()

File ~/.local/lib/python3.10/site-packages/nv_ingest_client/client/interface.py:228, in Ingestor.ingest(self, **kwargs)
    226 result = self._client.fetch_job_result(self._job_ids, **fetch_kwargs)
    227 if self._vdb_bulk_upload:
--> 228     self._vdb_bulk_upload.run(result)
    229     # only upload as part of jobs user specified this action
    230     self._vdb_bulk_upload = None

File ~/.local/lib/python3.10/site-packages/nv_ingest_client/util/milvus.py:95, in MilvusOperator.run(self, records)
     93 if isinstance(collection_name, str):
     94     create_nvingest_collection(collection_name, **create_params)
---> 95     write_to_nvingest_collection(records, collection_name, **write_params)
     96 elif isinstance(collection_name, dict):
     97     split_params_list = _dict_to_params(collection_name, write_params)

File ~/.local/lib/python3.10/site-packages/nv_ingest_client/util/milvus.py:570, in write_to_nvingest_collection(records, collection_name, milvus_uri, minio_endpoint, sparse, enable_text, enable_charts, enable_tables, enable_images, bm25_save_path, compute_bm25_stats, access_key, secret_key, bucket_name)
    568 bm25_ef = None
    569 if sparse and compute_bm25_stats:
--> 570     bm25_ef = create_bm25_model(
    571         records,
    572         enable_text=enable_text,
    573         enable_charts=enable_charts,
    574         enable_tables=enable_tables,
    575         enable_images=enable_images,
    576     )
    577     bm25_ef.save(bm25_save_path)
    578 elif sparse and not compute_bm25_stats:

File ~/.local/lib/python3.10/site-packages/nv_ingest_client/util/milvus.py:456, in create_bm25_model(records, enable_text, enable_charts, enable_tables, enable_images)
    453 analyzer = build_default_analyzer(language="en")
    454 bm25_ef = BM25EmbeddingFunction(analyzer)
--> 456 bm25_ef.fit(all_text)
    457 return bm25_ef

File ~/.local/lib/python3.10/site-packages/milvus_model/sparse/bm25/bm25.py:126, in BM25EmbeddingFunction.fit(self, corpus)
    125 def fit(self, corpus: List[str]):
--> 126     self._rebuild(corpus)

File ~/.local/lib/python3.10/site-packages/milvus_model/sparse/bm25/bm25.py:112, in BM25EmbeddingFunction._rebuild(self, corpus)
    110 self._clear()
    111 corpus = self._tokenize_corpus(corpus)
--> 112 term_document_frequencies = self._compute_statistics(corpus)
    113 self._calc_idf(term_document_frequencies)
    114 self._calc_term_indices()

File ~/.local/lib/python3.10/site-packages/milvus_model/sparse/bm25/bm25.py:80, in BM25EmbeddingFunction._compute_statistics(self, corpus)
     78         term_document_frequencies[word] += 1
     79     self.corpus_size += 1
---> 80 self.avgdl = total_word_count / self.corpus_size
     81 return term_document_frequencies

ZeroDivisionError: division by zero

The raised error is from milvus during the sparse vector calculation because no chunks of text are passed to this stage. The real error seems to have something to do with the resizing of images.

The text was updated successfully, but these errors were encountered:

jperez999 added the bug Something isn't working label Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Failure in the langchain multimodal rag example #390

[BUG]: Failure in the langchain multimodal rag example #390

jperez999 commented Jan 30, 2025 •

edited

Loading

[BUG]: Failure in the langchain multimodal rag example #390

[BUG]: Failure in the langchain multimodal rag example #390

Comments

jperez999 commented Jan 30, 2025 • edited Loading

Version

Which installation method(s) does this occur on?

Describe the bug.

jperez999 commented Jan 30, 2025 •

edited

Loading