RAG with Retrieval of Top Documents and Answer Generation #33

mihir86 · 2024-03-28T06:16:43Z

Description

This pull request provides an implementation to RAGLanguageModel which retrieves the top documents for every query and passes it to the LM for generating an answer to the query, given the top document as context

References

Part of Compare accuracy on the natural questions dataset with and without RAG #3
References: https://github.com/castorini/pyserini

…n/retriever

neubig

Thanks a lot! This looks good, but I would suggest moving all of the pyserini specific parameters to be included in the initialization function of the datastore.

examples/rag/rag_example.ipynb

llments/datastore/datastore.py

llments/lm/rag.py

…n/retriever

neubig

This looks great! I'm ready to merge if you are (it's still listed as a "draft" PR so I'm not sure if you're finished).

neubig · 2024-04-16T12:10:51Z

llments/lm/rag.py

+            base (LanguageModel): The base language model to be modified.
+            datastore (Datastore): The datastore object for document index.
+            max_results (int, optional): Maximum number of results to retrieve. Defaults to 1.


Suggested change

base (LanguageModel): The base language model to be modified.

datastore (Datastore): The datastore object for document index.

max_results (int, optional): Maximum number of results to retrieve. Defaults to 1.

base: The base language model to be modified.

datastore: The datastore object for document index.

max_results: Maximum number of results to retrieve.

Let's not put types in the comments, since they're already in the function signature and duplicating them could move the two out of sync.

neubig · 2024-04-16T12:11:02Z

llments/lm/rag.py

+        """Read JSONL file and convert it into a dictionary with document ID as keys and contents as values.
+
+        Args:
+            file_path (str): Path to the JSONL file.


Suggested change

file_path (str): Path to the JSONL file.

file_path: Path to the JSONL file.

neubig · 2024-04-16T12:12:01Z

llments/lm/rag.py

+        context = ' '.join([self.doc_dict[str(key.docid)] for key in top_docs])
+        prompt = None
+        if condition is not None:
+            prompt = "Please answer the following question, given its context.\nQuestion: " + condition + "\nContext: " + context + "\nAnswer: "


We don't need to fix it in this PR, but could you open an issue to make the prompt for RAGLanguageModel configurable?

mihir86 added 2 commits March 28, 2024 02:12

Initial changes

6e01dba

Merge branch 'main' of https://github.com/neulab/llments into mihirba…

7a2ac0d

…n/retriever

neubig marked this pull request as draft March 28, 2024 13:27

mihir86 added 3 commits March 31, 2024 22:26

Completing RAG pipeline v1

ef51974

Merge branch 'main' of https://github.com/neulab/llments into mihirba…

5b9bf96

…n/retriever

fixes

b6bdaea

mihir86 changed the title ~~Retrieval of Top Documents using Pyserini~~ RAG with Retrieval of Top Documents and Answer Generation Apr 1, 2024

mihir86 added 10 commits March 31, 2024 22:40

fixes

378549a

fixes

41043fc

fixes

61d5cd4

fixes

feb486f

fixes

164091b

fixes

a6f1f8c

fixes

4fa7ce4

fixes

5f2ec31

fixes

b998804

fixes

83178f4

mihir86 requested a review from neubig April 1, 2024 05:07

neubig requested changes Apr 2, 2024

View reviewed changes

examples/rag/rag_example.ipynb Outdated Show resolved Hide resolved

llments/datastore/datastore.py Outdated Show resolved Hide resolved

llments/lm/rag.py Outdated Show resolved Hide resolved

mihir86 added 3 commits April 15, 2024 17:01

Merge branch 'main' of https://github.com/neulab/llments into mihirba…

d604bcd

…n/retriever

fixing comments

e463293

fixing comments

b1b6582

mihir86 requested a review from neubig April 15, 2024 21:45

neubig approved these changes Apr 16, 2024

View reviewed changes

mihir86 closed this Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAG with Retrieval of Top Documents and Answer Generation #33

RAG with Retrieval of Top Documents and Answer Generation #33

mihir86 commented Mar 28, 2024

neubig left a comment

neubig left a comment

neubig Apr 16, 2024

neubig Apr 16, 2024

neubig Apr 16, 2024

	file_path (str): Path to the JSONL file.
	file_path: Path to the JSONL file.

RAG with Retrieval of Top Documents and Answer Generation #33

RAG with Retrieval of Top Documents and Answer Generation #33

Conversation

mihir86 commented Mar 28, 2024

Description

References

neubig left a comment

Choose a reason for hiding this comment

neubig left a comment

Choose a reason for hiding this comment

neubig Apr 16, 2024

Choose a reason for hiding this comment

neubig Apr 16, 2024

Choose a reason for hiding this comment

neubig Apr 16, 2024

Choose a reason for hiding this comment