Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAG with Retrieval of Top Documents and Answer Generation #33

Closed
wants to merge 18 commits into from

Conversation

mihir86
Copy link
Collaborator

@mihir86 mihir86 commented Mar 28, 2024

Description

This pull request provides an implementation to RAGLanguageModel which retrieves the top documents for every query and passes it to the LM for generating an answer to the query, given the top document as context

References

@neubig neubig marked this pull request as draft March 28, 2024 13:27
@mihir86 mihir86 changed the title Retrieval of Top Documents using Pyserini RAG with Retrieval of Top Documents and Answer Generation Apr 1, 2024
@mihir86 mihir86 requested a review from neubig April 1, 2024 05:07
Copy link
Contributor

@neubig neubig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! This looks good, but I would suggest moving all of the pyserini specific parameters to be included in the initialization function of the datastore.

examples/rag/rag_example.ipynb Outdated Show resolved Hide resolved
llments/datastore/datastore.py Outdated Show resolved Hide resolved
llments/lm/rag.py Outdated Show resolved Hide resolved
@mihir86 mihir86 requested a review from neubig April 15, 2024 21:45
Copy link
Contributor

@neubig neubig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I'm ready to merge if you are (it's still listed as a "draft" PR so I'm not sure if you're finished).

Comment on lines +19 to +21
base (LanguageModel): The base language model to be modified.
datastore (Datastore): The datastore object for document index.
max_results (int, optional): Maximum number of results to retrieve. Defaults to 1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
base (LanguageModel): The base language model to be modified.
datastore (Datastore): The datastore object for document index.
max_results (int, optional): Maximum number of results to retrieve. Defaults to 1.
base: The base language model to be modified.
datastore: The datastore object for document index.
max_results: Maximum number of results to retrieve.

Let's not put types in the comments, since they're already in the function signature and duplicating them could move the two out of sync.

"""Read JSONL file and convert it into a dictionary with document ID as keys and contents as values.

Args:
file_path (str): Path to the JSONL file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
file_path (str): Path to the JSONL file.
file_path: Path to the JSONL file.

context = ' '.join([self.doc_dict[str(key.docid)] for key in top_docs])
prompt = None
if condition is not None:
prompt = "Please answer the following question, given its context.\nQuestion: " + condition + "\nContext: " + context + "\nAnswer: "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to fix it in this PR, but could you open an issue to make the prompt for RAGLanguageModel configurable?

@mihir86 mihir86 closed this Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants