-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RAG with Retrieval of Top Documents and Answer Generation #33
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! This looks good, but I would suggest moving all of the pyserini specific parameters to be included in the initialization function of the datastore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I'm ready to merge if you are (it's still listed as a "draft" PR so I'm not sure if you're finished).
base (LanguageModel): The base language model to be modified. | ||
datastore (Datastore): The datastore object for document index. | ||
max_results (int, optional): Maximum number of results to retrieve. Defaults to 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
base (LanguageModel): The base language model to be modified. | |
datastore (Datastore): The datastore object for document index. | |
max_results (int, optional): Maximum number of results to retrieve. Defaults to 1. | |
base: The base language model to be modified. | |
datastore: The datastore object for document index. | |
max_results: Maximum number of results to retrieve. |
Let's not put types in the comments, since they're already in the function signature and duplicating them could move the two out of sync.
"""Read JSONL file and convert it into a dictionary with document ID as keys and contents as values. | ||
|
||
Args: | ||
file_path (str): Path to the JSONL file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file_path (str): Path to the JSONL file. | |
file_path: Path to the JSONL file. |
context = ' '.join([self.doc_dict[str(key.docid)] for key in top_docs]) | ||
prompt = None | ||
if condition is not None: | ||
prompt = "Please answer the following question, given its context.\nQuestion: " + condition + "\nContext: " + context + "\nAnswer: " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to fix it in this PR, but could you open an issue to make the prompt for RAGLanguageModel
configurable?
Description
This pull request provides an implementation to RAGLanguageModel which retrieves the top documents for every query and passes it to the LM for generating an answer to the query, given the top document as context
References