[QUESTION] What is the retrieval datasets when evaluating downstream tasks? #1172

ZihaoLin0123 · 2024-02-27T19:20:26Z

ZihaoLin0123
Feb 27, 2024

I read the InstructRetro paper and it said

For downstream task evaluation, we follow Retro (Borgeaud et al., 2022) and use task-specific corpus and state-of-the-art retrievers and DRAGON+ (Lin et al., 2023)) to retrieve the most relevant and high-quality information for the task. For NQ and TriviaQA, we use DPR to retrieve from Wikipedia. For doc2dial and other long-form QA datasets, we use DRAGON+.

Does this mean that for each QA dataset, you only use its corpus as retrieval data? For example, when you evaluate NewsQA, you first collect all the articles of NewsQA without considering other tasks' corpus such as SQuAD and Wikipedia, chunk them, and embed them. Then you input a question, and the retriever will retrieve related article chunks from the corpus. Finally, concatenate the retrieved chunks with the input and feed them into LLM. Is that correct?

2024-04-28T18:20:08Z

github-actions[bot]
bot Apr 28, 2024

Marking as stale. No activity in 60 days.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] What is the retrieval datasets when evaluating downstream tasks? #1172

{{title}}

Replies: 1 comment

{{title}}

Select a reply

[QUESTION] What is the retrieval datasets when evaluating downstream tasks? #1172

ZihaoLin0123 Feb 27, 2024

Replies: 1 comment

github-actions[bot] bot Apr 28, 2024

ZihaoLin0123
Feb 27, 2024

github-actions[bot]
bot Apr 28, 2024