diff --git a/README.md b/README.md index cbf153965..ba96458d6 100644 --- a/README.md +++ b/README.md @@ -242,3 +242,8 @@ Many existing works greatly inspired AdalFlow library! Here is a non-exhaustive url = {https://github.com/SylphAI-Inc/LightRAG} } ``` + +# Star History + + +[![Star History Chart](https://api.star-history.com/svg?repos=SylphAI-Inc/AdalFlow&type=Date)](https://star-history.com/#SylphAI-Inc/AdalFlow&Date) diff --git a/docs/source/tutorials/evaluation.rst b/docs/source/tutorials/evaluation.rst index 0d470143f..7e63e037e 100644 --- a/docs/source/tutorials/evaluation.rst +++ b/docs/source/tutorials/evaluation.rst @@ -544,6 +544,7 @@ There is one new way is to indirectly use the ground truth answers from the gene Recall = [GT statements that can be attributed to the retrieved context] / [GT statements] +There is also **Context Relevance** and **Context Precision** metrics in RAGAS. LLM or model based judge for Retriever Recall ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -577,6 +578,10 @@ See the evaluation on datasets at :doc:`Evaluating a RAG Pipeline <../tutorials/ Additionally, there are more research for RAG evaluation, such as SemScore [13]_, ARES [14]_, RGB [15]_, etc. +.. note:: + + GovTech Singapore provides a well-explained evaluation guideline [22]_ that aligns with our guideline but with more thereotical explanation on some metrics. + For Contributors ------------------------------------------ @@ -614,6 +619,7 @@ References .. [19] Liu, Yang, et al. "Datasets for large language models: A comprehensive survey." arXiv preprint arXiv:2402.18041 (2024). .. [20] ROUGE Deep dive: https://medium.com/nlplanet/two-minutes-nlp-learn-the-rouge-metric-by-examples-f179cc285499 .. [21] Zhu, Kunlun, et al. "RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework." arXiv preprint arXiv:2408.01262 (2024). +.. [22] https://playbooks.capdev.govtext.gov.sg/evaluation/ .. admonition:: AdalFlow Eval API Reference :class: highlight diff --git a/docs/source/use_cases/build_a_rag.rst b/docs/source/use_cases/build_a_rag.rst index dad41e9c9..32435027f 100644 --- a/docs/source/use_cases/build_a_rag.rst +++ b/docs/source/use_cases/build_a_rag.rst @@ -48,6 +48,14 @@ For each use case, we need to answer: 8. What about finetuning? How to do it and would it be more token efficient or more effective? +First RAG Paper +------------------ +RAG was introduced in 2020 by Lewis et al. [1]_ which is an architecture that finetunes both the query encoder (bi-encoder like most embedding models) and the generator (LLM) jointly with only final answer supervision. +It did not mention document chunking as most of the time, their text length is usally short and also fits into the context length of the embedding models. +As both the embedding model and LLM model scales up in terms of knowledge and parameters (400M LLM model used in the paper), RAG can achieve high performance in few-shot (prompt engineering) setup without the finetune. + +RAG Playbook +------------------ References ------------------------------------------