add govtech evaluation guideline and star history in readme

SylphAI-Inc · Sep 11, 2024 · 0a77b4b · 0a77b4b
1 parent 53d5bb0
commit 0a77b4b
Show file tree

Hide file tree

Showing 3 changed files with 19 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -242,3 +242,8 @@ Many existing works greatly inspired AdalFlow library! Here is a non-exhaustive
   url = {https://github.com/SylphAI-Inc/LightRAG}
 }
 ```
+
+# Star History
+
+
+[![Star History Chart](https://api.star-history.com/svg?repos=SylphAI-Inc/AdalFlow&type=Date)](https://star-history.com/#SylphAI-Inc/AdalFlow&Date)
diff --git a/docs/source/tutorials/evaluation.rst b/docs/source/tutorials/evaluation.rst
@@ -544,6 +544,7 @@ There is one new way is to indirectly use the ground truth answers from the gene
 
     Recall = [GT statements that can be attributed to the retrieved context] / [GT statements]
 
+There is also **Context Relevance** and **Context Precision** metrics in RAGAS.
 
 LLM or model based judge for Retriever Recall
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -577,6 +578,10 @@ See the evaluation on datasets at :doc:`Evaluating a RAG Pipeline <../tutorials/
 
 Additionally, there are more research for RAG evaluation, such as SemScore [13]_, ARES [14]_, RGB [15]_, etc.
 
+.. note::
+
+    GovTech Singapore provides a well-explained evaluation guideline [22]_ that aligns with our guideline but with more thereotical explanation on some metrics.
+
 
 For Contributors
 ------------------------------------------
@@ -614,6 +619,7 @@ References
 .. [19] Liu, Yang, et al. "Datasets for large language models: A comprehensive survey." arXiv preprint arXiv:2402.18041 (2024).
 .. [20] ROUGE Deep dive: https://medium.com/nlplanet/two-minutes-nlp-learn-the-rouge-metric-by-examples-f179cc285499
 .. [21] Zhu, Kunlun, et al. "RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework." arXiv preprint arXiv:2408.01262 (2024).
+.. [22] https://playbooks.capdev.govtext.gov.sg/evaluation/
 
 .. admonition:: AdalFlow Eval API Reference
    :class: highlight

diff --git a/docs/source/use_cases/build_a_rag.rst b/docs/source/use_cases/build_a_rag.rst
@@ -48,6 +48,14 @@ For each use case, we need to answer:
 
 8. What about finetuning? How to do it and would it be more token efficient or more effective?
 
+First RAG Paper
+------------------
+RAG was introduced in 2020 by Lewis et al. [1]_ which is an architecture that finetunes both the query encoder (bi-encoder like most embedding models) and the generator (LLM) jointly with only final answer supervision.
+It did not mention document chunking as most of the time, their text length is usally short and also fits into the context length of the embedding models.
+As both the embedding model and LLM model scales up in terms of knowledge and parameters (400M LLM model used in the paper), RAG can achieve high performance in few-shot (prompt engineering) setup without the finetune.
+
+RAG Playbook
+------------------
 
 References
 ------------------------------------------