Skip to content

Commit

Permalink
add govtech evaluation guideline and star history in readme
Browse files Browse the repository at this point in the history
  • Loading branch information
liyin2015 committed Sep 11, 2024
1 parent 53d5bb0 commit 0a77b4b
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 0 deletions.
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -242,3 +242,8 @@ Many existing works greatly inspired AdalFlow library! Here is a non-exhaustive
url = {https://github.com/SylphAI-Inc/LightRAG}
}
```

# Star History


[![Star History Chart](https://api.star-history.com/svg?repos=SylphAI-Inc/AdalFlow&type=Date)](https://star-history.com/#SylphAI-Inc/AdalFlow&Date)
6 changes: 6 additions & 0 deletions docs/source/tutorials/evaluation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -544,6 +544,7 @@ There is one new way is to indirectly use the ground truth answers from the gene
Recall = [GT statements that can be attributed to the retrieved context] / [GT statements]
There is also **Context Relevance** and **Context Precision** metrics in RAGAS.
LLM or model based judge for Retriever Recall
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -577,6 +578,10 @@ See the evaluation on datasets at :doc:`Evaluating a RAG Pipeline <../tutorials/
Additionally, there are more research for RAG evaluation, such as SemScore [13]_, ARES [14]_, RGB [15]_, etc.
.. note::
GovTech Singapore provides a well-explained evaluation guideline [22]_ that aligns with our guideline but with more thereotical explanation on some metrics.
For Contributors
------------------------------------------
Expand Down Expand Up @@ -614,6 +619,7 @@ References
.. [19] Liu, Yang, et al. "Datasets for large language models: A comprehensive survey." arXiv preprint arXiv:2402.18041 (2024).
.. [20] ROUGE Deep dive: https://medium.com/nlplanet/two-minutes-nlp-learn-the-rouge-metric-by-examples-f179cc285499
.. [21] Zhu, Kunlun, et al. "RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework." arXiv preprint arXiv:2408.01262 (2024).
.. [22] https://playbooks.capdev.govtext.gov.sg/evaluation/
.. admonition:: AdalFlow Eval API Reference
:class: highlight
Expand Down
8 changes: 8 additions & 0 deletions docs/source/use_cases/build_a_rag.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,14 @@ For each use case, we need to answer:

8. What about finetuning? How to do it and would it be more token efficient or more effective?

First RAG Paper
------------------
RAG was introduced in 2020 by Lewis et al. [1]_ which is an architecture that finetunes both the query encoder (bi-encoder like most embedding models) and the generator (LLM) jointly with only final answer supervision.
It did not mention document chunking as most of the time, their text length is usally short and also fits into the context length of the embedding models.
As both the embedding model and LLM model scales up in terms of knowledge and parameters (400M LLM model used in the paper), RAG can achieve high performance in few-shot (prompt engineering) setup without the finetune.

RAG Playbook
------------------

References
------------------------------------------
Expand Down

0 comments on commit 0a77b4b

Please sign in to comment.