modelscope · DavdGao · Jun 12, 2024 · Apr 11, 2024 · Apr 11, 2024 · Apr 15, 2024
diff --git a/README.md b/README.md
@@ -31,6 +31,8 @@ Welcome to join our community on
 
 ## News
 
+- <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-06-11]** The RAG functionality is available for agents in **AgentScope** now! [**A quick introduction to RAG in AgentScope**](https://modelscope.github.io/agentscope/en/tutorial/210-rag.html) can help you equip your agent with external knowledge!
+
 - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-06-09]** We release **AgentScope** v0.0.5 now! In this new version, [**AgentScope Workstation**](https://modelscope.github.io/agentscope/en/tutorial/209-gui.html) is open-sourced with the refactored [**AgentScope Studio**](https://modelscope.github.io/agentscope/en/tutorial/209-gui.html)!
 
 <h5 align="center">

diff --git a/README_ZH.md b/README_ZH.md
@@ -27,6 +27,7 @@
 | <img src="https://gw.alicdn.com/imgextra/i1/O1CN01hhD1mu1Dd3BWVUvxN_!!6000000000238-2-tps-400-400.png" width="100" height="100"> | <img src="https://img.alicdn.com/imgextra/i2/O1CN01tuJ5971OmAqNg9cOw_!!6000000001747-0-tps-444-460.jpg" width="100" height="100"> |
 
 ## 新闻
+- <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-06-11]** RAG功能现在已经整合进 **AgentScope** 中! 大家可以根据 [**简要介绍AgentScope中的RAG**](https://modelscope.github.io/agentscope/en/tutorial/210-rag.html) ，让自己的agent用上外部知识!
 
 - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-06-09]** AgentScope v0.0.5 已经更新！在这个新版本中，我们开源了 [**AgentScope Workstation**](https://modelscope.github.io/agentscope/en/tutorial/209-gui.html)！
 

diff --git a/docs/sphinx_doc/en/source/tutorial/210-rag.md b/docs/sphinx_doc/en/source/tutorial/210-rag.md
@@ -0,0 +1,197 @@
+(210-rag-en)=
+
+# A Quick Introduction to RAG in AgentScope
+
+We want to introduce three concepts related to RAG in AgentScope: Knowledge, KnowledgeBank and RAG agent.
+
+### Knowledge
+The Knowledge modules (now only `LlamaIndexKnowledge`; support for LangChain will come soon) are responsible for handling all RAG-related operations.
+
+#### How to create a Knowledge object
+  A Knowledge object can be created with a JSON configuration to specify 1) data path, 2) data loader, 3) data preprocessing methods, and 4) embedding model (model config name).
+  A detailed example can refer to the following:
+  <details>
+  <summary> A detailed example of Knowledge object configuration </summary>
+
+  ```json
+  [
+  {
+    "knowledge_id": "{your_knowledge_id}",
+    "emb_model_config_name": "{your_embed_model_config_name}",
+    "data_processing": [
+      {
+        "load_data": {
+          "loader": {
+            "create_object": true,
+            "module": "llama_index.core",
+            "class": "SimpleDirectoryReader",
+            "init_args": {
+              "input_dir": "{path_to_your_data_dir_1}",
+              "required_exts": [".md"]
+            }
+          }
+        }
+      },
+      {
+        "load_data": {
+          "loader": {
+            "create_object": true,
+            "module": "llama_index.core",
+            "class": "SimpleDirectoryReader",
+            "init_args": {
+              "input_dir": "{path_to_your_python_code_data_dir}",
+              "recursive": true,
+              "required_exts": [".py"]
+            }
+          }
+        },
+        "store_and_index": {
+          "transformations": [
+            {
+              "create_object": true,
+              "module": "llama_index.core.node_parser",
+              "class": "CodeSplitter",
+              "init_args": {
+                "language": "python",
+                "chunk_lines": 100
+              }
+            }
+          ]
+        }
+      }
+    ]
+  }
+  ]
+  ```
+
+  </details>
+
+#### More about knowledge configurations
+The aforementioned configuration is usually saved as a JSON file, it musts
+contain the following key attributes,
+* `knowledge_id`: a unique identifier of the knowledge;
+* `emb_model_config_name`: the name of the embedding model;
+* `chunk_size`: default chunk size for the document transformation (node parser);
+* `chunk_overlap`: default chunk overlap for each chunk (node);
+* `data_processing`: a list of data processing methods.
+
+##### Using LlamaIndexKnowledge as an example
+
+Regarding the last attribute `data_processing`, each entry of the list (which is a dict) configures a data
+loader object that loads the needed data (i.e. `load_data`),
+and a transformation object that the process the loaded data (`store_and_index`).
+Accordingly, one may load data from multiple sources (with different data loaders),
+process with individually defined manners (i.e. transformation or node parser),
+and merge the processed data into a single index for later retrieval.
+For more information about the components, please refer to
+[LlamaIndex-Loading Data](https://docs.llamaindex.ai/en/stable/module_guides/loading/).
+In common, we need to set the following attributes
+* `create_object`: indicates whether to create a new object, must be true in this case;
+* `module`: where the class is located;
+* `class`: the name of the class.
+
+More specifically, for setting the `load_data`, you can use a wide collection of data loaders,
+    such as `SimpleDirectoryReader` (in `class`), provided by Llama-index, to load a various collection of data types
+    (e.g. txt, pdf, html, py, md, etc.). Regarding this data loader, you can set the following attributes
+* `input_dir`: the path to the data directory;
+* `required_exts`: the file extensions that the data loader will load.
+
+For more information about the data loaders, please refer to [here](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/)
+
+For `store_and_index`, it is optional and if it is not specified, the default transformation (a.k.a. node parser) is `SentenceSplitter`. For some specific node parser such as `CodeSplitter`, users can set the following attributes:
+* `language`: the language of the code;
+* `chunk_lines`: the number of lines for each of the code chunk.
+
+For more information about the node parsers, please refer to [here](https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/).
+
+
+If users want to avoid the detailed configuration, we also provide a quick way in `KnowledgeBank` (see the following).
+
+#### How to use a Knowledge object
+After a knowledge object is created successfully, users can retrieve information related to their queries by calling `.retrieve(...)` function.
+The `.retrieve` function accepts at least three basic parameters:
+* `query`: input that will be matched in the knowledge;
+* `similarity_top_k`: how many most similar "data blocks" will be returned;
+* `to_list_strs`: whether return the retrieved information as strings.
+
+*Advanaced:* In `LlamaIndexKnowledge`, it also supports users passing their own retriever to retrieve from knowledge.
+
+#### More details inside `LlamaIndexKnowledge`
+Here, we will use `LlamaIndexKnowledge` as an example to illustrate the operation within the `Knowledge` module.
+When a `LlamaIndexKnowledge` object is initialized, the `LlamaIndexKnowledge.__init__` will go through the following steps:
+  *  It processes data and prepare for retrieval in `LlamaIndexKnowledge._data_to_index(...)`, which includes
+      * loading the data `LlamaIndexKnowledge._data_to_docs(...)`;
+      * preprocessing the data with preprocessing methods (e.g., splitting) and embedding model `LlamaIndexKnowledge._docs_to_nodes(...)`;
+      * get ready for being query, i.e. generate indexing for the processed data.
+  * If the indexing already exists, then `LlamaIndexKnowledge._load_index(...)` will be invoked to load the index and avoid repeating embedding calls.
+</br>
+
+### Knowledge Bank
+The knowledge bank maintains a collection of Knowledge objects (e.g., on different datasets) as a set of *knowledge*. Thus,
+different agents can reuse the Knowledge object without unnecessary "re-initialization".
+Considering that configuring the Knowledge object may be too complicated for most users, the knowledge bank also provides an easy function call to create Knowledge objects.
+  * `KnowledgeBank.add_data_as_knowledge`: create Knowledge object. An easy way only requires to provide `knowledge_id`, `emb_model_name` and `data_dirs_and_types`.
+    As knowledge bank process files as `LlamaIndexKnowledge` by default, all text file types are supported, such as `.txt`, `.html`, `.md`, `.csv`, `.pdf` and all code file like `.py`.  File types other than the text can refer to [LlamaIndex document](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/).
+  ```python
+  knowledge_bank.add_data_as_knowledge(
+        knowledge_id="agentscope_tutorial_rag",
+        emb_model_name="qwen_emb_config",
+        data_dirs_and_types={
+            "../../docs/sphinx_doc/en/source/tutorial": [".md"],
+        },
+    )
+  ```
+  More advance initialization, users can still pass a knowledge config as a parameter `knowledge_config`:
+  ```python
+  # load knowledge_config as dict
+  knowledge_bank.add_data_as_knowledge(
+      knowledge_id=knowledge_config["knowledge_id"],
+      emb_model_name=knowledge_config["emb_model_config_name"],
+      knowledge_config=knowledge_config,
+  )
+  ```
+* `KnowledgeBank.get_knowledge`: It accepts two parameters, `knowledge_id` and `duplicate`.
+  It will return a knowledge object with the provided `knowledge_id`; if `duplicate` is true, the return will be deep copied.
+* `KnowledgeBank.equip`: It accepts three parameters, `agent`, `knowledge_id_list` and `duplicate`.
+ The function will provide knowledge objects according to the `knowledge_id_list` and put them into `agent.knowledge_list`. If `duplicate` is true, the assigned knowledge object will be deep copied first.
+
+
+
+
+### RAG agent
+RAG agent is an agent that can generate answers based on the retrieved knowledge.
+  * Agent using RAG: a RAG agent has a list of knowledge objects (`knowledge_list`).
+    * RAG agent can be initialized with a `knowledge_list`
+      ```python
+        knowledge = knowledge_bank.get_knowledge(knowledge_id)
+        agent = LlamaIndexAgent(
+            name="rag_worker",
+            sys_prompt="{your_prompt}",
+            model_config_name="{your_model}",
+            knowledge_list=[knowledge], # provide knowledge object directly
+            similarity_top_k=3,
+            log_retrieval=False,
+            recent_n_mem_for_retrieve=1,
+        )
+      ```
+    * If RAG agent is build with a configurations with `knowledge_id_list` specified, agent can load specific knowledge from a `KnowledgeBank` by passing it and a list ids into the `KnowledgeBank.equip` function.
+       ```python
+          # >>> agent.knowledge_list
+          # >>> []
+          knowledge_bank.equip(agent, agent.knowledge_id_list)
+          # >>> agent.knowledge_list
+          # [<LlamaIndexKnowledge object at 0x16e516fb0>]
+      ```
+  * Agent can use the retrieved knowledge in the `reply` function and compose their prompt to LLMs.
+
+
+
+**Building RAG agent yourself.** As long as you provide a list of knowledge id, you can pass it with your agent to the `KnowledgeBank.equip`.
+Your agent will be equipped with a list of knowledge according to the `knowledge_id_list`.
+You can decide how to use the retrieved content and even update and refresh the index in your agent's `reply` function.
+
+
+[[Back to the top]](#210-rag-en)
+
+
+
diff --git a/docs/sphinx_doc/en/source/tutorial/main.md b/docs/sphinx_doc/en/source/tutorial/main.md
@@ -22,6 +22,7 @@ AgentScope is an innovative multi-agent platform designed to empower developers
 - [Pipeline and MsgHub](202-pipeline.md)
 - [Distribution](208-distribute.md)
 - [AgentScope Studio](209-gui.md)
+- [Retrieval Augmented Generation (RAG)](210-rag.md)
 - [Logging](105-logging.md)
 - [Monitor](207-monitor.md)
 - [Example: Werewolf Game](104-usecase.md)
-Original file line number
+Diff line change
@@ Expand Up / @@ -31,6 +31,8 @@ Welcome to join our community on @@
     ## News
+    - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-06-11]** The RAG functionality is available for agents in **AgentScope** now! [**A quick introduction to RAG in AgentScope**](https://modelscope.github.io/agentscope/en/tutorial/210-rag.html) can help you equip your agent with external knowledge!
     - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-06-09]** We release **AgentScope** v0.0.5 now! In this new version, [**AgentScope Workstation**](https://modelscope.github.io/agentscope/en/tutorial/209-gui.html) is open-sourced with the refactored [**AgentScope Studio**](https://modelscope.github.io/agentscope/en/tutorial/209-gui.html)!
     <h5 align="center">
@@ Expand Down @@