modelscope · DavdGao · Jun 12, 2024 · Apr 11, 2024 · Apr 11, 2024 · Apr 15, 2024
diff --git a/docs/sphinx_doc/en/source/tutorial/209-rag.md b/docs/sphinx_doc/en/source/tutorial/209-rag.md
@@ -0,0 +1,126 @@
+(209-rag-en)=
+
+# A Quick Introduction to RAG in AgentScope
+
+We want to introduce three concepts related to RAG in AgentScope: Knowledge, KnowledgeBank and RAG agent.
+
+### Knowledge
+The Knowledge modules (now only `LlamaIndexKnowledge`; support for LangChain will come soon) are responsible for handling all RAG-related operations.
+
+Here, we will use `LlamaIndexKnowledge` as an example to illustrate the operation within the `Knowledge` module.
+When a `LlamaIndexKnowledge` object is initialized, the `LlamaIndexKnowledge.__init__` will go through the following steps:
+  *  It processes data and prepare for retrieval in `LlamaIndexKnowledge._data_to_index(...)`, which includes
+      * loading the data `LlamaIndexKnowledge._data_to_docs(...)`;
+      * preprocessing the data with preprocessing methods (e.g., splitting) and embedding model `LlamaIndexKnowledge._docs_to_nodes(...)`;
+      * get ready for being query, i.e. generate indexing for the processed data.
+  * If the indexing already exists, then `LlamaIndexKnowledge._load_index(...)` will be invoked to load the index and avoid repeating embedding calls.
+
+  A Knowledge object can be created with a JSON configuration to specify 1) data path, 2) data loader, 3) data preprocessing methods, and 4) embedding model (model config name).
+  A detailed example can refer to the following:
+  <details>
+  <summary> A detailed example of Knowledge object configuration </summary>
+
+  ```json
+  [
+  {
+    "knowledge_id": "{your_knowledge_id}",
+    "emb_model_config_name": "{your_embed_model_config_name}",
+    "data_processing": [
+      {
+        "load_data": {
+          "loader": {
+            "create_object": true,
+            "module": "llama_index.core",
+            "class": "SimpleDirectoryReader",
+            "init_args": {
+              "input_dir": "{path_to_your_data_dir_1}",
+              "required_exts": [".md"]
+            }
+          }
+        }
+      },
+      {
+        "load_data": {
+          "loader": {
+            "create_object": true,
+            "module": "llama_index.core",
+            "class": "SimpleDirectoryReader",
+            "init_args": {
+              "input_dir": "{path_to_your_python_code_data_dir}",
+              "recursive": true,
+              "required_exts": [".py"]
+            }
+          }
+        },
+        "store_and_index": {
+          "transformations": [
+            {
+              "create_object": true,
+              "module": "llama_index.core.node_parser",
+              "class": "CodeSplitter",
+              "init_args": {
+                "language": "python",
+                "chunk_lines": 100
+              }
+            }
+          ]
+        }
+      }
+    ]
+  }
+  ]
+  ```
+
+  </details>
+
+If users want to avoid the detailed configuration, we also provide a quick way in `KnowledgeBank` (see the following).
+  </br>
+
+### Knowledge Bank
+The knowledge bank maintains a collection of Knowledge objects (e.g., on different datasets) as a set of *knowledge*. Thus,
+different agents can reuse the Knowledge object without unnecessary "re-initialization".
+Considering that configuring the Knowledge object may be too complicated for most users, the knowledge bank also provides an easy function call to create Knowledge objects.
+  * `KnowledgeBank.add_data_as_knowledge`: create Knowledge object. An easy way only requires to provide `knowledge_id`, `emb_model_name` and `data_dirs_and_types`
+  ```python
+  knowledge_bank.add_data_as_knowledge(
+        knowledge_id="agentscope_tutorial_rag",
+        emb_model_name="qwen_emb_config",
+        data_dirs_and_types={
+            "../../docs/sphinx_doc/en/source/tutorial": [".md"],
+        },
+    )
+  ```
+  More advance initialization, users can still pass a knowledge config as a parameter `knowledge_config`:
+  ```python
+  # load knowledge_config as dict
+  knowledge_bank.add_data_as_knowledge(
+      knowledge_id=knowledge_config["knowledge_id"],
+      emb_model_name=knowledge_config["emb_model_config_name"],
+      knowledge_config=knowledge_config,
+  )
+  ```
+* `KnowledgeBank.get_knowledge`: It accepts two parameters, `knowledge_id` and `duplicate`.
+  It will return a knowledge object with the provided `knowledge_id`; if `duplicate` is true, the return will be deep copied.
+* `KnowledgeBank.equip`: It accepts two parameters, `agent` and `duplicate`.
+ The function will first check if the agent has `rag_config`; if so, it will provide the knowledge according to the
+ `knowledge_id` in the `rag_config` and initialize the retriever(s) for the agent.
+
+
+
+### RAG agent
+RAG agent is an agent that can generate answers based on the retrieved knowledge.
+  * Agent using RAG: RAG agent requires `rag_config` in its configuration, and there is a list of `knowledge_id`.
+  * Agent can load specific knowledge from a `KnowledgeBank` by passing it into the `KnowledgeBank.equip` function.
+  * Agent can use the retrievers in the `reply` function to retrieve from the `Knowledge` and compose their prompt to LLMs.
+
+
+
+**Building RAG agent yourself.** As long as your agent config has the `rag_config` attribute as a dict and there is a list of `knowledge_id`, you can pass it to the `KnowledgeBank.equip`.
+Your agent will be equipped with a list of knowledge according to the list of `knowledge_id` and the corresponding retrievers.
+You can decide how to use the retriever and even update and refresh the index in your agent's `reply` function.
+
+
+[[Back to the top]](#209-rag-en)
+
+
+
diff --git a/docs/sphinx_doc/zh_CN/source/tutorial/209-rag.md b/docs/sphinx_doc/zh_CN/source/tutorial/209-rag.md
@@ -0,0 +1,121 @@
+(209-rag-zh)=
+
+# 简要介绍AgentScope中的RAG
+
+我们在此介绍AgentScope与RAG相关的三个概念：知识（Knowledge），知识库（Knowledge Bank）和RAG 智能体。
+
+### Knowledge
+知识模块（目前仅有“LlamaIndexKnowledge”；即将提供对LangChain的支持）负责处理所有与RAG相关的操作。
+
+在这里，我们将使用`LlamaIndexKnowledge`作为示例，以说明在`Knowledge`模块内的操作。
+当初始化`LlamaIndexKnowledge`对象时，`LlamaIndexKnowledge.__init__`将执行以下步骤：
+  *  它处理数据并生成检索索引 (`LlamaIndexKnowledge._data_to_index(...)`中完成) 其中包括
+      * 加载数据 `LlamaIndexKnowledge._data_to_docs(...)`;
+      * 对数据进行预处理，使用预处理方法（比如分割）和向量模型生成向量  `LlamaIndexKnowledge._docs_to_nodes(...)`;
+      * 基于生成的向量做好被查询的准备， 即生成索引。
+  * 如果索引已经存在，则会调用 `LlamaIndexKnowledge._load_index(...)` 来加载索引，并避免重复的嵌入调用。
+
+ 用户可以使用JSON配置来创建一个Knowledge模块，以指定1）数据路径，2）数据加载器，3）数据预处理方法，以及4）嵌入模型（模型配置名称）。
+一个详细的示例可以参考以下内容：
+  <details>
+  <summary> 详细的配置示例 </summary>
+
+  ```json
+  [
+  {
+    "knowledge_id": "{your_knowledge_id}",
+    "emb_model_config_name": "{your_embed_model_config_name}",
+    "data_processing": [
+      {
+        "load_data": {
+          "loader": {
+            "create_object": true,
+            "module": "llama_index.core",
+            "class": "SimpleDirectoryReader",
+            "init_args": {
+              "input_dir": "{path_to_your_data_dir_1}",
+              "required_exts": [".md"]
+            }
+          }
+        }
+      },
+      {
+        "load_data": {
+          "loader": {
+            "create_object": true,
+            "module": "llama_index.core",
+            "class": "SimpleDirectoryReader",
+            "init_args": {
+              "input_dir": "{path_to_your_python_code_data_dir}",
+              "recursive": true,
+              "required_exts": [".py"]
+            }
+          }
+        },
+        "store_and_index": {
+          "transformations": [
+            {
+              "create_object": true,
+              "module": "llama_index.core.node_parser",
+              "class": "CodeSplitter",
+              "init_args": {
+                "language": "python",
+                "chunk_lines": 100
+              }
+            }
+          ]
+        }
+      }
+    ]
+  }
+  ]
+  ```
+
+  </details>
+
+如果用户想要避免详细的配置，我们也在`KnowledgeBank`中提供了一种快速的方式（请参阅以下内容）。
+  </br>
+
+### Knowledge Bank
+知识库将一组Knowledge模块（例如，来自不同数据集的知识）作为知识的集合进行维护。因此，不同的智能体可以在没有不必要的重新初始化的情况下重复使用知识模块。考虑到配置Knowledge模块可能对大多数用户来说过于复杂，知识库还提供了一个简单的函数调用来创建Knowledge模块。
+
+* `KnowledgeBank.add_data_as_knowledge`: 创建Knowledge模块。一种简单的方式只需要提供knowledge_id、emb_model_name和data_dirs_and_types。
+  ```python
+  knowledge_bank.add_data_as_knowledge(
+        knowledge_id="agentscope_tutorial_rag",
+        emb_model_name="qwen_emb_config",
+        data_dirs_and_types={
+            "../../docs/sphinx_doc/en/source/tutorial": [".md"],
+        },
+    )
+  ```
+  对于更高级的初始化，用户仍然可以将一个知识模块配置作为参数knowledge_config传递：
+  ```python
+  # load knowledge_config as dict
+  knowledge_bank.add_data_as_knowledge(
+      knowledge_id=knowledge_config["knowledge_id"],
+      emb_model_name=knowledge_config["emb_model_config_name"],
+      knowledge_config=knowledge_config,
+  )
+  ```
+* `KnowledgeBank.get_knowledge`: 它接受两个参数，knowledge_id和duplicate。
+  如果duplicate为true，则返回提供的knowledge_id对应的知识对象；否则返回深拷贝的对象。
+* `KnowledgeBank.equip`: 它接受两个参数，`agent`和`duplicate`。
+该函数首先会检查智能体是否具有rag_config；如果有，则根据rag_config中的knowledge_id提供相应的知识，并为智能体初始化检索器。
+`duplicate` 同样决定是否是深拷贝。
+
+
+### RAG 智能体
+RAG 智能体是可以基于检索到的知识生成答案的智能体。
+  * 让智能体使用RAG: RAG agent在其配置中需要`rag_config`，其中有一个`knowledge_id`的列表
+  * Agent可以通过将其传递给`KnowledgeBank.equip`函数来从`KnowledgeBank`加载特定的知识。
+  * Agent 智能体可以在`reply`函数中使用检索器(retriever)从`Knowledge`中检索，并将其提示组合到LLM中
+
+**自己搭建 RAG 智能体.** 只要您的智能体配置具有`rag_config`属性并且是字典型，里面有一个`knowledge_id`列表，您就可以将其传递给`KnowledgeBank.equip`,
+为它配置`knowledge_id`列表和相应的知识和检索器（retriever），您的智能体将配备一系列知识。
+您可以在`reply`函数中决定如何使用检索器，甚至更新和刷新索引。
+
+[[Back to the top]](#209-rag-zh)
+
+
+
diff --git a/examples/conversation_with_RAG_agents/README.md b/examples/conversation_with_RAG_agents/README.md
@@ -7,7 +7,6 @@ you will obtain three different agents who can help you answer different questio
 * **What is this example for?** By this example, we want to show how the agent with retrieval augmented generation (RAG)
 capability can be used to build easily.
 
-**Notice:** This example is a Beta version of the AgentScope RAG agent. A formal version will soon be added to `src/agentscope/agents`, but it may be subject to changes.
 
 ## Prerequisites
 * **Cloning repo:** This example requires cloning the whole AgentScope repo to local.
@@ -23,35 +22,28 @@ capability can be used to build easily.
 **Note:** This example has been tested with `dashscope_chat` and `dashscope_text_embedding` model wrapper, with `qwen-max` and `text-embedding-v2` models.
 However, you are welcome to replace the Dashscope language and embedding model wrappers or models with other models you like to test.
 
-## Start AgentScope Consultants
+## Start AgentScope Copilots
 * **Terminal:** The most simple way to execute the AgentScope Consultants is running in terminal.
   ```bash
   python ./rag_example.py
   ```
-  Setting `log_retrieval` to `false` in `agent_config.json` can hide the retrieved information and provide only answers of agents.
+
 
 * **AS studio:** If you want to have more organized, clean UI, you can also run with our `as_studio`.
   ```bash
   as_studio ./rag_example.py
   ```
 
-### Customize AgentScope Consultants to other consultants
+### Agents in the example
+Customize AgentScope Consultants to other consultants
 After you run the example, you may notice that this example consists of three RAG agents:
-* `AgentScope Tutorial Assistant`: responsible for answering questions based on AgentScope tutorials (markdown files).
-* `AgentScope Framework Code Assistant`: responsible for answering questions based on AgentScope code base (python files).
-* `Summarize Assistant`: responsible for summarize the questions from the above two agents.
-
-These agents can be configured to answering questions based on other GitHub repo, by simply modifying the `input_dir` fields in the `agent_config.json`.
-
-For more advanced customization, we may need to learn a little bit from the following.
+* `Tutorial-Assistant`: responsible for answering questions based on AgentScope tutorials (markdown files).
+* `Code-Search-Assistant`: responsible for answering questions based on AgentScope code base (python files).
+* `API-Assistant`: responsible for answering questions based on AgentScope API documents (html files, generated by `sphinx`)
+* `Searching-Assistant`: responsible for general search in tutorial and code base (markdown files and code files)
+* `Agent-Guiding-Assistant`: responsible for referring the correct agent(s) among the above ones.
 
-**RAG modules:** In AgentScope, RAG modules are abstract to provide three basic functions: `load_data`, `store_and_index` and `retrieve`. Refer to `src/agentscope/rag` for more details.
+Besides the last `Agent-Guiding-Assistant`, all other agents can be configured to answering questions based on other GitHub repo by replacing the `knowledge`.
 
-**RAG configs:** In the example configuration (the `rag_config` field), all parameters are optional. But if you want to customize them, you may want to learn the following:
-*  `load_data`: contains all parameters for the the `rag.load_data` function.
-Since the `load_data` accepts a dataloader object `loader`, the `loader` in the config need to have `"create_object": true` to let a internal parse create a LlamaIndex data loader object.
-The loader object is an instance of `class` in module `module`, with initialization parameters in `init_args`.
+For more details about how to use the RAG module in AgentScope, please refer to the tutorial.
 
-* `store_and_index`: contains all parameters for the the `rag.store_and_index` function.
-For example, you can pass `vector_store` and `retriever` configurations in a similar way as the `loader` mentioned above.
-For the `transformations` parameter, you can pass a list of dicts, each of which corresponds to building a `NodeParser`-kind of preprocessor in Llamaindex.
diff --git a/examples/conversation_with_RAG_agents/agent_config.json b/examples/conversation_with_RAG_agents/agent_config.json