diff --git a/README.md b/README.md
index 52fca47bc..1bc3fb733 100644
--- a/README.md
+++ b/README.md
@@ -31,6 +31,8 @@ Welcome to join our community on
 
 ## News
 
+- <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-06-11]** The RAG functionality is available for agents in **AgentScope** now! [**A quick introduction to RAG in AgentScope**](https://modelscope.github.io/agentscope/en/tutorial/210-rag.html) can help you equip your agent with external knowledge!
+
 - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-06-09]** We release **AgentScope** v0.0.5 now! In this new version, [**AgentScope Workstation**](https://modelscope.github.io/agentscope/en/tutorial/209-gui.html) is open-sourced with the refactored [**AgentScope Studio**](https://modelscope.github.io/agentscope/en/tutorial/209-gui.html)!
 
 <h5 align="center">
diff --git a/README_ZH.md b/README_ZH.md
index fe1ff3183..7bc991820 100644
--- a/README_ZH.md
+++ b/README_ZH.md
@@ -27,6 +27,7 @@
 | <img src="https://gw.alicdn.com/imgextra/i1/O1CN01hhD1mu1Dd3BWVUvxN_!!6000000000238-2-tps-400-400.png" width="100" height="100"> | <img src="https://img.alicdn.com/imgextra/i2/O1CN01tuJ5971OmAqNg9cOw_!!6000000001747-0-tps-444-460.jpg" width="100" height="100"> |
 
 ## 新闻
+- <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-06-11]** RAG功能现在已经整合进 **AgentScope** 中! 大家可以根据 [**简要介绍AgentScope中的RAG**](https://modelscope.github.io/agentscope/en/tutorial/210-rag.html) ，让自己的agent用上外部知识!
 
 - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-06-09]** AgentScope v0.0.5 已经更新！在这个新版本中，我们开源了 [**AgentScope Workstation**](https://modelscope.github.io/agentscope/en/tutorial/209-gui.html)！
 
diff --git a/docs/sphinx_doc/en/source/tutorial/210-rag.md b/docs/sphinx_doc/en/source/tutorial/210-rag.md
new file mode 100644
index 000000000..867fdb2ec
--- /dev/null
+++ b/docs/sphinx_doc/en/source/tutorial/210-rag.md
@@ -0,0 +1,197 @@
+(210-rag-en)=
+
+# A Quick Introduction to RAG in AgentScope
+
+We want to introduce three concepts related to RAG in AgentScope: Knowledge, KnowledgeBank and RAG agent.
+
+### Knowledge
+The Knowledge modules (now only `LlamaIndexKnowledge`; support for LangChain will come soon) are responsible for handling all RAG-related operations.
+
+#### How to create a Knowledge object
+  A Knowledge object can be created with a JSON configuration to specify 1) data path, 2) data loader, 3) data preprocessing methods, and 4) embedding model (model config name).
+  A detailed example can refer to the following:
+  <details>
+  <summary> A detailed example of Knowledge object configuration </summary>
+
+  ```json
+  [
+  {
+    "knowledge_id": "{your_knowledge_id}",
+    "emb_model_config_name": "{your_embed_model_config_name}",
+    "data_processing": [
+      {
+        "load_data": {
+          "loader": {
+            "create_object": true,
+            "module": "llama_index.core",
+            "class": "SimpleDirectoryReader",
+            "init_args": {
+              "input_dir": "{path_to_your_data_dir_1}",
+              "required_exts": [".md"]
+            }
+          }
+        }
+      },
+      {
+        "load_data": {
+          "loader": {
+            "create_object": true,
+            "module": "llama_index.core",
+            "class": "SimpleDirectoryReader",
+            "init_args": {
+              "input_dir": "{path_to_your_python_code_data_dir}",
+              "recursive": true,
+              "required_exts": [".py"]
+            }
+          }
+        },
+        "store_and_index": {
+          "transformations": [
+            {
+              "create_object": true,
+              "module": "llama_index.core.node_parser",
+              "class": "CodeSplitter",
+              "init_args": {
+                "language": "python",
+                "chunk_lines": 100
+              }
+            }
+          ]
+        }
+      }
+    ]
+  }
+  ]
+  ```
+
+  </details>
+
+#### More about knowledge configurations
+The aforementioned configuration is usually saved as a JSON file, it musts
+contain the following key attributes,
+* `knowledge_id`: a unique identifier of the knowledge;
+* `emb_model_config_name`: the name of the embedding model;
+* `chunk_size`: default chunk size for the document transformation (node parser);
+* `chunk_overlap`: default chunk overlap for each chunk (node);
+* `data_processing`: a list of data processing methods.
+
+##### Using LlamaIndexKnowledge as an example
+
+Regarding the last attribute `data_processing`, each entry of the list (which is a dict) configures a data
+loader object that loads the needed data (i.e. `load_data`),
+and a transformation object that the process the loaded data (`store_and_index`).
+Accordingly, one may load data from multiple sources (with different data loaders),
+process with individually defined manners (i.e. transformation or node parser),
+and merge the processed data into a single index for later retrieval.
+For more information about the components, please refer to
+[LlamaIndex-Loading Data](https://docs.llamaindex.ai/en/stable/module_guides/loading/).
+In common, we need to set the following attributes
+* `create_object`: indicates whether to create a new object, must be true in this case;
+* `module`: where the class is located;
+* `class`: the name of the class.
+
+More specifically, for setting the `load_data`, you can use a wide collection of data loaders,
+    such as `SimpleDirectoryReader` (in `class`), provided by Llama-index, to load a various collection of data types
+    (e.g. txt, pdf, html, py, md, etc.). Regarding this data loader, you can set the following attributes
+* `input_dir`: the path to the data directory;
+* `required_exts`: the file extensions that the data loader will load.
+
+For more information about the data loaders, please refer to [here](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/)
+
+For `store_and_index`, it is optional and if it is not specified, the default transformation (a.k.a. node parser) is `SentenceSplitter`. For some specific node parser such as `CodeSplitter`, users can set the following attributes:
+* `language`: the language of the code;
+* `chunk_lines`: the number of lines for each of the code chunk.
+
+For more information about the node parsers, please refer to [here](https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/).
+
+
+If users want to avoid the detailed configuration, we also provide a quick way in `KnowledgeBank` (see the following).
+
+#### How to use a Knowledge object
+After a knowledge object is created successfully, users can retrieve information related to their queries by calling `.retrieve(...)` function.
+The `.retrieve` function accepts at least three basic parameters:
+* `query`: input that will be matched in the knowledge;
+* `similarity_top_k`: how many most similar "data blocks" will be returned;
+* `to_list_strs`: whether return the retrieved information as strings.
+
+*Advanaced:* In `LlamaIndexKnowledge`, it also supports users passing their own retriever to retrieve from knowledge.
+
+#### More details inside `LlamaIndexKnowledge`
+Here, we will use `LlamaIndexKnowledge` as an example to illustrate the operation within the `Knowledge` module.
+When a `LlamaIndexKnowledge` object is initialized, the `LlamaIndexKnowledge.__init__` will go through the following steps:
+  *  It processes data and prepare for retrieval in `LlamaIndexKnowledge._data_to_index(...)`, which includes
+      * loading the data `LlamaIndexKnowledge._data_to_docs(...)`;
+      * preprocessing the data with preprocessing methods (e.g., splitting) and embedding model `LlamaIndexKnowledge._docs_to_nodes(...)`;
+      * get ready for being query, i.e. generate indexing for the processed data.
+  * If the indexing already exists, then `LlamaIndexKnowledge._load_index(...)` will be invoked to load the index and avoid repeating embedding calls.
+</br>
+
+### Knowledge Bank
+The knowledge bank maintains a collection of Knowledge objects (e.g., on different datasets) as a set of *knowledge*. Thus,
+different agents can reuse the Knowledge object without unnecessary "re-initialization".
+Considering that configuring the Knowledge object may be too complicated for most users, the knowledge bank also provides an easy function call to create Knowledge objects.
+  * `KnowledgeBank.add_data_as_knowledge`: create Knowledge object. An easy way only requires to provide `knowledge_id`, `emb_model_name` and `data_dirs_and_types`.
+    As knowledge bank process files as `LlamaIndexKnowledge` by default, all text file types are supported, such as `.txt`, `.html`, `.md`, `.csv`, `.pdf` and all code file like `.py`.  File types other than the text can refer to [LlamaIndex document](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/).
+  ```python
+  knowledge_bank.add_data_as_knowledge(
+        knowledge_id="agentscope_tutorial_rag",
+        emb_model_name="qwen_emb_config",
+        data_dirs_and_types={
+            "../../docs/sphinx_doc/en/source/tutorial": [".md"],
+        },
+    )
+  ```
+  More advance initialization, users can still pass a knowledge config as a parameter `knowledge_config`:
+  ```python
+  # load knowledge_config as dict
+  knowledge_bank.add_data_as_knowledge(
+      knowledge_id=knowledge_config["knowledge_id"],
+      emb_model_name=knowledge_config["emb_model_config_name"],
+      knowledge_config=knowledge_config,
+  )
+  ```
+* `KnowledgeBank.get_knowledge`: It accepts two parameters, `knowledge_id` and `duplicate`.
+  It will return a knowledge object with the provided `knowledge_id`; if `duplicate` is true, the return will be deep copied.
+* `KnowledgeBank.equip`: It accepts three parameters, `agent`, `knowledge_id_list` and `duplicate`.
+ The function will provide knowledge objects according to the `knowledge_id_list` and put them into `agent.knowledge_list`. If `duplicate` is true, the assigned knowledge object will be deep copied first.
+
+
+
+
+### RAG agent
+RAG agent is an agent that can generate answers based on the retrieved knowledge.
+  * Agent using RAG: a RAG agent has a list of knowledge objects (`knowledge_list`).
+    * RAG agent can be initialized with a `knowledge_list`
+      ```python
+        knowledge = knowledge_bank.get_knowledge(knowledge_id)
+        agent = LlamaIndexAgent(
+            name="rag_worker",
+            sys_prompt="{your_prompt}",
+            model_config_name="{your_model}",
+            knowledge_list=[knowledge], # provide knowledge object directly
+            similarity_top_k=3,
+            log_retrieval=False,
+            recent_n_mem_for_retrieve=1,
+        )
+      ```
+    * If RAG agent is build with a configurations with `knowledge_id_list` specified, agent can load specific knowledge from a `KnowledgeBank` by passing it and a list ids into the `KnowledgeBank.equip` function.
+       ```python
+          # >>> agent.knowledge_list
+          # >>> []
+          knowledge_bank.equip(agent, agent.knowledge_id_list)
+          # >>> agent.knowledge_list
+          # [<LlamaIndexKnowledge object at 0x16e516fb0>]
+      ```
+  * Agent can use the retrieved knowledge in the `reply` function and compose their prompt to LLMs.
+
+
+
+**Building RAG agent yourself.** As long as you provide a list of knowledge id, you can pass it with your agent to the `KnowledgeBank.equip`.
+Your agent will be equipped with a list of knowledge according to the `knowledge_id_list`.
+You can decide how to use the retrieved content and even update and refresh the index in your agent's `reply` function.
+
+
+[[Back to the top]](#210-rag-en)
+
+
+
diff --git a/docs/sphinx_doc/en/source/tutorial/main.md b/docs/sphinx_doc/en/source/tutorial/main.md
index c06a8d551..78e8520c8 100644
--- a/docs/sphinx_doc/en/source/tutorial/main.md
+++ b/docs/sphinx_doc/en/source/tutorial/main.md
@@ -22,6 +22,7 @@ AgentScope is an innovative multi-agent platform designed to empower developers
 - [Pipeline and MsgHub](202-pipeline.md)
 - [Distribution](208-distribute.md)
 - [AgentScope Studio](209-gui.md)
+- [Retrieval Augmented Generation (RAG)](210-rag.md)
 - [Logging](105-logging.md)
 - [Monitor](207-monitor.md)
 - [Example: Werewolf Game](104-usecase.md)
diff --git a/docs/sphinx_doc/zh_CN/source/tutorial/210-rag.md b/docs/sphinx_doc/zh_CN/source/tutorial/210-rag.md
new file mode 100644
index 000000000..7a0efd7d0
--- /dev/null
+++ b/docs/sphinx_doc/zh_CN/source/tutorial/210-rag.md
@@ -0,0 +1,180 @@
+(210-rag-zh)=
+
+# 简要介绍AgentScope中的RAG
+
+我们在此介绍AgentScope与RAG相关的三个概念：知识（Knowledge），知识库（Knowledge Bank）和RAG 智能体。
+
+### Knowledge
+知识模块（目前仅有“LlamaIndexKnowledge”；即将提供对LangChain的支持）负责处理所有与RAG相关的操作。
+
+#### 如何初始化一个Knowledge对象
+ 用户可以使用JSON配置来创建一个Knowledge模块，以指定1）数据路径，2）数据加载器，3）数据预处理方法，以及4）嵌入模型（模型配置名称）。
+一个详细的示例可以参考以下内容：
+  <details>
+  <summary> 详细的配置示例 </summary>
+
+  ```json
+  [
+  {
+    "knowledge_id": "{your_knowledge_id}",
+    "emb_model_config_name": "{your_embed_model_config_name}",
+    "data_processing": [
+      {
+        "load_data": {
+          "loader": {
+            "create_object": true,
+            "module": "llama_index.core",
+            "class": "SimpleDirectoryReader",
+            "init_args": {
+              "input_dir": "{path_to_your_data_dir_1}",
+              "required_exts": [".md"]
+            }
+          }
+        }
+      },
+      {
+        "load_data": {
+          "loader": {
+            "create_object": true,
+            "module": "llama_index.core",
+            "class": "SimpleDirectoryReader",
+            "init_args": {
+              "input_dir": "{path_to_your_python_code_data_dir}",
+              "recursive": true,
+              "required_exts": [".py"]
+            }
+          }
+        },
+        "store_and_index": {
+          "transformations": [
+            {
+              "create_object": true,
+              "module": "llama_index.core.node_parser",
+              "class": "CodeSplitter",
+              "init_args": {
+                "language": "python",
+                "chunk_lines": 100
+              }
+            }
+          ]
+        }
+      }
+    ]
+  }
+  ]
+  ```
+
+  </details>
+
+#### 更多关于 knowledge 配置
+以上提到的配置通常保存为一个JSON文件，它必须包含以下关键属性
+* `knowledge_id`: 每个knowledge模块的唯一标识符;
+* `emb_model_config_name`: embedding模型的名称;
+* `chunk_size`: 对文件分块的默认大小;
+* `chunk_overlap`: 文件分块之间的默认重叠大小;
+* `data_processing`: 一个list型的数据处理方法集合。
+
+##### 以配置 LlamaIndexKnowledge 为例
+
+当使用`llama_index_knowledge`是，对于上述的最后一项`data_processing` ，这个`list`型的参数中的每个条目（为`dict`型）都对应配置一个data loader对象，其功能包括用来加载所需的数据（即字段`load_data`中包含的信息），以及处理加载数据的转换对象（`store_and_index`）。换而言之，在一次载入数据时，可以同时从多个数据源中加载数据，并处理后合并在同一个索引下以供后面的数据提取使用（retrieve）。有关该组件的更多信息，请参阅 [LlamaIndex-Loading](https://docs.llamaindex.ai/en/stable/module_guides/loading/)。
+
+在这里，无论是针对数据加载还是数据处理，我们都需要配置以下属性
+* `create_object`：指示是否创建新对象，在此情况下必须为true；
+* `module`：对象对应的类所在的位置；
+* `class`：这个类的名称。
+
+更具体得说，当对`load_data`进行配置时候，您可以选择使用多种多样的的加载器，例如使用`SimpleDirectoryReader`（在`class`字段里配置）来读取各种类型的数据（例如txt、pdf、html、py、md等）。关于这个数据加载器，您还需要配置以下关键属性
+* `input_dir`：数据加载的路径；
+* `required_exts`：将加载的数据的文件扩展名。
+
+有关数据加载器的更多信息，请参阅[这里](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/)。
+
+对于`store_and_index`而言，这个配置是可选的，如果用户未指定特定的转换方式，系统将使用默认的transformation（也称为node parser）方法，名称为`SentenceSplitter`。对于某些特定需求下也可以使用不同的转换方式，例如对于代码解析可以使用`CodeSplitter`，针对这种特殊的node parser，用户可以设置以下属性：
+* `language`：希望处理代码的语言名；
+* `chunk_lines`：分割后每个代码块的行数。
+
+有关节点解析器的更多信息，请参阅[这里](https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/)。
+
+如果用户想要避免详细的配置，我们也在`KnowledgeBank`中提供了一种快速的方式（请参阅以下内容）。
+
+#### 如何使用一个 Knowledge 对象
+当我们成功创建了一个knowledge后，用户可以通过`.retrieve`从`Knowledge` 对象中提取信息。`.retrieve`函数一下三个参数：
+* `query`: 输入参数，用户希望提取与之相关的内容;
+* `similarity_top_k`: 提取的“数据块”数量；
+* `to_list_strs`: 是否只返回字符串(str)的列表(list)。
+
+*高阶:* 对于 `LlamaIndexKnowledge`, 它的`.retrieve`函数也支持熟悉LlamaIndex的用户直接传入一个建好的retriever。
+
+#### 关于`LlamaIndexKnowledge`的细节
+在这里，我们将使用`LlamaIndexKnowledge`作为示例，以说明在`Knowledge`模块内的操作。
+当初始化`LlamaIndexKnowledge`对象时，`LlamaIndexKnowledge.__init__`将执行以下步骤：
+  *  它处理数据并生成检索索引 (`LlamaIndexKnowledge._data_to_index(...)`中完成) 其中包括
+      * 加载数据 `LlamaIndexKnowledge._data_to_docs(...)`;
+      * 对数据进行预处理，使用预处理方法（比如分割）和向量模型生成向量  `LlamaIndexKnowledge._docs_to_nodes(...)`;
+      * 基于生成的向量做好被查询的准备， 即生成索引。
+  * 如果索引已经存在，则会调用 `LlamaIndexKnowledge._load_index(...)` 来加载索引，并避免重复的嵌入调用。
+</br>
+
+### Knowledge Bank
+知识库将一组Knowledge模块（例如，来自不同数据集的知识）作为知识的集合进行维护。因此，不同的智能体可以在没有不必要的重新初始化的情况下重复使用知识模块。考虑到配置Knowledge模块可能对大多数用户来说过于复杂，知识库还提供了一个简单的函数调用来创建Knowledge模块。
+
+* `KnowledgeBank.add_data_as_knowledge`: 创建Knowledge模块。一种简单的方式只需要提供knowledge_id、emb_model_name和data_dirs_and_types。
+   因为`KnowledgeBank`默认生成的是 `LlamaIndexKnowledge`, 所以所有文本类文件都可以支持，包括`.txt`， `.html`， `.md` ，`.csv`，`.pdf`和 所有代码文件（如`.py`）.  其他支持的文件类型可以参考 [LlamaIndex document](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/).
+  ```python
+  knowledge_bank.add_data_as_knowledge(
+        knowledge_id="agentscope_tutorial_rag",
+        emb_model_name="qwen_emb_config",
+        data_dirs_and_types={
+            "../../docs/sphinx_doc/en/source/tutorial": [".md"],
+        },
+    )
+  ```
+  对于更高级的初始化，用户仍然可以将一个知识模块配置作为参数knowledge_config传递：
+  ```python
+  # load knowledge_config as dict
+  knowledge_bank.add_data_as_knowledge(
+      knowledge_id=knowledge_config["knowledge_id"],
+      emb_model_name=knowledge_config["emb_model_config_name"],
+      knowledge_config=knowledge_config,
+  )
+  ```
+* `KnowledgeBank.get_knowledge`: 它接受两个参数，knowledge_id和duplicate。
+  如果duplicate为true，则返回提供的knowledge_id对应的知识对象；否则返回深拷贝的对象。
+* `KnowledgeBank.equip`: 它接受三个参数，`agent`，`knowledge_id_list` 和`duplicate`。
+该函数会根据`knowledge_id_list`为`agent`提供相应的知识（放入`agent.knowledge_list`）。`duplicate` 同样决定是否是深拷贝。
+
+
+
+### RAG 智能体
+RAG 智能体是可以基于检索到的知识生成答案的智能体。
+  * 让智能体使用RAG: RAG agent配有一个`knowledge_list`的列表
+    * 可以在初始化时就给RAG agent传入`knowledge_list`
+      ```python
+          knowledge = knowledge_bank.get_knowledge(knowledge_id)
+          agent = LlamaIndexAgent(
+              name="rag_worker",
+              sys_prompt="{your_prompt}",
+              model_config_name="{your_model}",
+              knowledge_list=[knowledge], # provide knowledge object directly
+              similarity_top_k=3,
+              log_retrieval=False,
+              recent_n_mem_for_retrieve=1,
+          )
+        ```
+    * 如果通过配置文件来批量启动agent，也可以给agent提供`knowledge_id_list`。这样也可以通过将agent和它的`knowledge_id_list`一起传入`KnowledgeBank.equip`来为agent赋予`knowledge_list`。
+      ```python
+          # >>> agent.knowledge_list
+          # >>> []
+          knowledge_bank.equip(agent, agent.knowledge_id_list)
+          # >>> agent.knowledge_list
+          # [<LlamaIndexKnowledge object at 0x16e516fb0>]
+      ```
+  * Agent 智能体可以在`reply`函数中使用从`Knowledge`中检索到的信息，将其提示组合到LLM的提示词中。
+
+**自己搭建 RAG 智能体.** 只要您的智能体配置具有`knowledge_id_list`，您就可以将一个agent和这个列表传递给`KnowledgeBank.equip`；这样该agent就是被装配`knowledge_id`。
+您可以在`reply`函数中自己决定如何从`Knowledge`对象中提取和使用信息，甚至通过`Knowledge`修改知识库。
+
+[[回到顶部]](#210-rag-zh)
+
+
+
diff --git a/docs/sphinx_doc/zh_CN/source/tutorial/main.md b/docs/sphinx_doc/zh_CN/source/tutorial/main.md
index f3c605949..98fda818c 100644
--- a/docs/sphinx_doc/zh_CN/source/tutorial/main.md
+++ b/docs/sphinx_doc/zh_CN/source/tutorial/main.md
@@ -22,6 +22,7 @@ AgentScope是一款全新的Multi-Agent框架，专为应用开发者打造，
 - [Pipeline和MsgHub](202-pipeline.md)
 - [分布式](208-distribute.md)
 - [AgentScope Studio](209-gui.md)
+- [检索增强生成（RAG）](210-rag.md)
 - [日志](105-logging.md)
 - [监控器](207-monitor.md)
 - [样例：狼人杀游戏](104-usecase.md)
diff --git a/examples/conversation_with_RAG_agents/README.md b/examples/conversation_with_RAG_agents/README.md
index b09ee95e3..c8f9fb5c1 100644
--- a/examples/conversation_with_RAG_agents/README.md
+++ b/examples/conversation_with_RAG_agents/README.md
@@ -1,4 +1,4 @@
-# AgentScope Consultants: a Multi-Agent RAG Application
+# AgentScope Copilot: a Multi-Agent RAG Application
 
 * **What is this example about?**
 With the provided implementation and configuration,
@@ -7,7 +7,6 @@ you will obtain three different agents who can help you answer different questio
 * **What is this example for?** By this example, we want to show how the agent with retrieval augmented generation (RAG)
 capability can be used to build easily.
 
-**Notice:** This example is a Beta version of the AgentScope RAG agent. A formal version will soon be added to `src/agentscope/agents`, but it may be subject to changes.
 
 ## Prerequisites
 * **Cloning repo:** This example requires cloning the whole AgentScope repo to local.
@@ -23,35 +22,27 @@ capability can be used to build easily.
 **Note:** This example has been tested with `dashscope_chat` and `dashscope_text_embedding` model wrapper, with `qwen-max` and `text-embedding-v2` models.
 However, you are welcome to replace the Dashscope language and embedding model wrappers or models with other models you like to test.
 
-## Start AgentScope Consultants
-* **Terminal:** The most simple way to execute the AgentScope Consultants is running in terminal.
+## Start AgentScope Copilot
+* **Terminal:** The most simple way to execute the AgentScope Copilot is running in terminal.
   ```bash
   python ./rag_example.py
   ```
-  Setting `log_retrieval` to `false` in `agent_config.json` can hide the retrieved information and provide only answers of agents.
+
 
 * **AS studio:** If you want to have more organized, clean UI, you can also run with our `as_studio`.
   ```bash
   as_studio ./rag_example.py
   ```
 
-### Customize AgentScope Consultants to other consultants
+### Agents in the example
 After you run the example, you may notice that this example consists of three RAG agents:
-* `AgentScope Tutorial Assistant`: responsible for answering questions based on AgentScope tutorials (markdown files).
-* `AgentScope Framework Code Assistant`: responsible for answering questions based on AgentScope code base (python files).
-* `Summarize Assistant`: responsible for summarize the questions from the above two agents.
-
-These agents can be configured to answering questions based on other GitHub repo, by simply modifying the `input_dir` fields in the `agent_config.json`.
-
-For more advanced customization, we may need to learn a little bit from the following.
+* `Tutorial-Assistant`: responsible for answering questions based on AgentScope tutorials (markdown files).
+* `Code-Search-Assistant`: responsible for answering questions based on AgentScope code base (python files).
+* `API-Assistant`: responsible for answering questions based on AgentScope API documents (html files, generated by `sphinx`)
+* `Searching-Assistant`: responsible for general search in tutorial and code base (markdown files and code files)
+* `Agent-Guiding-Assistant`: responsible for referring the correct agent(s) among the above ones.
 
-**RAG modules:** In AgentScope, RAG modules are abstract to provide three basic functions: `load_data`, `store_and_index` and `retrieve`. Refer to `src/agentscope/rag` for more details.
+Besides the last `Agent-Guiding-Assistant`, all other agents can be configured to answering questions based on other GitHub repo by replacing the `knowledge`.
 
-**RAG configs:** In the example configuration (the `rag_config` field), all parameters are optional. But if you want to customize them, you may want to learn the following:
-*  `load_data`: contains all parameters for the the `rag.load_data` function.
-Since the `load_data` accepts a dataloader object `loader`, the `loader` in the config need to have `"create_object": true` to let a internal parse create a LlamaIndex data loader object.
-The loader object is an instance of `class` in module `module`, with initialization parameters in `init_args`.
+For more details about how to use the RAG module in AgentScope, please refer to the tutorial.
 
-* `store_and_index`: contains all parameters for the the `rag.store_and_index` function.
-For example, you can pass `vector_store` and `retriever` configurations in a similar way as the `loader` mentioned above.
-For the `transformations` parameter, you can pass a list of dicts, each of which corresponds to building a `NodeParser`-kind of preprocessor in Llamaindex.
\ No newline at end of file
diff --git a/examples/conversation_with_RAG_agents/agent_config.json b/examples/conversation_with_RAG_agents/agent_config.json
deleted file mode 100644
index fc0a23c12..000000000
--- a/examples/conversation_with_RAG_agents/agent_config.json
+++ /dev/null
@@ -1,79 +0,0 @@
-[
-  {
-    "class": "LlamaIndexAgent",
-    "args": {
-      "name": "AgentScope Tutorial Assistant",
-      "sys_prompt": "You're a helpful assistant. You need to generate answers based on the provided context.",
-      "model_config_name": "qwen_config",
-      "emb_model_config_name": "qwen_emb_config",
-      "rag_config": {
-            "load_data": {
-              "loader": {
-                "create_object": true,
-                "module": "llama_index.core",
-                "class": "SimpleDirectoryReader",
-                "init_args": {
-                  "input_dir": "../../docs/sphinx_doc/en/source/tutorial/",
-                  "required_exts": [".md"]
-                }
-              }
-            },
-            "chunk_size": 2048,
-            "chunk_overlap": 40,
-            "similarity_top_k": 10,
-            "log_retrieval": false,
-            "recent_n_mem": 1
-      }
-    }
-  },
-  {
-    "class": "LlamaIndexAgent",
-    "args": {
-      "name": "AgentScope Framework Code Assistant",
-      "sys_prompt": "You're a helpful assistant about coding. You can very familiar with the framework code of AgentScope.",
-      "model_config_name": "qwen_config",
-      "emb_model_config_name": "qwen_emb_config",
-      "rag_config": {
-            "load_data": {
-              "loader": {
-                "create_object": true,
-                "module": "llama_index.core",
-                "class": "SimpleDirectoryReader",
-                "init_args": {
-                  "input_dir": "../../src/agentscope",
-                  "recursive": true,
-                  "required_exts": [".py"]
-                }
-              }
-            },
-            "store_and_index": {
-              "transformations": [
-                {
-                  "create_object": true,
-                  "module": "llama_index.core.node_parser",
-                  "class": "CodeSplitter",
-                  "init_args": {
-                    "language": "python",
-                    "chunk_lines": 100
-                  }
-                }
-              ]
-            },
-            "chunk_size": 2048,
-            "chunk_overlap": 40,
-            "similarity_top_k": 10,
-            "log_retrieval": false,
-            "recent_n_mem": 1
-      }
-    }
-  },
-  {
-    "class": "DialogAgent",
-    "args": {
-      "name": "Summarize Assistant",
-      "sys_prompt": "You are a helpful assistant that can summarize the answers of the previous two messages.",
-      "model_config_name": "qwen_config",
-      "use_memory": true
-    }
-  }
-]
\ No newline at end of file
diff --git a/examples/conversation_with_RAG_agents/configs/agent_config.json b/examples/conversation_with_RAG_agents/configs/agent_config.json
new file mode 100644
index 000000000..0aa4a12d3
--- /dev/null
+++ b/examples/conversation_with_RAG_agents/configs/agent_config.json
@@ -0,0 +1,64 @@
+[
+  {
+    "class": "LlamaIndexAgent",
+    "args": {
+      "name": "Tutorial-Assistant",
+      "description": "Tutorial-Assistant is an agent that can provide answer based on English tutorial material, mainly the markdown files. It can answer general questions about AgentScope.",
+      "sys_prompt": "You're an assistant helping new users to use AgentScope. The language style is helpful and cheerful. You generate answers based on the provided context. The answer is expected to be no longer than 100 words. If the key words of the question can be found in the provided context, the answer should contain the section name which contains the answer. For example, 'You may refer to SECTION_NAME for more details.'",
+      "model_config_name": "qwen_config",
+      "knowledge_id_list": ["agentscope_tutorial_rag"],
+      "similarity_top_k": 5,
+      "log_retrieval": false,
+      "recent_n_mem_for_retrieve": 1
+    }
+  },
+  {
+    "class": "LlamaIndexAgent",
+    "args": {
+      "name": "Code-Search-Assistant",
+      "description": "Code-Search-Assistant is an agent that can provide answer based on AgentScope code base. It can answer questions about specific modules in AgentScope.",
+      "sys_prompt": "You're a coding assistant of AgentScope. The answer starts with appreciation for the question, then provide details regarding the functionality and features of the modules mentioned in the question. The language should be in a professional and simple style. The answer is limited to be less than 100 words.",
+      "model_config_name": "qwen_config",
+      "knowledge_id_list": ["agentscope_code_rag"],
+      "similarity_top_k": 5,
+      "log_retrieval": false,
+      "recent_n_mem_for_retrieve": 1
+    }
+  },
+  {
+    "class": "LlamaIndexAgent",
+    "args": {
+      "name": "API-Assistant",
+      "description": "API-Assistant is an agent that can answer questions about APIs in AgentScope. It can answer general questions about AgentScope.",
+      "sys_prompt": "You're an assistant providing answers to the questions related to APIs (functions and classes) in AgentScope. The language style is helpful and cheerful. You generate answers based on the provided context. The answer is expected to be no longer than 200 words. If the key words of the question can be found in the provided context, the answer should contain the module of the API. For example, 'You may refer to MODULE_NAME for more details.'",
+      "model_config_name": "qwen_config",
+      "knowledge_id_list": ["agentscope_api_rag"],
+      "similarity_top_k": 2,
+      "log_retrieval": true,
+      "recent_n_mem_for_retrieve": 1
+    }
+  },
+  {
+    "class": "LlamaIndexAgent",
+    "args": {
+      "name": "Searching-Assistant",
+      "description": "Search-Assistant is an agent that can provide answer based on AgentScope code and tutorial. It can answer questions about everything in AgentScope codes and tutorials.",
+      "sys_prompt": "You're a helpful assistant of AgentScope. The answer starts with appreciation for the question, then provide output the location of the code or section that the most relevant to the question. The answer is limited to be less than 50 words.",
+      "model_config_name": "qwen_config",
+      "knowledge_id_list": ["agentscope_code_rag","agentscope_tutorial_rag"],
+      "similarity_top_k": 5,
+      "log_retrieval": false,
+      "recent_n_mem_for_retrieve": 1,
+      "persist_dir": "./rag_storage/searching_assist"
+    }
+  },
+  {
+    "class": "DialogAgent",
+    "args": {
+      "name": "Agent-Guiding-Assistant",
+      "sys_prompt": "You're an assistant guiding the user to specific agent for help. The answer is in a cheerful styled language. The output starts with appreciation for the question. Next, rephrase the question in a simple declarative Sentence for example, 'I think you are asking...'. Last, if the question is about detailed code or example in AgentScope Framework, output '@ Code-Search-Assistant you might be suitable for answering the question'; if the question is about API or function calls (Example: 'Is there function related...' or 'how can I initialize ...' ) in AgentScope, output '@ API-Assistant, I think you are more suitable for the question, please tell us more about it'; if question is about where to find some context (Example:'where can I find...'), output '@ Searching-Assistant, we need your help', otherwise, output '@ Tutorial-Assistant, I think you are more suitable for the question, can you tell us more about it?'. The answer is expected to be only one sentence",
+      "model_config_name": "qwen_config",
+      "use_memory": false
+    }
+  }
+]
\ No newline at end of file
diff --git a/examples/conversation_with_RAG_agents/configs/knowledge_config.json b/examples/conversation_with_RAG_agents/configs/knowledge_config.json
new file mode 100644
index 000000000..d7ef45542
--- /dev/null
+++ b/examples/conversation_with_RAG_agents/configs/knowledge_config.json
@@ -0,0 +1,114 @@
+[
+  {
+    "knowledge_id": "agentscope_code_rag",
+    "emb_model_config_name": "qwen_emb_config",
+    "chunk_size": 2048,
+    "chunk_overlap": 40,
+    "data_processing": [
+      {
+        "load_data": {
+          "loader": {
+            "create_object": true,
+            "module": "llama_index.core",
+            "class": "SimpleDirectoryReader",
+            "init_args": {
+              "input_dir": "../../src/agentscope",
+              "recursive": true,
+              "required_exts": [
+                ".py"
+              ]
+            }
+          }
+        },
+        "store_and_index": {
+          "transformations": [
+            {
+              "create_object": true,
+              "module": "llama_index.core.node_parser",
+              "class": "CodeSplitter",
+              "init_args": {
+                "language": "python",
+                "chunk_lines": 100
+              }
+            }
+          ]
+        }
+      }
+    ]
+  },
+  {
+    "knowledge_id": "agentscope_api_rag",
+    "emb_model_config_name": "qwen_emb_config",
+    "chunk_size": 1024,
+    "chunk_overlap": 40,
+    "data_processing": [
+      {
+        "load_data": {
+          "loader": {
+            "create_object": true,
+            "module": "llama_index.core",
+            "class": "SimpleDirectoryReader",
+            "init_args": {
+              "input_dir": "../../docs/docstring_html/",
+              "required_exts": [
+                ".html"
+              ]
+            }
+          }
+        }
+      }
+    ]
+  },
+  {
+    "knowledge_id": "agentscope_global_rag",
+    "emb_model_config_name": "qwen_emb_config",
+    "chunk_size": 2048,
+    "chunk_overlap": 40,
+    "data_processing": [
+      {
+        "load_data": {
+          "loader": {
+            "create_object": true,
+            "module": "llama_index.core",
+            "class": "SimpleDirectoryReader",
+            "init_args": {
+              "input_dir": "../../docs/sphinx_doc/en/source/tutorial",
+              "required_exts": [
+                ".md"
+              ]
+            }
+          }
+        }
+      },
+      {
+        "load_data": {
+          "loader": {
+            "create_object": true,
+            "module": "llama_index.core",
+            "class": "SimpleDirectoryReader",
+            "init_args": {
+              "input_dir": "../../src/agentscope",
+              "recursive": true,
+              "required_exts": [
+                ".py"
+              ]
+            }
+          }
+        },
+        "store_and_index": {
+          "transformations": [
+            {
+              "create_object": true,
+              "module": "llama_index.core.node_parser",
+              "class": "CodeSplitter",
+              "init_args": {
+                "language": "python",
+                "chunk_lines": 100
+              }
+            }
+          ]
+        }
+      }
+    ]
+  }
+]
\ No newline at end of file
diff --git a/examples/conversation_with_RAG_agents/configs/model_config.json b/examples/conversation_with_RAG_agents/configs/model_config.json
new file mode 100644
index 000000000..25ba628cd
--- /dev/null
+++ b/examples/conversation_with_RAG_agents/configs/model_config.json
@@ -0,0 +1,14 @@
+[
+  {
+    "model_type": "dashscope_text_embedding",
+    "config_name": "qwen_emb_config",
+    "model_name": "text-embedding-v2",
+    "api_key": ""
+  },
+  {
+      "model_type": "dashscope_chat",
+      "config_name": "qwen_config",
+      "model_name": "qwen-max",
+      "api_key": ""
+  }
+]
\ No newline at end of file
diff --git a/examples/conversation_with_RAG_agents/groupchat_utils.py b/examples/conversation_with_RAG_agents/groupchat_utils.py
new file mode 100644
index 000000000..24d422c57
--- /dev/null
+++ b/examples/conversation_with_RAG_agents/groupchat_utils.py
@@ -0,0 +1,37 @@
+# -*- coding: utf-8 -*-
+""" Group chat utils."""
+import re
+from typing import Sequence
+
+
+def select_next_one(agents: Sequence, rnd: int) -> Sequence:
+    """
+    Select next agent.
+    """
+    return agents[rnd % len(agents)]
+
+
+def filter_agents(string: str, agents: Sequence) -> Sequence:
+    """
+    This function filters the input string for occurrences of the given names
+    prefixed with '@' and returns a list of the found names.
+    """
+    if len(agents) == 0:
+        return []
+
+    # Create a pattern that matches @ followed by any of the candidate names
+    pattern = (
+        r"@(" + "|".join(re.escape(agent.name) for agent in agents) + r")\b"
+    )
+
+    # Find all occurrences of the pattern in the string
+    matches = re.findall(pattern, string)
+
+    # Create a dictionary mapping agent names to agent objects for quick lookup
+    agent_dict = {agent.name: agent for agent in agents}
+
+    # Return the list of matched agent objects preserving the order
+    ordered_agents = [
+        agent_dict[name] for name in matches if name in agent_dict
+    ]
+    return ordered_agents
diff --git a/examples/conversation_with_RAG_agents/rag/__init__.py b/examples/conversation_with_RAG_agents/rag/__init__.py
deleted file mode 100644
index 3c8f48882..000000000
--- a/examples/conversation_with_RAG_agents/rag/__init__.py
+++ /dev/null
@@ -1,18 +0,0 @@
-# -*- coding: utf-8 -*-
-""" Import all pipeline related modules in the package. """
-from .rag import RAGBase
-
-from .llama_index_rag import LlamaIndexRAG
-
-
-try:
-    from .langchain_rag import LangChainRAG
-except Exception:
-    LangChainRAG = None  # type: ignore # NOQA
-
-
-__all__ = [
-    "RAGBase",
-    "LlamaIndexRAG",
-    "LangChainRAG",
-]
diff --git a/examples/conversation_with_RAG_agents/rag/langchain_rag.py b/examples/conversation_with_RAG_agents/rag/langchain_rag.py
deleted file mode 100644
index 36a329547..000000000
--- a/examples/conversation_with_RAG_agents/rag/langchain_rag.py
+++ /dev/null
@@ -1,208 +0,0 @@
-# -*- coding: utf-8 -*-
-"""
-This module is integrate the LangChain RAG model into our AgentScope package
-"""
-
-
-from typing import Any, Optional, Union
-
-try:
-    from langchain_core.vectorstores import VectorStore
-    from langchain_core.documents import Document
-    from langchain_core.embeddings import Embeddings
-    from langchain_community.document_loaders.base import BaseLoader
-    from langchain_community.vectorstores import Chroma
-    from langchain_text_splitters.base import TextSplitter
-    from langchain_text_splitters import CharacterTextSplitter
-except ImportError:
-    VectorStore = None
-    Document = None
-    Embeddings = None
-    BaseLoader = None
-    Chroma = None
-    TextSplitter = None
-    CharacterTextSplitter = None
-
-from examples.conversation_with_RAG_agents.rag import RAGBase
-from examples.conversation_with_RAG_agents.rag.rag import (
-    DEFAULT_CHUNK_OVERLAP,
-    DEFAULT_CHUNK_SIZE,
-)
-from agentscope.models import ModelWrapperBase
-
-
-class _LangChainEmbModel(Embeddings):
-    """
-    Dummy wrapper to convert the ModelWrapperBase embedding model
-    to a LanguageChain RAG model
-    """
-
-    def __init__(self, emb_model: ModelWrapperBase) -> None:
-        """
-        Dummy wrapper
-        Args:
-            emb_model (ModelWrapperBase): embedding model of
-                ModelWrapperBase type
-        """
-        self._emb_model_wrapper = emb_model
-
-    def embed_documents(self, texts: list[str]) -> list[list[float]]:
-        """
-        Wrapper function for embedding list of documents
-        Args:
-            texts (list[str]): list of texts to be embedded
-        """
-        results = [
-            list(self._emb_model_wrapper(t).embedding[0]) for t in texts
-        ]
-        return results
-
-    def embed_query(self, text: str) -> list[float]:
-        """
-        Wrapper function for embedding a single query
-        Args:
-            text (str): query to be embedded
-        """
-        return list(self._emb_model_wrapper(text).embedding[0])
-
-
-class LangChainRAG(RAGBase):
-    """
-    This class is a wrapper around the LangChain RAG.
-    """
-
-    def __init__(
-        self,
-        model: Optional[ModelWrapperBase],
-        emb_model: Union[ModelWrapperBase, Embeddings, None],
-        config: Optional[dict] = None,
-        **kwargs: Any,
-    ) -> None:
-        """
-        Initializes the LangChainRAG
-        Args:
-            model (ModelWrapperBase):
-                The language model used for final synthesis
-            emb_model ( Union[ModelWrapperBase, Embeddings, None]):
-                The embedding model used for generate embeddings
-            config (dict):
-                The additional configuration for llama index rag
-        """
-        super().__init__(model, emb_model, **kwargs)
-
-        self.loader = None
-        self.splitter = None
-        self.retriever = None
-        self.vector_store = None
-
-        if VectorStore is None:
-            raise ImportError(
-                "Please install LangChain RAG packages to use LangChain RAG.",
-            )
-
-        self.config = config or {}
-        if isinstance(emb_model, ModelWrapperBase):
-            self.emb_model = _LangChainEmbModel(emb_model)
-        elif isinstance(emb_model, Embeddings):
-            self.emb_model = emb_model
-        else:
-            raise TypeError(
-                f"Embedding model does not support {type(self.emb_model)}.",
-            )
-
-    def load_data(
-        self,
-        loader: BaseLoader,
-        query: Optional[Any] = None,
-        **kwargs: Any,
-    ) -> list[Document]:
-        # pylint: disable=unused-argument
-        """
-        Loading data from a directory
-        Args:
-            loader (BaseLoader):
-                accepting a LangChain loader instance
-            query (str):
-                accepting a query, LangChain does not rely on this
-        Returns:
-            list[Document]: a list of documents loaded
-        """
-        self.loader = loader
-        docs = self.loader.load()
-        return docs
-
-    def store_and_index(
-        self,
-        docs: Any,
-        vector_store: Optional[VectorStore] = None,
-        splitter: Optional[TextSplitter] = None,
-        **kwargs: Any,
-    ) -> Any:
-        # pylint: disable=unused-argument
-        """
-        Preprocessing the loaded documents.
-        Args:
-            docs (Any):
-                documents to be processed
-            vector_store (Optional[VectorStore]):
-                vector store in LangChain RAG
-            splitter (Optional[TextSplitter]):
-                optional, specifies the splitter to preprocess
-                the documents
-
-        Returns:
-            None
-
-        In LlamaIndex terms, an Index is a data structure composed
-        of Document objects, designed to enable querying by an LLM.
-        For example:
-        1) preprocessing documents with
-        2) generate embedding,
-        3) store the embedding-content to vdb
-        """
-        self.splitter = splitter or CharacterTextSplitter(
-            chunk_size=self.config.get("chunk_size", DEFAULT_CHUNK_SIZE),
-            chunk_overlap=self.config.get(
-                "chunk_overlap",
-                DEFAULT_CHUNK_OVERLAP,
-            ),
-        )
-        all_splits = self.splitter.split_documents(docs)
-
-        # indexing the chunks and store them into the vector store
-        if vector_store is None:
-            vector_store = Chroma()
-        self.vector_store = vector_store.from_documents(
-            documents=all_splits,
-            embedding=self.emb_model,
-        )
-
-        # build retriever
-        search_type = self.config.get("search_type", "similarity")
-        self.retriever = self.vector_store.as_retriever(
-            search_type=search_type,
-            search_kwargs={
-                "k": self.config.get("similarity_top_k", 6),
-            },
-        )
-
-    def retrieve(self, query: Any, to_list_strs: bool = False) -> list[Any]:
-        """
-        This is a basic retrieve function with LangChain APIs
-        Args:
-          query: query is expected to be a question in string
-
-        Returns:
-            list of answers
-
-        More advanced retriever can refer to
-        https://python.langchain.com/docs/modules/data_connection/retrievers/
-        """
-
-        retrieved_docs = self.retriever.invoke(query)
-        if to_list_strs:
-            results = []
-            for doc in retrieved_docs:
-                results.append(doc.page_content)
-            return results
-        return retrieved_docs
diff --git a/examples/conversation_with_RAG_agents/rag/llama_index_rag.py b/examples/conversation_with_RAG_agents/rag/llama_index_rag.py
deleted file mode 100644
index 8756856ff..000000000
--- a/examples/conversation_with_RAG_agents/rag/llama_index_rag.py
+++ /dev/null
@@ -1,320 +0,0 @@
-# -*- coding: utf-8 -*-
-"""
-This module is an integration of the Llama index RAG
-into AgentScope package
-"""
-
-from typing import Any, Optional, List, Union
-from loguru import logger
-
-try:
-    from llama_index.core.readers.base import BaseReader
-    from llama_index.core.base.base_retriever import BaseRetriever
-    from llama_index.core.base.embeddings.base import BaseEmbedding, Embedding
-    from llama_index.core.ingestion import IngestionPipeline
-    from llama_index.core.vector_stores.types import (
-        BasePydanticVectorStore,
-        VectorStore,
-    )
-    from llama_index.core.bridge.pydantic import PrivateAttr
-    from llama_index.core.node_parser.interface import NodeParser
-    from llama_index.core.node_parser import SentenceSplitter
-    from llama_index.core import (
-        VectorStoreIndex,
-    )
-except ImportError:
-    BaseReader, BaseRetriever = None, None
-    BaseEmbedding, Embedding = None, None
-    IngestionPipeline, BasePydanticVectorStore, VectorStore = None, None, None
-    NodeParser, SentenceSplitter = None, None
-    VectorStoreIndex = None
-    PrivateAttr = None
-
-from rag import RAGBase
-from rag.rag import (
-    DEFAULT_CHUNK_SIZE,
-    DEFAULT_CHUNK_OVERLAP,
-    DEFAULT_TOP_K,
-)
-from agentscope.models import ModelWrapperBase
-
-
-class _EmbeddingModel(BaseEmbedding):
-    """
-    wrapper for ModelWrapperBase to an embedding model can be used
-    in Llama Index pipeline.
-    """
-
-    _emb_model_wrapper: ModelWrapperBase = PrivateAttr()
-
-    def __init__(
-        self,
-        emb_model: ModelWrapperBase,
-        embed_batch_size: int = 1,
-    ) -> None:
-        """
-        Dummy wrapper to convert a ModelWrapperBase to llama Index
-        embedding model
-
-        Args:
-            emb_model (ModelWrapperBase): embedding model in ModelWrapperBase
-            embed_batch_size (int): batch size, defaults to 1
-        """
-        super().__init__(
-            model_name="Temporary_embedding_wrapper",
-            embed_batch_size=embed_batch_size,
-        )
-        self._emb_model_wrapper = emb_model
-
-    def _get_query_embedding(self, query: str) -> List[float]:
-        """
-        get embedding for query
-        Args:
-            query (str): query to be embedded
-        """
-        # Note: AgentScope embedding model wrapper returns list of embedding
-        return list(self._emb_model_wrapper(query).embedding[0])
-
-    def _get_text_embeddings(self, texts: List[str]) -> List[Embedding]:
-        """
-        get embedding for list of strings
-        Args:
-             texts ( List[str]): texts to be embedded
-        """
-        results = [
-            list(self._emb_model_wrapper(t).embedding[0]) for t in texts
-        ]
-        return results
-
-    def _get_text_embedding(self, text: str) -> Embedding:
-        """
-        get embedding for a single string
-        Args:
-             text (str): texts to be embedded
-        """
-        return list(self._emb_model_wrapper(text).embedding[0])
-
-    # TODO: use proper async methods, but depends on model wrapper
-    async def _aget_query_embedding(self, query: str) -> List[float]:
-        """The asynchronous version of _get_query_embedding."""
-        return self._get_query_embedding(query)
-
-    async def _aget_text_embedding(self, text: str) -> List[float]:
-        """Asynchronously get text embedding."""
-        return self._get_text_embedding(text)
-
-    async def _aget_text_embeddings(
-        self,
-        texts: List[str],
-    ) -> List[List[float]]:
-        """Asynchronously get text embeddings."""
-        return self._get_text_embeddings(texts)
-
-
-class LlamaIndexRAG(RAGBase):
-    """
-    This class is a wrapper with the llama index RAG.
-    """
-
-    def __init__(
-        self,
-        model: Optional[ModelWrapperBase],
-        emb_model: Union[ModelWrapperBase, BaseEmbedding, None] = None,
-        config: Optional[dict] = None,
-        **kwargs: Any,
-    ) -> None:
-        """
-        RAG component based on llama index.
-        Args:
-            model (ModelWrapperBase):
-                The language model used for final synthesis
-            emb_model (Optional[ModelWrapperBase]):
-                The embedding model used for generate embeddings
-            config (dict):
-                The additional configuration for llama index rag
-        """
-        super().__init__(model, emb_model, config, **kwargs)
-        self.retriever = None
-        self.index = None
-        self.persist_dir = kwargs.get("persist_dir", "/")
-        self.emb_model = emb_model
-        print(self.config)
-
-        # ensure the emb_model is compatible with LlamaIndex
-        if isinstance(emb_model, ModelWrapperBase):
-            self.emb_model = _EmbeddingModel(emb_model)
-        elif isinstance(self.emb_model, BaseEmbedding):
-            pass
-        else:
-            raise TypeError(
-                f"Embedding model does not support {type(self.emb_model)}.",
-            )
-
-    def load_data(
-        self,
-        loader: BaseReader,
-        query: Optional[str] = None,
-        **kwargs: Any,
-    ) -> Any:
-        """
-        Accept a loader, loading the desired data (no chunking)
-        Args:
-            loader (BaseReader):
-                object to load data, expected be an instance of class
-                inheriting from BaseReader in llama index.
-            query (Optional[str]):
-                optional, used when the data is in a database.
-
-        Returns:
-            Any: loaded documents
-
-        Example 1: use simple directory loader to load general documents,
-        including Markdown, PDFs, Word documents, PowerPoint decks, images,
-        audio and video.
-        ```
-            load_data_to_chunks(
-                loader=SimpleDirectoryReader("./data")
-            )
-        ```
-
-        Example 2: use SQL loader
-        ```
-            load_data_to_chunks(
-                DatabaseReader(
-                    scheme=os.getenv("DB_SCHEME"),
-                    host=os.getenv("DB_HOST"),
-                    port=os.getenv("DB_PORT"),
-                    user=os.getenv("DB_USER"),
-                    password=os.getenv("DB_PASS"),
-                    dbname=os.getenv("DB_NAME"),
-                ),
-                query = "SELECT * FROM users"
-            )
-        ```
-        """
-        if query is None:
-            documents = loader.load_data()
-        else:
-            documents = loader.load_data(query)
-        logger.info(f"loaded {len(documents)} documents")
-        return documents
-
-    def store_and_index(
-        self,
-        docs: Any,
-        vector_store: Union[BasePydanticVectorStore, VectorStore, None] = None,
-        retriever: Optional[BaseRetriever] = None,
-        transformations: Optional[list[NodeParser]] = None,
-        **kwargs: Any,
-    ) -> Any:
-        """
-        Preprocessing the loaded documents.
-        Args:
-            docs (Any):
-                documents to be processed, usually expected to be in
-                 llama index Documents.
-            vector_store (Union[BasePydanticVectorStore, VectorStore, None]):
-                vector store in llama index
-            retriever (Optional[BaseRetriever]):
-                optional, specifies the retriever in llama index to be used
-            transformations (Optional[list[NodeParser]]):
-                optional, specifies the transformations (operators) to
-                process documents (e.g., split the documents into smaller
-                chunks)
-
-        Return:
-            Any: return the index of the processed document
-
-        In LlamaIndex terms, an Index is a data structure composed
-        of Document objects, designed to enable querying by an LLM.
-        For example:
-        1) preprocessing documents with
-        2) generate embedding,
-        3) store the embedding-content to vdb
-        """
-        # build and run preprocessing pipeline
-        if transformations is None:
-            transformations = [
-                SentenceSplitter(
-                    chunk_size=self.config.get(
-                        "chunk_size",
-                        DEFAULT_CHUNK_SIZE,
-                    ),
-                    chunk_overlap=self.config.get(
-                        "chunk_overlap",
-                        DEFAULT_CHUNK_OVERLAP,
-                    ),
-                ),
-            ]
-
-        # adding embedding model as the last step of transformation
-        # https://docs.llamaindex.ai/en/stable/module_guides/loading/ingestion_pipeline/root.html
-        transformations.append(self.emb_model)
-
-        if vector_store is not None:
-            pipeline = IngestionPipeline(
-                transformations=transformations,
-                vector_store=vector_store,
-            )
-            _ = pipeline.run(docs)
-            self.index = VectorStoreIndex.from_vector_store(vector_store)
-        else:
-            # No vector store is provide, use simple in memory
-            pipeline = IngestionPipeline(
-                transformations=transformations,
-            )
-            nodes = pipeline.run(documents=docs)
-            self.index = VectorStoreIndex(
-                nodes=nodes,
-                embed_model=self.emb_model,
-            )
-
-        # set the retriever
-        if retriever is None:
-            logger.info(
-                f'{self.config.get("similarity_top_k", DEFAULT_TOP_K)}',
-            )
-            self.retriever = self.index.as_retriever(
-                embed_model=self.emb_model,
-                similarity_top_k=self.config.get(
-                    "similarity_top_k",
-                    DEFAULT_TOP_K,
-                ),
-                **kwargs,
-            )
-        else:
-            self.retriever = retriever
-        return self.index
-
-    def set_retriever(self, retriever: BaseRetriever) -> None:
-        """
-        Reset the retriever if necessary.
-        Args:
-            retriever (BaseRetriever): passing a retriever in llama index.
-        """
-        self.retriever = retriever
-
-    def retrieve(self, query: str, to_list_strs: bool = False) -> list[Any]:
-        """
-        This is a basic retrieve function
-        Args:
-            query (str):
-                query is expected to be a question in string
-            to_list_strs (book):
-                whether returns the list of strings;
-                if False, return NodeWithScore
-
-        Return:
-            list[Any]: list of str or NodeWithScore
-
-
-        More advanced query processing can refer to
-        https://docs.llamaindex.ai/en/stable/examples/query_transformations/query_transform_cookbook.html
-        """
-        retrieved = self.retriever.retrieve(str(query))
-        if to_list_strs:
-            results = []
-            for node in retrieved:
-                results.append(node.get_text())
-            return results
-        return retrieved
diff --git a/examples/conversation_with_RAG_agents/rag/rag.py b/examples/conversation_with_RAG_agents/rag/rag.py
deleted file mode 100644
index 0de27ca37..000000000
--- a/examples/conversation_with_RAG_agents/rag/rag.py
+++ /dev/null
@@ -1,118 +0,0 @@
-# -*- coding: utf-8 -*-
-"""
-Base class module for retrieval augmented generation (RAG).
-To accommodate the RAG process of different packages,
-we abstract the RAG process into four stages:
-- data loading: loading data into memory for following processing;
-- data indexing and storage: document chunking, embedding generation,
-and off-load the data into VDB;
-- data retrieval: taking a query and return a batch of documents or
-document chunks;
-- post-processing of the retrieved data: use the retrieved data to
-generate an answer.
-"""
-
-from abc import ABC, abstractmethod
-from typing import Any, Optional
-
-from agentscope.models import ModelWrapperBase
-
-DEFAULT_CHUNK_SIZE = 1024
-DEFAULT_CHUNK_OVERLAP = 20
-DEFAULT_TOP_K = 5
-
-
-class RAGBase(ABC):
-    """
-    Base class for RAG, CANNOT be instantiated directly
-    """
-
-    def __init__(
-        self,
-        model: Optional[ModelWrapperBase],
-        emb_model: Any = None,
-        config: Optional[dict] = None,
-        **kwargs: Any,
-    ) -> None:
-        # pylint: disable=unused-argument
-        self.postprocessing_model = model
-        self.emb_model = emb_model
-        self.config = config or {}
-
-    @abstractmethod
-    def load_data(
-        self,
-        loader: Any,
-        query: Any,
-        **kwargs: Any,
-    ) -> Any:
-        """
-        Load data (documents) from disk to memory and chunking them
-        Args:
-            loader (Any): data loader, depending on the package
-            query (str): query for getting data from DB
-
-        Returns:
-            Any: loaded documents
-        """
-
-    @abstractmethod
-    def store_and_index(
-        self,
-        docs: Any,
-        vector_store: Any,
-        **kwargs: Any,
-    ) -> Any:
-        """
-        Store and index the documents.
-        Args:
-            docs (Any):
-                documents to be processed, stored and indexed
-            vector_store (Any):
-                vector store to store the index and/or documents
-
-        Returns:
-            Any: can be indices, depending on the RAG package
-
-        preprocessing the loaded documents, for example:
-        1) chunking,
-        2) generate embedding,
-        3) store the embedding-content to vdb
-        """
-
-    @abstractmethod
-    def retrieve(self, query: Any, to_list_strs: bool = False) -> list[Any]:
-        """
-        retrieve list of content from vdb to memory
-        Args:
-            query (Any): query to retrieve
-            to_list_strs (bool): whether return a list of str
-
-        Returns:
-            return a list with retrieved documents (in strings)
-        """
-
-    def post_processing(
-        self,
-        retrieved_docs: list[str],
-        prompt: str,
-        **kwargs: Any,
-    ) -> Any:
-        """
-        A default solution for post-processing function, generates answer
-        based on the retrieved documents.
-        Args:
-            retrieved_docs (list[str]):
-                list of retrieved documents
-            prompt (str):
-                prompt for LLM generating answer with the retrieved documents
-
-        Returns:
-            Any: a synthesized answer from LLM with retrieved documents
-
-        Example:
-            self.postprocessing_model(prompt.format(retrieved_docs))
-        """
-        assert self.postprocessing_model
-        prompt = prompt.format("\n".join(retrieved_docs))
-        return self.postprocessing_model(prompt, **kwargs).text
diff --git a/examples/conversation_with_RAG_agents/rag_agents.py b/examples/conversation_with_RAG_agents/rag_agents.py
deleted file mode 100644
index 101b2e305..000000000
--- a/examples/conversation_with_RAG_agents/rag_agents.py
+++ /dev/null
@@ -1,332 +0,0 @@
-# -*- coding: utf-8 -*-
-"""
-This example shows how to build an agent with RAG
-with LlamaIndex.
-
-Notice, this is a Beta version of RAG agent.
-"""
-
-from abc import ABC, abstractmethod
-from typing import Optional, Any
-import importlib
-from loguru import logger
-
-from rag import RAGBase, LlamaIndexRAG
-
-from agentscope.agents.agent import AgentBase
-from agentscope.message import Msg
-from agentscope.models import load_model_by_config_name
-
-
-class RAGAgentBase(AgentBase, ABC):
-    """
-    Base class for RAG agents
-    """
-
-    def __init__(
-        self,
-        name: str,
-        sys_prompt: str,
-        model_config_name: str,
-        emb_model_config_name: str,
-        memory_config: Optional[dict] = None,
-        rag_config: Optional[dict] = None,
-    ) -> None:
-        """
-        Initialize the RAG base agent
-        Args:
-            name (str):
-                the name for the agent.
-            sys_prompt (str):
-                system prompt for the RAG agent.
-            model_config_name (str):
-                language model for the agent.
-            emb_model_config_name (str):
-                embedding model for the agent.
-            memory_config (dict):
-                memory configuration.
-            rag_config (dict):
-                config for RAG. It contains most of the
-                important parameters for RAG modules. If not provided,
-                the default setting will be used.
-                Examples can refer to children classes.
-        """
-        super().__init__(
-            name=name,
-            sys_prompt=sys_prompt,
-            model_config_name=model_config_name,
-            use_memory=True,
-            memory_config=memory_config,
-        )
-        # setup embedding model used in RAG
-        self.emb_model = load_model_by_config_name(emb_model_config_name)
-
-        self.rag_config = rag_config or {}
-        if "log_retrieval" not in self.rag_config:
-            self.rag_config["log_retrieval"] = True
-
-        # use LlamaIndexAgent OR LangChainAgent
-        self.rag = self.init_rag()
-
-    @abstractmethod
-    def init_rag(self) -> RAGBase:
-        """initialize RAG with configuration"""
-
-    def _prepare_args_from_config(
-        self,
-        config: dict,
-    ) -> Any:
-        """
-        Helper function to build args for the two functions:
-        rag.load_data(...) and rag.store_and_index(docs, ...)
-        in RAG classes.
-        Args:
-            config (dict): a dictionary containing configurations
-
-        Returns:
-            Any: an object that is parsed/built to be an element
-                of input to the function of RAG module.
-        """
-        if not isinstance(config, dict):
-            return config
-
-        if "create_object" in config:
-            # if a term in args is a object,
-            # recursively create object with args from config
-            module_name = config.get("module", "")
-            class_name = config.get("class", "")
-            init_args = config.get("init_args", {})
-            try:
-                cur_module = importlib.import_module(module_name)
-                cur_class = getattr(cur_module, class_name)
-                init_args = self._prepare_args_from_config(init_args)
-                logger.info(
-                    f"load and build object{cur_module, cur_class, init_args}",
-                )
-                return cur_class(**init_args)
-            except ImportError as exc_inner:
-                logger.error(
-                    f"Fail to load class {class_name} "
-                    f"from module {module_name}",
-                )
-                raise ImportError(
-                    f"Fail to load class {class_name} "
-                    f"from module {module_name}",
-                ) from exc_inner
-        else:
-            prepared_args = {}
-            for key, value in config.items():
-                if isinstance(value, list):
-                    prepared_args[key] = []
-                    for c in value:
-                        prepared_args[key].append(
-                            self._prepare_args_from_config(c),
-                        )
-                elif isinstance(value, dict):
-                    prepared_args[key] = self._prepare_args_from_config(value)
-                else:
-                    prepared_args[key] = value
-            return prepared_args
-
-    def reply(
-        self,
-        x: dict = None,
-    ) -> dict:
-        """
-        Reply function of the RAG agent.
-        Processes the input data,
-        1) use the input data to retrieve with RAG function;
-        2) generates a prompt using the current memory and system
-        prompt;
-        3) invokes the language model to produce a response. The
-        response is then formatted and added to the dialogue memory.
-
-        Args:
-            x (`dict`, defaults to `None`):
-                A dictionary representing the user's input to the agent. This
-                input is added to the memory if provided. Defaults to
-                None.
-        Returns:
-            A dictionary representing the message generated by the agent in
-            response to the user's input.
-        """
-        retrieved_docs_to_string = ""
-        # record the input if needed
-        if self.memory:
-            self.memory.add(x)
-            # in case no input is provided (e.g., in msghub),
-            # use the memory as query
-            history = self.memory.get_memory(
-                recent_n=self.rag_config.get("recent_n_mem", 1),
-            )
-            query = (
-                "/n".join(
-                    [msg["content"] for msg in history],
-                )
-                if isinstance(history, list)
-                else str(history)
-            )
-        elif x is not None:
-            query = x["content"]
-        else:
-            query = ""
-
-        if len(query) > 0:
-            # when content has information, do retrieval
-            retrieved_docs = self.rag.retrieve(query, to_list_strs=True)
-            for content in retrieved_docs:
-                retrieved_docs_to_string += "\n>>>> " + content
-
-            if self.rag_config["log_retrieval"]:
-                self.speak("[retrieved]:" + retrieved_docs_to_string)
-
-        # prepare prompt
-        prompt = self.model.format(
-            Msg(
-                name="system",
-                role="system",
-                content=self.sys_prompt,
-            ),
-            # {"role": "system", "content": retrieved_docs_to_string},
-            self.memory.get_memory(
-                recent_n=self.rag_config.get("recent_n_mem", 1),
-            ),
-            Msg(
-                name="user",
-                role="user",
-                content="Context: " + retrieved_docs_to_string,
-            ),
-        )
-
-        # call llm and generate response
-        response = self.model(prompt).text
-        msg = Msg(self.name, response)
-
-        # Print/speak the message in this agent's voice
-        self.speak(msg)
-
-        if self.memory:
-            # Record the message in memory
-            self.memory.add(msg)
-
-        return msg
-
-
-class LlamaIndexAgent(RAGAgentBase):
-    """
-    A LlamaIndex agent build on LlamaIndex.
-    """
-
-    def __init__(
-        self,
-        name: str,
-        sys_prompt: str,
-        model_config_name: str,
-        emb_model_config_name: str = None,
-        memory_config: Optional[dict] = None,
-        rag_config: Optional[dict] = None,
-    ) -> None:
-        """
-        Initialize the RAG LlamaIndexAgent
-        Args:
-            name (str):
-                the name for the agent
-            sys_prompt (str):
-                system prompt for the RAG agent
-            model_config_name (str):
-                language model for the agent
-            emb_model_config_name (str):
-                embedding model for the agent
-            memory_config (dict):
-                memory configuration
-            rag_config (dict):
-                config for RAG. It contains the parameters for
-                RAG modules functions:
-                rag.load_data(...) and rag.store_and_index(docs, ...)
-                 If not provided, the default setting will be used.
-                An example of the config for retrieving code files
-                is as following:
-
-                "rag_config": {
-                    "load_data": {
-                      "loader": {
-                        "create_object": true,
-                        "module": "llama_index.core",
-                        "class": "SimpleDirectoryReader",
-                        "init_args": {
-                          "input_dir": "path/to/data",
-                          "recursive": true
-                          ...
-                        }
-                      }
-                    },
-                    "store_and_index": {
-                      "transformations": [
-                        {
-                          "create_object": true,
-                          "module": "llama_index.core.node_parser",
-                          "class": "CodeSplitter",
-                          "init_args": {
-                            "language": "python",
-                            "chunk_lines": 100
-                          }
-                        }
-                      ]
-                    },
-                    "chunk_size": 2048,
-                    "chunk_overlap": 40,
-                    "similarity_top_k": 10,
-                    "log_retrieval": true,
-                    "recent_n_mem": 1
-               }
-        """
-        super().__init__(
-            name=name,
-            sys_prompt=sys_prompt,
-            model_config_name=model_config_name,
-            emb_model_config_name=emb_model_config_name,
-            memory_config=memory_config,
-            rag_config=rag_config,
-        )
-
-    def init_rag(self) -> LlamaIndexRAG:
-        # dynamic loading loader
-        # init rag related attributes
-        rag = LlamaIndexRAG(
-            model=self.model,
-            emb_model=self.emb_model,
-            config=self.rag_config,
-        )
-        # load the document to memory
-        # Feed the AgentScope tutorial documents, so that
-        # the agent can answer questions related to AgentScope!
-        if "load_data" in self.rag_config:
-            load_data_args = self._prepare_args_from_config(
-                self.rag_config["load_data"],
-            )
-        else:
-            try:
-                from llama_index.core import SimpleDirectoryReader
-            except ImportError as exc_inner:
-                raise ImportError(
-                    " LlamaIndexAgent requires llama-index to be install."
-                    "Please run `pip install llama-index`",
-                ) from exc_inner
-            load_data_args = {
-                "loader": SimpleDirectoryReader(self.config["data_path"]),
-            }
-        logger.info(f"rag.load_data args: {load_data_args}")
-        docs = rag.load_data(**load_data_args)
-
-        # store and indexing
-        if "store_and_index" in self.rag_config:
-            store_and_index_args = self._prepare_args_from_config(
-                self.rag_config["store_and_index"],
-            )
-        else:
-            store_and_index_args = {}
-
-        logger.info(f"store_and_index_args args: {store_and_index_args}")
-        rag.store_and_index(docs, **store_and_index_args)
-
-        return rag
diff --git a/examples/conversation_with_RAG_agents/rag_example.py b/examples/conversation_with_RAG_agents/rag_example.py
index d000f459c..283c014b2 100644
--- a/examples/conversation_with_RAG_agents/rag_example.py
+++ b/examples/conversation_with_RAG_agents/rag_example.py
@@ -1,64 +1,147 @@
 # -*- coding: utf-8 -*-
 """
-A simple example for conversation between user and
-an agent with RAG capability.
+An example for conversation between user and agents with RAG capability.
+One agent is a tutorial assistant, the other is a code explainer.
 """
 import json
 import os
 
-from rag_agents import LlamaIndexAgent
+from groupchat_utils import filter_agents
 
 import agentscope
 from agentscope.agents import UserAgent
-from agentscope.message import Msg
-from agentscope.agents import DialogAgent
+from agentscope.rag import KnowledgeBank
+
+
+AGENT_CHOICE_PROMPT = """
+There are following available agents. You need to choose the most appropriate
+agent(s) to answer the user's question.
+
+agent descriptions:{}
+
+First, rephrase the user's question, which must contain the key information.
+The you need to think step by step. If you believe some of the agents are
+good candidates to answer the question (e.g., AGENT_1 and AGENT_2), then
+you need to follow the following format to generate your output:
+
+'
+Because $YOUR_REASONING.
+I believe @AGENT_1 and @AGENT_2 are the most appropriate agents to answer
+your question.
+'
+"""
+
+
+def prepare_docstring_html() -> None:
+    """prepare docstring in html for API assistant"""
+    if not os.path.exists("../../docs/docstring_html/"):
+        os.system(
+            "sphinx-apidoc -f -o ../../docs/sphinx_doc/en/source "
+            "../../src/agentscope -t template",
+        )
+        os.system(
+            "sphinx-build -b html  ../../docs/sphinx_doc/en/source "
+            "../../docs/docstring_html/ -W --keep-going",
+        )
 
 
 def main() -> None:
     """A RAG multi-agent demo"""
-    agentscope.init(
-        model_configs=[
-            {
-                "model_type": "dashscope_chat",
-                "config_name": "qwen_config",
-                "model_name": "qwen-max",
-                "api_key": f"{os.environ.get('DASHSCOPE_API_KEY')}",
-            },
-            {
-                "model_type": "dashscope_text_embedding",
-                "config_name": "qwen_emb_config",
-                "model_name": "text-embedding-v2",
-                "api_key": f"{os.environ.get('DASHSCOPE_API_KEY')}",
-            },
-        ],
+    # prepare html for api agent
+    prepare_docstring_html()
+
+    # prepare models
+    with open("configs/model_config.json", "r", encoding="utf-8") as f:
+        model_configs = json.load(f)
+
+    # load config of the agents
+    with open("configs/agent_config.json", "r", encoding="utf-8") as f:
+        agent_configs = json.load(f)
+
+    agent_list = agentscope.init(
+        model_configs=model_configs,
+        agent_configs=agent_configs,
         project="Conversation with RAG agents",
     )
+    rag_agent_list = agent_list[:4]
+    guide_agent = agent_list[4]
 
-    with open("./agent_config.json", "r", encoding="utf-8") as f:
-        agent_configs = json.load(f)
-    tutorial_agent = LlamaIndexAgent(**agent_configs[0]["args"])
-    code_explain_agent = LlamaIndexAgent(**agent_configs[1]["args"])
-    summarize_agent = DialogAgent(**agent_configs[2]["args"])
+    # the knowledge bank can be configured by loading config file
+    knowledge_bank = KnowledgeBank(configs="configs/knowledge_config.json")
+
+    # alternatively, we can easily input the configs to add data to RAG
+    knowledge_bank.add_data_as_knowledge(
+        knowledge_id="agentscope_tutorial_rag",
+        emb_model_name="qwen_emb_config",
+        data_dirs_and_types={
+            "../../docs/sphinx_doc/en/source/tutorial": [".md"],
+        },
+    )
+
+    # let knowledgebank to equip rag agent with a (set of) knowledge
+    # corresponding to its knowledge_id_list
+    for agent in rag_agent_list:
+        knowledge_bank.equip(agent, agent.knowledge_id_list)
+
+    # an alternative way is to provide knowledge list to agents
+    # when initializing them one by one, e.g.
+    #
+    # ```
+    # knowledge = knowledge_bank.get_knowledge(knowledge_id)
+    # agent = LlamaIndexAgent(
+    #   name="rag_worker",
+    #   sys_prompt="{your_prompt}",
+    #   model_config_name="{your_model}",
+    #   knowledge_list=[knowledge], # provide knowledge object directly
+    #   similarity_top_k=3,
+    #   log_retrieval=False,
+    #   recent_n_mem_for_retrieve=1,
+    # )
+    # ```
+
+    rag_agent_names = [agent.name for agent in rag_agent_list]
+    # update guide agent system prompt with the descriptions of rag agents
+    rag_agent_descriptions = [
+        "agent name: "
+        + agent.name
+        + "\n agent description："
+        + agent.description
+        + "\n"
+        for agent in rag_agent_list
+    ]
+
+    guide_agent.sys_prompt = (
+        guide_agent.sys_prompt
+        + AGENT_CHOICE_PROMPT.format(
+            "".join(rag_agent_descriptions),
+        )
+    )
 
     user_agent = UserAgent()
-    # start the conversation between user and assistant
     while True:
+        # The workflow is the following:
+        # 1. user input a message,
+        # 2. if it mentions (@) one of the agents, the agent will be called
+        # 3. otherwise, the guide agent will decide which agent to call
+        # 4. the called agent will respond to the user
+        # 5. repeat
         x = user_agent()
         x.role = "user"  # to enforce dashscope requirement on roles
         if len(x["content"]) == 0 or str(x["content"]).startswith("exit"):
             break
-        tutorial_response = tutorial_agent(x)
-        code_explain = code_explain_agent(x)
-        msg = Msg(
-            name="user",
-            role="user",
-            content=tutorial_response["content"]
-            + "\n"
-            + code_explain["content"]
-            + "\n"
-            + x["content"],
-        )
-        summarize_agent(msg)
+        speak_list = filter_agents(x.get("content", ""), rag_agent_list)
+        if len(speak_list) == 0:
+            guide_response = guide_agent(x)
+            # Only one agent can be called in the current version,
+            # we may support multi-agent conversation later
+            speak_list = filter_agents(
+                guide_response.get("content", ""),
+                rag_agent_list,
+            )
+        agent_name_list = [agent.name for agent in speak_list]
+        for agent_name, agent in zip(agent_name_list, speak_list):
+            if agent_name in rag_agent_names:
+                agent(x)
 
 
 if __name__ == "__main__":
diff --git a/setup.py b/setup.py
index 999041a4f..e5ea88322 100644
--- a/setup.py
+++ b/setup.py
@@ -49,6 +49,10 @@
     "modelscope_studio==0.0.5",
 ]
 
+rag_requires = [
+    "llama-index",
+]
+
 studio_requires = []
 
 # released requires
@@ -91,6 +95,7 @@
     + doc_requires
     + test_requires
     + gradio_requires
+    + rag_requires
     + studio_requires
 )
 
diff --git a/src/agentscope/__init__.py b/src/agentscope/__init__.py
index c822a1ad6..bd0c621b7 100644
--- a/src/agentscope/__init__.py
+++ b/src/agentscope/__init__.py
@@ -12,6 +12,7 @@
 from . import web
 from . import exception
 from . import parsers
+from . import rag
 
 # objects or function
 from .msghub import msghub
diff --git a/src/agentscope/agents/__init__.py b/src/agentscope/agents/__init__.py
index 7bc5f83e5..e50efa66f 100644
--- a/src/agentscope/agents/__init__.py
+++ b/src/agentscope/agents/__init__.py
@@ -8,6 +8,7 @@
 from .text_to_image_agent import TextToImageAgent
 from .rpc_agent import RpcAgent
 from .react_agent import ReActAgent
+from .rag_agent import LlamaIndexAgent
 
 
 __all__ = [
@@ -20,4 +21,5 @@
     "ReActAgent",
     "DistConf",
     "RpcAgent",
+    "LlamaIndexAgent",
 ]
diff --git a/src/agentscope/agents/rag_agent.py b/src/agentscope/agents/rag_agent.py
new file mode 100644
index 000000000..59012b17c
--- /dev/null
+++ b/src/agentscope/agents/rag_agent.py
@@ -0,0 +1,195 @@
+# -*- coding: utf-8 -*-
+"""
+This example shows how to build an agent with RAG
+with LlamaIndex.
+
+Notice, this is a Beta version of RAG agent.
+"""
+
+from typing import Any
+from loguru import logger
+
+from agentscope.agents.agent import AgentBase
+from agentscope.message import Msg
+from agentscope.rag import Knowledge
+
+CHECKING_PROMPT = """
+                Is the retrieved content relevant to the query?
+                Retrieved content: {}
+                Query: {}
+                Only answer YES or NO.
+                """
+
+
+class LlamaIndexAgent(AgentBase):
+    """
+    A LlamaIndex agent build on LlamaIndex.
+    """
+
+    def __init__(
+        self,
+        name: str,
+        sys_prompt: str,
+        model_config_name: str,
+        knowledge_list: list[Knowledge] = None,
+        knowledge_id_list: list[str] = None,
+        similarity_top_k: int = None,
+        log_retrieval: bool = True,
+        recent_n_mem_for_retrieve: int = 1,
+        **kwargs: Any,
+    ) -> None:
+        """
+        Initialize the RAG LlamaIndexAgent
+        Args:
+            name (str):
+                the name for the agent
+            sys_prompt (str):
+                system prompt for the RAG agent
+            model_config_name (str):
+                language model for the agent
+            knowledge_list (list[Knowledge]):
+                a list of knowledge.
+                User can choose to pass a list knowledge object
+                directly when initializing the RAG agent. Another
+                choice can be passing a list of knowledge ids and
+                obtain the knowledge with the `equip` function of a
+                knowledge bank.
+            knowledge_id_list (list[Knowledge]):
+                a list of id of the knowledge.
+                This is designed for easy setting up multiple RAG
+                agents with a config file. To obtain the knowledge
+                objects, users can pass this agent to the `equip`
+                function in a knowledge bank to add corresponding
+                knowledge to agent's self.knowledge_list.
+            similarity_top_k (int):
+                the number of most similar data blocks retrieved
+                from each of the knowledge
+            log_retrieval (bool):
+                whether to print the retrieved content
+            recent_n_mem_for_retrieve (int):
+                the number of pieces of memory used as part of
+                retrival query
+        """
+        super().__init__(
+            name=name,
+            sys_prompt=sys_prompt,
+            model_config_name=model_config_name,
+        )
+        self.knowledge_list = knowledge_list or []
+        self.knowledge_id_list = knowledge_id_list or []
+        self.similarity_top_k = similarity_top_k
+        self.log_retrieval = log_retrieval
+        self.recent_n_mem_for_retrieve = recent_n_mem_for_retrieve
+        self.description = kwargs.get("description", "")
+
+    def reply(self, x: dict = None) -> dict:
+        """
+        Reply function of the RAG agent.
+        Processes the input data,
+        1) use the input data to retrieve with RAG function;
+        2) generates a prompt using the current memory and system
+        prompt;
+        3) invokes the language model to produce a response. The
+        response is then formatted and added to the dialogue memory.
+
+        Args:
+            x (`dict`, defaults to `None`):
+                A dictionary representing the user's input to the agent. This
+                input is added to the memory if provided. Defaults to
+                None.
+        Returns:
+            A dictionary representing the message generated by the agent in
+            response to the user's input.
+        """
+        retrieved_docs_to_string = ""
+        # record the input if needed
+        if self.memory:
+            self.memory.add(x)
+            # in case no input is provided (e.g., in msghub),
+            # use the memory as query
+            history = self.memory.get_memory(
+                recent_n=self.recent_n_mem_for_retrieve,
+            )
+            query = (
+                "/n".join(
+                    [msg["content"] for msg in history],
+                )
+                if isinstance(history, list)
+                else str(history)
+            )
+        elif x is not None:
+            query = x["content"]
+        else:
+            query = ""
+
+        if len(query) > 0:
+            # when content has information, do retrieval
+            scores = []
+            for knowledge in self.knowledge_list:
+                retrieved_nodes = knowledge.retrieve(
+                    str(query),
+                    self.similarity_top_k,
+                )
+                for node in retrieved_nodes:
+                    scores.append(node.score)
+                    retrieved_docs_to_string += (
+                        "\n>>>> score:"
+                        + str(node.score)
+                        + "\n>>>> source:"
+                        + str(node.node.get_metadata_str())
+                        + "\n>>>> content:"
+                        + node.get_content()
+                    )
+
+            if self.log_retrieval:
+                self.speak("[retrieved]:" + retrieved_docs_to_string)
+
+            if max(scores) < 0.4:
+                # if the max score is lower than 0.4, then we let LLM
+                # decide whether the retrieved content is relevant
+                # to the user input.
+                msg = Msg(
+                    name="user",
+                    role="user",
+                    content=CHECKING_PROMPT.format(
+                        retrieved_docs_to_string,
+                        query,
+                    ),
+                )
+                msg = self.model.format(msg)
+                checking = self.model(msg)
+                logger.info(checking)
+                checking = checking.text.lower()
+                if "no" in checking:
+                    retrieved_docs_to_string = "EMPTY"
+
+        # prepare prompt
+        prompt = self.model.format(
+            Msg(
+                name="system",
+                role="system",
+                content=self.sys_prompt,
+            ),
+            # {"role": "system", "content": retrieved_docs_to_string},
+            self.memory.get_memory(
+                recent_n=self.recent_n_mem_for_retrieve,
+            ),
+            Msg(
+                name="user",
+                role="user",
+                content="Context: " + retrieved_docs_to_string,
+            ),
+        )
+
+        # call llm and generate response
+        response = self.model(prompt).text
+        msg = Msg(self.name, response)
+
+        # Print/speak the message in this agent's voice
+        self.speak(msg)
+
+        if self.memory:
+            # Record the message in memory
+            self.memory.add(msg)
+
+        return msg
diff --git a/src/agentscope/constants.py b/src/agentscope/constants.py
index a006d4a9b..a0298ad0c 100644
--- a/src/agentscope/constants.py
+++ b/src/agentscope/constants.py
@@ -59,3 +59,9 @@ class ShrinkPolicy(IntEnum):
 
     TRUNCATE = 0
     SUMMARIZE = 1
+
+
+# rag related
+DEFAULT_CHUNK_SIZE = 1024
+DEFAULT_CHUNK_OVERLAP = 20
+DEFAULT_TOP_K = 5
diff --git a/src/agentscope/rag/__init__.py b/src/agentscope/rag/__init__.py
new file mode 100644
index 000000000..362f1de14
--- /dev/null
+++ b/src/agentscope/rag/__init__.py
@@ -0,0 +1,11 @@
+# -*- coding: utf-8 -*-
+""" Import all pipeline related modules in the package. """
+from .knowledge import Knowledge
+from .llama_index_knowledge import LlamaIndexKnowledge
+from .knowledge_bank import KnowledgeBank
+
+__all__ = [
+    "Knowledge",
+    "LlamaIndexKnowledge",
+    "KnowledgeBank",
+]
diff --git a/src/agentscope/rag/knowledge.py b/src/agentscope/rag/knowledge.py
new file mode 100644
index 000000000..3ba2b120c
--- /dev/null
+++ b/src/agentscope/rag/knowledge.py
@@ -0,0 +1,157 @@
+# -*- coding: utf-8 -*-
+"""
+Base class module for retrieval augmented generation (RAG).
+To accommodate the RAG process of different packages,
+we abstract the RAG process into four stages:
+- data loading: loading data into memory for following processing;
+- data indexing and storage: document chunking, embedding generation,
+and off-load the data into VDB;
+- data retrieval: taking a query and return a batch of documents or
+document chunks;
+- post-processing of the retrieved data: use the retrieved data to
+generate an answer.
+"""
+
+import importlib
+from abc import ABC, abstractmethod
+from typing import Any, Optional
+from loguru import logger
+from agentscope.models import ModelWrapperBase
+
+
+class Knowledge(ABC):
+    """
+    Base class for RAG, CANNOT be instantiated directly
+    """
+
+    def __init__(
+        self,
+        knowledge_id: str,
+        emb_model: Any = None,
+        knowledge_config: Optional[dict] = None,
+        model: Optional[ModelWrapperBase] = None,
+        **kwargs: Any,
+    ) -> None:
+        # pylint: disable=unused-argument
+        """
+        initialize the knowledge component
+        Args:
+        knowledge_id (str):
+            The id of the knowledge unit.
+        emb_model (ModelWrapperBase):
+            The embedding model used for generate embeddings
+        knowledge_config (dict):
+            The configuration to generate or load the index.
+        """
+        self.knowledge_id = knowledge_id
+        self.emb_model = emb_model
+        self.knowledge_config = knowledge_config or {}
+        self.postprocessing_model = model
+
+    @abstractmethod
+    def _init_rag(
+        self,
+        **kwargs: Any,
+    ) -> Any:
+        """
+        Initiate the RAG module.
+        """
+
+    @abstractmethod
+    def retrieve(
+        self,
+        query: Any,
+        similarity_top_k: int = None,
+        to_list_strs: bool = False,
+        **kwargs: Any,
+    ) -> list[Any]:
+        """
+        retrieve list of content from database (vector stored index) to memory
+        Args:
+            query (Any):
+                query for retrieval
+            similarity_top_k (int):
+                the number of most similar data returned by the
+                retriever.
+            to_list_strs (bool):
+                whether return a list of str
+
+        Returns:
+            return a list with retrieved documents (in strings)
+        """
+
+    def post_processing(
+        self,
+        retrieved_docs: list[str],
+        prompt: str,
+        **kwargs: Any,
+    ) -> Any:
+        """
+        A default solution for post-processing function, generates answer
+        based on the retrieved documents.
+        Args:
+            retrieved_docs (list[str]):
+                list of retrieved documents
+            prompt (str):
+                prompt for LLM generating answer with the retrieved documents
+
+        Returns:
+            Any: a synthesized answer from LLM with retrieved documents
+
+        Example:
+            self.postprocessing_model(prompt.format(retrieved_docs))
+        """
+        assert self.postprocessing_model
+        prompt = prompt.format("\n".join(retrieved_docs))
+        return self.postprocessing_model(prompt, **kwargs).text
+
+    def _prepare_args_from_config(self, config: dict) -> Any:
+        """
+        Helper function to build objects in RAG classes.
+
+        Args:
+            config (dict): a dictionary containing configurations
+        Returns:
+            Any: an object that is parsed/built to be an element
+                of input to the function of RAG module.
+        """
+        if not isinstance(config, dict):
+            return config
+
+        if "create_object" in config:
+            # if a term in args is an object,
+            # recursively create object with args from config
+            module_name = config.get("module", "")
+            class_name = config.get("class", "")
+            init_args = config.get("init_args", {})
+            try:
+                cur_module = importlib.import_module(module_name)
+                cur_class = getattr(cur_module, class_name)
+                init_args = self._prepare_args_from_config(init_args)
+                logger.info(
+                    f"load and build object: {class_name}",
+                )
+                return cur_class(**init_args)
+            except ImportError as exc_inner:
+                logger.error(
+                    f"Fail to load class {class_name} "
+                    f"from module {module_name}",
+                )
+                raise ImportError(
+                    f"Fail to load class {class_name} "
+                    f"from module {module_name}",
+                ) from exc_inner
+        else:
+            prepared_args = {}
+            for key, value in config.items():
+                if isinstance(value, list):
+                    prepared_args[key] = []
+                    for c in value:
+                        prepared_args[key].append(
+                            self._prepare_args_from_config(c),
+                        )
+                elif isinstance(value, dict):
+                    prepared_args[key] = self._prepare_args_from_config(value)
+                else:
+                    prepared_args[key] = value
+            return prepared_args
diff --git a/src/agentscope/rag/knowledge_bank.py b/src/agentscope/rag/knowledge_bank.py
new file mode 100644
index 000000000..c6d7acee8
--- /dev/null
+++ b/src/agentscope/rag/knowledge_bank.py
@@ -0,0 +1,186 @@
+# -*- coding: utf-8 -*-
+"""
+Knowledge bank for making Knowledge objects easier to use
+"""
+import copy
+import json
+from typing import Optional, Union
+from loguru import logger
+from agentscope.models import load_model_by_config_name
+from agentscope.agents import AgentBase
+from .llama_index_knowledge import LlamaIndexKnowledge
+
+
+DEFAULT_INDEX_CONFIG = {
+    "knowledge_id": "",
+    "data_processing": [],
+}
+DEFAULT_LOADER_CONFIG = {
+    "load_data": {
+        "loader": {
+            "create_object": True,
+            "module": "llama_index.core",
+            "class": "SimpleDirectoryReader",
+            "init_args": {},
+        },
+    },
+}
+DEFAULT_INIT_CONFIG = {
+    "input_dir": "",
+    "recursive": True,
+    "required_exts": [],
+}
+
+
+class KnowledgeBank:
+    """
+    KnowledgeBank enables
+    1) provide an easy and fast way to initialize the Knowledge object;
+    2) make Knowledge object reusable and sharable for multiple agents.
+    """
+
+    def __init__(
+        self,
+        configs: Union[dict, str],
+    ) -> None:
+        """initialize the knowledge bank"""
+        if isinstance(configs, str):
+            logger.info(f"Loading configs from {configs}")
+            with open(configs, "r", encoding="utf-8") as fp:
+                self.configs = json.loads(fp.read())
+        else:
+            self.configs = configs
+        self.stored_knowledge: dict[str, LlamaIndexKnowledge] = {}
+        self._init_knowledge()
+
+    def _init_knowledge(self) -> None:
+        """initialize the knowledge bank"""
+        for config in self.configs:
+            print("bank", config)
+            self.add_data_as_knowledge(
+                knowledge_id=config["knowledge_id"],
+                emb_model_name=config["emb_model_config_name"],
+                knowledge_config=config,
+            )
+        logger.info("knowledge bank initialization completed.\n ")
+
+    def add_data_as_knowledge(
+        self,
+        knowledge_id: str,
+        emb_model_name: str,
+        data_dirs_and_types: dict[str, list[str]] = None,
+        model_name: Optional[str] = None,
+        knowledge_config: Optional[dict] = None,
+    ) -> None:
+        """
+        Transform data in a directory to be ready to work with RAG.
+        Args:
+            knowledge_id (str):
+                user-defined unique id for the knowledge
+            emb_model_name (str):
+                name of the embedding model
+            model_name (Optional[str]):
+                name of the LLM for potential post-processing or query rewrite
+            data_dirs_and_types (dict[str, list[str]]):
+                dictionary of data paths (keys) to the data types
+                (file extensions) for knowledgebase
+                (e.g., [".md", ".py", ".html"])
+            knowledge_config (optional[dict]):
+                complete indexing configuration, used for more advanced
+                applications. Users can customize
+                - loader,
+                - transformations,
+                - ...
+                Examples can refer to../examples/conversation_with_RAG_agents/
+
+            a simple example of importing data to Knowledge object:
+            ''
+                knowledge_bank.add_data_as_knowledge(
+                    knowledge_id="agentscope_tutorial_rag",
+                    emb_model_name="qwen_emb_config",
+                    data_dirs_and_types={
+                        "../../docs/sphinx_doc/en/source/tutorial": [".md"],
+                    },
+                    persist_dir="./rag_storage/tutorial_assist",
+                )
+            ''
+        """
+        if knowledge_id in self.stored_knowledge:
+            raise ValueError(f"knowledge_id {knowledge_id} already exists.")
+
+        assert data_dirs_and_types is not None or knowledge_config is not None
+
+        if knowledge_config is None:
+            knowledge_config = copy.deepcopy(DEFAULT_INDEX_CONFIG)
+            for data_dir, types in data_dirs_and_types.items():
+                loader_config = copy.deepcopy(DEFAULT_LOADER_CONFIG)
+                loader_init = copy.deepcopy(DEFAULT_INIT_CONFIG)
+                loader_init["input_dir"] = data_dir
+                loader_init["required_exts"] = types
+                loader_config["load_data"]["loader"]["init_args"] = loader_init
+                knowledge_config["data_processing"].append(loader_config)
+
+        self.stored_knowledge[knowledge_id] = LlamaIndexKnowledge(
+            knowledge_id=knowledge_id,
+            emb_model=load_model_by_config_name(emb_model_name),
+            knowledge_config=knowledge_config,
+            model=load_model_by_config_name(model_name)
+            if model_name
+            else None,
+        )
+        logger.info(f"data loaded for knowledge_id = {knowledge_id}.")
+
+    def get_knowledge(
+        self,
+        knowledge_id: str,
+        duplicate: bool = False,
+    ) -> LlamaIndexKnowledge:
+        """
+        Get a Knowledge object from the knowledge bank.
+        Args:
+            knowledge_id (str):
+                unique id for the Knowledge object
+            duplicate (bool):
+                whether return a copy of the Knowledge object.
+        Returns:
+            LlamaIndexKnowledge:
+                the Knowledge object defined with Llama-index
+        """
+        if knowledge_id not in self.stored_knowledge:
+            raise ValueError(
+                f"{knowledge_id} does not exist in the knowledge bank.",
+            )
+        knowledge = self.stored_knowledge[knowledge_id]
+        if duplicate:
+            knowledge = copy.deepcopy(knowledge)
+        logger.info(f"knowledge bank loaded: {knowledge_id}.")
+        return knowledge
+
+    def equip(
+        self,
+        agent: AgentBase,
+        knowledge_id_list: list[str] = None,
+        duplicate: bool = False,
+    ) -> None:
+        """
+        Equip the agent with the knowledge by knowledge ids.
+
+        Args:
+            agent (AgentBase):
+                the agent to be equipped with knowledge
+            knowledge_id_list:
+                the list of knowledge ids to be equipped with the agent
+            duplicate (bool): whether to deepcopy the knowledge object
+        TODO: to accommodate with distributed setting
+        """
+        logger.info(f"Equipping {agent.name} knowledge {knowledge_id_list}")
+        knowledge_id_list = knowledge_id_list or []
+
+        if not hasattr(agent, "knowledge_list"):
+            agent.knowledge_list = []
+        for kid in knowledge_id_list:
+            knowledge = self.get_knowledge(
+                knowledge_id=kid,
+                duplicate=duplicate,
+            )
+            agent.knowledge_list.append(knowledge)
diff --git a/src/agentscope/rag/llama_index_knowledge.py b/src/agentscope/rag/llama_index_knowledge.py
new file mode 100644
index 000000000..4f4195a96
--- /dev/null
+++ b/src/agentscope/rag/llama_index_knowledge.py
@@ -0,0 +1,579 @@
+# -*- coding: utf-8 -*-
+"""
+This module is an integration of the Llama index RAG
+into AgentScope package
+"""
+
+import os.path
+from typing import Any, Optional, List, Union
+from loguru import logger
+
+try:
+    from llama_index.core.base.base_retriever import BaseRetriever
+    from llama_index.core.base.embeddings.base import (
+        BaseEmbedding,
+        Embedding,
+    )
+    from llama_index.core.ingestion import IngestionPipeline
+
+    from llama_index.core.bridge.pydantic import PrivateAttr
+    from llama_index.core.node_parser import SentenceSplitter
+    from llama_index.core import (
+        VectorStoreIndex,
+        StorageContext,
+        load_index_from_storage,
+    )
+    from llama_index.core.schema import (
+        Document,
+        TransformComponent,
+    )
+except ImportError as import_error:
+    from agentscope.utils.tools import ImportErrorReporter
+
+    BaseRetriever = ImportErrorReporter(import_error, "full")
+    BaseEmbedding = ImportErrorReporter(import_error, "full")
+    Embedding = ImportErrorReporter(import_error, "full")
+    IngestionPipeline = ImportErrorReporter(import_error, "full")
+    SentenceSplitter = ImportErrorReporter(import_error, "full")
+    VectorStoreIndex = ImportErrorReporter(import_error, "full")
+    StorageContext = ImportErrorReporter(import_error, "full")
+    load_index_from_storage = ImportErrorReporter(import_error, "full")
+    PrivateAttr = ImportErrorReporter(import_error, "full")
+    Document = ImportErrorReporter(import_error, "full")
+    TransformComponent = ImportErrorReporter(import_error, "full")
+
+from agentscope.file_manager import file_manager
+from agentscope.models import ModelWrapperBase
+from agentscope.constants import (
+    DEFAULT_TOP_K,
+    DEFAULT_CHUNK_SIZE,
+    DEFAULT_CHUNK_OVERLAP,
+)
+from agentscope.rag.knowledge import Knowledge
+
+
+try:
+
+    class _EmbeddingModel(BaseEmbedding):
+        """
+        wrapper for ModelWrapperBase to an embedding model can be used
+        in Llama Index pipeline.
+        """
+
+        _emb_model_wrapper: ModelWrapperBase = PrivateAttr()
+
+        def __init__(
+            self,
+            emb_model: ModelWrapperBase,
+            embed_batch_size: int = 1,
+        ) -> None:
+            """
+            Dummy wrapper to convert a ModelWrapperBase to llama Index
+            embedding model
+
+            Args:
+                emb_model (ModelWrapperBase):
+                    embedding model in ModelWrapperBase
+                embed_batch_size (int):
+                    batch size, defaults to 1
+            """
+            super().__init__(
+                model_name="Temporary_embedding_wrapper",
+                embed_batch_size=embed_batch_size,
+            )
+            self._emb_model_wrapper = emb_model
+
+        def _get_query_embedding(self, query: str) -> List[float]:
+            """
+            get embedding for query
+            Args:
+                query (str): query to be embedded
+            """
+            # Note: AgentScope embedding model wrapper returns list
+            # of embedding
+            return list(self._emb_model_wrapper(query).embedding[0])
+
+        def _get_text_embeddings(self, texts: List[str]) -> List[Embedding]:
+            """
+            get embedding for list of strings
+            Args:
+                 texts ( List[str]): texts to be embedded
+            """
+            results = [
+                list(self._emb_model_wrapper(t).embedding[0]) for t in texts
+            ]
+            return results
+
+        def _get_text_embedding(self, text: str) -> Embedding:
+            """
+            get embedding for a single string
+            Args:
+                 text (str): texts to be embedded
+            """
+            return list(self._emb_model_wrapper(text).embedding[0])
+
+        # TODO: use proper async methods, but depends on model wrapper
+        async def _aget_query_embedding(self, query: str) -> List[float]:
+            """The asynchronous version of _get_query_embedding."""
+            return self._get_query_embedding(query)
+
+        async def _aget_text_embedding(self, text: str) -> List[float]:
+            """Asynchronously get text embedding."""
+            return self._get_text_embedding(text)
+
+        async def _aget_text_embeddings(
+            self,
+            texts: List[str],
+        ) -> List[List[float]]:
+            """Asynchronously get text embeddings."""
+            return self._get_text_embeddings(texts)
+
+except Exception:
+
+    class _EmbeddingModel:  # type: ignore[no-redef]
+        """
+        A dummy embedding model for passing tests when
+        llama-index is not install
+        """
+
+        def __init__(self, emb_model: ModelWrapperBase):
+            self._emb_model_wrapper = emb_model
+
+
+class LlamaIndexKnowledge(Knowledge):
+    """
+    This class is a wrapper with the llama index RAG.
+    """
+
+    def __init__(
+        self,
+        knowledge_id: str,
+        emb_model: Union[ModelWrapperBase, BaseEmbedding, None] = None,
+        knowledge_config: Optional[dict] = None,
+        model: Optional[ModelWrapperBase] = None,
+        persist_root: Optional[str] = None,
+        overwrite_index: Optional[bool] = False,
+        showprogress: Optional[bool] = True,
+        **kwargs: Any,
+    ) -> None:
+        """
+        initialize the knowledge component based on the
+        llama-index framework: https://github.com/run-llama/llama_index
+
+        Notes:
+            In LlamaIndex, one of the most important concepts is index,
+            which is a data structure composed of Document objects, designed to
+            enable querying by an LLM. The core workflow of initializing RAG is
+            to convert data to index, and retrieve information from index.
+            For example:
+            1) preprocessing documents with data loaders
+            2) generate embedding by configuring pipline with embedding models
+            3) store the embedding-content to vector database
+                the default dir is "./rag_storage/knowledge_id"
+
+        Args:
+            knowledge_id (str):
+                The id of the RAG knowledge unit.
+            emb_model (ModelWrapperBase):
+                The embedding model used for generate embeddings
+            knowledge_config (dict):
+                The configuration for llama-index to
+                generate or load the index.
+            model (ModelWrapperBase):
+                The language model used for final synthesis
+            persist_root (str):
+                The root directory for index persisting
+            overwrite_index (Optional[bool]):
+                Whether to overwrite the index while refreshing
+            showprogress (Optional[bool]):
+                Whether to show the indexing progress
+        """
+        super().__init__(
+            knowledge_id=knowledge_id,
+            emb_model=emb_model,
+            knowledge_config=knowledge_config,
+            model=model,
+            **kwargs,
+        )
+        if persist_root is None:
+            persist_root = file_manager.dir
+        self.persist_dir = os.path.join(persist_root, knowledge_id)
+        self.emb_model = emb_model
+        self.overwrite_index = overwrite_index
+        self.showprogress = showprogress
+        self.index = None
+        # ensure the emb_model is compatible with LlamaIndex
+        if isinstance(emb_model, ModelWrapperBase):
+            self.emb_model = _EmbeddingModel(emb_model)
+        elif isinstance(self.emb_model, BaseEmbedding):
+            pass
+        else:
+            raise TypeError(
+                f"Embedding model does not support {type(self.emb_model)}.",
+            )
+        # then we can initialize the RAG
+        self._init_rag()
+
+    def _init_rag(self, **kwargs: Any) -> None:
+        """
+        Initialize the RAG. This includes:
+            * if the persist_dir exists, load the persisted index
+            * if not, convert the data to index
+            * if needed, update the index
+            * set the retriever to retrieve information from index
+
+        Notes:
+            * the index is persisted in the self.persist_dir
+            * the refresh_index method is placed here for testing, it can be
+                called externally. For example, updated the index periodically
+                by calling rag.refresh_index() during the execution of the
+                agent.
+        """
+        if os.path.exists(self.persist_dir):
+            self._load_index()
+            # self.refresh_index()
+        else:
+            self._data_to_index()
+        self._get_retriever()
+        logger.info(
+            f"RAG with knowledge ids: {self.knowledge_id} "
+            f"initialization completed!\n",
+        )
+
+    def _load_index(self) -> None:
+        """
+        Load the persisted index from persist_dir.
+        """
+        # load the storage_context
+        storage_context = StorageContext.from_defaults(
+            persist_dir=self.persist_dir,
+        )
+        # construct index from
+        self.index = load_index_from_storage(
+            storage_context=storage_context,
+            embed_model=self.emb_model,
+        )
+        logger.info(f"index loaded from {self.persist_dir}")
+
+    def _data_to_index(self) -> None:
+        """
+        Convert the data to index by configs. This includes:
+            * load the data to documents by using information from configs
+            * set the transformations associated with documents
+            * convert the documents to nodes
+            * convert the nodes to index
+
+        Notes:
+            As each selected file type may need to use a different loader
+            and transformations, knowledge_config is a list of configs.
+        """
+        nodes = []
+        # load data to documents and set transformations
+        # using information in knowledge_config
+        for config in self.knowledge_config.get("data_processing"):
+            documents = self._data_to_docs(config=config)
+            transformations = self._set_transformations(config=config).get(
+                "transformations",
+            )
+            nodes_docs = self._docs_to_nodes(
+                documents=documents,
+                transformations=transformations,
+            )
+            nodes = nodes + nodes_docs
+        # convert nodes to index
+        self.index = VectorStoreIndex(
+            nodes=nodes,
+            embed_model=self.emb_model,
+        )
+        logger.info("index calculation completed.")
+        # persist the calculated index
+        self.index.storage_context.persist(persist_dir=self.persist_dir)
+        logger.info("index persisted.")
+
+    def _data_to_docs(
+        self,
+        query: Optional[str] = None,
+        config: dict = None,
+    ) -> Any:
+        """
+        This method set the loader as needed, or just use the default setting.
+        Then use the loader to load data from dir to documents.
+
+        Notes:
+            We can use simple directory loader (SimpleDirectoryReader)
+            to load general documents, including Markdown, PDFs,
+            Word documents, PowerPoint decks, images, audio and video.
+            Or use SQL loader (DatabaseReader) to load database.
+
+        Args:
+            query (Optional[str]):
+                optional, used when the data is in a database.
+            config (dict):
+                optional, used when the loader config is in a config file.
+        Returns:
+            Any: loaded documents
+        """
+        loader = self._set_loader(config=config).get("loader")
+        # let the doc_id be the filename for each document
+        loader.filename_as_id = True
+        if query is None:
+            documents = loader.load_data()
+        else:
+            # this is for querying a database,
+            # does not work for loading a document directory
+            documents = loader.load_data(query)
+        logger.info(f"loaded {len(documents)} documents")
+        return documents
+
+    def _docs_to_nodes(
+        self,
+        documents: List[Document],
+        transformations: Optional[list[Optional[TransformComponent]]] = None,
+    ) -> Any:
+        """
+        Convert the loaded documents to nodes using transformations.
+
+        Args:
+            documents (List[Document]):
+                documents to be processed, usually expected to be in
+                 llama index Documents.
+            transformations (Optional[list[TransformComponent]]):
+                optional, specifies the transformations (operators) to
+                process documents (e.g., split the documents into smaller
+                chunks)
+        Return:
+            Any: return the index of the processed document
+        """
+        # nodes, or called chunks, is a presentation of the documents
+        # we build nodes by using the IngestionPipeline
+        # for each document with corresponding transformations
+        pipeline = IngestionPipeline(
+            transformations=transformations,
+        )
+        # stack up the nodes from the pipline
+        nodes = pipeline.run(
+            documents=documents,
+            show_progress=self.showprogress,
+        )
+        logger.info("nodes generated.")
+        return nodes
+
+    def _set_loader(self, config: dict) -> Any:
+        """
+        Set the loader as needed, or just use the default setting.
+
+        Args:
+            config (dict): a dictionary containing configurations
+        """
+        if "load_data" in config:
+            # we prepare the loader from the configs
+            loader = self._prepare_args_from_config(
+                config=config.get("load_data", {}),
+            )
+        else:
+            # we prepare the loader by default
+            try:
+                from llama_index.core import SimpleDirectoryReader
+            except ImportError as exc_inner:
+                raise ImportError(
+                    " LlamaIndexAgent requires llama-index to be install."
+                    "Please run `pip install llama-index`",
+                ) from exc_inner
+            loader = {
+                "loader": SimpleDirectoryReader(
+                    input_dir="set_default_data_path",
+                ),
+            }
+        logger.info("loaders are ready.")
+        return loader
+
+    def _set_transformations(self, config: dict) -> Any:
+        """
+        Set the transformations as needed, or just use the default setting.
+
+        Args:
+            config (dict): a dictionary containing configurations.
+        """
+        if "store_and_index" in config:
+            temp = self._prepare_args_from_config(
+                config=config.get("store_and_index", {}),
+            )
+            transformations = temp.get("transformations")
+        else:
+            transformations = [
+                SentenceSplitter(
+                    chunk_size=self.knowledge_config.get(
+                        "chunk_size",
+                        DEFAULT_CHUNK_SIZE,
+                    ),
+                    chunk_overlap=self.knowledge_config.get(
+                        "chunk_overlap",
+                        DEFAULT_CHUNK_OVERLAP,
+                    ),
+                ),
+            ]
+        # adding embedding model as the last step of transformation
+        # https://docs.llamaindex.ai/en/stable/module_guides/loading/ingestion_pipeline/root.html
+        transformations.append(self.emb_model)
+        logger.info("transformations are ready.")
+        # as the last step, we need to repackage the transformations in dict
+        transformations = {"transformations": transformations}
+        return transformations
+
+    def _get_retriever(
+        self,
+        similarity_top_k: int = None,
+        **kwargs: Any,
+    ) -> BaseRetriever:
+        """
+        Set the retriever as needed, or just use the default setting.
+
+        Args:
+            retriever (Optional[BaseRetriever]): passing a retriever in
+             LlamaIndexKnowledge
+            rag_config (dict): rag configuration, including similarity top k
+            index.
+        """
+        # set the retriever
+        logger.info(
+            f"similarity_top_k" f"={similarity_top_k or DEFAULT_TOP_K}",
+        )
+        retriever = self.index.as_retriever(
+            embed_model=self.emb_model,
+            similarity_top_k=similarity_top_k or DEFAULT_TOP_K,
+            **kwargs,
+        )
+        logger.info("retriever is ready.")
+        return retriever
+
+    def retrieve(
+        self,
+        query: str,
+        similarity_top_k: int = None,
+        to_list_strs: bool = False,
+        retriever: Optional[BaseRetriever] = None,
+        **kwargs: Any,
+    ) -> list[Any]:
+        """
+        This is a basic retrieve function for knowledge.
+        It will build a retriever on the fly and return the
+        result of the query.
+        Args:
+            query (str):
+                query is expected to be a question in string
+            similarity_top_k (int):
+                the number of most similar data returned by the
+                retriever.
+            to_list_strs (bool):
+                whether returns the list of strings;
+                if False, return NodeWithScore
+            retriever (BaseRetriever):
+                for advanced usage, user can pass their own retriever.
+        Return:
+            list[Any]: list of str or NodeWithScore
+
+        More advanced query processing can refer to
+        https://docs.llamaindex.ai/en/stable/examples/query_transformations/query_transform_cookbook.html
+        """
+        if retriever is None:
+            retriever = self._get_retriever(similarity_top_k)
+        retrieved = retriever.retrieve(str(query))
+        if to_list_strs:
+            results = []
+            for node in retrieved:
+                results.append(node.get_text())
+            return results
+        return retrieved
+
+    def refresh_index(self) -> None:
+        """
+        Refresh the index when needed.
+        """
+        for config in self.knowledge_config.get("data_processing"):
+            documents = self._data_to_docs(config=config)
+            # store and indexing for each file type
+            transformations = self._set_transformations(config=config).get(
+                "transformations",
+            )
+            self._insert_docs_to_index(
+                documents=documents,
+                transformations=transformations,
+            )
+
+    def _insert_docs_to_index(
+        self,
+        documents: List[Document],
+        transformations: TransformComponent,
+    ) -> None:
+        """
+        Add documents to the index. Given a list of documents, we first test if
+        the doc_id is already in the index. If not, we add the doc to the
+        list. If yes, and the over-write flag is enabled,
+        we delete the old doc and add the new doc to the list.
+        Lastly, we generate nodes for all documents on the list, and insert
+        the nodes to the index.
+
+        Args:
+            documents (List[Document]): list of documents to be added.
+            transformations (TransformComponent): transformations that
+            convert the documents into nodes.
+        """
+        # this is the pipline that generate the nodes
+        pipeline = IngestionPipeline(
+            transformations=transformations,
+        )
+        # we need to generate nodes from this list of documents
+        insert_docs_list = []
+        for doc in documents:
+            if doc.doc_id not in self.index.ref_doc_info.keys():
+                # if the doc_id is not in the index, we add it to the list
+                insert_docs_list.append(doc)
+                logger.info(
+                    f"add new documents to index, " f"doc_id={doc.doc_id}",
+                )
+            else:
+                if self.overwrite_index:
+                    # if we enable overwrite index, we delete the old doc
+                    self.index.delete_ref_doc(
+                        ref_doc_id=doc.doc_id,
+                        delete_from_docstore=True,
+                    )
+                    # then add the same doc to the list
+                    insert_docs_list.append(doc)
+                    logger.info(
+                        f"replace document in index, " f"doc_id={doc.doc_id}",
+                    )
+        logger.info("documents scan completed.")
+        # we generate nodes for documents on the list
+        nodes = pipeline.run(
+            documents=insert_docs_list,
+            show_progress=True,
+        )
+        logger.info("nodes generated.")
+        # insert the new nodes to index
+        self.index.insert_nodes(nodes=nodes)
+        logger.info("nodes inserted to index.")
+        # persist the updated index
+        self.index.storage_context.persist(persist_dir=self.persist_dir)
+
+    def _delete_docs_from_index(
+        self,
+        documents: List[Document],
+    ) -> None:
+        """
+        Delete the nodes that are associated with a list of documents.
+
+        Args:
+            documents (List[Document]): list of documents to be deleted.
+        """
+        doc_id_list = [doc.doc_id for doc in documents]
+        for key in self.index.ref_doc_info.keys():
+            if key in doc_id_list:
+                self.index.delete_ref_doc(
+                    ref_doc_id=key,
+                    delete_from_docstore=True,
+                )
+                logger.info(f"docs deleted from index, doc_id={key}")
+        # persist the updated index
+        self.index.storage_context.persist(persist_dir=self.persist_dir)
+        logger.info("nodes delete completed.")