Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reformat and improve RAG module and agents #184

Merged
merged 101 commits into from
Jun 12, 2024
Merged
Show file tree
Hide file tree
Changes from 77 commits
Commits
Show all changes
101 commits
Select commit Hold shift + click to select a range
d4731d0
new testing code
FredericW Apr 11, 2024
9b4b060
test the idea that using two agents to analyze different aspects of c…
FredericW Apr 11, 2024
7551c2f
handle irrelevant question and simplify the setting first
ZiTao-Li Apr 15, 2024
86fac88
fix dir
ZiTao-Li Apr 15, 2024
0687138
function added for persisting the index.
FredericW Apr 16, 2024
c598310
Merge pull request #1 from FredericW/testing
FredericW Apr 16, 2024
ddddbe4
Merge branch 'modelscope:main' into main
FredericW Apr 16, 2024
2b1617b
json files are deleted.
FredericW Apr 16, 2024
8afab8f
Merge branch 'modelscope:main' into main
FredericW Apr 16, 2024
6c11924
Merge branch 'modelscope:main' into testing
FredericW Apr 16, 2024
dc5b3e5
Merge branch 'main' into testing
FredericW Apr 16, 2024
fc9066c
Merge pull request #2 from FredericW/testing
FredericW Apr 16, 2024
8daaf4b
Delete rag_storage directory
FredericW Apr 16, 2024
47a9e22
Merge pull request #3 from FredericW/main
ZiTao-Li Apr 16, 2024
a890ac1
merge
ZiTao-Li Apr 16, 2024
702c5c0
add mention function
ZiTao-Li Apr 16, 2024
1e4b4c5
add ui
ZiTao-Li Apr 16, 2024
70bb7bb
fix bugs
ZiTao-Li Apr 16, 2024
484fb4d
runnable ui with flask
ZiTao-Li Apr 18, 2024
99d5a73
The agent dialog flow is modified. We remove the summary agent, and a…
Apr 18, 2024
b3011b1
add docstring agent
ZiTao-Li Apr 18, 2024
449c4dc
update info
ZiTao-Li Apr 18, 2024
e55be4a
Changes are made to improve the performance
Apr 19, 2024
133dbc9
Changes are made to improve the performance
Apr 19, 2024
0fc5e16
Merge pull request #1 from ZiTao-Li/zitao/dev_copilot
ZiTao-Li Apr 19, 2024
5d6c0fe
config modified
Apr 19, 2024
dd0d328
Merge pull request #5 from FredericW/dev_copilot_zitao
ZiTao-Li Apr 19, 2024
3d80561
Merge branch 'main' into zitao/dev_copilot
ZiTao-Li Apr 19, 2024
ccc5b46
add file
ZiTao-Li Apr 19, 2024
ec3d8c0
add api assistant
ZiTao-Li Apr 19, 2024
3007809
New feature added: now we allow user to load and index multiple files…
Apr 22, 2024
d7ae87c
Changes made: To reorganize the work flow, we made the following chan…
Apr 25, 2024
1f37b96
Changes made: the major components of init_rag is now moved to rag_ag…
Apr 25, 2024
a55eaf0
Merge pull request #6 from FredericW/dev/as_copilot
ZiTao-Li Apr 26, 2024
85401a8
Changes made: refactor the load_data method in LlamaIndexRAG, now the…
Apr 26, 2024
b8c6eac
Changes made: new method load_index for load stored index from persis…
Apr 26, 2024
06d9ebb
reformat code
ZiTao-Li Apr 26, 2024
2ab69fa
Merge branch 'main' into zitao/dev_copilot
ZiTao-Li Apr 26, 2024
f9f912c
Merge branch 'main' into zitao/dev_copilot
ZiTao-Li Apr 28, 2024
986e1b6
fix merge
ZiTao-Li Apr 28, 2024
fc91107
fix
ZiTao-Li Apr 28, 2024
df7a79e
Merge branch 'main' into dev/as_copilot
ZiTao-Li Apr 28, 2024
82eff0c
Merge branch 'dev/as_copilot' into zitao/dev_copilot
ZiTao-Li Apr 28, 2024
32adc04
Merge pull request #7 from ZiTao-Li/zitao/dev_copilot
ZiTao-Li Apr 28, 2024
3dc828e
fix langchain_rag.py
ZiTao-Li Apr 28, 2024
130c235
Merge pull request #8 from ZiTao-Li/zitao/dev_copilot
ZiTao-Li Apr 28, 2024
a414a7c
Merge branch 'modelscope:main' into dev/as_copilot_index_manage
FredericW Apr 28, 2024
d41b3f8
Changes made: new method refresh_index, _insert_docs_to_index, and _d…
Apr 29, 2024
f533612
Quick fix: re-enable modules in rag_example.py
Apr 29, 2024
e89591c
Merge branch 'main' into dev/as_copilot
ZiTao-Li Apr 29, 2024
421e021
Merge branch 'modelscope:main' into dev/as_copilot_index_manage
FredericW Apr 29, 2024
c9902a5
sync with dev/as_copilot
Apr 29, 2024
07480d5
comments and docstrings are updated.
Apr 29, 2024
1640146
comments and docstrings are updated.
Apr 29, 2024
0895b10
Merge pull request #10 from FredericW/dev/as_copilot_index_manage
ZiTao-Li Apr 29, 2024
3ee3266
add knowledge bank
ZiTao-Li Apr 29, 2024
0b29773
Merge pull request #11 from ZiTao-Li/zitao/dev_copilot
ZiTao-Li Apr 30, 2024
e94cb0c
comments and docstrings are updated.
Apr 30, 2024
83f5aec
fix configs, add docstring, modify names
ZiTao-Li Apr 30, 2024
2ef4641
make requirements to pass tests
ZiTao-Li Apr 30, 2024
4205be6
Merge pull request #12 from ZiTao-Li/zitao/dev_copilot
ZiTao-Li Apr 30, 2024
67a1cf3
update docstrings
ZiTao-Li Apr 30, 2024
0b52ef8
Merge branch 'dev/as_copilot' into dev/as_copilot_fei
May 6, 2024
31c3aa4
New Feature: the RAG agent could load multiple rag modules with corre…
May 9, 2024
df18555
Minor edits.
May 10, 2024
56090eb
Minor edits.
May 10, 2024
db2f7a9
Minor edits.
May 10, 2024
d68b1dd
Minor edits.
May 10, 2024
48bf108
Minor edits.
May 10, 2024
8b75360
Minor edits regarding "persist_dir"
May 10, 2024
847f3b5
Minor edits.
May 10, 2024
55c34fa
Merge pull request #13 from FredericW/dev/as_copilot_fei
ZiTao-Li May 10, 2024
6186188
move emb_model_config_name to knowledge_config
ZiTao-Li May 15, 2024
123cd06
Merge pull request #14 from ZiTao-Li/zitao/dev_copilot
ZiTao-Li May 15, 2024
4f024b1
improve after discussion, change names, add equip function for knowle…
ZiTao-Li May 16, 2024
0c25ada
fix init function bug
ZiTao-Li May 16, 2024
06ce25d
Merge pull request #15 from ZiTao-Li/zitao/dev_copilot
ZiTao-Li May 16, 2024
94dcc9b
merged main
ZiTao-Li May 20, 2024
4ff401d
update as comments suggest
ZiTao-Li May 20, 2024
88e64cd
Merge pull request #16 from ZiTao-Li/zitao/dev_copilot
ZiTao-Li May 20, 2024
5ab6dde
update as comments suggest (for docs)
ZiTao-Li May 20, 2024
e704fd7
Merge pull request #17 from ZiTao-Li/zitao/dev_copilot
ZiTao-Li May 20, 2024
0257667
update as comments suggest 3
ZiTao-Li May 22, 2024
63648fc
add RAGConfig to regulate rag config input
ZiTao-Li May 22, 2024
f9a1308
Merge pull request #18 from ZiTao-Li/zitao/dev_copilot
ZiTao-Li May 22, 2024
b337a34
[Fix] remove "emb_model_config_name" from agent_config.json, which wa…
May 23, 2024
6f6b0d4
Merge branch 'main' into dev/as_copilot
ZiTao-Li May 24, 2024
2a72fda
Merge branch 'main' into dev/as_copilot
ZiTao-Li May 24, 2024
e6849e5
update as comments
ZiTao-Li Jun 3, 2024
cea6afb
[Fix] remove "emb_model_config_name" from agent_config.json, which wa…
Jun 6, 2024
0b9fbe4
Address the comments
Jun 7, 2024
f497985
Address the comments by zitao
Jun 7, 2024
c1c0814
Merge pull request #19 from FredericW/dev/as_copilot_fei
FredericW Jun 7, 2024
4a7a852
Merge remote-tracking branch 'modelscope/main' into dev/as_copilot
DavdGao Jun 9, 2024
93aaf0f
Fix typo
DavdGao Jun 9, 2024
78b9ba9
Fix typo
DavdGao Jun 9, 2024
d3f4684
fix import error
ZiTao-Li Jun 9, 2024
aaff5bd
Merge remote-tracking branch 'refs/remotes/origin/main' into dev/as_c…
ZiTao-Li Jun 9, 2024
88a9478
merge 2
ZiTao-Li Jun 9, 2024
a49791b
update tutorial
ZiTao-Li Jun 9, 2024
46334c5
update README
ZiTao-Li Jun 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions docs/sphinx_doc/en/source/tutorial/209-rag.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
(209-rag-en)=

# A Quick Introduction to RAG in AgentScope

We want to introduce three concepts related to RAG in AgentScope: Knowledge, KnowledgeBank and RAG agent.
ZiTao-Li marked this conversation as resolved.
Show resolved Hide resolved

### Knowledge
The Knowledge modules (now only `LlamaIndexKnowledge`; support for LangChain will come soon) are responsible for handling all RAG-related operations.

Here, we will use `LlamaIndexKnowledge` as an example to illustrate the operation within the `Knowledge` module.
When a `LlamaIndexKnowledge` object is initialized, the `LlamaIndexKnowledge.__init__` will go through the following steps:
* It processes documents and generates indexing for retrieval in `LlamaIndexKnowledge._data_to_index(...)`, which includes
* loading the documents `LlamaIndexKnowledge._data_to_docs(...)`;
* preprocessing the documents to nodes with preprocessing methods and embedding model `LlamaIndexKnowledge._docs_to_nodes(...)`;
* generating index with the processed nodes.
* If the indexing already exists, then `LlamaIndexKnowledge._load_index(...)` will be invoked to load the index and avoid repeating embedding calls.

A RAG module can be created with a JSON configuration to specify 1) data path, 2) data loader, 3) data preprocessing methods, and 4) embedding model (model config name).
ZiTao-Li marked this conversation as resolved.
Show resolved Hide resolved
A detailed example can refer to the following:
<details>
<summary> A detailed example of RAG module configuration </summary>

```json
[
{
"knowledge_id": "{your_knowledge_id}",
"emb_model_config_name": "{your_embed_model_config_name}",
"data_processing": [
{
"load_data": {
"loader": {
"create_object": true,
"module": "llama_index.core",
"class": "SimpleDirectoryReader",
"init_args": {
"input_dir": "{path_to_your_data_dir_1}",
"required_exts": [".md"]
}
}
}
},
{
"load_data": {
"loader": {
"create_object": true,
"module": "llama_index.core",
"class": "SimpleDirectoryReader",
"init_args": {
"input_dir": "{path_to_your_python_code_data_dir}",
"recursive": true,
"required_exts": [".py"]
}
}
},
"store_and_index": {
"transformations": [
{
"create_object": true,
"module": "llama_index.core.node_parser",
"class": "CodeSplitter",
"init_args": {
"language": "python",
"chunk_lines": 100
}
}
]
}
}
]
}
]
```

</details>

If users want to avoid the detailed configuration, we also provide a quick way in `KnowledgeBank` (see the following).
</br>

### Knowledge Bank
The knowledge bank maintains a collection of Knowledge objects (e.g., on different datasets) as a set of *knowledge*. Thus,
different agents can reuse the RAG modules without unnecessary "re-initialization".
Considering that configuring the RAG module may be too complicated for most users, the knowledge bank also provides an easy function call to create RAG modules.
* `KnowledgeBank.add_data_as_knowledge`: create RAG module. An easy way only requires to provide `knowledge_id`, `emb_model_name` and `data_dirs_and_types`
ZiTao-Li marked this conversation as resolved.
Show resolved Hide resolved
```python
knowledge_bank.add_data_as_knowledge(
knowledge_id="agentscope_tutorial_rag",
emb_model_name="qwen_emb_config",
DavdGao marked this conversation as resolved.
Show resolved Hide resolved
data_dirs_and_types={
"../../docs/sphinx_doc/en/source/tutorial": [".md"],
},
)
```
More advance initialization, users can still pass a knowledge config as a parameter `knowledge_config`:
```python
# load knowledge_config as dict
knowledge_bank.add_data_as_knowledge(
knowledge_id=knowledge_config["knowledge_id"],
emb_model_name=knowledge_config["emb_model_config_name"],
knowledge_config=knowledge_config,
)
```
* `KnowledgeBank.get_knowledge`: It accepts two parameters, `knowledge_id` and `duplicate`.
It will return a knowledge object with the provided `knowledge_id`; if `duplicate` is true, the return will be deep copied.
* `KnowledgeBank.equip`: It accepts two parameters, `agent` and `duplicate`.
The function will first check if the agent has `rag_config`; if so, it will provide the knowledge according to the
`knowledge_id` in the `rag_config` and initialize the retriever(s) for the agent.



### RAG agent
RAG agent is an agent that can generate answers based on the retrieved knowledge.
* Agent using RAG: RAG agent requires `rag_config` in its configuration, and there is a list of `knowledge_id`.
* Agent can load specific knowledge from a `KnowledgeBank` by passing it into the `KnowledgeBank.equip` function.
* Agent can use the retrievers in the `reply` function to retrieve from the `Knowledge` and compose their prompt to LLMs.



**Building RAG agent yourself.** As long as your agent config has the `rag_config` attribute as a dict and there is a list of `knowledge_id`, you can pass it to the `KnowledgeBank.equip`.
Your agent will be equipped with a list of knowledge according to the list of `knowledge_id` and the corresponding retrievers.
You can decide how to use the retriever and even update and refresh the index in your agent's `reply` function.


[[Back to the top]](#209-rag-en)



121 changes: 121 additions & 0 deletions docs/sphinx_doc/zh_CN/source/tutorial/209-rag.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
(209-rag-zh)=

# 简要介绍AgentScope中的RAG

我们在此介绍AgentScope与RAG相关的三个概念:知识(Knowledge),知识库(Knowledge Bank)和RAG agent。

### Knowledge
知识模块(目前仅有“LlamaIndexKnowledge”;即将支持对LangChain)负责处理所有与RAG相关的操作。
ZiTao-Li marked this conversation as resolved.
Show resolved Hide resolved

在这里,我们将使用`LlamaIndexKnowledge`作为示例,以说明在`Knowledge`模块内的操作。
当初始化`LlamaIndexKnowledge`对象时,`LlamaIndexKnowledge.__init__`将执行以下步骤:
* 它处理文档并生成检索索引 (`LlamaIndexKnowledge._data_to_index(...)`中完成) 其中包括
* 加载文档 `LlamaIndexKnowledge._data_to_docs(...)`;
* 对文档进行预处理,使用预处理方法和向量模型生成nodes `LlamaIndexKnowledge._docs_to_nodes(...)`;
* 生成处理后的节点的索引。
* 如果索引已经存在,则会调用 `LlamaIndexKnowledge._load_index(...)` 来加载索引,并避免重复的嵌入调用。

用户可以使用JSON配置来创建一个RAG模块,以指定1)数据路径,2)数据加载器,3)数据预处理方法,以及4)嵌入模型(模型配置名称)。
一个详细的示例可以参考以下内容:
<details>
<summary> 详细的配置示例 </summary>

```json
[
{
"knowledge_id": "{your_knowledge_id}",
DavdGao marked this conversation as resolved.
Show resolved Hide resolved
"emb_model_config_name": "{your_embed_model_config_name}",
"data_processing": [
{
"load_data": {
"loader": {
"create_object": true,
"module": "llama_index.core",
"class": "SimpleDirectoryReader",
"init_args": {
"input_dir": "{path_to_your_data_dir_1}",
"required_exts": [".md"]
}
}
}
},
{
"load_data": {
"loader": {
"create_object": true,
"module": "llama_index.core",
"class": "SimpleDirectoryReader",
"init_args": {
"input_dir": "{path_to_your_python_code_data_dir}",
"recursive": true,
"required_exts": [".py"]
}
}
},
"store_and_index": {
"transformations": [
{
"create_object": true,
"module": "llama_index.core.node_parser",
"class": "CodeSplitter",
"init_args": {
"language": "python",
"chunk_lines": 100
}
}
]
}
}
]
}
]
```

</details>

如果用户想要避免详细的配置,我们也在`KnowledgeBank`中提供了一种快速的方式(请参阅以下内容)。
</br>

### Knowledge Bank
知识库将一组Knowledge模块(例如,来自不同数据集的知识)作为知识的集合进行维护。因此,不同的代理可以在没有不必要的重新初始化的情况下重复使用知识模块。考虑到配置RAG模块可能对大多数用户来说过于复杂,知识库还提供了一个简单的函数调用来创建RAG模块。

* `KnowledgeBank.add_data_as_knowledge`: 创建RAG模块。一种简单的方式只需要提供knowledge_id、emb_model_name和data_dirs_and_types。
```python
knowledge_bank.add_data_as_knowledge(
knowledge_id="agentscope_tutorial_rag",
DavdGao marked this conversation as resolved.
Show resolved Hide resolved
emb_model_name="qwen_emb_config",
data_dirs_and_types={
"../../docs/sphinx_doc/en/source/tutorial": [".md"],
},
)
```
对于更高级的初始化,用户仍然可以将一个知识模块配置作为参数knowledge_config传递:
```python
# load knowledge_config as dict
knowledge_bank.add_data_as_knowledge(
knowledge_id=knowledge_config["knowledge_id"],
emb_model_name=knowledge_config["emb_model_config_name"],
knowledge_config=knowledge_config,
)
```
* `KnowledgeBank.get_knowledge`: 它接受两个参数,knowledge_id和duplicate。
如果duplicate为true,则返回提供的knowledge_id对应的知识对象;否则返回深拷贝的对象。
* `KnowledgeBank.equip`: 它接受两个参数,`agent`和`duplicate`。
该函数首先会检查代理是否具有rag_config;如果有,则根据rag_config中的knowledge_id提供相应的知识,并为代理初始化检索器。
ZiTao-Li marked this conversation as resolved.
Show resolved Hide resolved
`duplicate` 同样决定是否是深拷贝。


### RAG agent
RAG agent是可以基于检索到的知识生成答案的agent。
* 让Agent使用RAG: RAG agent在其配置中需要·`rag_config`,其中有一个`knowledge_id`的列表
ZiTao-Li marked this conversation as resolved.
Show resolved Hide resolved
* Agent可以通过将其传递给`KnowledgeBank.equip`函数来从`KnowledgeBank`加载特定的知识。
* Agent 代理可以在`reply`函数中使用检索器(retriever)从`Knowledge`中检索,并将其提示组合到LLM中

**Building RAG agent yourself.** 只要您的代理配置具有`rag_config`属性并且是字典型,里面有一个`knowledge_id`列表,您就可以将其传递给`KnowledgeBank.equip`,
为它配置`knowledge_id`列表和相应的知识和检索器(retriever),您的代理将配备一系列知识。
您可以在`reply`函数中决定如何使用检索器,甚至更新和刷新索引。

[[Back to the top]](#209-rag-zh)



32 changes: 12 additions & 20 deletions examples/conversation_with_RAG_agents/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,12 @@ you will obtain three different agents who can help you answer different questio
* **What is this example for?** By this example, we want to show how the agent with retrieval augmented generation (RAG)
capability can be used to build easily.

**Notice:** This example is a Beta version of the AgentScope RAG agent. A formal version will soon be added to `src/agentscope/agents`, but it may be subject to changes.

## Prerequisites
* **Cloning repo:** This example requires cloning the whole AgentScope repo to local.
* **Packages:** This example is built on the LlamaIndex package. Thus, some packages need to be installed before running the example.
```bash
pip install llama-index tree_sitter tree-sitter-languages
pip install llama-index llama-index-readers-docstring-walker tree_sitter tree-sitter-languages
```
* **Model APIs:** This example uses Dashscope APIs. Thus, we also need an API key for DashScope.
```bash
Expand All @@ -23,35 +22,28 @@ capability can be used to build easily.
**Note:** This example has been tested with `dashscope_chat` and `dashscope_text_embedding` model wrapper, with `qwen-max` and `text-embedding-v2` models.
However, you are welcome to replace the Dashscope language and embedding model wrappers or models with other models you like to test.

## Start AgentScope Consultants
## Start AgentScope Copilots
* **Terminal:** The most simple way to execute the AgentScope Consultants is running in terminal.
ZiTao-Li marked this conversation as resolved.
Show resolved Hide resolved
ZiTao-Li marked this conversation as resolved.
Show resolved Hide resolved
```bash
python ./rag_example.py
```
Setting `log_retrieval` to `false` in `agent_config.json` can hide the retrieved information and provide only answers of agents.


* **AS studio:** If you want to have more organized, clean UI, you can also run with our `as_studio`.
```bash
as_studio ./rag_example.py
```

### Customize AgentScope Consultants to other consultants
### Agents in the example
Customize AgentScope Consultants to other consultants
After you run the example, you may notice that this example consists of three RAG agents:
* `AgentScope Tutorial Assistant`: responsible for answering questions based on AgentScope tutorials (markdown files).
* `AgentScope Framework Code Assistant`: responsible for answering questions based on AgentScope code base (python files).
* `Summarize Assistant`: responsible for summarize the questions from the above two agents.

These agents can be configured to answering questions based on other GitHub repo, by simply modifying the `input_dir` fields in the `agent_config.json`.

For more advanced customization, we may need to learn a little bit from the following.
* `Tutorial-Assistant`: responsible for answering questions based on AgentScope tutorials (markdown files).
* `Code-Search-Assistant`: responsible for answering questions based on AgentScope code base (python files).
* `API-Assistant`: responsible for answering questions based on AgentScope API documents (html files, generated by `sphinx`)
* `Searching-Assistant`: responsible for general search in tutorial and code base (markdown files and code files)
* `Agent-Guiding-Assistant`: responsible for referring the correct agent(s) among the above ones.

**RAG modules:** In AgentScope, RAG modules are abstract to provide three basic functions: `load_data`, `store_and_index` and `retrieve`. Refer to `src/agentscope/rag` for more details.
Besides the last `Agent-Guiding-Assistant`, all other agents can be configured to answering questions based on other GitHub repo by replacing the `knowledge`.

**RAG configs:** In the example configuration (the `rag_config` field), all parameters are optional. But if you want to customize them, you may want to learn the following:
* `load_data`: contains all parameters for the the `rag.load_data` function.
Since the `load_data` accepts a dataloader object `loader`, the `loader` in the config need to have `"create_object": true` to let a internal parse create a LlamaIndex data loader object.
The loader object is an instance of `class` in module `module`, with initialization parameters in `init_args`.
For more details about how to use the RAG module in AgentScope, please refer to the tutorial.

* `store_and_index`: contains all parameters for the the `rag.store_and_index` function.
For example, you can pass `vector_store` and `retriever` configurations in a similar way as the `loader` mentioned above.
For the `transformations` parameter, you can pass a list of dicts, each of which corresponds to building a `NodeParser`-kind of preprocessor in Llamaindex.
79 changes: 0 additions & 79 deletions examples/conversation_with_RAG_agents/agent_config.json

This file was deleted.

Loading