Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reformat and improve RAG module and agents #184

Merged
merged 101 commits into from
Jun 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
101 commits
Select commit Hold shift + click to select a range
d4731d0
new testing code
FredericW Apr 11, 2024
9b4b060
test the idea that using two agents to analyze different aspects of c…
FredericW Apr 11, 2024
7551c2f
handle irrelevant question and simplify the setting first
ZiTao-Li Apr 15, 2024
86fac88
fix dir
ZiTao-Li Apr 15, 2024
0687138
function added for persisting the index.
FredericW Apr 16, 2024
c598310
Merge pull request #1 from FredericW/testing
FredericW Apr 16, 2024
ddddbe4
Merge branch 'modelscope:main' into main
FredericW Apr 16, 2024
2b1617b
json files are deleted.
FredericW Apr 16, 2024
8afab8f
Merge branch 'modelscope:main' into main
FredericW Apr 16, 2024
6c11924
Merge branch 'modelscope:main' into testing
FredericW Apr 16, 2024
dc5b3e5
Merge branch 'main' into testing
FredericW Apr 16, 2024
fc9066c
Merge pull request #2 from FredericW/testing
FredericW Apr 16, 2024
8daaf4b
Delete rag_storage directory
FredericW Apr 16, 2024
47a9e22
Merge pull request #3 from FredericW/main
ZiTao-Li Apr 16, 2024
a890ac1
merge
ZiTao-Li Apr 16, 2024
702c5c0
add mention function
ZiTao-Li Apr 16, 2024
1e4b4c5
add ui
ZiTao-Li Apr 16, 2024
70bb7bb
fix bugs
ZiTao-Li Apr 16, 2024
484fb4d
runnable ui with flask
ZiTao-Li Apr 18, 2024
99d5a73
The agent dialog flow is modified. We remove the summary agent, and a…
Apr 18, 2024
b3011b1
add docstring agent
ZiTao-Li Apr 18, 2024
449c4dc
update info
ZiTao-Li Apr 18, 2024
e55be4a
Changes are made to improve the performance
Apr 19, 2024
133dbc9
Changes are made to improve the performance
Apr 19, 2024
0fc5e16
Merge pull request #1 from ZiTao-Li/zitao/dev_copilot
ZiTao-Li Apr 19, 2024
5d6c0fe
config modified
Apr 19, 2024
dd0d328
Merge pull request #5 from FredericW/dev_copilot_zitao
ZiTao-Li Apr 19, 2024
3d80561
Merge branch 'main' into zitao/dev_copilot
ZiTao-Li Apr 19, 2024
ccc5b46
add file
ZiTao-Li Apr 19, 2024
ec3d8c0
add api assistant
ZiTao-Li Apr 19, 2024
3007809
New feature added: now we allow user to load and index multiple files…
Apr 22, 2024
d7ae87c
Changes made: To reorganize the work flow, we made the following chan…
Apr 25, 2024
1f37b96
Changes made: the major components of init_rag is now moved to rag_ag…
Apr 25, 2024
a55eaf0
Merge pull request #6 from FredericW/dev/as_copilot
ZiTao-Li Apr 26, 2024
85401a8
Changes made: refactor the load_data method in LlamaIndexRAG, now the…
Apr 26, 2024
b8c6eac
Changes made: new method load_index for load stored index from persis…
Apr 26, 2024
06d9ebb
reformat code
ZiTao-Li Apr 26, 2024
2ab69fa
Merge branch 'main' into zitao/dev_copilot
ZiTao-Li Apr 26, 2024
f9f912c
Merge branch 'main' into zitao/dev_copilot
ZiTao-Li Apr 28, 2024
986e1b6
fix merge
ZiTao-Li Apr 28, 2024
fc91107
fix
ZiTao-Li Apr 28, 2024
df7a79e
Merge branch 'main' into dev/as_copilot
ZiTao-Li Apr 28, 2024
82eff0c
Merge branch 'dev/as_copilot' into zitao/dev_copilot
ZiTao-Li Apr 28, 2024
32adc04
Merge pull request #7 from ZiTao-Li/zitao/dev_copilot
ZiTao-Li Apr 28, 2024
3dc828e
fix langchain_rag.py
ZiTao-Li Apr 28, 2024
130c235
Merge pull request #8 from ZiTao-Li/zitao/dev_copilot
ZiTao-Li Apr 28, 2024
a414a7c
Merge branch 'modelscope:main' into dev/as_copilot_index_manage
FredericW Apr 28, 2024
d41b3f8
Changes made: new method refresh_index, _insert_docs_to_index, and _d…
Apr 29, 2024
f533612
Quick fix: re-enable modules in rag_example.py
Apr 29, 2024
e89591c
Merge branch 'main' into dev/as_copilot
ZiTao-Li Apr 29, 2024
421e021
Merge branch 'modelscope:main' into dev/as_copilot_index_manage
FredericW Apr 29, 2024
c9902a5
sync with dev/as_copilot
Apr 29, 2024
07480d5
comments and docstrings are updated.
Apr 29, 2024
1640146
comments and docstrings are updated.
Apr 29, 2024
0895b10
Merge pull request #10 from FredericW/dev/as_copilot_index_manage
ZiTao-Li Apr 29, 2024
3ee3266
add knowledge bank
ZiTao-Li Apr 29, 2024
0b29773
Merge pull request #11 from ZiTao-Li/zitao/dev_copilot
ZiTao-Li Apr 30, 2024
e94cb0c
comments and docstrings are updated.
Apr 30, 2024
83f5aec
fix configs, add docstring, modify names
ZiTao-Li Apr 30, 2024
2ef4641
make requirements to pass tests
ZiTao-Li Apr 30, 2024
4205be6
Merge pull request #12 from ZiTao-Li/zitao/dev_copilot
ZiTao-Li Apr 30, 2024
67a1cf3
update docstrings
ZiTao-Li Apr 30, 2024
0b52ef8
Merge branch 'dev/as_copilot' into dev/as_copilot_fei
May 6, 2024
31c3aa4
New Feature: the RAG agent could load multiple rag modules with corre…
May 9, 2024
df18555
Minor edits.
May 10, 2024
56090eb
Minor edits.
May 10, 2024
db2f7a9
Minor edits.
May 10, 2024
d68b1dd
Minor edits.
May 10, 2024
48bf108
Minor edits.
May 10, 2024
8b75360
Minor edits regarding "persist_dir"
May 10, 2024
847f3b5
Minor edits.
May 10, 2024
55c34fa
Merge pull request #13 from FredericW/dev/as_copilot_fei
ZiTao-Li May 10, 2024
6186188
move emb_model_config_name to knowledge_config
ZiTao-Li May 15, 2024
123cd06
Merge pull request #14 from ZiTao-Li/zitao/dev_copilot
ZiTao-Li May 15, 2024
4f024b1
improve after discussion, change names, add equip function for knowle…
ZiTao-Li May 16, 2024
0c25ada
fix init function bug
ZiTao-Li May 16, 2024
06ce25d
Merge pull request #15 from ZiTao-Li/zitao/dev_copilot
ZiTao-Li May 16, 2024
94dcc9b
merged main
ZiTao-Li May 20, 2024
4ff401d
update as comments suggest
ZiTao-Li May 20, 2024
88e64cd
Merge pull request #16 from ZiTao-Li/zitao/dev_copilot
ZiTao-Li May 20, 2024
5ab6dde
update as comments suggest (for docs)
ZiTao-Li May 20, 2024
e704fd7
Merge pull request #17 from ZiTao-Li/zitao/dev_copilot
ZiTao-Li May 20, 2024
0257667
update as comments suggest 3
ZiTao-Li May 22, 2024
63648fc
add RAGConfig to regulate rag config input
ZiTao-Li May 22, 2024
f9a1308
Merge pull request #18 from ZiTao-Li/zitao/dev_copilot
ZiTao-Li May 22, 2024
b337a34
[Fix] remove "emb_model_config_name" from agent_config.json, which wa…
May 23, 2024
6f6b0d4
Merge branch 'main' into dev/as_copilot
ZiTao-Li May 24, 2024
2a72fda
Merge branch 'main' into dev/as_copilot
ZiTao-Li May 24, 2024
e6849e5
update as comments
ZiTao-Li Jun 3, 2024
cea6afb
[Fix] remove "emb_model_config_name" from agent_config.json, which wa…
Jun 6, 2024
0b9fbe4
Address the comments
Jun 7, 2024
f497985
Address the comments by zitao
Jun 7, 2024
c1c0814
Merge pull request #19 from FredericW/dev/as_copilot_fei
FredericW Jun 7, 2024
4a7a852
Merge remote-tracking branch 'modelscope/main' into dev/as_copilot
DavdGao Jun 9, 2024
93aaf0f
Fix typo
DavdGao Jun 9, 2024
78b9ba9
Fix typo
DavdGao Jun 9, 2024
d3f4684
fix import error
ZiTao-Li Jun 9, 2024
aaff5bd
Merge remote-tracking branch 'refs/remotes/origin/main' into dev/as_c…
ZiTao-Li Jun 9, 2024
88a9478
merge 2
ZiTao-Li Jun 9, 2024
a49791b
update tutorial
ZiTao-Li Jun 9, 2024
46334c5
update README
ZiTao-Li Jun 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ Welcome to join our community on

## News

- <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-06-11]** The RAG functionality is available for agents in **AgentScope** now! [**A quick introduction to RAG in AgentScope**](https://modelscope.github.io/agentscope/en/tutorial/210-rag.html) can help you equip your agent with external knowledge!

- <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-06-09]** We release **AgentScope** v0.0.5 now! In this new version, [**AgentScope Workstation**](https://modelscope.github.io/agentscope/en/tutorial/209-gui.html) is open-sourced with the refactored [**AgentScope Studio**](https://modelscope.github.io/agentscope/en/tutorial/209-gui.html)!

<h5 align="center">
Expand Down
1 change: 1 addition & 0 deletions README_ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
| <img src="https://gw.alicdn.com/imgextra/i1/O1CN01hhD1mu1Dd3BWVUvxN_!!6000000000238-2-tps-400-400.png" width="100" height="100"> | <img src="https://img.alicdn.com/imgextra/i2/O1CN01tuJ5971OmAqNg9cOw_!!6000000001747-0-tps-444-460.jpg" width="100" height="100"> |

## 新闻
- <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-06-11]** RAG功能现在已经整合进 **AgentScope** 中! 大家可以根据 [**简要介绍AgentScope中的RAG**](https://modelscope.github.io/agentscope/en/tutorial/210-rag.html) ,让自己的agent用上外部知识!

- <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-06-09]** AgentScope v0.0.5 已经更新!在这个新版本中,我们开源了 [**AgentScope Workstation**](https://modelscope.github.io/agentscope/en/tutorial/209-gui.html)!

Expand Down
197 changes: 197 additions & 0 deletions docs/sphinx_doc/en/source/tutorial/210-rag.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
(210-rag-en)=

# A Quick Introduction to RAG in AgentScope

We want to introduce three concepts related to RAG in AgentScope: Knowledge, KnowledgeBank and RAG agent.

### Knowledge
The Knowledge modules (now only `LlamaIndexKnowledge`; support for LangChain will come soon) are responsible for handling all RAG-related operations.

#### How to create a Knowledge object
A Knowledge object can be created with a JSON configuration to specify 1) data path, 2) data loader, 3) data preprocessing methods, and 4) embedding model (model config name).
A detailed example can refer to the following:
<details>
<summary> A detailed example of Knowledge object configuration </summary>

```json
[
{
"knowledge_id": "{your_knowledge_id}",
"emb_model_config_name": "{your_embed_model_config_name}",
"data_processing": [
{
"load_data": {
"loader": {
"create_object": true,
"module": "llama_index.core",
"class": "SimpleDirectoryReader",
"init_args": {
"input_dir": "{path_to_your_data_dir_1}",
"required_exts": [".md"]
}
}
}
},
{
"load_data": {
"loader": {
"create_object": true,
"module": "llama_index.core",
"class": "SimpleDirectoryReader",
"init_args": {
"input_dir": "{path_to_your_python_code_data_dir}",
"recursive": true,
"required_exts": [".py"]
}
}
},
"store_and_index": {
"transformations": [
{
"create_object": true,
"module": "llama_index.core.node_parser",
"class": "CodeSplitter",
"init_args": {
"language": "python",
"chunk_lines": 100
}
}
]
}
}
]
}
]
```

</details>

#### More about knowledge configurations
The aforementioned configuration is usually saved as a JSON file, it musts
contain the following key attributes,
* `knowledge_id`: a unique identifier of the knowledge;
* `emb_model_config_name`: the name of the embedding model;
* `chunk_size`: default chunk size for the document transformation (node parser);
* `chunk_overlap`: default chunk overlap for each chunk (node);
* `data_processing`: a list of data processing methods.

##### Using LlamaIndexKnowledge as an example

Regarding the last attribute `data_processing`, each entry of the list (which is a dict) configures a data
loader object that loads the needed data (i.e. `load_data`),
and a transformation object that the process the loaded data (`store_and_index`).
Accordingly, one may load data from multiple sources (with different data loaders),
process with individually defined manners (i.e. transformation or node parser),
and merge the processed data into a single index for later retrieval.
For more information about the components, please refer to
[LlamaIndex-Loading Data](https://docs.llamaindex.ai/en/stable/module_guides/loading/).
In common, we need to set the following attributes
* `create_object`: indicates whether to create a new object, must be true in this case;
* `module`: where the class is located;
* `class`: the name of the class.

More specifically, for setting the `load_data`, you can use a wide collection of data loaders,
such as `SimpleDirectoryReader` (in `class`), provided by Llama-index, to load a various collection of data types
(e.g. txt, pdf, html, py, md, etc.). Regarding this data loader, you can set the following attributes
* `input_dir`: the path to the data directory;
* `required_exts`: the file extensions that the data loader will load.

For more information about the data loaders, please refer to [here](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/)

For `store_and_index`, it is optional and if it is not specified, the default transformation (a.k.a. node parser) is `SentenceSplitter`. For some specific node parser such as `CodeSplitter`, users can set the following attributes:
* `language`: the language of the code;
* `chunk_lines`: the number of lines for each of the code chunk.

For more information about the node parsers, please refer to [here](https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/).


If users want to avoid the detailed configuration, we also provide a quick way in `KnowledgeBank` (see the following).

#### How to use a Knowledge object
After a knowledge object is created successfully, users can retrieve information related to their queries by calling `.retrieve(...)` function.
The `.retrieve` function accepts at least three basic parameters:
* `query`: input that will be matched in the knowledge;
* `similarity_top_k`: how many most similar "data blocks" will be returned;
* `to_list_strs`: whether return the retrieved information as strings.

*Advanaced:* In `LlamaIndexKnowledge`, it also supports users passing their own retriever to retrieve from knowledge.

#### More details inside `LlamaIndexKnowledge`
Here, we will use `LlamaIndexKnowledge` as an example to illustrate the operation within the `Knowledge` module.
When a `LlamaIndexKnowledge` object is initialized, the `LlamaIndexKnowledge.__init__` will go through the following steps:
* It processes data and prepare for retrieval in `LlamaIndexKnowledge._data_to_index(...)`, which includes
* loading the data `LlamaIndexKnowledge._data_to_docs(...)`;
* preprocessing the data with preprocessing methods (e.g., splitting) and embedding model `LlamaIndexKnowledge._docs_to_nodes(...)`;
* get ready for being query, i.e. generate indexing for the processed data.
* If the indexing already exists, then `LlamaIndexKnowledge._load_index(...)` will be invoked to load the index and avoid repeating embedding calls.
</br>

### Knowledge Bank
The knowledge bank maintains a collection of Knowledge objects (e.g., on different datasets) as a set of *knowledge*. Thus,
different agents can reuse the Knowledge object without unnecessary "re-initialization".
Considering that configuring the Knowledge object may be too complicated for most users, the knowledge bank also provides an easy function call to create Knowledge objects.
* `KnowledgeBank.add_data_as_knowledge`: create Knowledge object. An easy way only requires to provide `knowledge_id`, `emb_model_name` and `data_dirs_and_types`.
As knowledge bank process files as `LlamaIndexKnowledge` by default, all text file types are supported, such as `.txt`, `.html`, `.md`, `.csv`, `.pdf` and all code file like `.py`. File types other than the text can refer to [LlamaIndex document](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/).
```python
knowledge_bank.add_data_as_knowledge(
knowledge_id="agentscope_tutorial_rag",
emb_model_name="qwen_emb_config",
data_dirs_and_types={
"../../docs/sphinx_doc/en/source/tutorial": [".md"],
},
)
```
More advance initialization, users can still pass a knowledge config as a parameter `knowledge_config`:
```python
# load knowledge_config as dict
knowledge_bank.add_data_as_knowledge(
knowledge_id=knowledge_config["knowledge_id"],
emb_model_name=knowledge_config["emb_model_config_name"],
knowledge_config=knowledge_config,
)
```
* `KnowledgeBank.get_knowledge`: It accepts two parameters, `knowledge_id` and `duplicate`.
It will return a knowledge object with the provided `knowledge_id`; if `duplicate` is true, the return will be deep copied.
* `KnowledgeBank.equip`: It accepts three parameters, `agent`, `knowledge_id_list` and `duplicate`.
The function will provide knowledge objects according to the `knowledge_id_list` and put them into `agent.knowledge_list`. If `duplicate` is true, the assigned knowledge object will be deep copied first.




### RAG agent
RAG agent is an agent that can generate answers based on the retrieved knowledge.
* Agent using RAG: a RAG agent has a list of knowledge objects (`knowledge_list`).
* RAG agent can be initialized with a `knowledge_list`
```python
knowledge = knowledge_bank.get_knowledge(knowledge_id)
agent = LlamaIndexAgent(
name="rag_worker",
sys_prompt="{your_prompt}",
model_config_name="{your_model}",
knowledge_list=[knowledge], # provide knowledge object directly
similarity_top_k=3,
log_retrieval=False,
recent_n_mem_for_retrieve=1,
)
```
* If RAG agent is build with a configurations with `knowledge_id_list` specified, agent can load specific knowledge from a `KnowledgeBank` by passing it and a list ids into the `KnowledgeBank.equip` function.
```python
# >>> agent.knowledge_list
# >>> []
knowledge_bank.equip(agent, agent.knowledge_id_list)
# >>> agent.knowledge_list
# [<LlamaIndexKnowledge object at 0x16e516fb0>]
```
* Agent can use the retrieved knowledge in the `reply` function and compose their prompt to LLMs.



**Building RAG agent yourself.** As long as you provide a list of knowledge id, you can pass it with your agent to the `KnowledgeBank.equip`.
Your agent will be equipped with a list of knowledge according to the `knowledge_id_list`.
You can decide how to use the retrieved content and even update and refresh the index in your agent's `reply` function.


[[Back to the top]](#210-rag-en)



1 change: 1 addition & 0 deletions docs/sphinx_doc/en/source/tutorial/main.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ AgentScope is an innovative multi-agent platform designed to empower developers
- [Pipeline and MsgHub](202-pipeline.md)
- [Distribution](208-distribute.md)
- [AgentScope Studio](209-gui.md)
- [Retrieval Augmented Generation (RAG)](210-rag.md)
- [Logging](105-logging.md)
- [Monitor](207-monitor.md)
- [Example: Werewolf Game](104-usecase.md)
Expand Down
Loading