🌻 KMatrix-2: A Comprehensive Heterogeneous Knowledge Collaborative Enhancement Toolkit for Large Language Model
We present KMatrix-2, an open-source toolkit that supports comprehensive heterogeneous knowledge collaborative enhancement for LLMs, our main contributions are:
-
The paper proposes a comprehensive heterogeneous knowledge collaborative enhancement toolkit (KMatrix-2) for LLMs. Compared with previous toolkits, which mainly focused on descriptive knowledge, KMatrix-2 specifically considers the enhancement on procedural knowledge.
-
KMatrix-2 offers a rich selection of modular components and several typical enhancement patterns to support convenient construction of mainstream heterogeneous K-LLMs systems.
-
KMatrix-2 integrates systematic knowledge conflict resolution solutions for better knowledge integration, including inter-context and context-memory conflict resolution.
-
We provide comparative performance results of heterogeneous knowledge access and collaborative enhancement to demonstrate the capabilities of KMatrix-2.
Installation
To get started with KMatrix2, simply clone it from Github and install (requires Python 3.7+ , Python 3.10 recommended):
$ git clone https://github.com/NLPerWS/KMatrix-2
# It is recommended to use a virtual environment for installation
$ conda create -n KMatrix2 python=3.10
$ conda activate KMatrix2
# Install backend environment
$ cd KMatrix-2
$ pip install -r requirements.txt
# Install Frontend environment
# You need a node environment, and nvm is recommended for node environment management
# Recommended node environments: 16.20.2
# You can refer to the fellowing websites to install nvm
# https://nvm.uihtm.com/#nvm-linux
# https://github.com/nvm-sh/nvm
# After installing the node environment, execute:
$ cd font_kmatrix2
$ npm install
# Then, you need to install some third-party tools required by our toolkit
# Install ES database using Docker
$ docker pull elasticsearch:8.11.1
$ docker run -idt \
-p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
-e "xpack.security.enabled=true" \
-e "xpack.security.enrollment.enabled=true" \
-e "ELASTIC_PASSWORD=yourpassword" \
-v $(pwd)/elasticsearch_data:/usr/share/elasticsearch/data \
-v $(pwd)/esplugins:/usr/share/elasticsearch/plugins \
--name elasticsearch elasticsearch:8.11.1
# Additional model
You need to visit these two websites to download the text2natsql_schema_item_classifier and text2natsql-t5-base models.
https://drive.google.com/file/d/1UWNj1ZADfKa1G5I4gBYCJeEQO6piMg4G/view
https://drive.google.com/file/d/1QyfSfHHrxfIM5X9gKUYNr_0ZRVvb1suV/view
These two models are used to convert text into SQL statements (you can skip this if you don't need this feature).
Once the download is complete, please place the model files into the dir_model/ directory.
# Additional datasets and knowledges
We have uploaded all the datasets and knowledge bases used in the experiment,
You can run the following command to download:
$ git lfs install
$ git clone https://www.modelscope.cn/datasets/zhangyujie/KMatrix2_Rep.git
And then upload to the system as needed for use.
StartUp
If you have successfully installed the environment, a quick start will be easy.
1. Set the baseURL in font_kmatrix2/src/axios/index.js to the IP address of deployment server.
2. Start the toolkit by executing following command:
$ cd KMatrix-2/font_kmatrix2
$ npm run dev
$ cd KMatrix-2
$ python flask_server.py
Visit KMatrix2 toolkit using the browser: http://yourserverip:8010
$ git clone https://github.com/NLPerWS/KMatrix-2.git
$ chmod +x -R KMatrix-2
Set configurations that needs to be modified in the root_config.py
Set the baseURL in font_kmatrix2/src/axios/index.js to the IP address of deployment server.
# Install ES database using Docker
$ docker pull elasticsearch:8.11.1
$ docker run -idt \
-p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
-e "xpack.security.enabled=true" \
-e "xpack.security.enrollment.enabled=true" \
-e "ELASTIC_PASSWORD=yourpassword" \
-v $(pwd)/elasticsearch_data:/usr/share/elasticsearch/data \
-v $(pwd)/esplugins:/usr/share/elasticsearch/plugins \
--name elasticsearch elasticsearch:8.11.1
$ docker pull leap233/kmatrix2:v1
$ cd KMatrix-2
$ sh docker_start.sh
KMatrix-2 is an open-source toolkit that supports comprehensive heterogeneous knowledge collaborative enhancement for Large Language Models(K-LLMs). We inherit the main framework of KMatrix, but place it in the background to hide K-LLMs design details. A rich of modular components (like Retrievers, Generators, Conflict Resolver, etc) and several typical enhancement patterns (such as loop and adaptive patterns) are encapsulated, and can be combined to conveniently construct mainstream heterogeneous K-LLMs systems.
Our toolkit consists of three sections: Knowledge Base Management, System Design and Interaction, and Task Management and Evaluation. The screencast video of our toolkit are available at here, which introduces the detailed instructions for using KMatrix-2.