Chatbot feature (llama3) - Nvidia, AMD or CPU Compatible #265

wizz13150 · 2024-05-14T19:05:54Z

wizz13150
May 14, 2024

Hey,

I recently added a Chatbot feature to my fork.

Here is the chatbot file :
https://github.com/wizz13150/aiyabot/blob/Full_bot/core/chatbotcog.py

The point is to run an llm to discuss with, just by adding a cog to the bot.
But here comes several constraints that I applied :

To run on AMD, NVIDIA or CPU
To run on a 'normal'-end GPU (8gb VRAM), so using a GGUF version (compressed) of the model (Previously GGML)
To not use a slash command interact
To work with any gpt4all compatible models, like Google Gemma or any other good model
To be able to generate images using the bot

So, i went to use GPT4All.
And here comes even more constraints, lol.

So at this point, this is just a draft, a proof of concept.

Here is the current state of the Chatbot part of the bot:

Set for Meta-Llama-3-8B-Instruct.Q4_0.gguf, but can fit any model.
Only one session, for all users and channels.
Only one response at a time.
The bot will respond only when you tag it or reply to it.
There are 3 "!" additional commands. (no need to tag the bot)
!stop, to stop the current text generation.
!reset, to reset the current conversation/session.
!generate, to generate a prompt, then an image.
When the current response from the bot reach the discord 2000 chars limit, a new message is created, to continue smoothly.
The author is tagged in the response, to highlight the message.
The System Prompt is 'reintroduced' in the next message when reaching the context limit. So the bot never forget the instructions in a long conversation, without a session reset.

To get this functionnal, you need to :

add the chatbotcog.py file into the core folder
add gpt4all to the requirements.txt to install gpt4all (the model should be automagically downloaded later), after the torch line:

aiyabot/requirements.txt

Line 8 in 4b3b62f

torch
refactor of the aiya.py file :
wizz13150@2da0c5e

The bot will, with this code, load the Llama-3-8B-Instruct model, on a 8gb GPU.
VRAM usage screenshot (8K tokens context here, 16-18K max with 8gb. Can be extended with another model and more VRAM):

With this code, it's using a second AMD GPU (RX570 generating at ~8 tokens/sec), while the webui is using a main RTX3060 for SD.

Here is a screenshot of an interaction with the Chatbot :

Here is a screenshot example of image generation from the Chatbot :

Live Preview :

Result :

Many ideas comes to my mind, like to let the bot search on internet.
But any other new features imply a structure/logic change, or additionnal ressources.
Also many changes to fully implement this in the last version of the bot.
So... Will see.

What else to say, i'll edit this post when more comes to my mind.
If any peep want to play with this, I would be happy to have any advices or feedbacks.

Cheers ! 🥂

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chatbot feature (llama3) - Nvidia, AMD or CPU Compatible #265

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Chatbot feature (llama3) - Nvidia, AMD or CPU Compatible #265

wizz13150 May 14, 2024

Replies: 0 comments

wizz13150
May 14, 2024