-
-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement the use of tools #49
Conversation
@nuvic and @lazymaniac As two awesome contributors, I thought this might be something you'd be interested in. It's not quite ready to test yet but the idea is to add the ability for the plugin to execute code remotely (in this case in a Docker container) and then put the output into the chat to verify if it's correct. I'll be keen to get your ideas on features/functionality in the coming days. |
2024-05-07.21_46_04.-.WezTerm.mp4@nuvic and @lazymaniac - A video of the progress. Would love to get your initial thoughts. Behind the scenes, the plugin is parsing the XML that the LLM has generated and uses it to initiate a In the video we do ask the LLM to write some code in Python and Ruby and then share the outputs back with them. |
This looks really cool, but I think manually implementing tools and RAG might require significant work. A simple code executor is fine, but LLMs often are just stupid. Moving the RAG and code execution components to a Python app might be better in the long term since the best frameworks are written in Python (AutoGen, LangChain, LlamaIndex, CrewAI). RAG itself is quite complex if you're aiming for good results, especially when working with source code. The workflow is as follows: load data (project code, web scraping), preprocess the data (splitting and adding metadata), enhance the data by extracting, for instance, keywords or summaries, feed it into an embedding model, store the output in a vector database, and then create a retriever with suitable post-processors like a reranker. This provides a strong foundation for RAG, but it's only the first step in retrieving relevant data that fits within the context window of LLMs. Next is the agent flow, which can take various forms. For instance, if you want to focus on changes within a single file, it's simpler. However, if the task is larger, it might require breaking it down into smaller tasks and using more complex agent combinations, such as one agent orchestrating, another searching for documentation via web search, another managing the backlog, and so forth. Tool usage is also quite tricky because LLMs are unreliable in this aspect. They tend to either add too much or omit critical parts of the tool schema. In such cases, self-correcting tools are helpful, where the output from the LLM is validated against the schema and asked for correction if needed. So, to sum up, I'm not sure what the ultimate goal is. If this is where you want to stop, then it's more than enough. If you want to be more adventurous and try something more complex, it might be good to explore other options. Of course, using the frameworks mentioned earlier is not mandatory—they are just abstractions over other tools and frameworks—and many different approaches are possible. |
@lazymaniac - Thanks for such a beautifully crafted and insightful response. I've explored LangChain and LlamaIndex in some detail and realise that this is one complex and rapidly evolving field. Something which is definitely out of scope for this plugin. In this PR I wanted to see how easily I could build in the ability to get an LLM to run external commands and then share the outputs with it for self-reflection purposes. Initially I envisage this only for basic code execution. |
Tools.mp4 |
Andrew Ng outlines brilliantly in Part 3 of Agentic Design Patterns, the power of combining tools with LLMs. From doing simple tasks such as executing code that an LLM has generated through to browsing the web for up to date information (RAG).
Below I outline the approach I've taken:
## tools
heading and some XML:Considerations
Giving an LLM the power to run commands on your machine may as well be accompanied by yelling "YOLO!". This implementation allows for any executions to take place in a remote env like Docker.
At present, this implementation keeps tools away from regular chat buffers with no ability to add them in.