Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement the use of tools #49

Merged
merged 40 commits into from
May 15, 2024
Merged

feat: implement the use of tools #49

merged 40 commits into from
May 15, 2024

Conversation

olimorris
Copy link
Owner

@olimorris olimorris commented May 5, 2024

Andrew Ng outlines brilliantly in Part 3 of Agentic Design Patterns, the power of combining tools with LLMs. From doing simple tasks such as executing code that an LLM has generated through to browsing the web for up to date information (RAG).

Below I outline the approach I've taken:

  • Using a system prompt, we share with the LLM the tools it has available and outline how they can be called
  • The LLM uses judgement to determine when and if it needs to call a tool
  • The LLM can call a tool via a ## tools heading and some XML:
<tool>
  <name>code_runner</name>
  <parameters>
    <inputs>
      <lang>ruby</lang>
      <code>
strings = ["Hello", "Hi there", "Greetings", "Salutations"]

strings.each do |s|
  puts s
end
      </code>
      <version>3.1.0</version>
    </inputs>
  </parameters>
</tool>
  • The plugin then parses the response via Tree-sitter and extracts the XML, handing it off to the appropriate tool.

Considerations

Giving an LLM the power to run commands on your machine may as well be accompanied by yelling "YOLO!". This implementation allows for any executions to take place in a remote env like Docker.

At present, this implementation keeps tools away from regular chat buffers with no ability to add them in.

@olimorris
Copy link
Owner Author

@nuvic and @lazymaniac

As two awesome contributors, I thought this might be something you'd be interested in. It's not quite ready to test yet but the idea is to add the ability for the plugin to execute code remotely (in this case in a Docker container) and then put the output into the chat to verify if it's correct.

I'll be keen to get your ideas on features/functionality in the coming days.

@olimorris
Copy link
Owner Author

olimorris commented May 7, 2024

2024-05-07.21_46_04.-.WezTerm.mp4

@nuvic and @lazymaniac - A video of the progress. Would love to get your initial thoughts.

Behind the scenes, the plugin is parsing the XML that the LLM has generated and uses it to initiate a code_runner tool. However just before that, it pulls down a Docker image and runs the code from the LLM in a Docker container.

In the video we do ask the LLM to write some code in Python and Ruby and then share the outputs back with them.

@lazymaniac
Copy link
Contributor

This looks really cool, but I think manually implementing tools and RAG might require significant work. A simple code executor is fine, but LLMs often are just stupid. Moving the RAG and code execution components to a Python app might be better in the long term since the best frameworks are written in Python (AutoGen, LangChain, LlamaIndex, CrewAI).

RAG itself is quite complex if you're aiming for good results, especially when working with source code. The workflow is as follows: load data (project code, web scraping), preprocess the data (splitting and adding metadata), enhance the data by extracting, for instance, keywords or summaries, feed it into an embedding model, store the output in a vector database, and then create a retriever with suitable post-processors like a reranker. This provides a strong foundation for RAG, but it's only the first step in retrieving relevant data that fits within the context window of LLMs.

Next is the agent flow, which can take various forms. For instance, if you want to focus on changes within a single file, it's simpler. However, if the task is larger, it might require breaking it down into smaller tasks and using more complex agent combinations, such as one agent orchestrating, another searching for documentation via web search, another managing the backlog, and so forth.

Tool usage is also quite tricky because LLMs are unreliable in this aspect. They tend to either add too much or omit critical parts of the tool schema. In such cases, self-correcting tools are helpful, where the output from the LLM is validated against the schema and asked for correction if needed.

So, to sum up, I'm not sure what the ultimate goal is. If this is where you want to stop, then it's more than enough. If you want to be more adventurous and try something more complex, it might be good to explore other options. Of course, using the frameworks mentioned earlier is not mandatory—they are just abstractions over other tools and frameworks—and many different approaches are possible.

@olimorris
Copy link
Owner Author

@lazymaniac - Thanks for such a beautifully crafted and insightful response.

I've explored LangChain and LlamaIndex in some detail and realise that this is one complex and rapidly evolving field. Something which is definitely out of scope for this plugin. In this PR I wanted to see how easily I could build in the ability to get an LLM to run external commands and then share the outputs with it for self-reflection purposes. Initially I envisage this only for basic code execution.

@olimorris
Copy link
Owner Author

Tools.mp4

@olimorris olimorris merged commit b36d9dc into main May 15, 2024
2 checks passed
@olimorris olimorris deleted the feat/assistants-run-cmds branch May 15, 2024 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants