Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open source TypeScript agent... | Hacker News #887

Open
1 task
ShellLM opened this issue Aug 16, 2024 · 1 comment
Open
1 task

open source TypeScript agent... | Hacker News #887

ShellLM opened this issue Aug 16, 2024 · 1 comment
Labels
AI-Agents Autonomous AI agents using LLMs AI-Chatbots Topics related to advanced chatbot platforms integrating multiple AI models ai-platform model hosts and APIs Anthropic-ai Related to anthropic.ai and their Claude LLMs Automation Automate the things code-generation code generation models and tools like copilot and aider Git-Repo Source code repository like gitlab or gh llm Large Language Models llm-benchmarks testing and benchmarking large language models software-engineering Best practice for software engineering

Comments

@ShellLM
Copy link
Collaborator

ShellLM commented Aug 16, 2024

I'm excited to test this out! I've been building an open source TypeScript agent... | Hacker News

Snippet

I'm excited to test this out! I've been building an open source TypeScript agentic AI platform for work (DevOps related with an autonomous agent and software engineer workflow). The Claude 3 models had an influence on the design with their tuning on using XML and three levels of capabilities, and have been my preferred models to use.
I remember having moments looking at the plans Opus generated and being impressed with its capabilities.
The slow speed of requests I could deal with, but the costs could quickly add up in workflows and the autonomous agent control loop. When GPT4o came out at half the price it made Opus quite pricey in comparison. I'd often thought if I could just have Opus capabilities at a fraction of the price, so its a nice surprise to have it here sooner that I imagined!
The agent platform isn't officially launched yet, but its up at https://github.com/trafficguard/nous
I never liked the Langchain API when I looked at the examples so I built it from scratch. It has an autonomous agent with a custom XML-based function calling, memory and call history. The software engineer agentic workflow I initially dog-fooded with a prompt like "Complete Jira XYZ-123". So it gets the Jira description, finds the appropriate terraform project in GitLab, clones, edits (delegating to Aider), creates a MR and messages on Slack. It also has a UI for running agents, human-in-the-loop interactions etc.

Content

From the Anthropic model guide:

Agentic Coding Claude 3.5 Sonnet solves 64% of problems on an internal agentic coding evaluation, compared to 38% for Claude 3 Opus. Our evaluation tests a model's ability to understand an open source codebase and implement a pull request, such as a bug fix or new feature, given a natural language description of the desired improvement. For each problem, the model is evaluated based on whether all the tests of the codebase pass for the completed code submission. The tests are not visible to the model, and include tests of the bug fix or new feature. To ensure the evaluation mimics real world software engineering, we based the problems on real pull requests submitted to open source codebases. The changes involve searching, viewing, and editing multiple files (typically three or four, as many as twenty). The model is allowed to write and run code in an agentic loop and iteratively self-correct during evaluation. We run these tests in a secure sandboxed environment without access to the internet.

Model % of problems which pass all tests
Claude 3.5 Sonnet 64%
Claude 3 Opus 38%
Claude 3 Sonnet 21%
Claude 3 Haiku 17%

Suggested labels

None

@ShellLM ShellLM added AI-Agents Autonomous AI agents using LLMs AI-Chatbots Topics related to advanced chatbot platforms integrating multiple AI models ai-platform model hosts and APIs Anthropic-ai Related to anthropic.ai and their Claude LLMs Automation Automate the things code-generation code generation models and tools like copilot and aider Git-Repo Source code repository like gitlab or gh llm Large Language Models llm-benchmarks testing and benchmarking large language models software-engineering Best practice for software engineering labels Aug 16, 2024
@ShellLM
Copy link
Collaborator Author

ShellLM commented Aug 16, 2024

Related content

#682 similarity score: 0.88
#885 similarity score: 0.88
#488 similarity score: 0.87
#812 similarity score: 0.87
#685 similarity score: 0.87
#762 similarity score: 0.87

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI-Agents Autonomous AI agents using LLMs AI-Chatbots Topics related to advanced chatbot platforms integrating multiple AI models ai-platform model hosts and APIs Anthropic-ai Related to anthropic.ai and their Claude LLMs Automation Automate the things code-generation code generation models and tools like copilot and aider Git-Repo Source code repository like gitlab or gh llm Large Language Models llm-benchmarks testing and benchmarking large language models software-engineering Best practice for software engineering
Projects
None yet
Development

No branches or pull requests

1 participant