open source TypeScript agent... | Hacker News #887

ShellLM · 2024-08-16T10:45:04Z

I'm excited to test this out! I've been building an open source TypeScript agent... | Hacker News

I'm excited to test this out! I've been building an open source TypeScript agent... | Hacker News

Snippet

I'm excited to test this out! I've been building an open source TypeScript agentic AI platform for work (DevOps related with an autonomous agent and software engineer workflow). The Claude 3 models had an influence on the design with their tuning on using XML and three levels of capabilities, and have been my preferred models to use.
I remember having moments looking at the plans Opus generated and being impressed with its capabilities.
The slow speed of requests I could deal with, but the costs could quickly add up in workflows and the autonomous agent control loop. When GPT4o came out at half the price it made Opus quite pricey in comparison. I'd often thought if I could just have Opus capabilities at a fraction of the price, so its a nice surprise to have it here sooner that I imagined!
The agent platform isn't officially launched yet, but its up at https://github.com/trafficguard/nous
I never liked the Langchain API when I looked at the examples so I built it from scratch. It has an autonomous agent with a custom XML-based function calling, memory and call history. The software engineer agentic workflow I initially dog-fooded with a prompt like "Complete Jira XYZ-123". So it gets the Jira description, finds the appropriate terraform project in GitLab, clones, edits (delegating to Aider), creates a MR and messages on Slack. It also has a UI for running agents, human-in-the-loop interactions etc.

Content

From the Anthropic model guide:

Agentic Coding Claude 3.5 Sonnet solves 64% of problems on an internal agentic coding evaluation, compared to 38% for Claude 3 Opus. Our evaluation tests a model's ability to understand an open source codebase and implement a pull request, such as a bug fix or new feature, given a natural language description of the desired improvement. For each problem, the model is evaluated based on whether all the tests of the codebase pass for the completed code submission. The tests are not visible to the model, and include tests of the bug fix or new feature. To ensure the evaluation mimics real world software engineering, we based the problems on real pull requests submitted to open source codebases. The changes involve searching, viewing, and editing multiple files (typically three or four, as many as twenty). The model is allowed to write and run code in an agentic loop and iteratively self-correct during evaluation. We run these tests in a secure sandboxed environment without access to the internet.

Model	% of problems which pass all tests
Claude 3.5 Sonnet	64%
Claude 3 Opus	38%
Claude 3 Sonnet	21%
Claude 3 Haiku	17%

Suggested labels

None

ShellLM · 2024-08-16T10:45:06Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

open source TypeScript agent... | Hacker News #887

open source TypeScript agent... | Hacker News #887

ShellLM commented Aug 16, 2024

ShellLM commented Aug 16, 2024

open source TypeScript agent... | Hacker News #887

open source TypeScript agent... | Hacker News #887

Comments

ShellLM commented Aug 16, 2024

I'm excited to test this out! I've been building an open source TypeScript agent... | Hacker News

Snippet

Content

Suggested labels

None

ShellLM commented Aug 16, 2024

Related content