open source TypeScript agent... | Hacker News #887
Labels
AI-Agents
Autonomous AI agents using LLMs
AI-Chatbots
Topics related to advanced chatbot platforms integrating multiple AI models
ai-platform
model hosts and APIs
Anthropic-ai
Related to anthropic.ai and their Claude LLMs
Automation
Automate the things
code-generation
code generation models and tools like copilot and aider
Git-Repo
Source code repository like gitlab or gh
llm
Large Language Models
llm-benchmarks
testing and benchmarking large language models
software-engineering
Best practice for software engineering
I'm excited to test this out! I've been building an open source TypeScript agent... | Hacker News
Snippet
Content
From the Anthropic model guide:
Agentic Coding Claude 3.5 Sonnet solves 64% of problems on an internal agentic coding evaluation, compared to 38% for Claude 3 Opus. Our evaluation tests a model's ability to understand an open source codebase and implement a pull request, such as a bug fix or new feature, given a natural language description of the desired improvement. For each problem, the model is evaluated based on whether all the tests of the codebase pass for the completed code submission. The tests are not visible to the model, and include tests of the bug fix or new feature. To ensure the evaluation mimics real world software engineering, we based the problems on real pull requests submitted to open source codebases. The changes involve searching, viewing, and editing multiple files (typically three or four, as many as twenty). The model is allowed to write and run code in an agentic loop and iteratively self-correct during evaluation. We run these tests in a secure sandboxed environment without access to the internet.
Suggested labels
None
The text was updated successfully, but these errors were encountered: