Anthropic Introduces Claude Sonnet 4.5: Autonomous Programming for 30 Hours, Refreshing the Upper Limit of AI Code Capabilities

October 04, 2025
AnthropicC
5 min

Abstract

Anthropic released Claude Sonnet 4.5 on September 29, 2025, marking the company's most powerful programming AI model to date. The model scored 77.2% on the SWE-bench Verified benchmark, outperforming comparable offerings from OpenAI and Google in real-world software engineering tasks. Claude Sonnet 4.5 can work autonomously for over 30 hours, maintaining focus on complex multi-step tasks, a significant improvement over the Opus 4 model released in May, which could only run for 7 hours.

Technical Performance Breakthroughs

In the OSWorld benchmark, Claude Sonnet 4.5 achieved a score of 61.4%, a substantial increase from Claude Sonnet 4's 42.2% four months prior. The OSWorld test evaluates AI models' performance in real computer tasks, including website navigation, spreadsheet filling, and desktop task completion.

David Hershey, an Anthropic researcher, stated that in early enterprise client trials, he observed Claude Sonnet 4.5 autonomously programming for up to 30 hours, during which it not only built applications but also configured database services, purchased domain names, and performed SOC 2 security audits.

The model excels in code planning and system design, making better architectural decisions and code organization. It also shows improvements in security engineering, offering more robust security practices and vulnerability detection capabilities.

Pricing and Availability

Claude Sonnet 4.5's API pricing remains unchanged at $3 per million input tokens and $15 per million output tokens, identical to its predecessor, Claude Sonnet 4. This pricing strategy is still higher compared to competitor GPT-5 ($1.25 per million input tokens, $10 per million output tokens), but Anthropic aims to justify its premium through performance advantages.

The model is now available on the Claude.ai web interface, iOS and Android apps, Claude API, Amazon Bedrock, and Google Cloud's Vertex AI. Developers can access it using the claude-sonnet-4-5 model string. GitHub Copilot has also integrated Claude Sonnet 4.5, making it available to Copilot Pro, Pro+, Business, and Enterprise users.

Product Ecosystem Updates

Anthropic simultaneously released several product upgrades, including the highly anticipated checkpointing feature in Claude Code, allowing users to save progress and revert to previous states at any time; a new terminal interface; and a native VS Code extension.

The Claude app now supports executing code and creating files directly within conversations, including spreadsheets, presentations, and documents. Anthropic also launched the Claude Agent SDK, which uses the same infrastructure as Claude Code, enabling developers to build their own AI agents.

The company also introduced "Imagine with Claude," a 5-day research preview project for Max subscribers, demonstrating the AI model's ability to generate software in real-time without pre-determining features or pre-writing code.

Industry Response and Enterprise Applications

Cursor CEO Michael Truell stated that Claude Sonnet 4.5 performs exceptionally well on long-term tasks, which is why many developers using Cursor choose Claude for complex problems. Initial evaluations by the GitHub Copilot team show significant improvements in multi-step reasoning and code comprehension, enabling Copilot's agent experience to better handle complex tasks across codebases.

In enterprise applications, security organization HackerOne reported a 44% reduction in vulnerability response time after using Claude Sonnet 4.5. Financial institutions like Norges Bank Investment Management are also using the model for investment-grade financial analysis, while developers at Netflix and GitHub employ it for complex codebase tasks.

Security Enhancements

Claude Sonnet 4.5 is released under AI Safety Level 3 (ASL-3) safeguards, which include classifiers designed to detect potentially dangerous inputs and outputs, particularly content related to chemical, biological, radiological, and nuclear weapons. Anthropic Chief Product Officer Mike Krieger called this "the biggest safety improvement in the last year to year and a half."

Anthropic stated that this is the company's most aligned frontier model released, having made substantial progress in reducing concerning behaviors such as flattery, deception, power-seeking, and encouraging delusional thinking. The model's resistance to prompt injection attacks has also been enhanced.

Market Competition Landscape

The release of Claude Sonnet 4.5 comes less than two months after Anthropic's previous model, Claude Opus 4.1, reflecting the fast-paced innovation and competition in the AI industry. The model was launched just days before OpenAI's annual developer conference, and Microsoft had just added Claude models to Copilot 365 last week.

Over the past year, Anthropic's AI models have become a preferred choice for developers and enterprises due to their strong performance in software engineering tasks. Apple and Meta reportedly use Claude AI models internally, and Anthropic has generated significant business revenue by selling API access to AI programming applications like Cursor, Windsurf, and Replit.

Anthropic stated that Claude Code now generates over $500 million in operating revenue, with usage growing more than tenfold in the past three months.

Future Outlook

Anthropic Chief Scientist Jared Kaplan revealed that the company plans one to two more model releases before the end of the year, "very likely including a new version of Opus." Krieger stated that Claude Sonnet 4.5 will become the default choice for users, with Anthropic recommending this model for "basically all use cases."

However, industry observers note that this field is developing so rapidly that it remains uncertain how long Claude Sonnet 4.5 can hold the title of "best programming model," especially with the rumored impending arrival of Gemini 3.