Claude 4 Leads Software Engineering
In May 2025, Anthropic launched the Claude 4 family, positioning it as a benchmark among state-of-the-art language models. The Claude Opus 4 and Claude Sonnet 4 models surpass previous benchmarks and introduce new capabilities that make the system more productive, interactive, and efficient.
In May 2025, Anthropic released the Claude 4 family, positioning it as a benchmark among next-generation language models. The Claude Opus 4 and Claude Sonnet 4 models surpass previous benchmarks and introduce new capabilities that make the system more productive, interactive, and efficient.
The highlight is the Claude Opus 4, the most advanced model from Anthropic to date. It leads in coding and reasoning tasks and has surpassed key industry benchmarks, including SWE-bench (72.5% task resolution in software) and Human Eval (94.4% accuracy in programming problems). Additionally, it achieved 87.1% on the MATH benchmark and 75.4% on GPQA, putting it ahead of competitors like GPT-4 Turbo across multiple tests.
The Opus 4 has also proven to be performant in extended sessions, maintaining stable performance for hours, which is essential for software engineering tasks and complex analyses.
Meanwhile, the Claude Sonnet 4 represents a significant upgrade over version 3.5, now with improvements in instruction accuracy and code quality, being considered a lighter but still powerful option.
Another central advancement is the "extended reasoning with tool use," a beta feature that allows models to switch between logical reflection and the use of external resources, such as web search engines. This ability enables more sophisticated workflows in which the model searches for information, analyzes results, and makes more informed decisions throughout complex tasks.

Furthermore, Claude 4 brings a much more refined persistent memory capability. The model can store and retrieve facts over interactions, maintaining continuity and personalization over time. This ability is especially useful in corporate contexts or in long-term projects where history needs to be reliably maintained.
The models have also advanced in programming support with the Claude Code tool, now widely available. Natively integrated with editors like VS Code and JetBrains, the model can collaborate directly with developers in real-time, suggesting changes, explaining logic, and editing files as an engineering assistant. Support for background tasks via GitHub Actions further expands its use in automated pipelines.
Related articles
Jun 3, 2025
Jun 3, 2023
