2 Comments
User's avatar
Pawel Jozefiak's avatar

Alex, the framing of Opus 4.6 being number one for exactly one hour captures the pace perfectly. But the agent teams feature outlasted the benchmark wars. I tested 4 Opus 4.6 agents in parallel on launch day building different parts of a game project simultaneously. The agents coordinated through filesystem-based task boards and the quality held across extended sessions. Your point about it being Anthropic's built-in Ralph is accurate. The difference between manually delegating to subagents and having a coordinated team that self-organizes is substantial. Documented the full parallel experiment: https://thoughts.jock.pl/p/opus-4-6-agent-experiment-2026

The AI Architect's avatar

Incredible how fast the infrastructure conversation shifted from "can we run this" to "should we run this". The Codex agent security point about .md files being the new .exe really landed with me. Last month I was excited about letting agents touch my file system but now the 1Password writeup has me rethinking every permission I granted. Atleast the skill marketplace approach could sandbox some of this risk if done right.