📆 ThursdAI - Feb 5 - Opus 4.6 was #1 for ONE…

Feb 6

From Weights & Biases - a hell of a week to be covering the AI news, with 2 big model drops live during the show, 1 interview with VB from OpenAI about Codex app and the new model, Voxtral and more AI

Listen →

2 Comments

Pawel Jozefiak

Feb 6

Alex, the framing of Opus 4.6 being number one for exactly one hour captures the pace perfectly. But the agent teams feature outlasted the benchmark wars. I tested 4 Opus 4.6 agents in parallel on launch day building different parts of a game project simultaneously. The agents coordinated through filesystem-based task boards and the quality held across extended sessions. Your point about it being Anthropic's built-in Ralph is accurate. The difference between manually delegating to subagents and having a coordinated team that self-organizes is substantial. Documented the full parallel experiment: https://thoughts.jock.pl/p/opus-4-6-agent-experiment-2026

Incredible how fast the infrastructure conversation shifted from "can we run this" to "should we run this". The Codex agent security point about .md files being the new .exe really landed with me. Last month I was excited about letting agents touch my file system but now the 1Password writeup has me rethinking every permission I granted. Atleast the skill marketplace approach could sandbox some of this risk if done right.

Reply

Share