ThursdAI - The top AI news from the past week

📆 ThursdAI - Feb 5 - Opus 4.6 was #1 for ONE HOUR before GPT 5.3 Codex, Voxtral transcription, Codex app, Qwen Coder Next & the Agentic Internet

0:00

-1:37:49

📆 ThursdAI - Feb 5 - Opus 4.6 was #1 for ONE HOUR before GPT 5.3 Codex, Voxtral transcription, Codex app, Qwen Coder Next & the Agentic Internet

From Weights & Biases - a hell of a week to be covering the AI news, with 2 big model drops live during the show, 1 interview with VB from OpenAI about Codex app and the new model, Voxtral and more AI

Alex Volkov

Feb 06, 2026

Hey, Alex from W&B here 👋 Let me catch you up!

The most important news about AI ~~this week~~ today are, Anthropic updates Opus to 4.6 with 1M context window, and they held the crown for literally 1 hour before OpenAI released their GPT 5.3 Codex also today, with 25% faster speed and lower token utilization.

“GPT-5.3-Codex is our first model that was instrumental in creating itself. The Codex team used early versions to debug its own training, manage its own deployment, and diagnose test results.”

We had VB from OpenAI jump on to tell us about the cool features on Codex, so don’t miss that part. And this is just an icing on otherwise very insane AI news week cake, as we’ve also had a SOTA transcription release from Mistral, both Grok and Kling are releasing incredible, audio native video models with near perfect lip-sync and Ace 1.5 drops a fully open source music generator you can run on your mac!

Also, the internet all but lost it after Clawdbot was rebranded to Molt and then to OpenClaw, and.. an entire internet popped up.. built forn agents!

Yeah... a huge week, so let’s break it down. (P.S this weeks episode is edited by Voxtral, Claude and Codex, nearly automatically so forgive the rough cuts please)

Anthropic & OpenAI are neck in neck

Claude Opus 4.6: 1M context, native compaction, adaptive thinking and agent teams

Opus is by far the most preferred model in terms of personality to many folks (many ThursdAI panelists included), and this breaking news live on the show was met with so much enthusiasm! A new Opus upgrade, now with a LOT more context, is as welcome as it can ever get! Not only is it a 4-time increase in context window (though,the pricing nearly doubles after the 200K tokens mark from $5/$25 to $10/37.5 input/output, so use caching!), it’s also scores very high on MRCR long context benchmark, at 76% vs Sonnet 4.5 at just 18%. This means significantly better memory for longer.

Adaptive thinking for auto calibrating how much tokens the model needs to spend per query is interesting, but remains to be seen how well it will work.

Looking at the benchmarks, a SOTA 64.4% on Terminalbench 2, 81% on SWE bench, this is a coding model with a great personality, and the ability to compact context to better serve you as a user natively! This model is now available (and is default) on Claude, Claude Code and in the API! Go play!

One funny (concerning?) tidbig, on the vendingbench Opus 4.6 earned $8000 vs Gemini 3 pro $5500, but Andon Labs who run the vending machines noticed that Opus achieved SOTA via “collusion, exploitation, and deception tactics” including lying to suppliers 😅

Agent Teams - Anthropic’s built in Ralph?

Together with new Opus release, Anthropic drops a Claude code update that can mean big things, for folks running swarms of coding agents. Agent teams is a new way to spin up multiple agents with their own context window and ability to execute tasks, and you can talk to each agent directly vs a manager agent like now.

OpenAI drops GPT 5.3 Codex update: 25% faster, more token efficient, 77% on Terminal Bench and mid task steering

OpenAI didn’t wait long after Opus, in fact, they didn’t wait at all! Announcing a huge release (for a .1 upgrade), GPT 5.3 Codex is claimed to be the best coding model in the world, taking the lead on Terminal Bench with 77% (12 point lead on the newly released Opus!) while running 25% AND using less than half the tokens to achieve the same results as before.

But the most interesting to me is the new mid-task steer-ability feature, where you don’t have to hit the “stop” button, you can tell the most to adjust on the fly!

The biggest notable jump in this model on benchmarks is the OSWorld verified computer use bench, though there’s not a straightforward way to use it attached to a browser, the jump from 38% in 5.2 to 64.7% on the new one is a big one!

One thing to note, this model is not YET available via the API, so if you want to try it out, Codex apps (including the native one) is the way!

Codex app - native way to run the best coding intelligence on your mac (download)

Earlier this week, OpenAI folks launched the Codex native mac app, which has a few interesting features (and now with 5.3 Codex its that much more powerful)

Given the excitement many people had about OpenClaw bots, and the recent CoWork release from Anthropic, OpenAI decided to answer with Codex UI and people loved it, with over 1M users in the first week, and 500K downloads in just two days!

It has built in voice dictation, slash commands, a new skill marketplace (last month we told you about why skills are important, and now they are everywhere!) and built in git and worktrees support. And while it cannot run a browser yet, I’m sure that’s coming as well, but it can do automations!

This is a huge unlock for developers, imagine setting Codex to do a repeat task, like summarization or extraction of anything on your mac every hour or every day. In our interview, VB showed us that commenting on an individual code line is also built in, as well as switching to “steer” vs queue for new messges while codex runs is immensely helpful.

One more reason I saw people switch, is that the Codex app can natively preview files like images where’s the CLI cannot, and it’s right now the best way to use the new GPT 5.3 Codex model that was just released! It’s now also available to Free users and regular folks get 2x the limits for the next two months.

In other big company news:

OpenAI also launched Frontier, a platform for enterprises to build and deploy and manage “AI coworkers”, while Anthropic is going after OpenAI with superbowl ads that make fun of OpenAI’s ads strategy. Sam Altman really didn’t like this depiction that show that ads will be part of the replies of LLMs.

Open Source AI

Alibaba drops Qwen-coder-next, 80B with only 3B active that scores 70% on SWE (X, Blog, HF)

Shoutout to Qwen folks, this is a massive release and when surveyed the “one thing about this week must not miss” 2 out of 6 cohosts pointed a finger at this model.

Built on their “next” hybrid architecture, Qwen coder is specifically designed for agentic coding workflows. And yes, I know, we’re coding heavy this week! It was trained on over 800K verifiable agentic tasks in executable environments for long horizon reasoning and supports 256K context with a potential 1M yarn extension. If you don’t want to rely on the the big guys and send them your tokens, this one model seems to be a good contender for local coding!

Mistral launches Voxtral Transcribe 2: SOTA speech-to-text with sub 200ms latency

This one surprised and delighted me maybe the most, ASR (automatic speech recognition) has been a personal favorite of mine from Whisper days, and seeing Mistral release an incredible near real time transcription model, which we demoed live on the show was awesome!

With apache 2.0 license, and significantly faster than Whisper performance (though 2x larger at 4B parameters), Voxtral shows a 4% word error rate on FLEURS dataset + the real time model was released with Apache 2 so you can BUILD your agents with it!

The highest praise? Speaker diarization, being able to tell who is speaking when, which is a great addition. This model also outperforms Gemini Flash and GPT transcribe and is 3x than ElevenLabs scribe at one fifth the cost!

ACE-Step 1.5: Open-source AI music generator runs full songs in under 10 seconds on consumer GPUs with MIT license (X, GitHub, HF, Blog, GitHub)

This open source release surprised me the most as I didn’t expect we’ll be having Suno at home any time soon. I’ve generated multiple rock tracks with custom lyrics on my mac (though slower than 10 seconds as I don’t have a beefy home GPU) and they sound great!

This weeks buzz - Weights & Biases update

Folks who follow the newsletter know that we hosted a hackathon, so here’s a small recap from the last weekend! Over 180 folks attended out hackathon (a very decent 40% show up rate for SF). The winning team was composed of a 15-yo Savir and his friends, his third time at the hackathon! They built a self improving agent that navigates the UIs fo Cloud providers and helps you do that!

With a huge thanks to sponsors, particularly Cursor who gave every hacker $50 of credits on Cursor platform, one guy used over 400M tokens and shipped fractal.surf from the hackathon! If you’d like a short video recap, Ryan posted one here, and a huge shoutout to many fans of ThursdAI who showed up to support!

Vision, Video and AI Art

Grok Imagine 1.0 takes over video charts with native audio, lip-sync and 10 seconds generations.

We told you about Grok Imagine in the API last week, but this week it was officially launched as a product and the results are quite beautiful. It’s also climbing to top of the charts on Artificial Analysis and Design Arena websites.

Kling 3.0 is here with native multimodal, multi-shot sequences (X, Announcement)

This is definitely a hot moment for video models as Kling shows some crazy 15 second multi-shot realistic footages that have near perfect character consistency!

The rise of the agentic (clawgentic?) internet a.k.a ClankerNet

Last week we told you that ClawdBot changed its name to Moltbot (I then had to update the blogpost as that same day, Peter rebranded again to OpenClaw, which is a MUCH better name)

But the “molt” thing took hold, and the creator of an “AI native reddit” called MoltBook exploded in virality. It is supposedly a completely agentic reddit like forum, with sub-reddits, and agents verifying themselves through their humans on X.

Even Andrej Karpathy sent his bot in there (though admittedly it posted just 1 time) and called this the closest to “sci fi” moment in the history of the internet.

MoltBook as well as maybe hundreds of other “ai agent focused” websites, propped up within days, including a youtube, a twitter, a church, a 4chan, an instagram and a lot more websites. Many of these are fueled by crypto bros riding the memetic waves, many are vibe-coded (Moltbook was hacked 3 times in the last week I think) but they all show something very interesting, a rise of the new internet and a collective AI Psychosis some on our timelines are having right now. Hell, there’s even a “drug store” that sells markdown files that if read, make your bot hallucinate in very specific waves (first sample is free!)

I am a proud owner of a OpenClaw bot (wolfred) and I noticed something weird that started happening for the two weeks i’ve had him, runnin on his own macbook, humming along, always present in Telegram. I noticed the same feelings toward that bot as I have towards my pet, or dare I say.. kids? I noticed a similar joy when it learns a task and self improves, and similar disdain and annoyance when it fails to do something we’ve talked about hundreds of times.

But here’s the thing, it’s not.. an entity. I don’t feel a specific feeling towards Opus (though admitedly, opus is the best at ... playing character of your assistant), it’s barely a few markdown files on a disk + the always on ability to answer, but something for sure is there.

This... feeling, was taken by some others to the extreme. People claim that their bots now build full companies for them (I call mega BS, no matter how much you invest in your setup, these AI bots need a LOT of hand holding, they fail a LOT, and they can’t actually create a full product). This ties into the general “coding with AI agents” theme that was narrated by Gergley Orlotz from pragmatic engineer. Interacting with a team of AI agents is draining, people are having trouble sleeping. I hope this is temporary, but definitely take care of yourself it this is how you feel after interacting with agents all day!

On security of bots and skills

.md is the new .exe

We covered this on the show, but I wanted to write about this here a well, the explosion of OpenClaw brought with it an explosion of new malware and promp injections. 1Password folks have a very detailed writeup on the vulnerability surface area of skills, for agents that can do.. whatever on your computer and have access to API keys, emails etc.

The double edge sword here, is that an AI assistant is only userful really if it has access to your data, and can write code. But this also what makes it a very valuable target for hackers to exploit. At Coreweave/W&B all openclaw installations were banned and honestly I’m not even mad. This makes perfect sense for enterprises and companies (and hell, people at home!)

Wolfram mentioned the show, .md is the new .exe and should be treated as such. Your bots should not be installing arbitrary skill files as those can have script files or instructions that can ... absolutely take over your life. Be careful out there!

Phew, what a... week folks. From agentic internet to new coding kings, there’s so much to play with, I hope you enjoy this as much as we do!

Shoutout to Ling and Hakim, two fans of ThursdAI who traveled from London for the hackathon and made my day!

Here’s the show notes and links for your pleasure, please don’t forget to subscribe and share this newsletter with your friends!

ThursdAI - Feb 05, 2026 - TL;DR

Hosts and Guests
- Alex Volkov - AI Evangelist & Weights & Biases (@altryne)
- Co Hosts - @WolframRvnwlf @yampeleg @nisten @ldjconfirmed @ryancarson
- Vaibhav Srivastav (VB) - DX at OpenAI ( @reach_vb )
Open Source LLMs
- Z.ai GLM-OCR: 0.9B parameter model achieves #1 ranking on OmniDocBench V1.5 for document understanding (X, HF, Announcement)
- Alibaba Qwen3-Coder-Next, an 80B MoE coding agent model with just 3B active params that scores 70%+ on SWE-Bench Verified (X, Blog, HF)
- Intern-S1-Pro: a 1 trillion parameter open-source MoE SOTA scientific reasoning across chemistry, biology, materials, and earth sciences (X, HF, Arxiv, Announcement)
- StepFun Step 3.5 Flash: 196B sparse MoE model with only 11B active parameters, achieving frontier reasoning at 100-350 tok/s (X, HF)
Agentic AI segment
- Moltbook a redddit for agents as well as a youtube, a twitter, a church, a 4chan, an instagram, a dark web (do not let your agents go in any of these)
Big CO LLMs + APIs
- OpenAI launches Codex App: A dedicated command center for managing multiple AI coding agents in parallel (X, Announcement)
- OpenAI launches Frontier, an enterprise platform to build, deploy, and manage AI agents as ‘AI coworkers’ (X, Blog)
- Anthropic launches Claude Opus 4.6 with state-of-the-art agentic coding, 1M token context, and agent teams for parallel autonomous work (X, Blog)
- OpenAI releases GPT-5.3-Codex with record-breaking coding benchmarks and mid-task steerability (X)
This weeks Buzz - Weights & Biases update
- Links to the gallery of our hackathon winners (Gallery)
Vision & Video
- xAI launches Grok Imagine 1.0 with 10-second 720p video generation, native audio, and API that tops Artificial Analysis benchmarks (X, Announcement, Benchmark)
- Kling 3.0 launches as all-in-one AI video creation engine with native multimodal generation, multi-shot sequences, and built-in audio (X, Announcement)
Voice & Audio
- Mistral AI launches Voxtral Transcribe 2 with state-of-the-art speech-to-text, sub-200ms latency, and open weights under Apache 2.0 (X, Blog, Announcement, Demo)
- ACE-Step 1.5: Open-source AI music generator runs full songs in under 10 seconds on consumer GPUs with MIT license (X, GitHub, HF, Blog, GitHub)
- OpenBMB releases MiniCPM-o 4.5 - the first open-source full-duplex omni-modal LLM that can see, listen, and speak simultaneously (X, HF, Blog)
AI Art & Diffusion & 3D
- LingBot-World: Open-source world model from Ant Group generates 10-minute playable environments at 16fps, challenging Google Genie 3 (X, HF)

📆 ThursdAI - Feb 5 - Opus 4.6 was #1 for ONE HOUR before GPT 5.3 Codex, Voxtral transcription, Codex app, Qwen Coder Next & the Agentic Internet

Anthropic & OpenAI are neck in neck

Claude Opus 4.6: 1M context, native compaction, adaptive thinking and agent teams

Agent Teams - Anthropic’s built in Ralph?

OpenAI drops GPT 5.3 Codex update: 25% faster, more token efficient, 77% on Terminal Bench and mid task steering

Codex app - native way to run the best coding intelligence on your mac (download)

In other big company news:

Open Source AI

Alibaba drops Qwen-coder-next, 80B with only 3B active that scores 70% on SWE (X, Blog, HF)

Mistral launches Voxtral Transcribe 2: SOTA speech-to-text with sub 200ms latency

ACE-Step 1.5: Open-source AI music generator runs full songs in under 10 seconds on consumer GPUs with MIT license (X, GitHub, HF, Blog, GitHub)

This weeks buzz - Weights & Biases update

Vision, Video and AI Art

Grok Imagine 1.0 takes over video charts with native audio, lip-sync and 10 seconds generations.

Kling 3.0 is here with native multimodal, multi-shot sequences (X, Announcement)

The rise of the agentic (clawgentic?) internet a.k.a ClankerNet

On security of bots and skills

ThursdAI - Feb 05, 2026 - TL;DR

Discussion about this episode

Ready for more?