ThursdAI - Highest signal weekly AI news show
ThursdAI - The top AI news from the past week
๐Ÿ“† ThursdAI - Oct 16 - VEO3.1, Haiku 4.5, ChatGPT adult mode, Claude Skills, NVIDIA DGX spark, Wordlabs RTFM & more AI news
0:00
-1:34:38

๐Ÿ“† ThursdAI - Oct 16 - VEO3.1, Haiku 4.5, ChatGPT adult mode, Claude Skills, NVIDIA DGX spark, Wordlabs RTFM & more AI news

From Weights & Biases - Google AI Finds Cancer Clue, VEO 3.1, Haiku 4.5, ChatGPT Erotica, Agentic Win 11, Claude Skills, NVIDIA DGX, Wordlabs RTFM, Ads in Amp & SWE-GREP Cognition + 4 interviews

Hey folks, Alex here.

Can you believe itโ€™s already the middle of October? This weekโ€™s show was a special one, not just because of the mind-blowing news, but because we set a new ThursdAI record with four incredible interviews back-to-back!

We had Jessica Gallegos from Google DeepMind walking us through the cinematic new features in VEO 3.1. Then we dove deep into the world of Reinforcement Learning with my new colleague Kyle Corbitt from OpenPipe. We got the scoop on Ampโ€™s wild new ad-supported free tier from CEO Quinn Slack. And just as we were wrapping up, Swyx ( from Latent.Space , now with Cognition!) jumped on to break the news about their blazingly fast SWE-grep models.

But the biggest story? An AI model from Google and Yale made a novel scientific discovery about cancer cells that was then validated in a lab. This is it, folks. This is the โ€œletโ€™s fucking goโ€ moment weโ€™ve been waiting for. So buckle up, because this week was an absolute monster. Letโ€™s dive in!

ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Open Source: An AI Model Just Made a Real-World Cancer Discovery

We always start with open source, but this week felt different. This week, open source AI stepped out of the benchmarks and into the biology lab.

Our friends at Qwen kicked things off with new 3B and 8B parameter versions of their Qwen3-VL vision model. Itโ€™s always great to see powerful models shrink down to sizes that can run on-device. Whatโ€™s wild is that these small models are outperforming last generationโ€™s giants, like the 72B Qwen2.5-VL, on a whole suite of benchmarks. The 8B model scores a 33.9 on OS World, which is incredible for an on-device agent that can actually see and click things on your screen. For comparison, thatโ€™s getting close to what we saw from Sonnet 3.7 just a few months ago. The pace is just relentless.

But then, Google dropped a bombshell. A 27-billion parameter Gemma-based model they developed with Yale, called C2S-Scale, generated a completely novel hypothesis about how cancer cells behave. This wasnโ€™t a summary of existing research; it was a new idea, something no human scientist had documented before. And hereโ€™s the kicker: researchers then took that hypothesis into a wet lab, tested it on living cells, and proved it was true.

This is a monumental deal. For years, AI skeptics like Gary Marcus have said that LLMs are just stochastic parrots, that they canโ€™t create genuinely new knowledge. This feels like the first, powerful counter-argument. Friend of the pod, Dr. Derya Unutmaz, has been on the show before saying AI is going to solve cancer, and this is the first real sign that he might be right. The researchers noted this was an โ€œemergent capability of scale,โ€ proving once again that as these models get bigger and are trained on more complex dataโ€”in this case, turning single-cell RNA sequences into โ€œsentencesโ€ for the model to learn fromโ€”they unlock completely new abilities. This is AI as a true scientific collaborator. Absolutely incredible.

Big Companies & APIs

The big companies werenโ€™t sleeping this week, either. The agentic AI race is heating up, and weโ€™re seeing huge updates across the board.

Claude Haiku 4.5: Fast, Cheap Model Rivals Sonnet 4 Accuracy (X, Official blog, X)

First up, Anthropic released Claude Haiku 4.5, and it is a beast. Itโ€™s a fast, cheap model thatโ€™s punching way above its weight. On the SWE-bench verified benchmark for coding, it hit 73.3%, putting it right up there with giants like GPT-5 Codex, but at a fraction of the cost and twice the speed of previous Claude models. Nisten has already been putting it through its paces and loves it for agentic workflows because it just follows instructions without getting opinionated. It seems like Anthropic has specifically tuned this one to be a workhorse for agents, and it absolutely delivers.

The thing to note also is the very impressive jump in OSWorld (50.7%), which is a computer use benchmark, and at this price and speed ($1/$5 MTok input/output) is going to make computer agents much more streamlined and speedy!

ChatGPT will loose restrictions; age-gating enables โ€œadult modeโ€ with new personality features coming (X)

Sam Altman set X on fire with a thread announcing that ChatGPT will start loosening its restrictions. Theyโ€™re planning to roll out an โ€œadult modeโ€ in December for age-verified users, potentially allowing for things like erotica. More importantly, theyโ€™re bringing back more customizable personalities, trying to recapture some of the magic of GPT-4.0 that so many people missed. It feels like theyโ€™re finally ready to treat adults like adults, letting us opt-in to R-rated conversations while keeping strong guardrails for minors. This is a welcome change, and weโ€™ve been advocating for this for a while, and itโ€™s a notable change from the XAI approach I covered last week. Opt in for adults with verification while taking precautions vs engagement bait in the form of a flirty animated waifu with engagement mechanics.

Microsoft is making every windows 11 an AI PC with copilot voice input and agentic powers (Blog,X)

And in breaking news from this morning, Microsoft announced that every Windows 11 machine is becoming an AI PC. Theyโ€™re building a new Copilot agent directly into the OS that can take over and complete tasks for you. The really clever part? It runs in a secure, sandboxed desktop environment that you can watch and interact with. This solves a huge problem with agents that take over your mouse and keyboard, locking you out of your own computer. Now, you can give the agent a task and let it run in the background while you keep working. This is going to put agentic AI in front of hundreds of millions of users, and itโ€™s a massive step towards making AI a true collaborator at the OS level.

NVIDIA DGX - the tiny personal supercomputer at $4K (X, LMSYS Blog)

NVIDIA finally delivered their promised AI Supercomputer, and while the excitement was in the air with Jensen hand delivering the DGX Spark to OpenAI and Elon (recreating that historical picture when Jensen hand delivered a signed DGX workstation while Elon was still affiliated with OpenAI). The workstation was sold out almost immediately. Folks from LMSys did a great deep dive into specs, all the while, folks on our feeds are saying that if you want to get the maximum possible open source LLMs inference speed, this machine is probably overpriced, compared to what you can get with an M3 Ultra Macbook with 128GB of RAM or the RTX 5090 GPU which can get you similar if not better speeds at significantly lower price points.

Anthropicโ€™s โ€œClaude Skillsโ€: Your AI Agent Finally Gets a Playbook (Blog)

Just when we thought the week couldnโ€™t get any more packed, Anthropic dropped โ€œClaude Skills,โ€ a huge upgrade that lets you give your agent custom instructions and workflows. Think of them as expertise folders you can create for specific tasks. For example, you can teach Claude your personal coding style, how to format reports for your company, or even give it a script to follow for complex data analysis.

The best part is that Claude automatically detects which โ€œSkillโ€ is needed for a given task, so you donโ€™t have to manually load them. This is a massive step towards making agents more reliable and personalized, moving beyond just a single custom instruction and into a library of repeatable, expert processes. Itโ€™s available now for all paid users, and itโ€™s a feature Iโ€™ve been waiting for. Our friend Simon Willison things skills may be a bigger deal than MCPs!

๐ŸŽฌ Vision & Video: Veo 3.1, Sora Gets Longer, and Real-Time Worlds

The AI video space is exploding. We started with an amazing interview with Jessica Gallegos, a Senior Product Manager at Google DeepMind, all about the new Veo 3.1. This is a significant 0.1 update, not a whole new model, but the new features are game-changers for creators.

The audio quality is way better, and theyโ€™ve massively improved video extensions. The model now conditions on the last second of a clipโ€”including the audio. This means if you extend a video of someone talking, they keep talking in the same voice! This is huge, saving creators from complex lip-syncing and dubbing workflows. They also added object insertion and removal, which works on both generated and real-world video. Jessica shared an incredible story about working with director Darren Aronofsky to insert a virtual baby into a live-action film shot, something thatโ€™s ethically and practically very difficult to do on a real set. These are professional-grade tools that are becoming accessible to everyone. Definitely worth listening to the whole interview with Jessica, starting at 00:25:44

Iโ€™ve played with the new VEO in Google Flow, and while I was somewhat (still) disappointed with the UI itself (it froze sometimes and didnโ€™t play). I wasnโ€™t able to upload my own videos to use the insert/remove features Jessica mentioned yet, but saw examples online and they looked great!

Ingredients were also improved with VEO 3.1, where you can add up to 3 references, and they will be included in your video but not as first frame, the model will use them to condition the vidoe generation. Jessica clarified that if you upload sound, as in, your voice, it wonโ€™t show up in the model as your voice yet, but maybe they will add this in the future (at least this was my feedback to her).

SORA 2 extends video gen to 15s for all, 25 seconds to pro users with a new storyboard

Not to be outdone, OpenAI pushed a bit of an update for Sora. All users can now generate up to 15-second clips (up from 8-10), and Pro users can go up to 25 seconds using a new storyboard feature. Iโ€™ve been playing with it, and while the new scene-based workflow is powerful, Iโ€™ve noticed the quality can start to degrade significantly in the final seconds of a longer generation (posted my experiments here) as you can see. The last few shot so the cowboy donโ€™t have any action, and the face is a blurry mess.

Worldlabs RTFM: Real-Time Frame Model renders 3D worlds at interactive speeds on a single H100 ( X, Blog, Demo )

And just when we thought weโ€™d seen it all, World Labs dropped a breaking news release: RTFM, the Real-Time Frame Model. This is a generative world model that renders interactive, 3D-consistent worlds on the fly, all on a single H100 GPU. Instead of pre-generating a 3D environment, itโ€™s a โ€œlearned rendererโ€ that streams pixels as you move. We played with the demo live on the show, and itโ€™s mind-blowing. The object permanence is impressive; you can turn 360 degrees and the scene stays perfectly coherent. It feels like walking around inside a simulation being generated just for you.


This Weekโ€™s Buzz: RL Made Easy with Serverless RL + interview with Kyle Corbitt

It was a huge week for us at Weights & Biases and CoreWeave. I was thrilled to finally have my new colleague Kyle Corbitt, founder of OpenPipe, back on the show to talk all things Reinforcement Learning (RL).

RL is the technique behind the massive performance gains weโ€™re seeing in models for tasks like coding and math. At a high level, it lets a model try things, and then you โ€œrewardโ€ it for good outcomes and penalize it for bad ones, allowing it to learn strategies that are better than what was in its original training data. The problem is, itโ€™s incredibly complex and expensive to set up the infrastructure. You have to juggle an inference stack for generating the โ€œrolloutsโ€ and a separate training stack for updating the model weights.

This is the problem Kyle and his team have solved with Serverless RL, which we just launched and we covered last week. Itโ€™s a new offering that lets you run RL jobs without managing any servers or GPUs. The whole thing is powered by the CoreWeave stack, with tracing and evaluation beautifully visualized in Weave.

We also launched a new model from the OpenPipe team on our inference service: a fine-tune-friendly โ€œinstructโ€ version of Qwen3 14B. The team is not just building amazing products, theyโ€™re also contributing great open-source models. Itโ€™s awesome to be working with them.


๐Ÿ› ๏ธ Tools & Agents: Free Agents & Lightning-Fast Code Search

The agentic coding space saw two massive announcements this week, and we had the representatives of both companies on the show to discuss them!

First, Quinn Slack from Amp announced that theyโ€™re launching a completely free, ad-supported tier. Iโ€™ll be honest, my first reaction was, โ€œAds? In my coding agent? Eww.โ€ But the more I thought about it, the more it made sense. My AI subscriptions are stacking up, and this model makes powerful agentic coding accessible to students and developers who canโ€™t afford another $20/month. The ads are contextual to your codebase (think Baseten or Axiom), and theyโ€™re powered by a rotating mix of models using surplus capacity from providers. Itโ€™s a bold and fascinating business model.

This move was met with generally positive responses, though folks from a competing agent, claim that Amp is serving Grok-4-fast which XAI is giving out for free anyway? Weโ€™ll see how this shakes up.

Cognition announces SWE-grep: RL-powered multi-turn context retriever for agentic code search (Blog, X, Playground, Windsurf)

Then, just as we were about to sign off, friend of the pod Swyx (now from Cognition) dropped in with breaking news about SWE-grep. Itโ€™s a new, RL-tuned sub-agent for their Windsurf editor that makes code retrieval and context gathering ridiculously fast. Weโ€™re talking over 2,800 tokens per second. (yes, they are using Cerebras under the hood)

The key insight from Swyx is that their model was trained for natively parallel tool calling, running up to eight searches on a codebase simultaneously. This speeds up the โ€œreadโ€ phase of an agentโ€™s workflowโ€”which is 60-70% of the workโ€”by 3-5x. Itโ€™s all about keeping the developer in a state of flow, and this is a huge leap forward in making agent interactions feel instantaneous. Swyx also dropped a hint that the next thing that comes is CodeMaps and they will make these retrievers look trivial!


This was one for the books, folks. An AI making a novel cancer discovery, video models taking huge leaps, and the agentic coding space is on fire. The pace of innovation is just breathtaking. Thank you for being a ThursdAI subscriber, and as always, hereโ€™s the TL:DR and show notes for everything that happened in AI this week.

TL;DR and Show Notes

  • Hosts and Guests

  • Open Source LLMs

    • KAIST KROMo - bilingual Korean/English 10B (HF, Paper)

    • Qwen3-VL 3B and 8B (X post, HF)

    • Googleโ€™s C2S-Scale 27B: AI Model Validates Cancer Hypothesis in Living Cells (X, Blog, Paper)

  • Big CO LLMs + APIs

    • Claude Haiku 4.5: Fast, Cheap Model Rivals Sonnet 4 Accuracy (X, Official blog)

    • ChatGPT will loose restrictions; age-gating enables โ€œadult modeโ€ with new personality features coming (X)

    • OpenAI updates memory management - no more โ€œmemory fullโ€ (X, FAQ)

    • Microsoft is making every windows 11 an AI PC with copilot voice input (X)

    • Claude Skills: Custom instructions for AI agents now live (X, Anthropic News, YouTube Demo)

  • Hardware

    • NVIDIA DGX Spark: desktop personal supercomputer for AI prototyping and local inference (LMSYS Blog)

    • Apple announces M5 chip with double AI performance (Apple Newsroom)

    • OpenAI and Broadcom set to deploy 10 gigawatts of custom AI accelerators (Official announcement)

  • This weeks Buzz

    • New model - OpenPipe Qwen3 14B instruct (link)

    • Interview with Kyle Corbitt - RL, Serverless RL

    • W&B Fully Connected London & Tokyo in 20 days - SIGN UP

  • Vision & Video

    • Veo 3.1: Googleโ€™s Next-Gen Video Model Launches with Cinematic Audio (Developers Blog)

    • Sora up to 15s and pro now up to 25s generation with a new storyboard feature

    • Baiduโ€™s MuseStreamer has >20 second generations (X)

  • AI Art & Diffusion & 3D

    • Worldlabs RTFM: Real-Time Frame Model renders 3D worlds at interactive speeds on a single H100 (Blog, Demo)

    • DiT360: SOTA Panoramic Image Generation with Hybrid Training (Project page, GitHub)

    • Riverflow 1 tops the imageโ€‘editing leaderboard (Sourceful blog)

  • Tools

    • Amp launches a Free tier - powered by ads and surplus model capacity (Website)

    • Cognition SWE-grep: RL-powered multi-turn context retriever for agentic code search (Blog, Playground)

Discussion about this episode

User's avatar

Ready for more?