Hey everyone, Happy Halloween! Alex here, coming to you live from my mad scientist lair! For the first ever, live video stream of ThursdAI, I dressed up as a mad scientist and had my co-host, Fester the AI powered Skeleton join me (as well as my usual cohosts haha) in a very energetic and hopefully entertaining video stream!
Since it's Halloween today, Fester (and I) have a very busy schedule, so no super length ThursdAI news-letter today, as we're still not in the realm of Gemini being able to write a decent draft that takes everything we talked about and cover all the breaking news, I'm afraid I will have to wish you a Happy Halloween and ask that you watch/listen to the episode.
The TL;DR and show links from today, don't cover all the breaking news but the major things we saw today (and caught live on the show as Breaking News) were, ChatGPT now has search, Gemini has grounded search as well (seems like the release something before Google announces it streak from OpenAI continues).
Here's a quick trailer of the major things that happened:
This weeks buzz - Halloween AI toy with Weave
In this weeks buzz, my long awaited Halloween project is finally live and operational!
I've posted a public Weave dashboard here and the code (that you can run on your mac!) here
Really looking forward to see all the amazing costumers the kiddos come up with and how Gemini will be able to respond to them, follow along!
Ok and finally my raw TL;DR notes and links for this week. Happy halloween everyone, I'm running off to spook the kiddos (and of course record and post about it!)
ThursdAI - Oct 31 - TL;DR
TL;DR of all topics covered:
Open Source LLMs:
Microsoft's OmniParser: SOTA UI parsing (MIT Licensed) 𝕏
Groundbreaking model for web automation (MIT license).
State-of-the-art UI parsing and understanding.
Outperforms GPT-4V in parsing web UI.
Designed for web automation tasks.
Can be integrated into various development workflows.
ZhipuAI's GLM-4-Voice: End-to-end Chinese/English speech 𝕏
End-to-end voice model for Chinese and English speech.
Open-sourced and readily available.
Focuses on direct speech understanding and generation.
Potential applications in various speech-related tasks.
Meta releases LongVU: Video LM for long videos 𝕏
Handles long videos with impressive performance.
Uses DINOv2 for downsampling, eliminating redundant scenes.
Fuses features using DINOv2 and SigLIP.
Select tokens are passed to Qwen2/Llama-3.2-3B.
Demo and model are available on HuggingFace.
Potential for significant advancements in video understanding.
OpenAI new factuality benchmark (Blog, Github)
Introducing SimpleQA: new factuality benchmark
Goal: high correctness, diversity, challenging for frontier models
Question Curation: AI trainers, verified by second trainer
Quality Assurance: 3% inherent error rate
Topic Diversity: wide range of topics
Grading Methodology: "correct", "incorrect", "not attempted"
Model Comparison: smaller models answer fewer correctly
Calibration Measurement: larger models more calibrated
Limitations: only for short, fact-seeking queries
Conclusion: drive research on trustworthy AI
Big CO LLMs + APIs:
ChatGPT now has Search! (X)
Grounded search results in browsing the web
Still hallucinates
Reincarnation of Search GPT inside ChatGPT
Apple Intelligence Launch: Image features for iOS 18.2 [𝕏]( Link not provided in source material)
Officially launched for developers in iOS 18.2.
Includes Image Playground and Gen Moji.
Aims to enhance image creation and manipulation on iPhones.
GitHub Universe AI News: Co-pilot expands, new Spark tool 𝕏
GitHub Co-pilot now supports Claude, Gemini, and OpenAI models.
GitHub Spark: Create micro-apps using natural language.
Expanding the capabilities of AI-powered coding tools.
Copilot now supports multi-file edits in VS Code, similar to Cursor, and faster code reviews.
GitHub Copilot extensions are planned for release in 2025.
Grok Vision: Image understanding now in Grok 𝕏
Finally has vision capabilities (currently via 𝕏, API coming soon).
Can now understand and explain images, even jokes.
Early version, with rapid improvements expected.
OpenAI advanced voice mode updates (X)
70% cheaper in input tokens because of automatic caching (X)
Advanced voice mode is now on desktop app
Claude this morning - new mac / pc App
This week's Buzz:
My AI Halloween toy skeleton is greeting kids right now (and is reporting to Weave dashboard)
Vision & Video:
Voice & Audio:
MaskGCT: New SoTA Text-to-Speech 𝕏
New open-source state-of-the-art text-to-speech model.
Zero-shot voice cloning, emotional TTS, long-form synthesis, variable speed synthesis, bilingual (Chinese & English).
Available on Hugging Face.
ZhipuAI's GLM-4-Voice: End-to-end Chinese/English speech 𝕏 (see Open Source LLMs for details)
Advanced Voice Mode on Desktops: 𝕏 (See Big CO LLMs + APIs for details).
AI Art & Diffusion: (See Red Panda in "This week's Buzz" above)
Redcraft Red Panda: new SOTA image diffusion 𝕏
High-performing image diffusion model, beating Black Forest Labs Flux.
72% win rate, higher ELO than competitors.
Creates SVG files, editable as vector files.
From Redcraft V3.
Tools:
Bolt.new by StackBlitz: In-browser full-stack dev environment 𝕏
Platform for prompting, editing, running, and deploying full-stack apps directly in your browser.
Uses WebContainers.
Supports npm, Vite, Next.js, and integrations with Netlify, Cloudflare, and SuperBase.
Free to use.
Jina AI's Meta-Prompt: Improved LLM Codegen 𝕏
📆 ThursdAI - Spooky Halloween edition with Video!