Hey everyone, sending a quick one today, no deep dive, as I'm still in the middle of AI Engineer World's Fair 2024 in San Francisco (in fact, I'm writing this from the incredible floor 32 presidential suite, that the team here got for interviews, media and podcasting, and hey to all new folks who I’ve just met during the last two days!)
It's been an incredible few days meeting so many ThursdAI community members, listeners and folks who came on the pod! The list honestly is too long but I've got to meet friends of the pod Maxime Labonne, Wing Lian, Joao Morra (crew AI), Vik from Moondream, Stefania Druga not to mention the countless folks who came up and gave high fives, introduced themselves, it was honestly a LOT of fun. (and it's still not over, if you're here, please come and say hi, and let's take a LLM judge selfie together!)
On today's show, we recorded extra early because I had to run and play dress up, and boy am I relieved now that both the show and the talk are behind me, and I can go an enjoy the rest of the conference 🔥 (which I will bring you here in full once I get the recording!)
On today's show, we had the awesome pleasure to have Surya Bhupatiraju who's a research engineer at Google DeepMind, talk to us about their newly released amazing Gemma 2 models! It was very technical, and a super great conversation to check out!
Gemma 2 came out with 2 sizes, a 9B and a 27B parameter models, with 8K context (we addressed this on the show) and this 27B model incredible performance is beating LLama-3 70B on several benchmarks and is even beating Nemotron 340B from NVIDIA!
This model is also now available on the Google AI studio to play with, but also on the hub!
We also covered the renewal of the HuggingFace open LLM leaderboard with their new benchmarks in the mix and normalization of scores, and how Qwen 2 is again the best model that's tested!
It's was a very insightful conversation, that's worth listening to if you're interested in benchmarks, definitely give it a listen.
Last but not least, we had a conversation with Ethan Sutin, the co-founder of Bee Computer. At the AI Engineer speakers dinner, all the speakers received a wearable AI device as a gift, and I onboarded (cause Swyx asked me) and kinda forgot about it. On the way back to my hotel I walked with a friend and chatted about my life.
When I got back to my hotel, the app prompted me with "hey, I now know 7 new facts about you" and it was incredible to see how much of the conversation it was able to pick up, and extract facts and eve TODO's!
So I had to have Ethan on the show to try and dig a little bit into the privacy and the use-cases of these hardware AI devices, and it was a great chat!
Sorry for the quick one today, if this is the first newsletter after you just met me and register, usually there’s a deeper dive here, expect a more in depth write-ups in the next sessions, as now I have to run down and enjoy the rest of the conference!
Here's the TL;DR and my RAW show notes for the full show, in case it's helpful!
AI Engineer is happening right now in SF
Tracks include Multimodality, Open Models, RAG & LLM Frameworks, Agents, Al Leadership, Evals & LLM Ops, CodeGen & Dev Tools, Al in the Fortune 500, GPUs & Inference
Open Source LLMs
HuggingFace - LLM Leaderboard v2 - (Blog)
Old Benchmarks sucked and it's time to renew
New Benchmarks
MMLU-Pro (Massive Multitask Language Understanding - Pro version, paper)
GPQA (Google-Proof Q&A Benchmark, paper). GPQA is an extremely hard knowledge dataset
MuSR (Multistep Soft Reasoning, paper).
MATH (Mathematics Aptitude Test of Heuristics, Level 5 subset, paper)
IFEval (Instruction Following Evaluation, paper)
🤝 BBH (Big Bench Hard, paper). BBH is a subset of 23 challenging tasks from the BigBench dataset
The community will be able to vote for models, and we will prioritize running models with the most votes first
Mozilla announces Builders Accelerator @ AI Engineer (X)
Theme: Local AI
100K non dilutive funding
Big CO LLMs + APIs
UMG, Sony, Warner sue Udio and Suno for copyright (X)
were able to recreate some songs
sue both companies
have 10 unnamed individuals who are also on the suit
Google Chrome Canary has Gemini nano (X)
Super easy to use window.ai.createTextSession()
Nano 1 and 2, at a 4bit quantized 1.8B and 3.25B parameters has decent performance relative to Gemini Pro
Behind a feature flag
Most text gen under 500ms
Unclear re: hardware requirements
Someone already built extensions
someone already posted this on HuggingFace
Anthropic Claude share-able projects (X)
Snapshots of Claude conversations shared with your team
Can share custom instructions
Anthropic has released new "Projects" feature for Claude AI to enable collaboration and enhanced workflows
Projects allow users to ground Claude's outputs in their own internal knowledge and documents
Projects can be customized with instructions to tailor Claude's responses for specific tasks or perspectives
"Artifacts" feature allows users to see and interact with content generated by Claude alongside the conversation
Claude Team users can share their best conversations with Claude to inspire and uplevel the whole team
North Highland consultancy has seen 5x faster content creation and analysis using Claude
Anthropic is committed to user privacy and will not use shared data to train models without consent
Future plans include more integrations to bring in external knowledge sources for Claude
OpenAI voice mode update - not until Fall
Share this post