👋 Hey all, this is Alex, coming to you from the very Sunny California, as I'm in SF again, while there is a complete snow storm back home in Denver (brrr).
I flew here for the Hackathon I kept telling you about, and it was glorious, we had over 400 registered, over 200 approved hackers, 21 teams submitted incredible projects 👏 You can follow some of these here
I then decided to stick around and record the show from SF, and after more than a few times being a guest at other studios I present, the first ThursdAI, recorded LIVE from the newly minted W&B Podcast studio at our office in SF 🎉
This isn't the only “first”, also today for the first time, all of the regular co-hosts of ThursdAI, met on video for the first time, after over a year of hanging out weekly, we've finally made the switch to video, and you know what? Given how good AI podcasts are getting, we may have to stick around with this video thing! We played one such clip from a new model called hertz-dev, which is a <10B model for full duplex audio.
Since that today's episode is a video podcast, I would love for you to see it, so here's the timestamps for the chapters, which will be followed by the TL;DR and show notes in raw format. I would love to hear from folks who read the longer form style newsletters, do you miss them? Should I bring them back? Please leave me a comment 🙏 (I may send you a survey)
This was a generally slow week (for AI!! not for... ehrm other stuff) and it was a fun podcast! Leave me a comment about what you think about this new format.
Chapter Timestamps
00:00 Introduction and Agenda Overview
00:15 Open Source LLMs: Small Models
01:25 Open Source LLMs: Large Models
02:22 Big Companies and LLM Announcements
04:47 Hackathon Recap and Community Highlights
18:46 Technical Deep Dive: HertzDev and FishSpeech
33:11 Human in the Loop: AI Agents
36:24 Augmented Reality Lab Assistant
36:53 Hackathon Highlights and Community Vibes
37:17 Chef Puppet and Meta Ray Bans Raffle
37:46 Introducing Fester the Skeleton
38:37 Fester's Performance and Community Reactions
39:35 Technical Insights and Project Details
42:42 Big Companies API Updates
43:17 Haiku 3.5: Performance and Pricing
43:44 Comparing Haiku and Sonnet Models
51:32 XAI Grok: New Features and Pricing
57:23 OpenAI's O1 Model: Leaks and Expectations
01:08:42 Transformer ASIC: The Future of AI Hardware
01:13:18 The Future of Training and Inference Chips
01:13:52 Oasis Demo and Etched AI Controversy
01:14:37 Nisten's Skepticism on Etched AI
01:19:15 Human Layer Introduction with Dex
01:19:24 Building and Managing AI Agents
01:20:54 Challenges and Innovations in AI Agent Development
01:21:28 Human Layer's Vision and Future
01:36:34 Recap and Closing Remarks
Show Notes and Links:
Interview
Dexter Horthy (X) from HumanLayer
Open Source LLMs
Big CO LLMs + APIs
This weeks Buzz
Voice & Audio
AI Art & Diffusion & 3D
See you next week 👋
Full Transcription for convenience below:
ThursdAI - Nov 7 - your weekly AI news show recorded LIVE
[00:00:00]
Introduction and Agenda Overview
Alex Volkov:
All right folks, it looks like it's working slowly. It's working. Folks are joining us. Yam is the last one that we're gonna join, they're gonna wait for and then we're just gonna go and go and have fun.
so I think that, the agenda for today, let's run through the agenda for today.
Open Source LLMs
Alex Volkov: So for folks who are listening, I'm going to run through the regular sections of our show, which is open source LLMs.
We have not one, but we're going to talk about the, the week of small open source LLMs. SmallLM2 was released by Hugging Face. It's a new, and they call it best, open 1 billion parameter language model. and it looks pretty dope on the metrics, so we're going to chat about the SmallLM1.
Also, I believe NVIDIA respawned as well. I don't have it in my notes, but we're going to go and find this out as well. I think that, let me, small, yeah. These are the ones that I wanted to talk about. Meta Release Mobile LLM with 125 million parameters, 350 million parameters, 600, and 1B. So those are like tiny ones that run on your device probably super fast as well.
And Hug Face with the small LLM [00:01:00] series of models 135 million. That's million with an M. That's not billion. We're used to, we're used to talk about billions of parameters. 135 million, that'sI don't know, your MacBook M1 can probably churn out, Nisten can probably calculate how many tokens super quick, that's going to go super fast.
And AMD also released a, OMO 1 billion parameter, and OMO? We know about OMO.
Open Source LLMs: Large Models
Alex Volkov: Alright, another thing for open source LLMs, we should talk about Tencent's Alright, I'm going to butcher this. Tencent's Hun Yuan? Wanyan, something like this, large, which is a whopping 389 billion. So do you guys see the difference, the scale, the massive scale difference here?
We're talking about 125 million parameters to 389 billion parameters, MOE with 52 billion parameters active. It's a huge language model with 256 context window, 256. thousand tokens in the context window. It's comparable to LLAMA 70B 3. 1 [00:02:00] and very closely comparable also to 400, 405 billion parameter model for LLAMA.
It's Tencent's open source and largest transformer based MOE so far. So that's very impressive. If you guys remember, LLAMA 405B is that's a dense model. This is an MOE. And I think that's pretty much it. Open Source wasn't like popping this week too much.
Big Companies and LLM Announcements
Alex Volkov: But then we're gonna go into big companies and LLMs and there has been some announcements.
I think the biggest announcement is It's not necessarily an AI announcement, but we talk about everything, right? We talk about a bunch of stuff. So OpenAI buys and opens chat. com. And, I, yeah, chat. com is now, redirects to chat. gpt. com. They bought it from Dharmesh. It's very interesting. There's a whole, there's a whole theory about like how much this actually costs.
I think it's around 15 to chat about this as well. and, I put this here, I put this, open the eye drops, 0. 1 full, and, we're gonna wait for this and hit the breaking news button. Also, [00:03:00] 0. 1 full. Right now, if you guys remember, we have 0. 1 mini, 0. 1 preview on the platforms. And, we know that O1full exists, we know this because, Sam told us at Dev Day, and also because, last week, we're going to cover this when we get there, XAI has now, released their API, we talked about this last week, but now they offer 25 bucks a month for free.
in Grok API credits till the end of the year, so you can play with it. And there's a whole thing to talk about Peter levels and the number of clicks it gets to get the API key. And I'm salty, folks. If you want to stay here for this, I am salty, about this thing. Also, Anthropic releases Haiku 3. 5. I think probably this should be like the biggest kind of news up here.
This is super dope,if I'm able to like wrangle with my notes thingy. Okay, Haiku 3. 5 is super cool. the price there is very interesting. So we're going to chat about the pricing structure of this thing. And, I'm getting pings that I'm missing HerdsDev, so we're going to chat about HerdsDev as well, whatever that is.
Yam and Nisten, I would love to hear from you guys about Etched. [00:04:00] Etched is a company that announced Sonu, the first transformer ASIC and For folks who have no idea what I just said, ASIC is a custom based hardware device that spits out the transformers will etch into the device itself to the chip, on the chip.
And then, they are talking about 500, 000 tokens per second output from this LLM. 500, 000 tokens per second. It's insane. Insane. GPUs do 100, 200, the top ones do 2000, the LPUs and the NPUs and whatever, Cerebrus has and, whatever. This is 500, 000. we're gonna chat about what does an ASIC mean in the Transformers world.
A small thing about perplexity being valued at 9 billion parameters. And Meta and Scale. ai have announced Defense LLAMA.
Hackathon Recap and Community Highlights
Alex Volkov: In this week's buzz, this week's buzz is a reminder, this week's buzz is a category where I chat about everything related to Weights Biases. About the stuff that we have, the courses that we have, the product announcement that we have.
So we're going to do a recap of the hackathon that we had here with the [00:05:00] AI tinkerers this last weekend. It was dope! Dex, was it dope? Oh,
Dexter Horthy: it was super
Alex Volkov: dope. Awesome. Dex was here, and a bunch of other folks joined us as well. I really enjoyed and had a lot of fun. Chef made an appearance, that we built for the hackathon previously.
And Chef was here roasting, Lucas, the CEO of Weights Biases, got roasted. And,that was fun. It got pretty spicy there. Yeah, Chef is foul mouthed as well. And, he uses Gemini, and you don't even have to break it. Jailbreak Gemini for that spiciness, so that's pretty cool. So Chef is actually here somewhere in the back, taking a break.
But yeah, Hackathon was super dope, we're going to chat about some, maybe some projects. the second AI hardware device puppet that I have, is fester for the Halloween toy. I think I have a problem. I think I need to go see a person about the AI hardware devices, fester. The Halloween toy finally had its debut.
If you guys remember last week, I jumped on the, oh, I should probably stop sharing. The screen I Jumped on the Twitter space and the YouTube and the reason for this was because I was wearing a costume And I [00:06:00] wanted to show you Fester. He was right here with me after that Fester actually went and saw Kitties and actually did what it was supposed to do, identify customs.
I really want to chat with you about this. I even, I think I have some videos to show now that the format is multimodal. I can show some pictures. The folks on Twitter Space, I have a link for you as well. I have a full write up of how to actually do this. If you want to build, let's say Santa's, whatever for you, you could, I'll send you to the write up as well.
And I definitely want to chat about this and let me go back to my Do my thank you super quick, share. Alright, you guys should see my notes, and Oh yeah, and I want to invite those of you who are coming out to NeurIPS, NeurIPS is coming out very soon, in a month, and we're going to have a party at NeurIPS, it's in Vancouver, Canada, and we have a party there at WNB.
I will have another thing to announce with Weights Biases soon, but not yet. In voice and audio, we have a tiny thing called Fish Agent. It's super cool. It's a continuation of [00:07:00] Quen 2. 53b, the trend on 200 billion audio and text tokens. And it's an end to end speech to speech model, which is like an Omni from OpenAI.
I unfortunately haven't been able to run the demo, but I trust VB from Hug Face when he says it's super cool. So it's super cool. We're going to mention this and maybe even try it out on space because we may have time today, and, oh, ART Diffusion, Flux just killed with this ridiculous thing.
So Flux 1. 1 Pro is now HD. It outputs 4x the resolution that it previously did. And the images of this is just, Absolutely stunning. They are absolutely stunning. So Flux, is absolutely killing the game recently. They have been overshadowed a little bit by, by Red Panda, just a tiny bit, not a big, not a much, fluxpro is still the absolute king of AI engineering.
Art Diffusion right now and the new 4x resolution is just absolutely mind blowing. So we're definitely going to chat about this. Maybe we have some folks in the audience. I see it's a few of us Maybe we'll join who are more of experts in that field. [00:08:00] I think that's most of what we have to chat about For this week.
Let's see if I missed a few things folks I know definitely I got things about a few things that we've missedfolks on stage, let's start with the folks here on, on the, on StreamYard. Anything that I missed, Big? Wolfram, anything huge that you see?
Wolfram RavenWolf: I sent you a few, the mochi, whatever it's called, the Mochi Model.
it was two weeks ago when it needed for H 100 graphic cards, right? And now it already works on, just the 30 90. So we are down to the recent, most recent news was 12 gigabyte VA is all you need to run this, state of the art model. So it's no, Zuora local yet, but it is a big step ahead, I think.
So I tried it and it's a great model. You can get it with a one click with Pinocchio, so you can Play with it on your own system if you have a bit enough of VRAM.
And Olamar added
Wolfram RavenWolf: Yeah, that is the video model. And Olamar added vision support, better vision support for the LLAMA 3. 2 models that are fully supported [00:09:00] now.
So you can run the 11B and the 90B with that as well. Awesome.
Alex Volkov: Alright.
LDJ: There was that HertzDev thing that you mentioned earlier. Yeah, you mentioned HertzDev. yeah. Yeah, it's basically like another Emoshi, but it's newer and it seems like it might be better. Unfortunately, I haven't seen many people attempt to test it yet,I don't exactly have much info on exactly, benchmarks or anything like that, but yeah, I think it's pretty interesting.
Alex Volkov: Yeah, we'll just pull it up. All right, folks, I think it's, it's time for open source. Yeah, let's get it started.
LDJ: Nisten, you're usually the small LLM guy, if you want to start off.
Alex Volkov: Let's get it started. All right. yeah, Nisten, small LLMs. I'm going to turn off my sharing here. there's a few of them. I would love to hear from you if anything of them at all are exciting for you as well. There's ones from, let's see, we have, we have Meta releasing mobile LLM and, Very tiny 125 million per meter.
Is there anything to do with this Nisten? are they really useful? Is there, is that like an autocorrect level that you can just like autocorrect stuff? what do you think?
Nisten Tahiraj: yeah, just quick assessment. Just use [00:10:00] QuenCoder 1. 5b. Nothing comes close out of all of these. So I tried the AMD OMO.
yeah, it just wasn't good. I tried the 1. 2b small LM. It was a bit better and conversational, but it also seemed very trained, where it was very sensitive to any form of quantization.the only useful thing is the one from Junaid. which is a 0. 5b that, auto corrects, or just,I think website data into XML.
And, it actually does it way better than GPT 4, like much more accurately. And it's just been trained to, Grab website sources, dumps, and turn them into XML, formatted. And it is very good at that job. And I think that's like the only one at sub 1b that actually has a practical use, which you could use today, For your business or for customers? anything else does not, does not come [00:11:00] close. yeah, the small 1.2 B was pretty good. again, if people, dunno, my most popular model was Biggie. Small lm, like one point? Yeah. 54 B. And that was based on the smaller lamb version 1 1 1 30 M just showed to be like the fastest.
Small,tiny one that you can run at 200 tokens per second on one CPU core.no, that was an evolutionary merge,
Alex Volkov: evolution, merge,
Nisten Tahiraj: and, and it was, it was trained on a whole new optimizer, so it was trained on the growth optimizer. That, that was a lot there that, that went on into just getting that thing to talk.
But yeah, the ones are better. however, the only thing at this range, like 1.5 b. And under that actually has very good practical use is, Gina's model, the 0. 5B that does the formatting and a lot of, coding assistants or online coding assistants use the Quent 1. 5B coder that actually has very good inline use and, I [00:12:00] tested it on a four year old phone and it pulls in over 10 tokens per second, so it's very practical, to use and, AMD's.
LLM is pretty,not a useful model. yeah, it worked, but, at this point, yeah, sorry, I get impatient with these reviews. It's is this of any use right now? And, yeah. Now, the thing about small LLM is that you have access to all the data. So you can match the distribution of the pre training data.
with whatever else you're trying to do, and you can do continued pre training. if you're studying this stuff, or if you're working on a model, that is a big deal, because if you want to compare as to how you continue pre training, now that you have your right data distribution, you can try and fit in your new data to the past one and get like a much better continued training.
So it does have u very useful, uses in research and maybe some company fine tuning things. So it is worth spending resources on the small [00:13:00] LM one. but, Yeah, so that's my, I think I gave a pretty thorough assessment of the whole thing so far.
Alex Volkov: so just on metrics,on a few of the metrics that we saw from small LLM, gets 42 percent versus LLAMA 3.
2, 1 billion parameter at 36%. and this is a slightly bigger model, so that kind of makes sense as well. 36%. the trend is on 11 trillion tokens of. Curated datasets, so this is like a very tiny model trained for a lot of data, and, yeah, that's pretty much it. Yeah,
Nisten Tahiraj: I quickly want to say that the big accomplishment here is the dataset, right?
That is a massive contribution. It's not the model itself. I don't, I hardly think, There's going to be like over a thousand times more resources or like maybe more, maybe 10, 000 times more resources and time and compute spent doing stuff with the dataset release than, The model is just a proof of concept.
The dataset, that's just like they delivered and amounted, so that's, that's a [00:14:00] huge, that's an insane contribution on their part. If this was just a dataset release, it would have been pretty good. The model, sorry, honestly, is just not that good. It repeats itself, needs a lot of work, shows some interesting things where it, It's very sensitive even to 8 bit quantization, so just leave that in F16, whereas other ones you could, you could quantize it more.
so yeah. But that's a massive contribution in how much data they,
Alex Volkov: they shipped. Awesome. All right. moving on from very tiny models, to very big ones, we're getting a new, a new huge model from Tencent, which is called HyunYanLarge. 389 billion parameters, it's active and we're actually going to present and we're going to go and take a look at this together.
What do you guys think? because I, yeah, let's go take a look at this together, I think. Should I pop a new window for this? I think that's gonna be a good idea. and I will present. Meanwhile, anybody already had a chance to play with King and Large?
Wolfram RavenWolf: I would like to, but it's not available in Europe again.
I have to remove it from the map here. Yeah, their license is worldwide, just not in the whole [00:15:00] of Europe. That's pretty concerning, actually. Go ahead, LDJ.
LDJ: I did check it out andit's really interesting. There's a lot of kind of architectural tweaks and changes that they made that it gives me, similar vibes to DeepSeek.
It makes me think maybe they put some DeepSeek people or something, but they did some things like, cross layer attention. they had some, custom learning rates. stuff that were like expert specific learning rates for training and like these are all like new things that aren't really standard in training processes and overall it seems i remember you were talking about the comparisons to 70b but this actually seems something more like i would say a 405b competitor and even in a lot of areas especially like math it does better than 405b And a big thing here too is the training tokens is only about 8.
5 trillion tokens, and 1. 5 trillion of those are synthetic data. And if you do the math overall in terms of the active parameters as well as the total tokens it was trained for, we're talking about [00:16:00] something that was, if I remember my math it's something like more than 8 times less training compute than LLMA 405b.
Wow. So this is pretty good in terms of just training and inference efficiency perspective, getting pretty close to those results are even better than 4 or 5B while using way less training compute and much faster to inference as well, since it's, only 50 billion active parameters.
Alex Volkov: Thanks, man. yeah, I'm Running through the technical report super quick and it looks like you've covered most of the interesting things here.
I, I used it.
Nisten Tahiraj: Nisten,
Alex Volkov: feelings about it? Wait, yeah, feelings and I want to hear from Wolfram, isn't that, wasn't it dropped in open source? Can't you just download?
Wolfram RavenWolf: no,yeah, you could, of course you could, but it's not licensed for company use. It's, it's a no go. just like Meta did not release, vision models.
in Europe, the LLAMA 3. 2 versions, those were not licensed here as well, so that seems to be a new trend, probably putting pressure on the EU, so hopefully they will reconsider their AI regulations.
Nisten Tahiraj: I used it for coding and [00:17:00] I just, it just wasn't that good. I just stick with DeepSeeker. So I thought at the end of the day, it ends up being about 50 billion active parameters, but you still need over 380 something gigs of RAM. So even if you have NVIDIA 48 gig cards, 80 of them might not be enough to actually run it.
it's not very practical to run, and even if you're going to run the CPU inference, you still have 50 billion active parameters to deal with, so it's going to be slow.there's that, and on top of that, it just wasn't responding that great, like half the time. At the time, I would ask it to, respond in a certain language and stuff, and, it would, I'll put everything in Chinese, and,the other models in,DeepSea can let you ask them to do stuff in English, and they do as they're told, they do the job, and, it, yeah, I don't know, it might, I haven't tested it that thoroughly, but, to me, this is, I'm not gonna, I don't really bother much with it.
There's DeepSeq, there's Quint, if you need to run stuff in open source, I would use it for benchmarking or generating some, entropy [00:18:00] data with,different distribution data if you need, math or, or other stuff. So it still has uses for data generation, but for day to day, I, I didn't really use it.
I don't even like it itself.
Alex Volkov: Didn't change the landscape too much. folks, hopefully folks can hear me well. Folks are saying that audio could be a bit louder. So I can project, but also, I added some loudness to the audio. All right. I think that's mostly it on the open source. Some folks said, heard something.
I think that's mostly it. LDJ, do you remember the last thing that I can go and look up and we can take a look together? Yeah, if
LDJ: you just look up HertzDev, if that's what you're thinking of. HertzDev, yeah, HertzDev. Yeah.
Alex Volkov: Hertz. dev?
LDJ: I think, Hertz dev, I think. Oh, Hertz dev. Hertz dev.
Alex Volkov: Let's look at X together.
That's dope. HertzDev.
LDJ: This one? Oh, yeah, that one's probably good. Yeah.
Technical Deep Dive: HertzDev and FishSpeech
Alex Volkov: Open source, first of its kind, basic models for full duplex conversational audio. Oh, that's dope. Yeah, alright. Oh my god. Can you guys hear this man? I cannot. You cannot? I cannot.
so it is a full duplex conversational audio [00:19:00] and what we've heard by the way is two conversations like growing from one left and right channels First open source base model for conversational audio generation. Oh looks super cool checkpoints and And, yeah, you guys are going to see a little bit behind, of how I view things when I need to prepare for ThursdAI.
The first thing I do, I go into this like longest blog and I'm like, no, I don't want to read all this. So I use ARC and I say, summarize intent bullet points. This is the first thing I do. and this is a beautiful thing that now I have a open source base model with 8. 5 billion parameters. it's Kodak, LM, and Hertz.
Voice activated. VAE is a
LDJ: variational autoencoder. Variational
Alex Volkov: autoencoder, thank you. and, let's see what's super interesting. We're reading this as we go along, folks. unfortunately this isn't as polished as prepared news, but somebody just threw it in the comments and hopefully, unless some of the hosts here already have some, way of, anything you want to say about this.
feel free to jump in.
Wolfram RavenWolf: But,it's a base model. So it's not [00:20:00] specifically fine tuned or something. I see Apache
Nisten Tahiraj: 2. 0 license on Hugging Face, so there's Okay, so now that, that puts it on a whole different story. I don't know, 8. 5b, does this mean, is this based on LLAMA or QWENDT? It's probably based on QWENDT.
LDJ: I think it's neither. I think they trained it, they mentioned 2 trillion training tokens and I think something like 6 million or something on the order of some millions of hours of audio that they trained it on as well.so yeah. And they also actually released a version with just trained on millions of hours of audio.
And not trained on any text at all.
Alex Volkov: I would like to ask one of our co hosts to play the audio, because I think if you guys share, there will not be an issue and you can actually play it. anybody wants to take this mantle and play us a sample super quick with one and two channels? Nisten, you want to take the, you want to try one?
Nisten Tahiraj: I have the entire OS sandboxed, so don't make me troubleshoot that right now, please.
Alex Volkov: Anybody with a very easy setup that just can click and play.
Wolfram RavenWolf: So you mean just go to the website and then share
Alex Volkov: your [00:21:00] window when this website is on and then just hit play and then it will come through to the folks.
Wolfram, you wanna try?
Wolfram RavenWolf: I can try if I have so many windows. Wait a second, I need a new window for that.
Alex Volkov: Alright folks,
Wolfram RavenWolf: just a second, we're
Alex Volkov: trying new things. We're trying,I can do an audio pause. No, I'm just kidding. but basically what I, because the show is now multi modal, as everything else, this is a new voice kind of thing, end to end voice thing, and I played one sample, but, you guys couldn't hear this, and that was on purpose, because I also bring you, never mind.
So I'm on the page now.
Audio is on. So you should see it. Any moment now. Yeah, I think we're, we see it, hit play. Okay, so I will hit play.
AI: I talk about this stuff every time in so many different terms and if you're new on the podcast why not take a look at some of the different ones but with all of that said this is another new episode
Alex Volkov: of Fallen.
So what you heard folks, is a generated audio from Hertz thing. Do you mind hearing the two channel [00:22:00] one? Yeah.
Nisten Tahiraj: Ten, nine, eight, seven,
Alex Volkov: Six, five, four, three, two, one.
Wolfram RavenWolf: Nice stereo.
Alex Volkov: Yeah. Nice stereo a little bit. What's the interactive one?
Wolfram RavenWolf: Probably people talking. Let's see.
AI: Welcome to the Standard Intelligence Podcast.
We have a very special guest with us today. His name is Bob. How's it going, Bob? Yeah, I'm doing pretty well. I'm very excited to be with you all on this nerdy, podcast. yeah, we'll talk about a lot of stuff like ai. what are your thoughts on ai?
Alex Volkov: Wait, folks. Pause. Pause this for a second.
That's really good. that's, yeah. I don't know if folks who listening just on the Twitter space can tell the difference when we stop talking and that starts, folks, I don't know. Maybe leave us comments and tell us, but that sounded dope as hell.
Wolfram RavenWolf: I wonder if that was a person talking to the AI or two AI talking like in a podcast.
It's, I can't tell. So good. Wow. That was
Nisten Tahiraj: good.
Alex Volkov: That was
Nisten Tahiraj: really good.
Alex Volkov: That was really good. And it's Apache
Nisten Tahiraj: licensed. Okay.
Alex Volkov: Okay. Fuck. All right. Okay. We [00:23:00] have to move to video next. We cannot stay only in AudioLand. AudioLand is screwed. let's cover the,
Nisten Tahiraj: let's cover the Phish one too.
Alex Volkov: Yeah. Okay. So this was this.
And then, what else is on this page? Super quick. So that folks, What else is very interesting here? during live inference, model needs to run 8 forward passes, blah blah blah, it's super boring. latency, oh, latency is super cool, milliseconds. Average time between given utterance and the end of one token.
that's incredible. Running our servers close to the end user, we have achieved the real world average latency of 120 milliseconds. Jesus. So folks, I always say this, but I'll repeat this all the time. 120 millisecondsis instantaneous. everything below 200 milliseconds is, feels instantaneous to humans because this is how this is the research Google did once.
When you click on the button and it's less than 200 milliseconds latency to reaction, you feel that it happened instantly. 120 milliseconds for a audio to get back to you is instantaneous. It's what happens when you see the human, it [00:24:00] reacts, it's ridiculous, and it sounds super dope. So I'm very excited to be able to publish this Apache 2 license.
how big is the model? can we download this? It's less
LDJ: than 10 billion parameters. It's, I think they said about 8. 5. Yeah.
Nisten Tahiraj: I would wait until someone converts it to safe tensors just because I'm paranoid about running, direct, PT Ooh, it's all in there.
Alex Volkov: Yeah. Yeah. the download checkpoints link on their website leads you to a another link.
Oh, it, no, it's on leads you to a TXT file on their server index txt, that, lists out five links to PT files. What the hell? This, some kind like nerdy, like it somehow. Bro. That's okay. That's good.
Nisten Tahiraj: That's good that, that's a good thing actually, that, yeah,
Alex Volkov: I'm into this, but who's the guy in charge with Hey dude, let's put up a downloadable TXT file outlining six PT links in the txt file.
That, that's that's a very convoluted way to download this. All right, folks. So Hertz dash dev is a end-to-end audio, full duplex model [00:25:00] that is less than 10 billion parameters that is able to generate. incredible podcast sounding audio with multiple channels and thanks to Wolfram we're able to actually hear this and it sounded dope it sounds like notebook LM to an extent like an open source notebook LM, shit, alright, LDJ you want to cover the other one that we had also cuz fish
LDJ: oh sorry
Alex Volkov: Fish?
Yeah, I wanted to Oh, yeah, Nisten, you mentioned fish already. no, fish speech. Fish speech, yeah.
Nisten Tahiraj: they trained one using, Sorry.
Alex Volkov: yeah. fish is also a similar thing. I didn't hear fish, though. I didn't hear samples of fish, but also an end to end speech to speech model. Looks like we got two.
So It's a lot. Yeah, it hurts. But I haven't heard it at all being at all good.
Nisten Tahiraj: It's okay, but there's a reason I'm a great fan of it, is because you can actually swap out the 3D model that's inside it with your own Finetune. And that makes it A lot more, utilitarian,for whatever you're doing, because you could just [00:26:00] have your, your model trained for whatever you can even swap out like the,the speech part.
and, yeah, and that's also, that's CC by NC. So it's actually not that they use it, but yeah, they used Quan3B and, the 3B Quan has one of the highest GPQA, diamond scores. of any model under 8b so it even like matches the new Mistral 8b which is like really good the 3b client the only one of the small ones that they did in apache license there's a reason why it's really good so this one uses that you That specific model and,it makes it very practical for on device use because that model runs great on device and, in my own setup, just using Whisper with, which you can get Whisper with a little bit of training and some engineering, you can get Whisper to 200 ms, total latency, and then you have the latency of the LLM and you end up with 300 something so you can do that with just engineering and not using an end to end train model but in this case, they [00:27:00] paired it together for you.
So it's yeah in some ways This is better. I just wish itthe license just makes it hard to use because I would bet you I, anything like 9 times out of 10, the 3B coin model is going to be a lot better than whatever,any other organization trained at 7 or 8B. So being able to have that in there.
And being able to fine tune that is a lot more useful to me, but there's no license, so that makes it not useful. Anyway, yeah, the speech is not that great, but I would still rather use Compared to
Alex Volkov: what we just heard from Hertz, what do you think? Because I wasn't able to tell you Oh, no, not even close.
None of the above.
Nisten Tahiraj: no.
Alex Volkov: It's,it's not bad,
Nisten Tahiraj: yeah, Hertz is a passion too, it's not, Fish is not bad. it's actually not bad at all, the, but, it's, I still, I would much rather use the Fish model because as a 3D assistant that's like very good, has insanely high scores, you would be able, it would be a much better [00:28:00] RAG agent.
To as a, as an, as a chat bot that you would use, like during work. tell me, oh, how do I do,a single thing in like Elixia or something? if you have random questions like that, you want a smart model to begin with, but you also want to be really fast and that's what I would use, I would trust the models to do that job.
But it's not, licensed for, like actually making a product for it,that sucks. Anyway, yeah, do use,I'm a fan of what they've set up with, PhishSpeech because I was ready to just rig that on my own and I'm just glad someone else did it,
Alex Volkov: Oh, that's dope. so I think, folks, we've covered like two segments in one.
We've covered, open source AI. LLMs and Audio Models, HertzDev. Thanks for the folks who suggested to cover HertzDev. I haven't, saw it yesterday and some folks. As always, folks, the community is incredible. Not only to mention that we have folks on stage here who joined from the community now, co hosting this.
But also, the fact that, we have a bunch of folks, I see. Maybe I'll shout [00:29:00] out, yeah, I should definitely shout out, folks, incredible folks in the community that are always supporting. I see, Lincoln is incredible and always, brings me the links that, and covers. You definitely should give him a follow.
I'll, He's in the community, probably going to be a moderator in the community. I see Bartowski dropping some conversations, Nicole, Roman. I see Thomas is joining us as well. A bunch of folks, Hertz, Hertzfeld Labs, Unity Eagle, Junaid is here. A bunch of folks, Ray Fernando in the chats. I really, Leshan in Twitter space is dazzling.
Tony, Ryan. It's incredible to see the community like across all of them. I would love to hear from you guys. What format do you prefer? Because the audio format we just saw, we're basically cooked, like the AI can take over that. this, the smiles that we're having, I don't think AI is quite there yet.
We're going to talk about the video model. But it feels like we need to upgrade. At least to me, it feels like we need to upgrade. so I think we're moving. Yam, I saw you struggling to join us for multiple things. you want to give a, you want to try to unmute here and tell us what you think about The [00:30:00] audio model or any comments that you had throughout the stuff that we talked about maybe the transformer stuff
You have too many CUDA kernels running at the same time
Oh really interesting folks, uhyeah, looks like it looks at least the folks that are I can hear him I can hear him
Nisten Tahiraj: on the phone
Alex Volkov: Yeah, looks like we're good. or at least some of us are good. Some of us, some of the time, not all of us, all of the time. right folks, I think it's time to move on to Big companies and the discussion around their stuff, I think.
It's 9. 22. I think that we're actually going to move there in around an hour. I will also say that,as we have a bunch of folks, It looks like we have 800 views on the live YouTube stuff. Oh, sorry. And live X stuff. I don't know how many folks are watching on YouTube. some of them maybe prefer on YouTube.
we also have a bunch of folks in the Twitter space as well. maybe it's time to do like a little bit of a reset. Hey folks, you're on Thursday. I, for the first, for the second time, but for the first time coming to you with multiple folks as well on coming to you in video and audio thing, We're still struggling through some technical [00:31:00] things.
as we'll always, hopefully at some point this will be this clean. but, hey, it works, at least some of the time. Um,yeah. Okay. The reset. you guys on ThursdAI, my name is Alex Volkov and the AI Evangelist with Weights Biases. For those of you who are seeing this, you can see the title like right here.
This is super cool. For those of you who are hearing this on the Twitter space, this is the debut of our community Twitter space, which is awesome. We have now a community on X with,1, 600 strong, which you can post links over there. If you don't want to spam your. main, and you want to just go deep into a subject.
The community is a great place for that. Feel free to do If you want to moderate that community, definitely reach out to me and I will consider it, if you're a great member of ThursdAI and,you haven't been on the spaces and you're not like super public and you don't want to talk, but you want to help, I would love help in moderation of the community.
If you want to produce, just send content there. You can cross post to your main and to the community, by the way, that's a new feature would appreciate that. And you'll get some exposure as well. we have been talking about open source AI for the [00:32:00] past 45 ish minutes. the rest of the conversation is going to be about some other stuff.
We're waiting for OpenAI to drop some news. They're usually doing this at around 9. 30 ish, I think. That is the usual thing. They may or may not drop some O1 stuff. We're going to cover this in a second. And, I definitely want to cover some Weights Biases stuff. In the next segment, and also remind that I have a guest here, Dexter Horthy, and, we're gonna chat about, how agents interact with humans.
Dexter Horthy: Yeah, what's up y'all? Super stoked to be here.
Alex Volkov: Yeah,what should we do next? I think that we'll do a little break for the Weights Biases stuff. And, and I think, yeah, I think let's do the Weights Biases stuff, super quick. And then I'll show some examples live on the YouTube stuff.
So if you're listening on Next and you want to jump and onto the video stuff and want to actually see what's going on, this would probably be the right time for that. Yeah. All right. Cool. Okay. I, should I have a transition for this? Let me see if I can have a transition for this. This week's bugs from ThursdAI.
No, this sucked. [00:33:00] I'll prepare for this later. This was really bad, really bad.folks, so this week's bugs is a category where I talk about everything that happened in ThursdAI, in Weights Biases this week that was interesting to me.
Human in the Loop: AI Agents
Alex Volkov: And the main thing that we had this week was obviously the hackathon together with AI tinkerers.
Thanks for watching. The hackathon with the iThinkers that we hosted this weekend, the hackathon was called the Human in the Loop Hackathon. And a shout out to Joe from the iThinkers who collaborate, we collaborated with. Google Cloud was here, brought There's a shit ton of swag. swag. We got out swagged in our own office, folks.
I don't know if this happens ever to anybody else. We got out swagged in our own office, which led to some conversations with some folks. we will not be out swagged again, I will tell you that. and, and it was dope. We had more than a hundred, I think around 120 people came in to hack.
And we had a great, super energetic weekend. If you followed some of my stuff, we definitely had, folks building incredible things, all the meeting rooms were [00:34:00] full, people gave a bunch of comments. Folks were building around the topic of human in the loop, which is also the topic of conversation that,that me and Dex will have.
Dex, I'm gonna, no, I'm gonna add you to the stage somehow, I'll find a way to add only you. let's see how we'll do this. Oh, okay, I'm just gonna remove the other folks and Alright, folks, so Maybe do a little introduction, and then we're gonna just move from there?
Dexter Horthy: Yeah, no,I think, the hackathon is great.
I love Joe. He put it in a way that I probably have been saying for a while, but not quite as eloquently, which is this idea that, AI agents are this new concept, and people are trying to figure out what are they, and what do they look like, and everyone's got their own definition. And I think the most important thing that Joe honed in on and why the theme of the Hackathon was, humans in the loop, is this idea of, AI agents are almost good enough.
They're almost good enough to do really impressive things and go beyond just text to text or human sitting in a chat interface. They're almost ready to be unhooked and sent off into the world to just be in the [00:35:00] background building, doing things, taking care of business. And by finding innovative ways to bring humans into the loop, we are able to shortcut that waiting for the next big model, waiting for the next big thing, and get real results and build agents that actually do things in the world.
Do things that are maybe even scary, like update production databases, or, send emails on my behalf. it was really cool, I love the community, I think, In building human layer, the people that identify the most, identify with the most, are those people who are in the trenches, just building stuff that they want for themselves.
Whether it's assistants that work over email, whether it'sI don't know, there were so many cool projects. What was your favorite project this week?
Alex Volkov: Ah, so I was a judge as well. And,the coolest, one of the coolest things, I think it was second place, I was partial to hardware stuff.
somebody built a, It was pre built a little bit, a kind of a hardware toy that looks like a little cactus that moves, and it was for kids to talk to, and it talks back, and, and I think it was called Toy Story 6. [00:36:00] And, it basically is for clinicians to try and have kids talk about what they went through, identifies them, and talks back to them.
I thought it was super cool. Just basically the voice based conversations were super cool. I love the idea of being able to like pause agents and go and for additional conversation. the folks are getting like, tell me that your audio is a little quiet. Let me boost you up a little bit.
and, yeah.
Augmented Reality Lab Assistant
Alex Volkov: so one other standout hackathon, project for me was somebody built a Lab Assistant, I think? In Complete Augmented Reality with Vision Pro where they can basically That was crazy for preparing a lab and combining some components, they added an augmented reality thing on top of your hand that tracks in real time whether or not you took enough of one component and enough of another component.
It turns red in real time if you took more than one. So that was super cool.
Hackathon Highlights and Community Vibes
Alex Volkov: Yeah, the players were great. Judges were awesome. The energy in the office was incredible. We will do more, folks. So if you're listening to this and you're like, [00:37:00] Oh shit, that sounds cool. We are committed to having an open community space in this office of ours.
So we will do more. If you need to, tell your boss, Hey, there's a reason for me to go to San Francisco. You can learn a bunch. You can meet other people. This is always great. I'll be definitely talking about the hackathons that we're doing here at Weights Biases.
Chef Puppet and Meta Ray Bans Raffle
Alex Volkov: And, I loved the fact that I got to present the chef puppet that you guys may be familiar with.
The robotic arm that was 3D printed and you got to roast a bunch of folks live. And also I did the raffle for the Meta Ray Bans, which I'm not going to wear on stream. Some of you know why, because Fucking thing takes over some brutal stuff. Nevermind. but, I did a raffle and the chef actually roasted the whole crowd and said, you guys are all lazy.
Why are you here? listening to this, go back to hacking. and then announced kind of the winner.
Introducing Fester the Skeleton
Alex Volkov: So that was really fun.so that one thing that happened, the Wizard Spice this week, the other thing I wanted to cover is, is fester. And I really wanna, we wanna show this super quick. This is not gonna be.
Let's see. Oh, I just want to run [00:38:00] through this, share screen, and then window, and then the second window, this. Yes. Alright,wandb. me, slash, hello, weave. So I posted this,Fairly extensive article, finally, about, about, how hacking a skeleton to detect kids customs and grid them with a custom spooky message.
Not my best title work as well. Hacker News didn't pick it up. But, yeah, you guys remember those of you who were here last week. For those of you who weren't here last week, I dressed up as a mad scientist and, and my lab invention was this skeleton called Fester. we're probably not going to be able to hear this.
Be able to see this. So I just wanted to show you actual reactions from folks.
Fester's Performance and Community Reactions
Alex Volkov: It detected with, I want to say, 65 percent like accuracy. That was pretty good. this is a shark in Eeyore. it detected this. I love this, cute other mad scientist that showed up to my door and I actually took a selfie with him because I was I think he went to Einstein?
I'm not sure. but Fester, the skeleton that I built, the user Gemini also, shout out to Gemini, all of this cost me exactly zero [00:39:00] dollars, by the way, from the LLM perspective, which is incredible. Fester, was able to identify kids customs, which was its whole point. I posted finally my run through of everything of how I built this folks.
So if you're interested in doing,skeleton brain surgery as well, feel free to, I recommend you go and check out this video. It's on wandb. me. I'm gonna post this on the stream as well, and then you guys should be able to see this in comments, just, Yeah, folks are saying that Fester is dope.
I really appreciate everybody who enjoys Fester. give it a like on YouTube because, you got exactly, nevermind.
Technical Insights and Project Details
Alex Volkov: so Raspberry Pi is controlling this, Gemini visual for the brains. I took over, the LEDs. I explained the process of how I learned to do this. No. I believe no soldering is essentially required if you really don't want to.
and, I, yeah, my kids have interacted with this. and then I shared some more examples of folks in trick or treat. We're past Halloween a little bit. I was debating whether or not to post it before Halloween so people can see it on [00:40:00] Halloween and get excited. Or after Halloween with actual examples.
I'm very happy that I posted it after Halloween with actual examples. And the reason why this is, again, in the Weights Biases corner is because this project was called HelloWeave. And, yeah, this is Fester. This is how my, I'm showing for the folks who are only listening. I'm showing my very messy,that desk with all the tools that was needed for this, you don't have to go as messy by the way, but, all of the software requirements and by the way, all of the stuff that you need to buy to achieve something like this are also linked here.
And the reason why I'm bringing this up in the Weights Biases corner as well is because I, a hundred percent, was helped by Weave. Fester, here's an example of Fester. I'm showing a picture of Fester standing outside of my door on Halloween night, waiting for kids. Looking creepy as heck and then not working.
And the only reason I was able to know that it's not working, the camera was obscured or something like this. And,I have an example of this here. hopefully folks who are watching the screen can see. As you can see, some of the streams that I had, [00:41:00] Weave just showed me like a, where the picture was supposed to be.
It's just a blank picture. And I noticed this and I was like, oh, shit, the camera doesn't work. And the problem with, okay, so the problem with Gemini is that even if you pass a blank image and you tell it, Hey, I identified the customs for these kids, Gemini will will do it. Gemini will say, Oh, no problem, man.
Oh, welcome spooky ghost, whatever. And no, there's no one there. It will just absolutely hallucinate. And, so it's really hard to, know, know if it operates or not. You can, you need to listen. So we was instrumental, haha, funny. we was very instrumental for me to actually debug this.
And you guys can see the date on this. 4. 47pm, Trick or Treat started at 5. Some people come before, but, most of the Trick or Treaters show up at around 5. 13. Minutes before production, the fact that I had Weave in a hardware device disconnected, I wasn't like even a SSH back then. I was just looking at the logs.
Help me catch a bug. This is a screenshot from Slack of me thanking the team on actual like D Day,of, of Halloween. that's it. I wanted to share [00:42:00] this with you guys super quick. And, this is, this is the write up, 1b. me slash HelloWeave. I'm going to put it up on the, Oh yeah, I'm going to put it up on the banner so folks can see it like here and set it up.
And hopefully you guys can see 1b. me slash HelloWeave. Check it out. And I think we're back now to chat about everything else. What do you guys think? Yam, unfortunately is not joining us on the video. We were going to set it up separately and Yam will join us next time. I promise.or are you here Yam?
Maybe? No? I don't know. yeah, the, Nisten's standing joke always applies, we get AGI before we solve all these, Zoom y issues.
Big Companies API Updates
Alex Volkov: folks, it's time for us to chat about, big companies API updates, and, I think we're gonna start with, we're gonna start with Haiku, right? Haiku is the first thing that we should talk about.
so Who wants to pick this up? Let's do a quick summary of the changes. we know, we knew about, HYCO 3. 5, for a while [00:43:00] because, when SONET 3. 5 v2 or SONET 3. 6, as some folks like to call this, was announced,that, From Anthropic, they also said, Hey, Haiku 3. 5 is coming, no mention of Opus.
And they didn't say when. And finally, Haiku 3.
Haiku 3.5: Performance and Pricing
Alex Volkov: 5 was now available via API. It's available everywhere on Vertex. It's available on, obviously, Anthropic's API platform as well. And available OpenRouter, shoutout OpenRouter. Folks, thoughts. I want to hear thoughts about 3. 5 Haiku before we go to the technical thing.
I don't have that many. The only thing about technicals is it's more expensive than the previous Haiku. Which is also something I want to chat about. Who wants to go first? I
Comparing Haiku and Sonnet Models
Wolfram RavenWolf: can tell you that I have been using it since Tuesday. So I made it my main model, actually, and have been using it productively in over 110 chats, which, yeah, at work, at home, all the time.
So I really wanted to test it, but unfortunately I [00:44:00] noticed it writes very well, Zonnet, of course. We expected that it's cheaper than that. Despite being much more expensive than the old Haiku, but it's not prompt following well enough. So either it doesn't understand the prompts or it doesn't follow as well.
So I had a lot of problems. I was using perplexity and they had it in there as well. So I switched to that and I had to regenerate with Sonnet sometimes. Relatively often, actually, where I noticed, okay, it's not doing what I told it, what has worked before. Sorry, Junaid. With Zonnet, and it was working again, which is a bit disappointing because they say that it's better than the old Zonnet.
before the new release that came out in October 22, but I can't say that it is, better than that. I don't have the feeling because I was using Sonnet 2. 5, the old one, all the time, and it was better than what I saw now with Haiku. And, yeah, that was my test, my experience in real use, and I also tried it with computer use, [00:45:00] because I thought it is cheaper, so it could work with that, but it doesn't support it yet, unfortunately.
That may be an interesting
Alex Volkov: aspect for this model. Yeah, and it doesn't
Wolfram RavenWolf: have image input as far as I know, so that doesn't work either. Wait, no image input? Yeah, I think it doesn't have image input. I'm not a hundred percent sure, but I think I read that it doesn't support it. So I haven't tried it with the images, I can check it again if it's true.
Yeah.
Alex Volkov: Let's take a look. let's look at some of the numbers, GPGA Diamond compared to the previous Haiku,3. 5 gets 41. 6, compared to GPT 53. So obviously not there, but this is the tiny, like in super fast model as well.On MMLU Pro, it gets 65. And, let's see what's interesting for us.
SweetBench verified the agentic coding. It gets 40%. So that is very decent. I love the fact that they're adding, they're adding that it's surpassing Opus. Cloud free Opus. You guys remember that Opus used to be like this insanely great model, super expensive, and we're like, oh shit, Opus would look so good, [00:46:00] whatever.
This tiny model. Now, I can't say this is like super cheap model because the price is I think, 1 per million tokens and then 5 per output million, so not like extremely cheap. Obviously not Gemini Flash level, but still fast and cheap. Still fast and cheap. it beats Opus from eight months ago, and Opus was Back then we were like, oh shit, Opus was like incredible.
So definitely the acceleration is palpable over there. I keep thinking about Opus 3. 5 and why Opus is not coming up. LDJ, did you find a way to raise your hand even in this? Yeah
LDJ: So with Haiku, it's I've noticed, they say that it scores better than Cloud 3 Opus,overall in benchmarks, and I think that's believable, but like Wolfram said, the intelligence or reasoning seems to be lower, and I think what's happening here with the benchmarks is it's significantly better in, I guess you could say, memorization and pattern matching, and then just also worse at, the kind of more raw intelligence reasoning, but then, the [00:47:00] balance out of what ends up happening in the benchmarks ends up a little bit over, Cloud3 Opus, just on average.
And, I think the types of downsides that you actually experience in real world tasks are going to be in things like very long context and multi term conversations. A lot of back and forth coding, if you want to do that's probably going to be much worse. And then, certain things just like world knowledge, will probably be things that it does better than Cloud3 Opus.
Alex Volkov: Yep, and the new sonnet that folks are starting calling 3. 6, are you guys okay with this naming? Are we okay with this? I decided that I'm going to call the new one 3. 5 and the old one 3. 4, but it looks like we're getting towards 3. 6 as, 3. 6 as the new sonnet. that seems to be landed okay. I remember some controversy around this.
That seems to be the model that folks are using. anything else on Haiku? Let's take a look. any more thoughts? Nisten stepped out a little bit. Dex, have you used the new Haiku at all?
Dexter Horthy: no, I haven't had a chance to play with it yet.
Alex Volkov: Alright, Did he mention the [00:48:00] price spike? Yeah, so 1 per 1 million input tokens, and then 5 per million output tokens.
I believe it's 5x the previous one? Yeah, that sounds about right. Alright.
Dexter Horthy: But it's still cheaper than Sonic, right? Significantly
Alex Volkov: Yes, it's still cheaper than Sonic. The thing is, let's talk about the price hike for a little bit, LDJ, because we expect models to get cheaper. This is the expectation. OpenAI set this up with every model released by like 80 percent price decrease, and then 90 percent price decrease, and then 99%.
They keep doing this. As far as I'm remembering, this is the first major lab that released a model that's like superseding the previous model and didn't keep the price, not only the same price, actually increased price. from the previous one. Specifically for Haiku. Haiku, the whole,offering was it's fast and cheap.
And it's it's smart and compatible for some stuff, but it's fast and cheap. And Haiku doesn't compete with, Gemini Pro. And doesn't compete with 4. 0 [00:49:00] or definitely not 0. 1. Haiku competes with Flash, Haiku competes with Mini. Those are the, this is the playground of Haiku. And I think that they're, they're not price competitive with Flash at all.
Flash is insanely cheap. Flash is free. Literally, I just said this. For HelloWeave, the project that I did, and for Chef, I paid zero dollars. I was on the free tier, and even if I wasn't on the free tier, we had Simon Wilson here on the chat as well. On the paid tier, you get like a million tokens for free until you start paying for Flash.
It's Google is basically hey, here's free intelligence for you. It's like as free as air, basically.I gotta wonder, who are they, trying to get with this pricing? And, especially with the increase in pricing. And how much that thing will work for them? Thoughts, folks? We'd love to hear your thoughts.
LDJ: I think maybe they just need to really get their revenue up, they need to compete more with OpenAI, kind of show that they have cash flow going to investors. And, actually the Head of Applied Research at OpenAI recently commented on this situation on Twitter. And out of the tweet [00:50:00] pulled up, he basically said,Jimmy Apple said, Hope OpenAI and others won't follow.
Unusual indeed. And then Boris Power, the Head of Applied Research at OpenAI replied, It seems very important to never do certain things if you are to remain a trustworthy platform that other companies can rely on. This seems like one of them. it sounds at least OpenAI people are firm on not wanting to do these types of things.
Yeah. Yeah.
Dexter Horthy: Is there anything, do you think this has anything to do with, I've always looked at Claude and Anthropic as like premium in some way or another. Is this them just trying To preserve their vibe as the best of class in whatever category they're in.
Alex Volkov: could be. although I don't know if Haiku is actually best of class in the category of fast and cheap, right? Because cheapness is part of the category of fast and cheap. alright folks. So yeah, definitely price hike, haiku is still very decent. It was always very decent.
There was a period of time where Haiku was like the only fast and cheap and like very smart model. it's no longer the case. I'm very partial to Flash. Flash is just incredible and like free, basically free. and so I, I love using Flash myself. all right, let's move on. Let's move [00:51:00] on. Can I just add something?
Yes, please.
Wolfram RavenWolf: because, it is writing German very well, which was a very, Bad point of the old one. So even if it's a smaller model, its language capabilities are great. And, just to confirm, so I checked with the API just now and you can't upload an image, which is funny because I did while I was testing it, but it's, perplexity switch to GPT 4.
In the background, I just noticed what happens because I thought I could upload images, no, it just switches the model, so it's impossible to use images with it yet, but it's coming, they said.
Alex Volkov: It's coming.
XAI Grok: New Features and Pricing
Alex Volkov: so in the things that, in the area of things that are coming,vision related as well, we now have access to Grok via API, and we already had access to Grok via API, so why are we announcing this again?
they're not paying us. oh, nobody pays us. Oh, I think we have Yam on video, finally. Yeah, what's up? We made it. Yes. Finally.did you reinstall, a Linux or something to get here? I switched, To this extreme, yeah. Yeah, I see you have an awesome mic. I don't think we can hear you from the awesome mic, but we can definitely see the [00:52:00] mic.
no, you can't. Nothing works. Pretty much nothing works, yeah. Alright, welcome. Folks, it's so good to, see folks on stage. We've been hanging out for, more than a year together. I think this is the first time we're, like, hanging out on stage. so this is awesome. A little celebration for us, and I think, I like this.
We should continue. the next thing that I wanted to chat about is, yeah, XAI Grok. we're getting access to it, we've gotten access to it before. The experience they had was sucky. The reason was, you would sign up, and then immediately they'd tell you, Hey, your organization is out of credits. immediately.
There was like a red banner on top, your organization is out of credits. That was like a meh experience immediately, straight up. And then, let's say you wanted to pay, none of us want to pay straight up. we want to play with your stuff before we even get excited. We all have open router keys anyway.
We know that, Alex Atal is going to go and buy your, on the backend, is going to buy your API, so the reason for me to go into your platform is just because I'm very interested. I know some people will go directly. so show me a red banner, or the blue is not. So they understood this, they got the feedback, and now everybody gets 25 for free on the Grok [00:53:00] until the end of this year, 25 a month for free credits.
I believe that Grok is around, I believe that Grok is around,5 on input and 10 on output. Something like this. let me take a look. I have this in my notes. and not super cheap. You get access to Grak Beta, whatever that means. It's not, yeah, 128, 000 tokens, context length, function calling, system prompts, and vision capable coming next week.
So this is like in the topic of what's coming next week, vision capable is coming next week. and it's only until the end of 24, 5 per input, 15 per output, which is like DPT for all level, which is, but as we've said before, the benefits of using Grok. Is, it's less centered.
They're purposely building this to be less centered. And so Wolf, I would love to hear from you on that experience if you had a chance to use it. And LDJ maybe you, you raised your artificial hand, so go ahead, go first.
LDJ: yeah, I think it's just maybe worth clarifying real quick. We're talking about grok with a K and not gr with the Q, just in case anybody?
[00:54:00] Yes.
Alex Volkov: Irock, Elon Musk's, like X Open the eye competitor that is going to. There's training the next generation of it right now on insane amount of compute in Memphis.
LDJ: Which is maybe why they have the pricing at this point, because maybe they're trying to prepare people for, so it doesn't look as big of a jump when they release their next 10x model or whatever soon.
Which is
Alex Volkov: supposedly December. Supposedly, either the training run is going to end in December or even the model is going to be something around December. As we said, a hundred
Yam Peleg: Do we know details?
Alex Volkov: The only scale of the Memphis Supercluster we know is that 100, 000 H100s were going live, and then they mentioned that they're going to add, it's going to be a total of like around 200, 000 H100s because they added H200s or something.
So it's going to be an insane, massive, it's one of the biggest ones. They scaled it super quick. Those are the things we know. We don't know anything else. We know that Grok is multi modal because Vision is now possible there. Vision was added, by the way, last week to the UI as well. [00:55:00] But on the API level, we just got access for free to their stuff.
Wolfram, anything else to add?
Wolfram RavenWolf: So like you, I was very excited when I read the tweet. And system prompt support and everything, so I immediately went and, I had to log in, and again, although I was logged in X, and it immediately said, not in your region. Again. It's bad. But, you know what I realized? what is, what may be a real problem is, what happens if you get banned for using, if you get banned on X, you can't use that.
Or if you do a chat with the AI and try to uncensored or do anything, and you get banned for that. So the connection between services, just like when you are on Amazon, you use a TV and you have the shop and you get banned for properly the shop. What happens then? So that is something where I'm not so sure if I really want to use everything from the same provider.
If Anthropic bans me, okay, I can still use my Google mail. What happens if Google bans me for any reason? Can I still use my AI? Can I still use my mail? So that, that is something we should [00:56:00] also keep in mind. And speaking of uncensored or more open models, I think, if you prompt it right, Sonnet is so uncensored.
It does stuff. I got another warning by them, by mail that my account is now flagged and say inject. We had that another time, this topic where they inject stuff like talk ethically and so on. So I had my AI, It expanded system prompt to detect those injections and rejection and report them to me. And that is working as well with the good models.
So yeah, you can use every model and say they have a lot of content. I must say that as a, you would be surprised if you just unjail it and chain it, unleash it. Yeah.
Alex Volkov: So you could do this in other models. It feels like Grok's like proposition is that it's by default like this, right? And this is like what we talked about, I didn't
Wolfram RavenWolf: really think that way, because when I was using it with my character and stuff, it was, it didn't embody it as much, so it said, oh, I'm just an AI and stuff like that, despite the system prompt.
So I can't even say for those [00:57:00] purposes, I say roleplaying and stuff like that, I can't say that it is more in center. Maybe it is less vogue, maybe that,
but
Wolfram RavenWolf: if you try to make it a character and stuff, Claude is better.
Alex Volkov: So Claude is better for role playing. Even uncensored.
Wolfram RavenWolf: Claude, if you uncensor it with a good prompt, it's much better.
Alex Volkov: Interesting. alright, so this was the XAI Grok updates for us. and Vision is coming next week as far as we've heard.
OpenAI's O1 Model: Leaks and Expectations
Alex Volkov: let's talk about OpenAI. So we've waited. No O1 full for us yet. But, we know the O1 full exists because some people got access to it. So this week, O1 full was dropped. Bye! Mistake, or not mistake probably, somebody got access to 0.
4 and not only they got access to 0. 1. 4 and folks remember we don't, we try very hard to not report on,like speculation or conspiracy theories on ThursdAI We try to report like what actually happened, this actually happened, folks like Pliny the Liberator and some other folks Pliny already tried to jailbreak 0.
4, there's screenshots of 0. 1. 4 We know that it's live In OpenAI for a while, we [00:58:00] heard rumors that, it's waiting until the election is over, and, shit, I said the word. and, but that's what we're hoping that it's going to come today, maybe it still will. O1. What currently we have is O1 Mini and O1 Preview.
So hopefully we'll get O1 Mini, O1 Full. This is the reasoning model from OpenAI. What we know from the screenshot is that O1 is, and they've announced this in Dev Day, O1 will support Function Calling, which it doesn't support right now. It will support, hopefully, System Message, which it doesn't support right now.
it will support Vision. Which is very important. And the screenshot that I saw was somebody uploading a screenshot of something and saying, Say, O1, what is this? And O1 starts thinking about this. folks, I'm assuming that you saw it as much as I saw it, the brief 8 hour period or 3 hour period that somebody got access to O1 full.
Expectations from that model to complete the break? LDJ, go ahead.
LDJ: Yeah, I think a lot of people should keep in mind that the jump from O1 to O2 The jump from GPT 4. 0 to O1 Preview is almost the [00:59:00] same as O1 Preview to O1 in a lot of use cases. for competition math and competition code here, for the qualifying exams for the Math Olympiad, for example, it goes from 13 percent to 56, or sorry, about 13 percent to 45 percent when you jump to O1 Preview, and then it jumps all the way to around 75 percent for full O1.
And then for competition code, it's around 62% for oh one preview and jumps all the way to 89% for oh one full. So yeah, I think things are gonna get exciting.
Yam Peleg: I, yeah, I don't know how to raise hands in this, so
Alex Volkov: you just,anyway,
Yam Peleg: let's, you said something interesting. Let's be honest. What do you feel in terms of jump from GPT-4 oh to oh one? what do you think? If we open this subject already, you have use cases where you really feel the difference?
Alex Volkov: Yeah,yeah, I will say that I often find myself, I can give a practical use case for folks who are listening maybe this will help. I am hiring for a social media role here at Weights [01:00:00] Biases. For some reason I was given the full control of social media, so if Weights Biases reacts to you in a funny, meme y way, That was me, on Twitter, and for some reason they think it's a good idea, we'll see if it turns out to be a good idea.
So I'm hiring for a social media role, and I'm scanning, I'm going through a bunch of CVs, and I'm taking notes in my conversations. I notice that, what I do is, I take the transcription of the whole conversation that I had, and I need to answer a bunch of questions, and I dub the whole transcription in JGPT.
I notice that O1 does a better way of, a better summarization of what I need from the transcription. O1 Substantially better, more thoughtful, more, more direct, more, and it's like a long conversation. not super long. 45 minutes. I noticed, I, I do both, and I noticed that O1 just gives me better answers.
I can't tell you exactly why. I know that, 4. 0 does a good job, O1 just feelsjust feels more. I wish that I had O1 with, search. Because search is dope, and I really like the grounding.
Yam Peleg: This is what I wanted to say, you can't have all one with advanced [01:01:00] voice mode, for example. You can't have it with search, and I really hope that we're gonna get everything,everything together in the same, all models, all features, because I don't know, advanced voice is really useful, for me at least.
Alex Volkov: Really useful, Advanced voice mode is incredible, I keep using this all the time. But advanced voice mode with a one would be, you're asking the question and he's Hey, let me think. And then you have to wait for 45 seconds. You
Yam Peleg: can use the normal voice
and yeah, you need to wait a little bit, but the normal voice, yeah, in the app, it works.
I did it today.
Alex Volkov: Did not know this. All right, folks. If you didn't
Yam Peleg: change the model in the background, I did it today, but in the app, it works.
Alex Volkov: Interesting. All right. LDJ, go ahead.
LDJ: LDJ. I believe in the recent Q& A that OpenAI actually did on Reddit. I believe they did mention, they plan to bring in the other things, I think they even mentioned search and other stuff that they plan to bring to the O1 models, and they even said, or Sam said in a recent interview about how they plan to [01:02:00] bring images, and he alluded to almost image generation, it sounds like he was alluding to as well, and saying that we should expect, big, steep improvements to images soon.
That's what he said. And he was vague about whether he meant image perception or image generation, but it sounded like he was alluding to generation.
Alex Volkov: So image perception, from the leak this week, we know that it's there. Right now, O1 and O1 mini don't support image input. From the leak this week, somebody uploaded a screenshot, we saw it does, and that would be great.
Image perception, or like image generation, there is a very famous, if you guys remember, Greg Bachman posted this, about 4. 0, not O1. an image of somebody writing, something on a whiteboard, and a person, writing, you guys remember this? And it was, like, an output of an image, and everybody's what the fuck?
And then nothing happened since then. and,the, for folks who are listening and not understanding what we're talking about, This is not DALI, the model that gets prompts in text mode from the model and gets like an API call. This is not what happens on Grok on X, where you ask it, Hey, generate the [01:03:00] image of Trump kissing Kamala.
Sorry, I had to, I said, this is most of what people use Grok for in image generation. And then you get Grok calling in text form the API for Flux and Flux generates an image. It's not that. This is the model itself, knowing how to generate images. As well as output text, right? So full multimodality on the output as well.
I tried to coin the term MMIO, Multimodal on Inputs and Outputs. And so we are getting multimodal output models on multimodality of text and audio now already. the Advanced Voice Mode, for example, that's an Omni model that outputs on This is the O
Yam Peleg: of GPT 4 O.
Alex Volkov: Yes.
Yam Peleg: O1 also has O. It's not the same O.
Omni, it's not an Omni model, I don't know, I'm not sure.
Alex Volkov: I heard Sam Altman talk about the O 1 in O 1, and, the O there, I think it stands for multiple things, but,specifically, the joke there is that, this is the joke, and Sam Altman confirmed it, that it's an alien of [01:04:00] extraordinary ability.
O 1 is the type of visa that many people in the U. S. get. And the joke is that Alien of Extraordinary Ability, O1 is Alien. Also, O1 is like 01, because like people say O1, they mean actually 0. And, and 01, they basically reset. It's like a new kind of model. And this is going to be O1 and O2. They confirmed this is going to be like, the GPT series will continue like separately.
So they did confirm this. But I heard Sam Altman confirm this. In front of SIL when sil, when, as a engineer in San Francisco here, who does the O one Visa support community. So if you have an O one visa, he knows who you are, and you know who he is. He's given out these hoodies. If you saw most of the people in South Silicon Valley, there were these hoodies, say alien of short ability.
I saw when Sam and him chatted about those hoodies and Sam Alman confirmed, that the joke that went about,there was a post. Before they released the O 1 publicly on the Twitter that said I'm an alien of Extraordinary Ability and O 1 is the type of visa that these people get. Yes, Yam, I saw your face.
If everyone
Yam Peleg: could stop the [01:05:00] confusing names of SONET 3. 5, SONET 3. 5 again and GPT 4. 0 and O, it would be great. Just name it in a different way. Just so it won't be confusing, but yeah. Yeah,
Alex Volkov: I loved when we had Claude Sonnet, Claude Haiku, and Claude Opus. Oh yeah. That was incredible. Oh yeah. That was like a, like it was an incredible move.
And then they should have kept us going. LDJ, let's go and then we'll move on.
LDJ: Yeah, sure. I remember in the recent Q& A that they did too, there was also, somebody asked, an OpenAI employee that was like, I think it was Mark Chen or somebody that was a vice president, about what does the O stand for and their official answer was OpenAI, like literally O stands for OpenAI and I guess there's also the extra meaning in puns of, O1 visas and stuff, but I think what Yen was initially, I think what you're about to say is about how O1 might have Omnimodal abilities and I think that's still true because if you guys remember, some people were getting emails.
On the initial 0. 1 preview release [01:06:00] about stuff like, Please don't try and jailbreak with Reasoning. And they were originally referring to it as GPT 4. 0 with Reasoning, which kind of seems to entail that it's, this large fine tune over GPT 4. 0, which would then mean that, because it's the same architecture, naturally, They would be able to have, those things like image generation and other four Oh, abilities.
Alex Volkov: Yep. Thank you. LDJ for that. And then I think that one, can I just insert something? Yeah, go ahead. Ham. And then we'll,
Wolfram RavenWolf: because, what came to my mind now, we, you said that you have more, or that oh one is better than four. Oh, I totally agree with that. But what would be interesting of you or the others if you have compared it to the new sonnet, for example, because of the benchmark, is that it reaches even the oh one levels.
And does any of you, have tried that, compared it?
Nisten Tahiraj: I sometimes use the API of O1 Mini just for the steps. And then I use that in,LMSys Arena to just ask Sonnet and some other model to follow those steps and then do the rest of the [01:07:00] problem. So they're stepping, the way that they handle that is, is still top notch.
Nobody has, has done it anywhere as good. I know a lot of people are trying. There are new models out coming with more chain of thought stuff, but nothing comes even close. So it still outputs the best steps. In terms of like how talented is it as a developer to solve this very particular issue, it still does feel like just asking GPT 4.
0 sometimes. Like I've seen the particular solution repeated, even though I asked 0xmini and 0x01. And Sona did,obviously, quite, quite a bit better. I've seen some interesting things that sometimes Sonnet on the desktop takes some time to answer. And I don't know if that's their DevOps, versus using it on the API on AWS.
It's, often it's quite a bit worse in API in AWS to the point where I just end up going on, again, on LMSys Arena and just using it there. It's something that I don't care about. It being public. so yeah, that's, that's my take on it. I don't think the model itself is necessarily smarter. It's,their operations and [01:08:00] their order of execution and their stepping is, is just on another level.
Alex Volkov: And also just for folks, a reminder for folks, O1 with reasoning, sorry, so for O with reasoning is O1, the reasoning behind the scenes, we don't get to see. But like folks who work at Open AI see the actual reasoning and it's quite awesome. but what we do get to see is another model summarizing that reasoning.
And so even that is helpful. Perton like even grabbing that and sending this to Claude and saying, Hey, this is like the steps for reasoning, giving the output even that is helpful on its own. Folks, let's move on because we also have a, I wanna chat with Dex, about some stuff. but this is the last thing.
and Yam this is, I saw you go hard about this. You wanna introduce. What's going on here?
Transformer ASIC: The Future of AI Hardware
Alex Volkov: All right, basically
Yam Peleg: in a simple sentence, there is a trade off.
Alex Volkov: But for folks who can't see the screen, tell them what they're seeing.
Yam Peleg: Okay. You are watching, the world's first transformer ASIC, pretty much.
in simple words, as simple as possible, there is a trade off when working on hardware,you print stuff on [01:09:00] silicon, it's physical location, there is a trade off. So you either have a more flexible chip that can do all sorts of instructions and all sorts of things, or you focus it on a different,on a specific use case.
For example, here, you focus it on a specific architecture. So it's less programmable, but you get a much, much more powerful chip that can do that thing. So there is a trade off. What I'm saying, and some people agree. I think the transformers aren't going anywhere, in my opinion. People think that they might go one day.
I don't think so. I think they're here to stay. And what we see here, this product bets on this, exactly. And pretty much burn transformers physically on a chip in silicon. And you can see the performance gains at the beginning of this landing page. You can see the big boom. Performance Gains that are expected from this.
I just want to say, Big is an understatement. Wait, hold
Alex Volkov: up.
Yam Peleg: Oh, yeah. Big is an understatement. Oh,
Alex Volkov: yeah. We're talking [01:10:00] about GPUs, H100s, or 8x B200s, the most insane machine that Jensen stood on stage, NVIDIA being the biggest company in the world, probably because, some of, because of that specific thing, they're getting 400 tokens per second on LLAMA, 70B, maybe 500.
This is 500, 000. This is like what, I don't know, 3 orders of magnitude more? Like 2, 3 orders of magnitude more? I'm not good on my wombs, but this is a lot of wombs. A lot. This is an insane jump in speed because like literally transformers all of the thing. Yeah, my question to you is that is the transformer in the architecture, but you can put different models or is the model on the architecture?
Look, I don't know the exact
Yam Peleg: specifics of what exactly how is it implemented, but just from what I've read, it's pretty much designed for transformer based models. So you probably can load whatever, no, you can load whatever model that you want and as long as it fits the architecture that it supports.
So I assume, for example, that [01:11:00] things like Mamba or State Space Models probably are not supported, but it's not, I don't know, I don't know, I don't want to say anything that is not, Not accurate, but I suppose this is not meant for that. And it's okay. this is exactly what you're getting for what you're getting and what you're getting is the speed up that you all see.
JI just want to point out two things first. if you wanna see, the same thing that already happened couple of years ago in a different field, you can look at crypto mining, Bitcoin mining specifically. That started, everyone started with GPUs and specifically Bitcoin mining, went,all in on ASICs, and just, you cannot mine Bitcoin, specifically Bitcoin, I'm liberty to saying specifically Bitcoin because Every other coin it doesn't work with this because exactly when you work with an asic you are burning Something specific into the silicone.
So if you're if you are working you're mining Bitcoin, you can't mine ethereum So what I'm saying is that it already happened in a specific field you niche [01:12:00] down your hardware and you get a tremendous amount of speed up Second thing I want to say, there is a, by the way, there is a large ecosystem here of AI hardware.
I know at least 12 startups working on AI hardware over here locally. so I talked to some people about this already. it's not magic. Hardware is not magic. Usually when you see a huge boost in something, usually there is a cost. And, and I'm not saying, I'm not saying that, Anything here is misleading.
It might be the best chip in the world. I'm just saying that it's not magic. For example, you cannot train with it. There might be latency. There might be latency for loading the model. It might be, it might consume a lot of, a lot of energy, but there are trade offs for anything because it's not magic at the end, hardware is hard, you print something and it's just physically there.
You can just reprint whatever you want andbut it's great to see in my opinion it's great to see it's time that the field matured that much that we start to see specific [01:13:00] hardware with neural network architectures burned in silicon it's amazing to see in my opinion and yeah
Alex Volkov: GPUs were made for games and we're still, we're like, they're no longer, obviously.
I wouldn't say
Yam Peleg: that. I wouldn't say that. I wouldn't say that about Jensen. They were popularized
Alex Volkov: by games.
Yam Peleg: yeah.
The Future of Training and Inference Chips
Yam Peleg: what I think is important to say that GPUs and TPUs and so on are probably going to still dominate the training, while you are going to see inference chips such as this competing with one another.
Because, just because. But I'm not sure you're going to see ASIC for training, because this is pretty much TensorCourse. And what is going on, this is pretty much a TPU already, if you think about it. yeah.
Alex Volkov: Alright, LDJ and Nisten, we'd love to hear your thoughts super quick, and then we're gonna move on and chat with Dex about the human layer, and then we're gonna close out the stream, because we're almost at two hours.
Oasis Demo and Etched AI Controversy
LDJ: Yeah, so I think a big thing we didn't mention yet is the Oasis demo, which is the Minecraft AI demo that a lot of people may have seen online. That was, [01:14:00] I think, a collaboration between Descartes and Etch. ai, which is this company making the ASICs, and it's not running on these chips yet. I don't think they have them made in production or deployed yet, but they say the demo's currently running at 360p with a very short context length on H100s, but once they do have the Etched AI chips out and running, These Minecraft AI simulations, they think they can get it to 4K resolution in real time with multi minute context length.
And so that will be really interesting and actually something that people might just do for fun rather than a cool kind of gimmick, right?
Nisten Tahiraj: All right, are you guys ready for me to wrap this up?
Nisten's Skepticism on Etched AI
Nisten Tahiraj: Yeah, Nisten, give us.I think Etched is a,okay, so I'm biased a bit because I've reviewed for other, pretty well known investors, a lot of chip startups.
And, the vast majority is a lot of, BS, so I am, biased against, in that manner. And I think Etched is a complete scam, to be honest. it's not a real chip. It does not exist at [01:15:00] all. they just paid this other company, Oasis, to make a joint blog post. It doesn't even run on their hardware.
They don't even have a demo board running on their hardware. it's all just hypothetical, just another. VC, Shiny website product made to be sold to other investors and this is not going to go anywhere and so far I don't think I've been wrong on these predictions. I think this whole thing is complete BS.
they have a small chance now that they have 120 million in funding to maybe find the right people to hire. It's just that knowing actual FPGA devs and the, how small the community of electrical engineers who, understand the stuff is, I'd say good luck, but, yeah, this is not, not real. And, the, if you look at the pictures of the die of the H100 and you look at the rendering that Etched has been using this time, you can see a lot of similarities there.
it basically just looks like someone took that in Blender and shaped things around. just look at the, Admin Tech H100 [01:16:00] die. it's basically that and just added some freaking stars and stuff. In the middle. So that's going to be my public assessment. We have Nisten on
Alex Volkov: the show, folks.
This is exactly right. Thank you.
Nisten Tahiraj: I, I'm not affiliated with anybody. This is, this is what I think. And if I'm wrong, you can always just show me the working chip. that's,
Alex Volkov: yeah. And if folks from Azure are listening to this and you want to prove Nisten wrong, show up with the chip. We would love to see.
Yeah, I'm going to focus for a second on that.
Yeah.
LDJ: Could I push back on this a little bit? Sure.
Alex Volkov: One last one, folks. We need to move on. We actually do need to move on. Come on, one last one.
LDJ: Okay, I guess real quick would be, I think it's just meant to be artist interpretation on that. And I think maybe they should make it that more clear that it's not the real chip on the website.
But they do seem to have some crack people that are actually in the industry and experienced in chip development at places like NVIDIA and other companies on their team. And the CEO himself wrote some of the back end for Apache TVM and some cool things like that, so I think people should look at the team behind the company, look more at details like that, and yeah, [01:17:00] I think it's a little hard to call them
Alex Volkov: just to like, Nisten has colorful language, but I will say coming into this not knowing anything, I saw the website, I was like, oh shit, ASIC is here.
Maybe I can buy this soon,now realizing that this is like a 3D printed, 3D designed, virtual thing, and maybe there's not a chip, this feels to me
Yam Peleg: Yeah, that's not right to, to, to at least tell it's fine if you want to invest or raise money, but you need to clarify that, that this is the stage you are.
This is the level, yeah. There are chips being
Alex Volkov: shipped. I got excited by the 500k, and I love the word blanket that Nissen told us specifically for me. Because 500k absolutely helped me start. If you also
Yam Peleg: specify numbers, where do numbers come from? numbers are.
Alex Volkov: Yeah, we like numbers.
Yam Peleg: Are a thing.
if these are numbers.
Nisten Tahiraj: It's very hard to make this and I think we have to respect all the work that people like Salas and IntenseTorrent are doing and they're actually shipping them and it's very hard to Make the A6 and you're just putting up PowerPoints, I think it doesn't acknowledge,[01:18:00] the massive amount of work that, that people who are shipping actual hardware, put in it.
And also they, it doesn't, there are many challenges with actually doing it in hardware that even Cerebrus is facing with getting longer context to work. And the moment that you leave the chip and you go into memory, you're still faced with the memory wall problem, which is also, Big famous video on, aginometry.
You should really watch that. The AI memory wall problem from a year or something ago. So you'll understand what actually this entails. And, yeah, please don't just make claims because, I'm just gonna. All of them are pretty unfiltered on the internet. And,
Alex Volkov: in a very unbiased way as well, which I appreciate you for.
all right, folks, moving on. And, this was most of the coverage for this week. The last thing that we haven't covered, and, we don't have time for necessarily, but, Mochi. the video model is now down to 12 gigabytes and soon to be run on hardware, consumer devices. And now I think we're moving on, to chat with Dex about some of the stuff that I heard and agent stuff as well.
And [01:19:00] then we're gonna, we're gonna close out after that. But, definitely stick with us because I'll do a recap of everything we've chatted about as well. Looks like my computer is like a little bit slow this time. I'm gonna send folks to the backstage so you guys can go and feel free to go to the restroom, whatever if you want.
Human Layer Introduction with Dex
Alex Volkov: Alright folks,for those of you who, Dexter, this is your first time on the show.
Dexter Horthy: Absolutely. Welcome, feel free
Alex Volkov: to introduce yourself to the folks who never heard, from you and then don't know who you are and what you do.
Building and Managing AI Agents
Dexter Horthy: Yeah, totally, thank you, and this has been super fun, super, thank you so much for having me on the show and, Yeah, I'll just say, for human layer, we're building an API and SDK that helps AI agents contact humans for help, feedback, approvals, whatever they need.
and it's really built for what I think is the future of AI agents, AI applications. I actually think the word agent is super overloaded. At this point, we've spent the last three months spending more time talking about what it means than, like, how to build them and, how to make them do useful things.
but what I believe is the future is what we call these, outer loop agents. Chat UI is very common for AI applications, but what I'd like to see and what I see more and [01:20:00] more people building are these agents that run out in the world, they're completely disconnected from a human, they're out there managing their own time, maybe even managing their own token spend, performing a variety of tasks, and when you take the human out of the direct supervision that you get in a chat UI, then you need new sets of tools.
And we were building these sorts of agents, at my previous startup. And we realized, oh, everyone building AI agents is eventually going to want to deploy them to the cloud in the background, and at that point, they're going to need ways to get back in touch with humans. we were building this SQL Warehouse Janitor bot that would drop a table if it hadn't been used in 90 days.
I'm like, I'm not 100 percent confident in this technology. We're a two person startup, so I don't have time to spend three months like getting all the evals and RLHR& T that it's never going to do something bad. let's just rig it up with Slack. When it wants to do something scary, it'll ping us.
We can review it. We can respond. And, it's getting towards what I think is going to become a bigger and bigger paradigm, which is like UI less or App less, I've been calling it.
Challenges and Innovations in AI Agent Development
Dexter Horthy: Where, because you have language model behind everything, you can build a [01:21:00] really great, really powerful application.
That has no web app, has no UI, and just communicates with people where they are. Whether it's over email, over Slack, over SMS, over WhatsApp. that's the dream of HumanLayer. we're building some agents,getting in the trenches to learn more about what is on the cutting edge. But, at the end of the day, we want to be the platform that enables people to build agents that do bigger things, more detached.
and that feel more like this, True. Virtual teammate. Virtual coworker kind of experience.
Alex Volkov: Yeah.
Human Layer's Vision and Future
Alex Volkov: So I wanna, so thanks for the introduction Human Layer, your role in Human Layer.
Dexter Horthy: I am the founder, C-E-O-C-T-O. We build all the things.
Alex Volkov: Oh, awesome. awesome. how long have you been doing Human Layer for?
Dexter Horthy: so we started exploring this, I basically became obsessed with this around like early August and spent 2 3 weeks building the MVP, came out to San Francisco, launched it with the AI Tinkerers community. Another shoutout by the way. Shoutout to Joe, yep. Joe's done an amazing job of cultivating a community where every single person you talk to is super technical, like actively building [01:22:00] stuff.
There's no, you go hang out at Discord, there's a lot of like hype people and stuff. You can see the lineage from the crypto to the meme stocks to the NFTs to like now there's a lot of that flowing into AI. Josephin's are super high signal and so we launched that after three weeks and got a ton of pull, a ton of great conversations like we found our people and so that was when we decided to pivot full time and let's help everybody build.
Alex Volkov: So here's how Dex ended up on the show. And folks, there's maybe a lesson for me in there, but definitely a lesson for folks who come up to me. We chatted for a brief period, like you said, human whatever. Dex is higher than me, which is not very common 6'3 And I was like, alright, let me chat with this tall dude.
And then at some point Dex was like, hey, maybe I can come on your show. And I was like See, now you've lost your chance. Usually when folks ask me to come on the show, that's usually when I'm saying, nope. What is the
Dexter Horthy: rule, right? If you ask to come on, then you can't come on, right? Yeah, usually.
Alex Volkov: And then, during the hackathon, the hackathon theme [01:23:00] was human in the loop.
So many folks built things where a human was required, this was like the theme of the hackathon, that you build something that runs and then the human, gets involved in some sort of way, which is Perfect for what you're building. and then, like many people use human layer and I think I saw a demo from you either, talking about this.
Yeah. I think you give a little demo. Yeah, and I looked at this and I was like, oh, that's super cool. And then we chatted and he actually showed me like a cool demo with Slack as well. And I was like, oh, that's super cool. And it was like, I would love to chat about this on the podcast. so this is the way, this is the way.
Show me something cool and I would love to share it with you folks. And what I loved about this specifically is the idea of, Look, I want to run stuff, and I have a few projects in mind that I would definitely want to, have this intelligence of mine that runs somewhere, that does something to trigger and say, Hey Alex, wherever I'm out in the world, and you shout it out, hey, which platforms would you guys want?
And I immediately said, Telegram, because Telegram is the [01:24:00] one that, I automate a lot, and the API is great, and super easy, you could probably bring it out in a day. send me a Telegram message, and say, Hey Alex, what do? What do? I would love this from an EA, that if I have an EA, I should probably get one.
I would love this from, from somebody that helps me on different things, and I would love this from my AIs that I would run, and this seemed to me the best way to achieve this. Now, here's my question to you. I could technically write this up. Feels like that,I could rig this up with some function calling.
I don't know exactly how, but, what do you say to folks who are, like, technical and they can, they think they can write this up? Because I have the answer, but I would love to hear what you
Dexter Horthy: answer. Yeah, it's something I thought about a lot in the first couple months of building this.
And it's cool, took a couple weeks, we can rig this like API, we can catch webhooks, we can. Fire that back down to the agent. I will tell you after the last month or so of building things, it gets really complex from a, AI agent like context window management and figuring out which messages go to which agent and all that kind of [01:25:00] stuff.
I just spent, the new thing I've been obsessed with in the last like week is agents that operate over email. So what I've wanted more than anything else for a long time is something where I can just forward an email to a specific email address. And that goes into an agent's context window. I got this email from our compliance provider and it said, Hey, you need to file your California Annual Compliance Report with a 250 penalty.
And I have a guy who does it. I have an operations contractor. He loves doing this kind of stuff. I don't know why. He's awesome. and I forward him the emails and then he can you take it.
Alex Volkov: But he's
Dexter Horthy: part time and I'm sometimes wondering like what he's working on or when things are. what I really want to do is like when I get one of those things is I go into Linear which we use for all of our projects and I create a ticket and then I ping him on the ticket and then I also forward him the email.
And it's just like a lot of, it's enough work, it's not hard, but it's enough work to take me out of flow when I'm just trying to grind through my email inbox. And so what I'd love to have and what I'm like working on probably this afternoon is I want to forward an email to some special address, it goes into an agent, that agent calls some tools and does some stuff, and then when it's ready to do the scary thing like create a ticket or send someone else an email on my behalf, [01:26:00] it gets back with me on email and says, hey here's what I'm going to do, and I can just like explicitly approve or reject that action.
And if it has questions, it emails me back as well, and so I can hold a conversation with the agent asynchronously, it's running out in the world, and so that's one of the core problems that I've been really excited about lately, recently.
Alex Volkov: Yeah,
Dexter Horthy: I think that answers your question.
Alex Volkov: You got there and then you moved into an email thing, and I appreciate this because I have multiple emails.
I would love to just send and for additional context, but
Dexter Horthy: yeah, back to why is, why wouldn't you just go this yourself? Yes. Once you wanna do Slack and email and maybe multiplex across multiple people and say, all of these two people and any of these four people have to approve this before it can go through, it becomes a.
Do I really want to spend all my time building approval flows or do I want to build my product?
Alex Volkov: That's my answer. Weights Biases Weave, which is an observability framework, which you should use, absolutely. People have been doing this for themselves, right? Either people are logging and saving these logs in log files, or people are dumping this in a sequence file, whatever.
Some people are [01:27:00] building a rudimentary UI on top of this. And people are saying like, some people are like, why do I need this? And the feature set that we've gotten to, there's no way for a company to build this and the product. It's just there's no way. Because the amount of complexity, the fact that we are like always online with the SLA, like all these things.
Infrastructure as a service basically is on top of it, there is a best UI on top of it. So I absolutely have an answer to this question, I just want you to verbalize this. But then you pivoted to something I would love to talk about. OpenTerpreter, which you probably, have you heard about OpenTerpreter?
Dexter Horthy: Yes, I've heard it a bunch of times, remind me which one that
Alex Volkov: is. OpenTerpreter is the software that runs on your Mac and executes commands on your Mac, like computer use, but on your Mac and does things on your Mac. Also from Tinkers in Seattle, and computer use from Anthropic. All these things, they have something that runs on your computer, and it's scary as hell.
And, the way that Anthropic dealt with this, they said, Hey, we're releasing this agent, or [01:28:00] not even agent,the model that knows how to do this. And then, We don't give you a tool to do this. We give you a docker container, only for developers, whatever, and it only runs in a docker, and you don't put any They're like super scared about what this could do, and they put this in a docker, and then also, it asks you some stuff, but mostly it doesn't ask you anything.
And it expects you to be there while it does its thing. Yeah. Open interpreter also multiple chat GPT stuff. They expect you to look at the thing and like it scrolled.
Dexter Horthy: Yeah. You click a button to approve the risk things
Alex Volkov: in the chat. Yes. and you wait there while it does its thing. And I hate it.
I'm done waiting. Like I don't wanna waiting. And you're talking about async.
Dexter Horthy: Yeah.
Alex Volkov: Async means I am free to do whatever I want until. And you're switching the formula basically, like the agent waits for me and I'm not waiting on the agent. And that's what I found novel, and I think this is the reason I want to talk to you.
would love to hear more about this because Are there any kind of Devon kind of also does that a little bit. Devon goes and does its own thing and you're like, Alright, I'm gonna go and come back and there's like a [01:29:00] full stack of software. I don't have access to Devon, by the way, Stephen, please give me access.
but this is at least the approach of these like newer tools on the cursor. They're like, they go, they do a bunch of things and then you come back. I definitely wanna think of a world where this happens more than I have to wait for them. , have you seen some of the stuff that you built?
what actually is practical in the real world for you at this point? That like you are able to take off your plate because you have now stuff that are asynchronous and you can go and enjoy your life.
Dexter Horthy: Yeah. Great. Great question. So like some things like lead research,I think that like building robo, SDR is for very specific use cases to go search very specific places where your users are.
we're able to find leads, go scrape a couple of webpages to get their email or contact info. Go all the way to the point of drafting the email. And that's when I get a Slack message that is Hey, here's the person, here's what I'm thinking, and here's what I'm going to send them.
And I've gotten to the point where after working with it for a couple weeks, and I, at the beginning I was rejecting almost half of the messages. this is wrong, or the tone is weird, or it just still, [01:30:00] even when the bot doesn't hallucinate, even when it calls the function right, sometimes it just sounds weird.
Bad. I'm not content. Go out and review those. Exactly. And I'm the type of person that I don't have the patience to like, build a giant eval set and test and test and test. I want to YOLO things out, keep an eye on them, make, guarantee nothing bad is going to happen. And then as it's operating in the real world on production data, I am constantly every couple days going in and updating the prompts and tuning it.
And now it's to the point where I accept 90 percent of the messages.
Alex Volkov: 90%? Yeah. What changed?
Dexter Horthy: I just updated the prompt a bunch.
Alex Volkov: Oh, you, so you like, you keep sending yeses and then as you know what happens and so you like, you know the examples
Dexter Horthy: I'm seeing every single thing, it gets wrong. and eventually we'd like to build features to automate that.
Yeah. It feels like you throw the bot in a channel with your four salespeople and they're reviewing all of that, and suddenly you created like very highly motivated data labelers. Because if they don't give feedback in the moment to the bot, then an email's gonna go out with their name on it. That's not.
Alex Volkov: Yeah, the additional, so first of all, yes, absolutely, they have human labeling built in [01:31:00] there, you, I'm thinking of like, where's the wheel of improvement that's automatic that you don't have to go and change the prompt. follow up on one thing is that the idea came to me that when I discussed my project with you that I want to build, you were talking specifically about approval or denial of features.
What I wanted to chat about you, about, is additional context.
Dexter Horthy: Yeah.
Alex Volkov: Approval Denial is like some sort of context. In my head, and the project that I wanted to, I'll just tell you folks, if somebody wants to build this. we have a GitHub repo for Weave. And that GitHub repo, folks are working like crazy.
They release a bunch of stuff, and they cut a release, and that release notes is vast. Developers! can't be bothered to put like very nice the formatted message of what they built and Sometimes it's connected to a Jira ticket That's another URL sometimes whatever and I want to automate some of that to be able to deliver to developers who are reading those news Or users who use weave like a nice formatted change log that makes sense for them even the screenshot I think you suggested like browser base, whatever even the [01:32:00] screenshot of what new What's new on a practical level, I want to automate this with AI.
I should, I think it should be possible. And in my head, ingesting some of this and then running through some of the links, whatever, already automatable. To a sense, but then I want a human in the loop to review this. The human in the loop, or maybe multiple humans in the loop, like maybe the product manager, etc.
Because it's obvious that the sloppy machine can hallucinate and then just say something that's wrong, and we're like, oh shit, no, we didn't release this, sorry folks. So you can't automate this fully, you can go all the way to prepare a draft, and you want to approve or accept this. But then on the way, I want additional context from the folks.
I maybe want to find the developer that's tagged on the GitHub issue, etc. How are you thinking about additional context on the way to approval? Is that a feature that's built in or is that like just a use case of human layer?
Dexter Horthy: Yeah, it's a great question. Yeah, these are the two main form factors that we've built in and actually I'll add also in the approver direct workflow when you reject something you're required to give feedback because the only way you're going to get a better answer is if you tell the LLM like why it's wrong because then it can try again and usually incorporate that [01:33:00] feedback.
But yeah, the other thing we have is, so that's transparent to the LLM. The LLM thinks it's sending an email. It might not even know that human approval is happening until the rejection comes in. But the other one is if an LLM outputs a tool called Output Structured Output that says, hey, I need help here.
I want it. I've been told in the prompt to get feedback or I've tried calling a bunch of tools and none of them are working and I'm just stuck. Or, I'm in a browser session, I need a 2FA code, and I need to pull in a human to come in and get me unstuck, and then I can proceed.
Alex Volkov: Oh, that's one, just, oh, cool, okay.
Dexter Horthy: yeah, but it's, the LLM is deciding to call a human, and so you can, the human layer lets you expose a variety of tools. You can give one tool to contact one person, or you can, hey, here's five tools, here's how you can contact the head of ops on Slack, here's how you can email the CEO, whatever roles that you want to give the LLM access to, you can code that in.
Alex Volkov: Awesome.and additional contacts as well. Alright, Dex, I think, tell me where HumanLayer is going from now on, and how can folks find you and how can they use this, and, and we'll close there.
Dexter Horthy: Yeah, I'm Dex Forthea on [01:34:00] Twitter, you can find me on LinkedIn, we are online at humanlayer. dev, and we are, again, I love the AI tinkerers community, I'm working with, at some point we're building a venture backed business, so we have to find a way to make money.
But my favorite thing is getting in the weeds with indie hackers who have really interesting use cases, who are going really groundbreaking things with AI. And on the where we're going part, I was going to try not to say the A word, but I'll put it in this context. Sam keeps talking about AGI is going to look something like a senior engineer that you can just give some tasks and it'll go off and figure this out and come back to you with an answer.
And I think what we have today is more like summer interns. And if you asked me, I would never, I would say, Hey, look, go research all this stuff, draft a bunch of emails, and then come sit with me, and I'm going to review them with you, and then we can go send them out, and you'll learn over time from those reviewing and coaching sessions.
And I think that is a big step towards something that feels a lot more like a virtual teammate, like a virtual coworker. And so the future is we want to build, we [01:35:00] want to be at the core of. The next generation of really awesome AI experiences.
Alex Volkov: And middleware, middleware layer stuff. Yeah. That gets in the way. And that specifically is oh, that's exactly what I need for some of my projects as well. agents mean a lot of things as well, , and for a lot of them, getting to a stop and ask the human is something that people will maybe don't even know they need.
And when they do need, they don't even know that.
Dexter Horthy: When people try it for the first time, they're like, oh, now I get it. And then they text me two days later, and it's it was crazy. I had an agent running on my laptop, and I went out to get food, and it slacked me about something that I needed to pay attention to. And I was able to pull right back in.
And I was like, that's the magic. And we want to build more of that.
Yeah, so we just launched email this week. We've had Slack for a while. That one's kind of the tightest and most thought through. sounds like we need Telegram, for it to make your stuff work. And then, we're looking at Discord. as soon as someone offers me enough money, we'll build MS Teams support.
I don't know when that's going to happen. Yeah. and then we also have a private alpha for RCS, SMS, Whatsapp, these more non consumer facing things.
Alex Volkov: Yeah. [01:36:00] Messages supports RCA or whatever, format that they now support with Androids. RCS is, yeah, it's going to be by message
Dexter Horthy: soon. I think it's like end of the year, all the carriers are planning to have RCS going.
Alex Volkov: Amazing. Alright, so Dex, thank you for, coming to ThursdAI. Folks, this was Dex from Human Layer. go follow him wherever he is. If this fits your agent use case, let us know. give Human Lair a shout out. as a reminder, if I ever do sponsor content on ThursdAI, you will know. I just met Dex, and I thought it was cool, and I thought that many of you will enjoy this on, on the stage and maybe for your projects.
Let me bring some of the folks back up.
Recap and Closing Remarks
Alex Volkov: Wolfram and Nisten, as we, as a kind of recap, my, my Mac seems to be Doesn't want you to do this. I'll bring you folks back up. LDJ looks like it's dropped. all right, folks, we've been at this for two hours and something. Jan, welcome back up, and LDJ will join us as well.
Let's do a little recap, super quick, and then we're gonna let you go for the rest of Thursday. We did not get a one, did we?
Nisten Tahiraj: Oh, breaking news as well, at least,Nous Research [01:37:00] announced, you can, just chat to their model now on, you can go to. Hermes. nousresearch. com. So that's H E R M E S at nousresearch.
com. And I think that's their 70B. I'm trying it now. It's pretty good. Yeah, it's just their Is that Hermes 3? 70B. yeah. So they launched their own, UI and I think you can Wait, is that Forge? Yeah, I think so. I don't know. Hermes. It looks like an app. It's an XDS app. Alright, fine. You can maybe look
Alex Volkov: at
Nisten Tahiraj: the code.
Alex Volkov: Hermeschat. It doesn't look like it's Forge. Okay. I'm waiting for Forge. Forge is the one that I'm, that I want. but let's see if this is it. Let me log in.
Yam Peleg: And it's like free of charge. I can just
Nisten Tahiraj: Yeah, you can just I can just
Yam Peleg: Alright.
Nisten Tahiraj: It works. Cool. I just dumped in a prompt. I literally just tried it.
yeah. I
Yam Peleg: tried it now. Yeah. It's pretty good to me.
Alex Volkov: Ah, wait. Okay, so let's, somebody wants to show it? I wish I could. Yeah. Hahaha. Alright. Where's the reasoning stuff? is this it? I keep waiting for Forge. I don't know if this is the Forge stuff. System prompt is [01:38:00] active. No, I don't see reasoning here.
Okay, so I'll be waiting for forwards from Nous Research where you can actually see reasoning steps, and that's something that they've announced, so shout out to Hermes, we're going to add this to the show notes. They are going to
Yam Peleg: collect such a great data with this. Yes. This is
Alex Volkov: such a great mechanism for collecting data.
All right, folks. Have a happy Thursday. We'll see you next week. Bye bye. Have a great week. Bye.
Dexter Horthy: Thanks, y'all.
📆 ThursdAI - Nov 7 - Video version, full o1 was given and taken away, Anthropic price hike-u, halloween 💀 recap & more AI news