ThursdAI - The top AI news from the past week

🔥 ThursdAI - Feb 15, 2024 - OpenAI changes the Video Game, Google changes the Context game, and other AI news from past week

0:00

-1:57:36

🔥 ThursdAI - Feb 15, 2024 - OpenAI changes the Video Game, Google changes the Context game, and other AI news from past week

From Weights & Biases, I bring you one of the craziest AI weeks (well days TBH) since I started writing this newsletter. Also, notes from my SF trip, 3 deeper conversations and more AI news

Alex Volkov

Feb 16, 2024

Holy SH*T,

These two words have been said on this episode multiple times, way more than ever before I want to say, and it's because we got 2 incredible exciting breaking news announcements in a very very short amount of time (in the span of 3 hours) and the OpenAI announcement came as we were recording the space, so you'll get to hear a live reaction of ours to this insanity.

We also had 3 deep-dives, which I am posting on this weeks episode, we chatted with Yi Tay and Max Bane from Reka, which trained and released a few new foundational multi modal models this week, and with Dome and Pablo from Stability who released a new diffusion model called Stable Cascade, and finally had a great time hanging with Swyx (from Latent space) and finally got a chance to turn the microphone back at him, and had a conversation about Swyx background, Latent Space, and AI Engineer.

I was also very happy to be in SF today of all days, as my day is not over yet, there's still an event which we Cohost together with A16Z, folks from Nous Research, Ollama and a bunch of other great folks, just look at all these logos! Open Source FTW 👏

This picture is taken as I’m writing this! 🎉

TL;DR of all topics covered:

Breaking AI News
- 🔥 OpenAI releases SORA - text to video generation (Sora Blogpost with examples)
- 🔥 Google teases Gemini 1.5 with a whopping 1 MILLION tokens context window (X, Blog)
Open Source LLMs
- Nvidia releases Chat With RTX local models (Blog, Download)
- Cohere open sources Aya 101 - 101 languages supporting 12.8B model (X, HuggingFace)
- Nomic releases Nomic Embed 1.5 + with Matryoshka embeddings (X)
Big CO LLMs + APIs
- Andrej Karpathy leaves OpenAI (Announcement)
- OpenAI adds memory to chatGPT (X)
This weeks Buzz (What I learned at WandB this week)
- We launched a new course with Hamel Husain on enterprise model management (Course)
Vision & Video
- Reka releases Reka-Flash, 21B & Reka Edge MM models (Blog, Demo)
Voice & Audio
- WhisperKit runs on WatchOS now! (X)
AI Art & Diffusion & 3D
- Stability releases Stable Casdade - new AI model based on Würstchen v3 (Blog, Demo)
Tools & Others
- Goody2ai - A very good and aligned AI that does NOT want to break the rules (try it)

🔥 Let's start with Breaking News (in the order of how they happened)

Google teases Gemini 1.5 with a whopping 1M context window

This morning, Jeff Dean released a thread, full of crazy multi modal examples of their new 1.5 Gemini model, which can handle up to 1M tokens in the context window. The closest to that model so far was Claude 2.1 and that was not multi modal. They also claim they are researching up to 10M tokens in the context window.

The thread was chock full of great examples, some of which highlighted the multimodality of this incredible model, like being able to pinpoint and give a timestamp of an exact moment in an hour long movie, just by getting a sketch as input. This, honestly blew me away. They were able to use the incredible large context window, break down the WHOLE 1 hour movie to frames and provide additional text tokens on top of it, and the model had near perfect recall.

They used Greg Kamradt needle in the haystack analysis on text, video and audio and showed incredible recall, near perfect which highlights how much advancement we got in the area of context windows. Just for reference, less than a year ago, we had this chart from Mosaic when they released MPT. This graph Y axis at 60K the above graph is 1 MILLION and we're less than a year apart, not only that, Gemini Pro 1.5 is also multi modal

I got to give promps to the Gemini team, this is quite a huge leap for them, and for the rest of the industry, this is a significant jump in what users will expect going forward! No longer will we be told "hey, your context is too long" 🤞

A friend of the pod Enrico Shipolle joined the stage, you may remember him from our deep dive into extending Llama context window to 128K and showed that a bunch of new research makes all this possible also for open source, so we're waiting for OSS to catch up to the big G.

I will sum up with this, Google is the big dog here, they invented transformers, they worked on this for a long time, and it's amazing to see them show up like this, like they used to do, and blow us away! Kudos 👏

OpenAI teases SORA - a new giant leap in text to video generation

You know what? I will not write any analysis, I will just post a link to the blogpost and upload some videos that the fine folks at OpenAI just started releasing out of the blue.

You can see a ton more videos on Sam twitter and on the official SORA website

Honestly I was so impressed with all of them, that I downloaded a bunch and edited them all into the trailer for the show!

Open Source LLMs

Nvidia releases Chat With RTX

Chat With Notes, Documents, and Video

Using Gradio interface and packing 2 local modals, Nvidia releases a bundle with open source AI packaged, including RAG and even Youtube transcriptions chat!

Chat with RTX supports various file formats, including text, pdf, doc/docx, and xml. Simply point the application at the folder containing your files and it'll load them into the library in a matter of seconds. Additionally, you can provide the url of a YouTube playlist and the app will load the transcriptions of the videos in the playlist, enabling you to query the content they cover.

Chat for Developers

The Chat with RTX tech demo is built from the TensorRT-LLM RAG developer reference project available from GitHub. Developers can use that reference to develop and deploy their own RAG-based applications for RTX, accelerated by TensorRT-LLM.

This weeks Buzz (What I learned with WandB this week)

We just released a new course! Hamel Hussein released a course on enterprise model management!

Course name: Enterprise Model Management
Course Link: wandb.me/emm-course
Who is this for: The course is targeted at enterprise ML practitioners working with models: MLOps engineers, ML team leaders, ML engineers. It shows both at conceptual and technical level how to get the most value of W&B Model Registry and automations. Attached is also a screenshot of a slide from the course on what different personas (MLOps, ML exec etc) get from Model Registry.
What can they expect: Learn how to store, version, and evaluate models like top enterprise companies today, using an LLM training & evaluation example. Big value props: improved compliance, collaboration, and disciplined model development.

Vision & Video

Reka releases Reka Flash and Reka Edge multimodal models

Reka was co-founded by Yi Tay, previously from DeepMind, trained and released 2 foundational multimodal models, I tried them and was blown away by the ability of the multi-modals to not only understand text and perform VERY well on metrics (73.5 MMLU / 65.2 on HumanEval) but also boasts incredible (honestly, never before seen by me) multi modal capabilities, including understanding video!

Here's a thread of me getting my head continuously blown away by the quality of the tonality of this multimodality (sorry...😅)

I uploaded a bunch of video examples and was blown away, it understands tonality (with the dive dive Diiiiive example) understands scene boundaries, and does incredible OCR between scenes (the Jason/Alex example from speakers)

AI Art & Diffusion

Stable Cascade (link)

Stability AI introduced a new text-to-image generation model called Stable Cascade that uses a three-stage approach to produce high-quality images with a compressed latent space, making it more efficient to train and use than previous models. It achieved better results than other models in evaluations while having faster inference speeds. The company released code to train, fine-tune, and use control models like inpainting with Stable Cascade to enable further customization and experimentation. Stability AI aims to lower barriers to AI development through models like this one.

Nate did a comparison between a much slower SDXL and Stable Cascade here:

Here’s the transcript for the whole episode, you definitely should check it out! It was really one of the coolest shows we had, and we had over 2K folks listening in!

[00:00:00] Alex Volkov: Hey, this is Alex Volkov, you're on ThursdAI, and I just gotta record this intro real quick, because today marks one of the more singular days in AI that I remember since I started recording ThursdAIs, which was itself a singular day, March 14th, 11 months ago, when GPT 4 was released and announced. We since then had a few days like this GPT Dev Day was one such day, and today marks another one.

[00:00:38] Alex Volkov: Google has released an update to their model, talking about 1 million tokens in the context window, basically unlimited. And then, just a few, just an hour or two later, OpenAI said, you know what, we also have something in store, and released the most incredible jump. Incapability of video generation, text to video generation.

[00:01:02] Alex Volkov: It's called SORA, and what you hear is us recording live, knowing only about Google, which came out an hour and a half before we started recording, and then somewhere in the middle, I think minute 35 or something, you'll hear our live reaction to the Incredibly mind blowing advancement in text to video that OpenAI just released.

[00:01:31] Alex Volkov: And I just wanted to record this as I'm finishing up the editing and about to start writing the newsletter, to say, days like this really are the reason why I'm all in on AI and I'm very excited about the changes and advancements.

[00:01:49] Alex Volkov: And I'm sure there will be more days like this going forward. We've yet to see what Apple came up with, we've yet to really see what Meta comes up with Llama 3, etc. And, yeah, I just wish you enjoyed this and I don't have a lot of words here besides just letting you listen to the rest of the episode and say that I was very happy to be in San Francisco for this, the place where most of this happens, and I was very happy to be in company of good friends, both in the virtual world those on stage in our Twitter live recording, and I was sitting across from Swyx, a friend of mine with whom I recorded an interview at the end of this, you can hear.

[00:02:30] Alex Volkov: I just couldn't let go of this chance. We also had a conversation, besides the updates and the breaking news, we also had conversations with the folks who worked on some of the stuff we talked about. I interviewed Yi Te and Max Bain from RECA, which you'll hear later, and the deep dive into RECA multimodal models, which blew me away just yesterday.

[00:02:52] Alex Volkov: And so my head kept getting blown away this week. And I also interviewed The folks who built Stable Cascade, a new stability model that outperforms the existing stability models. Dome, and Pablo. And all of those were great conversations, in addition to just generally the folks who joined me from week to week, Nisten and Far El and Alignment Lab, and we had Robert Scoble join us, with whom I've been buddying up since Vision Pro was released, as he was expecting, and that blew me away just a week ago.

[00:03:23] Alex Volkov: And I'm very excited to share with you this whole thing, and I hope that Yeah, I hope you enjoyed this as much as I do, and I hope that you enjoyed listening to these as much as I enjoy making them. And if you are, just share them with a friend, it would really help. And give us a 5 star review on Apple.

[00:03:38] Alex Volkov: This would great, gratefully help. With that, I'll give you the ThursdAI thing.

[00:03:43] Alex Volkov: All right, let's go. How's it going, everyone? Welcome to ThursdAI. Today is February 15th, and it's quite a day in the AI updates that we've had so far. Quite a day. Even today, this morning, we had like a bunch of updates. But besides those, we had quite a crazy week as well very interesting show today, very interesting show today.

[00:04:13] Alex Volkov: My name is Alex Volkov, I'm an AI evangelist with Weights Biases, and right now I'm getting my picture selfie taken by my today's co host, Swyx. Welcome,

[00:04:23] Swyx: Hey, hey, hey. Good morning, everyone.

[00:04:25] Alex Volkov: And we're in the Latent Space Studio in San Francisco. I flew in just last night. And as I was flying in, there was more news happening. So we're going to cover all of this.

[00:04:34] Alex Volkov: We have a very exciting show today. We have a bunch of guests, special guests that are coming on the second hour of this. So hopefully we'll see folks from the RECA models, and hopefully we'll see some folks from Stability. We're going to get to talk about Google and everything in between. So meanwhile, settle in.

[00:04:50] Alex Volkov: This is going to be a great show today in San Francisco. And maybe I'll also probably share with you why I Flew in here today. That's gonna come up next. So welcome to ThursdAI and we're gonna get started. All right there. Let's get started. Let me Smoothly fade out the music, say hi to everyone here on stage. Hey, Nisten, welcome. We have Robert Skobul over here, folks. We've been, we've been more, more friendly lately than usual because Robert and I are both members of the VisionPro cult. I think that's what you call it, Robert.

[00:05:37] Alex Volkov: But today is, today's the space for, for AI. But Robert you've been covering AI on your feed as well for, for a long time. We have, obviously Swyx is on stage, but also in front of me, which is super cool. And it's been a while, brother. It's great, you just flew back from

[00:05:51] Swyx: Singapore.

[00:05:52] Swyx: Yeah, Chinese New Year.

[00:05:53] Alex Volkov: Are you jet lagged at all or are you good?

[00:05:55] Swyx: I'm good actually. I have had very little sleep, but for some reason that always helps with the jet lag.

[00:06:00] Alex Volkov: Yes, awesome. And I also want to say hi to Alignment Labs, Austin and Far El as well, folks who are working on open source models, and we usually cover a bunch of stuff that they're doing, and usual co hosts and experts here on ThursdAI.

[00:06:11] Alex Volkov: So if you never joined ThursdAI before, just a brief kind of recap of what we're doing. As I said before, my name is Alex Volkov. I'm an AI evangelist with Weights Biases. It's always so fun to say. And Weights Biases is a company that is basically helping all these companies build their AI models, and it's super cool.

[00:06:26] Alex Volkov: And I flew in, I went to the office last night, and I have some cool videos to share with you from the office as well.

[00:06:32] Alex Volkov: and this

[00:06:33] Alex Volkov: is ThursdAI. ThursdAI is a Twitter space and newsletter and podcast that I started a year ago. And then slowly this built a community of fine folks who show up to talk about everything that happened in the world of AI for the past week.

[00:06:46] Alex Volkov: And there hasn't been many weeks like this last week that highlight how important and how cool ThursdAI actually is. Because we just had So much, so much to cover today and usually I start the space with a roundup of the stuff that we're going to run through just for folks who are not patient, don't have a lot of time and we're going to just run through everything we're going to talk about and then we're going to dive deep because we have some breaking news and I even have, hopefully, I have my breaking news button.

[00:07:16] Alex Volkov: Oh, I don't. Oh my God. Okay.

[00:07:17] Swyx: Oh no.

[00:07:17] Alex Volkov: I'm not set up for a breaking news button, but it's fine.

[00:07:20] Alex Volkov: We'll imagine this. I'm going to put this in the, in the, in the post edit. With that said, are you guys ready for a brief recap? Let's go to a brief recap.

[00:07:27] Recap and TL;DR

[00:07:27] Alex Volkov: Alright, folks, back for the recap. Today is Thursday. ThursdAI, February 15th. This is a recap of everything we talked about. And, ooh, boy, this was one of the worst days to be caught outside of my own personal production studio because my, my breaking news button didn't make it all the way here. And there was so much breaking news.

[00:07:57] Alex Volkov: So obviously as I woke up, the biggest breaking news of today was Ai. Actually cannot decide what was the biggest breaking news. So the first piece of breaking news from today was Google releasing a teaser of Gemini 1. 5. And 1. 5 was not only a continuation of Gemini Pro that we got last week, 1. 5 actually was teased with up to 1 million, a whopping 1 [00:08:20] million tokens in the context window, which is incredible.

[00:08:23] Alex Volkov: It's just for comparison, JGPT is currently at 128 and cloud to the best. Highest offering up until Gemini was 200k with Entropic Cloud Advanced and Google teased this out of the gate with 1 million token and their claim they have up to 10 million tokens of context window in in in the demos, which is incredible.

[00:08:44] Alex Volkov: And they've shown a bunch of demos. They did the needle in the haystack analysis that we've talked about from Greg Cumbrand and just quite an incredible release from them. They talked about that you can put a whole like hour of a movie of Dustin Keaton, I think it's called. And then you can actually ask questions about the movie and we'll give you the exact.

[00:09:03] Alex Volkov: Timestamp of something happens. They talked about it being multimodal where you can provide a sketch and say, Hey, when this, this scene happened, it will pull out just like incredibly like magic, mind blowing, mind blowing stuff. And all of this needs a lot of context because you take this, you take this video, you turn it into images, you send this into context.

[00:09:22] Alex Volkov: They also talked about, you can send 10 hours of audio within one prompt and then some ad, And the quality of retrieval is very, very high. You're talking about like 90 plus percentage, 95 plus percentage in the haystack, which is incredible. Again, we had Enrico Cipolla, a friend of the pod who worked on the Yarn paper and the rope methods before extending the LLAMA context.

[00:09:46] Alex Volkov: And he brought like four papers or something that show that open source is actually unlocking this ability as well. And not only today was a credible day just generally, but not only Google talked about a large context window, we also saw that Nat Friedman and Daniel Gross just invested 100 million in a company called Magic, that they also talk about multimodality and large context window up to 1 million as well.

[00:10:08] Alex Volkov: So it was very interesting. To see both of them release on the same day as well. We then geeked out about Gemini. We talked about Andre Karpathy leaving open AI and, and invited him to come to Thursday AI and latent space as well. And then we also mentioned the OpenAI ads, memory and personalization to charge G bt, which is super cool.

[00:10:25] Alex Volkov: They didn't release it to many people. Yeah, but personalization is my personal thread of 2024 because these models, especially with the larger, larger context window with personal per perfect recall, these models will. become our buddies that will remember everything about us, specifically, especially tied into different devices.

[00:10:43] Alex Volkov: Like the tab that's somewhere here behind me is getting built in San Francisco. We, we briefly mentioned that NVIDIA released the chat with RTX local models that you can download and run your NVIDIA GPUs. It has rack built in. It has a chat with YouTube videos and super cool. We talked about Cohere release and AYA 101 multimodal.

[00:11:01] Alex Volkov: And our friend of the pod Far El was talking about how he wasn't finding like super impressive. Unfortunately, He dropped in the middle of this. Apologies for El, but Cohere released a big multi model, which is also pretty cool. We mentioned that NOMIC, our friends at NOMIC, which we mentioned last week, released open source embeddings.

[00:11:17] Alex Volkov: If you guys remember, they released an update to those embeddings, NOMIC Embed 1. 5 with Matryoshka embeddings. Matryoshka. is obviously the name for the Russian doll that like sits one inside each other. And we're going to actually talk with the authors of the Matryoshka paper in not the next Thursday, the next after that.

[00:11:34] Alex Volkov: So we're going to cover Matryoshka but it's what OpenAI apparently used, not apparently, confirmed they used to reduce dimensions in the API for embeddings. Super cool. We're going to dive deep into this. As we're going to learn, I'm going to learn, you're going to learn. It's going to be super cool.

[00:11:48] Alex Volkov: And as we're talking about OpenAI I got a ping on my phone because I'm subscribed to all updates from their main account and we had a collective holy shit moment. Everybody's jaw was on the floor because OpenAI just released Sora, which is a foundational video model, text to video model, that just blew us the F away, pardon my French, because of the consistency.

[00:12:08] Alex Volkov: So if and if you've seen The how should I say the area of video generation has been has been evolving fairly quickly, but not as quick as what we just saw. We saw first we saw attempts at taking stable diffusion rendering frame by frame and the consistency wasn't there. It was moving from one to to another, like the face would change and everything.

[00:12:30] Alex Volkov: You guys saw this, right? So we moved from the hallucinatory kind of videos to Towards consistency videos where stable diffusion recently released and gave us SVD, which was like one to two to three seconds videos. Runway ML gives you the option to choose where the video is going to go. If it's going to be zoom in like brushes, all these things.

[00:12:49] Alex Volkov: And now all of them seem just so futile because open the eyes, Sora, can generate up to 60 seconds of a video. And honestly, we were sitting here just watching all of us just open the Sora website, and we were just mind blown away by the consistency and the complexity of the scenes that you can generate, the reflections.

[00:13:06] Alex Volkov: There was one scene where a woman was walking through the, a very busy street in Japan, and her coat stays the same, her face stays the same. There's another where a Dalmatian dog climbs out of one window and jumps into another. All the spots on the Dalmatian are perfect. perfectly in balance the legs are it's it's really unbelievable how high quality of a thing OpenAI released and what's unbelievable to me also is that The jump from what we saw in video to the open source stuff, or even the runway stuff and Pico stuff, the jump in fidelity, in quality, in consistency, is so much higher than the jump from like 200, 000 tokens to 1 million tokens that Google did.

[00:13:44] Alex Volkov: So it does feel like some folks in OpenAI sat there and said, Hey, Google just released something. It's super cool. It's picking up attention on Twitter. Let's release something else that we have behind the scenes. It looked super polished. So shout out to the folks who worked on Sora. It's really, if you haven't seen The videos, you'll see them in show notes and definitely you'll see them everywhere because Hollywood is about to get seriously, seriously disrupted with the, just the level of quality is amazing.

[00:14:08] Alex Volkov: Compare this with all the vision and, and, and sound stuff. I, moving back to the recap, I'm getting excited again. We also, then we talked about Reka and Reka Flash and Reka Edge from a company called Reka AI. And then, as I love bringing the people who actually built. the thing to talk about the thing.

[00:14:23] Alex Volkov: So we had Yitei and we had Max as well from Reka. Max made for Reka to talk to us about their multimodels. I was very, very impressed with Reka's multimodal understanding. And I think this model compared to Gemini Pro, which is probably huge and runs all the GPUs and TPUs. This model is 21 billion and Reka Edge is even smaller.

[00:14:41] Alex Volkov: And yet it was able to understand my videos to an extent that even surprised the guys who were the co founders of the company. It understood tonality, understood text. And audio in a very specific and interesting way. So we had a conversation with the RECA folks and continuing on this thread. We also had a new model from Stability called Stable Cascade that is significantly faster than SDXL and generates hands and text out of the blue.

[00:15:07] Alex Volkov: It's based on something called Worst Chen, which we learned is a hot dog today. And we had the folks that work behind this, Dom and I'm blanking on the name of the other author that joined. I apologize. It was a very exciting day. So we had a conversation with the guys behind Worshen and Stable Cascade as well.

[00:15:24] Alex Volkov: So definitely check this out. We mentioned that WhisperKid runs now on watchOS, which is quite incredible because Siri's voice to text is still not that great. And I think that's mostly of what we discussed. And then I flipped the mic on my, on my friend here that sits in front of me and I just had a deep dive interview with Swyx.

[00:15:41] Alex Volkov: In the latent space, he just posted a few images as well, and it was a great conversation as well, so definitely worth a follow and a listen if you haven't listened to this. With that, I think we recap ThursdAI on one of the more seminal days that I remember in the AI one after another, and we all hope that, Meta will just release Llama 3

[00:16:01] Investments updates from Swyx

[00:16:01] Alex Volkov: Unless I missed some stuff that's very important. I'll just double check. Nisten, out of the stuff that we've sent, did I miss anything else? Swyx, did I miss anything else?

[00:16:10] Swyx: Today there was also a LangChain Series A. True. With LangSmith.

[00:16:13] Swyx: Yes. There was Magic. dev, Series A with Nat Friedman.

[00:16:16] Alex Volkov: So I was thinking to cover this around the Google stuff because they also announced a longer context craziness.

[00:16:21] Alex Volkov: But definitely, definitely both of those.

[00:16:23] Swyx: Lambda Labs, Alonzo 300 million, Series C.

[00:16:26] Alex Volkov: Oh, wow, yeah, I even commented. I said, hey, Mitesh good. So we love Lambda, definitely. Most of the stuff that we play around with is happening in Lambda. And

[00:16:34] Swyx: Lindy also had their GA launch today.

[00:16:37] Alex Volkov: nice. Okay. Today

[00:16:38] Swyx: Today was a very bad day to launch [00:16:40] things, because everyone else launched

[00:16:41] Swyx: things.

[00:16:41] Swyx: Yes. If you're not Gemini, it's going to be a struggle

[00:16:44] Alex Volkov: I was just thinking, magic. dev, and I guess let's move to just discussing kind of the breaking news of the hour, as we already is. Let's talk about Google, and Gemina 1. 5.

[00:16:55] Google teases Gemini Pro 1.5 with 1M context windows

[00:16:55] Alex Volkov: Do we do a musical transition? Sure, let's do a musical News. This is not the Breaking News music. By not even a stretch, this is not a Breaking News music. But, imagine that we have Breaking News right now, because we do. Just an hour or so ago, we had an update from Jeff Dean and then Sundar Pichai and then a blog post and then a whole thread and a bunch of videos from Google.

[00:17:27] Alex Volkov: And if you guys remember some Google videos from before, these seem more authentic than the kind of the quote unquote fake video that we got previously with Gemini Ultra. So just a week after Google released Gemini Ultra, which is now available as aka Gemini Advance. And just a week after they killed Bard almost entirely as a concept they're now teasing.

[00:17:48] Alex Volkov: Teasing did not release, teasing. Gemini 1. 5, 1. 5, they're teasing it and they're coming out with a bang. Something that honestly, folks at least for me, that's how I expect Google to show up. Unlike before, where they're like lagging after GPT 4 by eight months or nine months, what they're doing now is that they're leading a category, or at least they're claiming they are.

[00:18:07] Alex Volkov: And so they released Gemini 1. 5, and they're teasing this with a whopping 1 million tokens. in context window on production and up to 10 million tokens in context window in research. And just to give a context, they put like this nice animated video where they put Gemini Pro, which they have currently, not 1.

[00:18:26] Alex Volkov: 5, the Pro version. is around 32, I think, and then they have GPT 4 with 128 and then they show Cloud 2 is at 200k and then Gemini 1. 5 is a whopping 1 million tokens, which is ridiculous. Not only that, they also came a little bit further and they released it with the Needle in Haystack analysis from our friend Greg Kambrad, which usually does this.

[00:18:50] Alex Volkov: We'll not be able to pronounce his name. I asked Greg to join us. Maybe he will. A needle in a haystack analysis that analyzes the ability of the model to recall whether or not it's able to actually process all these tokens and actually get them and understand what happens there. And quite surprisingly, they show like 99 percent recall, which is incredible.

[00:19:10] Alex Volkov: And we all know, previously in long context windows, we had this dip in the middle. We've talked about the The butter on toast analogy, where the context or attention is like the butter and context window is the toast and you spread and you don't have enough for the whole toast to spread evenly.

[00:19:27] Alex Volkov: We've talked about this. It doesn't seem, at least

[00:19:30] Alex Volkov: on

[00:19:30] Alex Volkov: the face of it, that they are suffering from this problem. And that's quite exciting. It is exciting because also this model is multi modal, which is very important to talk about. They definitely show audio and they are able to scrub through, I said, they said, I think they said 10 hours of audio or so.

[00:19:47] Alex Volkov: Which is quite incredible. Imagine this is going 10 hours of audio and say hey, when When did Alex talk about Gemini in ThursdAI? That would be super dope and Quite incredible. They also did video. They showed a hour of video of Buster Keaton's something and because the model is multi modal the cool thing they did is that they provided this model with a reference of with a sketch.

[00:20:11] Alex Volkov: So they drew a sketch of something that happened during this video, not even talking about this, just like a sketch. And they provided this multimodal with an image of this and said, when did this happen in the video? And it found the right timestamp. And so I'm very, very excited about this. If you can't hear from my voice, Swyx can probably tell you that it looks like I'm excited as well, because it's, it's quite.

[00:20:31] Alex Volkov: As far as I'm considering a breakthrough for multiple reasons. And now we're gonna have a short discussion.

[00:20:35] Enrico taking about open source alternatives to long context

[00:20:35] Alex Volkov: I want to say hi to Enrico here. Enrico welcome up on stage. Enrico Cipolli, one of the authors of the Yarn paper. And like we've had Enrico before talk to us about long context. Enrico, as we send this news in DMs, you replied that there have been some breakthroughs lately that kind of point to this.

[00:20:51] Alex Volkov: And you want to come up and say hi and introduce us briefly. And let's chat about the long context.

[00:20:57] Enrico Shipolle: Hi, Alex. Yeah, so there actually have been a lot of research improvements within the last couple months, even from before we submitted YARN. You could still scale even transformers to millions of essentially context. length back then. We previously in YARN worked on scaling the rotary embeddings, which was a traditional issue in long context.

[00:21:19] Enrico Shipolle: So I, if you don't mind, I'll probably go through some of the research really quickly because unfortunately,

[00:21:25] NA: so on January 2nd, there was one called, it's called LLM, maybe long LLM. That's a mouthful essentially, but they were showing that you can process these long input sequences during inference using something called self extend, which it allows you to basically manage the context window without even fine tuning these models.

[00:21:48] NA: And then on January 7th, 2024, there was another paper that released, it's called Soaring from 4k to 400k, which allows you to extend like the LLM's context with something called an activation beacon. With these activation beacons, they essentially condense raw activation functions in these models to a very like compact form, which essentially the large language model can perceive this longer context.

[00:22:14] NA: Even in a smaller context window, the great thing about these activation beacons or the LLM, maybe long LLM, is essentially they only take a few lines of code to modify the transformer architecture and get all these massive performance benefits for long context inference.

[00:22:33] Alex Volkov: Are

[00:22:33] Alex Volkov: you serious? Are we getting one of those breakthroughs that take two lines of code, kind

[00:22:37] NA: No so basically all of these require minimal code changes to even be able to scale to, to long, like token counts, whether it's audio, video, image, or text. Text is. Generally, like the shortest token count, if you look at something like RefinedWeb or SlimPajama the, the average token count of a piece of text in that is only anywhere from 300 to 500 tokens.

[00:23:02] NA: So this is actually generally a data centric issue too, when you're talking about long context with even training a standard natural language processing model. The thing about audio and video is, is these have a ton of tokens in them. And the one good thing, and then? the final note, I'm, I'm going to put in, unfortunately, before I have to head out, I know this was a lot of information.

[00:23:22] NA: I can link these

[00:23:24] Alex Volkov: Yeah, we're gonna add some, some of this, we're gonna add some, some links, the links that I'd be able to find, Enrique, if you can send

[00:23:29] NA: Yeah, I'll, I'll send you all the research papers.

[00:23:32] Alex Volkov: Yeah, you want to lend one last thing before we move on? Yeah, go ahead.

[00:23:36] NA: Yeah, So, just the last thing on January 13th is there was this paper called Extending LLM's Context Window with only a hundred samples and they were essentially able to show that even in a very limited amount of long context samples, you're able to massively improve the context lengths of these models. I should mention these are the papers that I found did pretty rigorous evaluation overall, because a lot of them, there's a huge problem in long context evaluation. But I feel these authors generally applied their knowledge pretty well, and these results are really impactful. so, even for the open source community, because you don't need a lot of computational power to be able to scale these context windows massively now.

[00:24:24] NA: And

[00:24:24] NA: that's basically everything I wanted to

[00:24:26] NA: say.

[00:24:27] Alex Volkov: Thank you, Enrico. Thank you, folks. Folks, definitely give Enrico a follow. And we have quite a few conversations with Enrico. If somebody in the open source community knows about Long Contacts, Enrico is that guy. And we're definitely going to follow up with the links in the show notes for a bunch of this research.

[00:24:41] Alex Volkov: And I think just to sum up, Enrico There have been breakthroughs, and it doesn't look like Google is the only folks who come up today. Nat Nat Friedman and Daniel Gross, the guys who have AI grant, they have the Vesuvius Challenge recently, and invest in everything AI possibly. They just announced an investment in magic, that they have a hundred million dollars investment, [00:25:00] quote unquote.

[00:25:00] Alex Volkov: We were so impressed with these guys when we decided to give them a hundred million dollars from Nat Friedman, and they also talk about the model that does. Something like 10 million context windows. Swyx, you wanna, you wanna talk about the magic thing?

[00:25:12] Swyx: They first talked about this last year, like six months ago, and then went completely silent. So we didn't really know what was going on with them. So it's good to see that this is at least real because six months ago they were talking about 5 million token context model.

[00:25:28] Swyx: But no, nothing was demoed. Not even like a little teaser graphic or anything like that. But for Nat to have invested in this amount, I think it's a huge vote of confidence. And it basically promises that you can do proper codebase embedding and reasoning over an entire codebase. Which, it's funny to have a code model that specially does this, because Gemini could also potentially do this.

[00:25:58] Alex Volkov: They showed in their examples 3JS. Did you see this?

[00:26:01] Swyx: No, I didn't see the 3JS, but okay, yeah. And we have a pretty consistent result from what we've seen so far that GPT 4 is simultaneously the best LLM, but also the best code model. There's a lot of open source code models, CodeLlama, DeepSeaCoder, all these things.

[00:26:18] Swyx: They're not as good as GPT So I think there's a general intelligence lesson to be learned here. That it remains to be seen because we, Magic did not release any other details today. Whether or not it can actually do better than just a general purpose Gemini.

[00:26:34] Alex Volkov: Yeah, and so the example that they showed is actually they took 3JS, if you folks know the 3JS library from Mr.

[00:26:40] Alex Volkov: Doob and they, embedded all of this in the context window and then asked questions and it was able to understand all of it Including, finding incredibly huge codebase. And I think I want to just move this conversation.

[00:26:52] Alex Volkov: Yeah, Nisten, go ahead. I see you, I see you unmuting. And folks on the stage, feel free to raise your hands if if you want to chime in. We'll hopefully get to some of you, but we have a bunch of stuff to chat about as well.

[00:27:01] Nisten Tahiraj: I'll just quickly say that there are still some drawbacks to these systems. And by systems the long context models where you dump in a whole code base or entire components in. And the drawbacks, even from the demos, still seem to be that. Yes, now they do look like they're much better at reading and intaking the information, but they're not yet much better at outputting similar length output, so they're still gonna only output, I think, up to 8, 000 tokens or so, and I don't know if that's that's a byproduct of of the training, or they could be trained to re output much longer, much longer context.

[00:27:43] Nisten Tahiraj: However, the benefit now is that unlike Retrieval augmentation system, unlike a RAG the, the drawback with a RAG was that yes, it could search over the document, but it would only find maybe two or three or a couple of points and bring them up. Whereas this one is more holistic understanding of the, of the entire input that you've dumped in.

[00:28:03] Nisten Tahiraj: But again, we're not quite there yet where they can just output a whole textbook. That's, that's what I mean. So that's the thing. That's the next challenge

[00:28:11] Far El: to solve.

[00:28:12] Alex Volkov: So I think, I think the, the immediate reaction that I had is very similar to what you had, Nisten. RAG is something everybody uses right now. And we've talked about long context versus, versus something like a RAG before, and the usual conversation we have is usually about cost. How much does it cost you pair these tokens, right?

[00:28:30] Alex Volkov: If you send 10 million tokens and each token is like a cent, you're basically paying 10 million cents for every back and forth. Also speed and, and user experience. If your users are sitting there and waiting for 45, 60 seconds because they sent a bunch of contacts, if you can solve this with RAG, then RAG is probably a better approach for you.

[00:28:48] Alex Volkov: However, however this specifically looks like. At least from the examples that the Google did, they showed the video transparently, they sped up the inference, but I saw something where with at least the video question, it took them around 40 seconds. to extract a frame of a video of an hour. They sent an hour worth of context of a video within this thing, and it took them 40 seconds for this inference.

[00:29:13] Alex Volkov: Folks, like I said before, and I'm going to say this again, regular ChatGPT, not even crazy context, queries took me sometimes 40 seconds. Now, you may say, okay, Alex they show the demo of their environment, and ChatGPT is in production environment. Yes, but the possibility is, if I can send I don't know, 500, 000 tokens in the context window, and then within 40 seconds get a response which is equivalent to what I get from GPT 4.

[00:29:38] Alex Volkov: Then I think that a bunch of the conversation about RAG being better just from a speed of inference perspective are slowing down. An additional thing I want to say before I get to you, Yam, just a second the immediate response in my head was, okay, RAG is done for, or at least not done for, but definitely the kind of the crown on RAG's head.

[00:29:56] Alex Volkov: Everybody's talking about RAG. There's vector databases everywhere. We just had folks talk about Colbert and different things. RAG is, okay, RAG is now shaky. But the other thing I started to think is, is fine tuning. also under risk. And Swyx, I think this goes back to what you just said about like the general models versus the maybe the Finetune or very specific models, because if a general model can take a whole book, and they had an example about this where there was a very low resource language, Kalamathi, Kalabathi, something like this, and there's only one book that's a dictionary for this language, they literally threw the book in the context window, and the model was able to, from context learning, to generalize and understand this and perform better than fine tuned models.

[00:30:37] Alex Volkov: And I'm thinking here okay, rag is the first thing to go. Is fine tuned second? Are we going to stop fine tuning and sending contexts? So Swyx, I want to hear your reaction about, about the language thing and then we're going to get to Yam and then we're going to ask some more folks.

[00:30:48] Discussion about effects of longer context windows

[00:30:48] Swyx: Yeah, I think there's generalizable insights about learning about language. And it's not surprising that throwing that into the context window works, especially if it's a cognate language of something that it already knows. So then you're just learning substitutions, and don't forget that transformers are initially trained to do language translation, like this is like bread and butter stuff for transformers.

[00:31:12] Swyx: The second thing I would respond to is, I have to keep saying and banging this drum, long context does not kill RAG because of cost. Imagine if every time you throw 10 million tokens of context in there, you have to pay like a thousand dollars. Because unless something fundamentally is very, very different about this paradigm, you still pay to ingest those tokens of cost.

[00:31:39] Swyx: So ultimately, people want to still reg for cost and then for attribution reasons, like debuggability attribution, which is something that's still valuable. So I think long context is something that I have historically quite underweighted for this reasons. I'm looking to change those assumptions, of course, because obviously this is magical capabilities if you can use

[00:32:03] Alex Volkov: this is magical capabilities if you can use

[00:32:10] Far El: Yeah, I just want to say on the topic of of latency and ingesting a lot of context. I think that there is a solution that we didn't talk about it here and will be something that is going to be incorporated in all the flagship models, which is embedding embedding knowledge into the KB cache, which is something that many of the inference engines today can do.

[00:32:34] Far El: And you simply just prefix the context beforehand, and then you don't need to process it through your model. So you're not sending the whole database each time you are calling your model. It's just saved. Imagine that OpenAI have some sort of API that you embed. The KD cache beforehand, and it's reduced price, of course, and then it uses that as, as your context.

[00:32:59] Far El: Basically, somewhere in the middle between the two. And the reason that it's not supported now in flagship models, because the first flagship model that supports a million tokens came out today. But I think that if we see this this, if we go there, this is something that we're going to see in all of the APIs.

[00:33:18] Far El: Moreover, I also don't [00:33:20] think that RUG is done for it because RUG is explaining to you very, very clearly and very simply. Where the information is coming from, what the model is basing itself on. You can claim that the model with the attention you can do it as well, but it's not like RUG. RUG, you're just showing the clients, the people, exactly where it comes from.

[00:33:40] Far El: And there are use cases where this is absolutely a must. So I think that there will always be room for RUG for these specific use

[00:33:49] NA: cases and long

[00:33:50] Far El: context. With KVCaching is going to be, I think, I think the methods for embedding, for example, a full database, or a book, or something big, and using it multiple times, with many different

[00:34:05] Far El: prompts.

[00:34:06] Alex Volkov: Or also multimodality, right? So thank you for this. Definitely, definitely makes sense. And I think somebody in the comment also left a similar comment as well. So we want to dive into the KVCache stuff maybe in the next one. But I want to talk about the multimodality part of this because, um We've, we've multiple times mentioned.

[00:34:25] Alex Volkov: I think we did this every Thursday. I sense GPT 4 launched because we were waiting for the vision part of GPT 4. And we've talked about 2024 being the year of multimodal. And we're going to have to talk about a bunch of multimodal stuff today, specifically with the RECA folks and the RECA flash, which understands videos.

[00:34:40] Alex Volkov: They basically, so I'm going to have to see whether RECA understands videos better than Gemini, but the Gemini folks talked about there's a specifically. A bunch of multi model effect on the context window where if you send videos, you, at least the way they did this was just frames. They broke down this movie to a bunch of 500, 000 frames or so and just sent it in context window.

[00:35:04] Alex Volkov: And they basically said we have all this video in the context window and then we have a little bit of text. And I think context window expansions like this will just allow for incredibly multi modal use cases, not only video, audio, they talked about, we've talked about previously with the folks from

[00:35:20] Alex Volkov: Prophetic about different fMRI and EEG signals that they're getting like multi modal like applications as well and Context window enlargement for these things, Google specifically highlighted.

[00:35:32] Alex Volkov: And I want to highlight this as well because it's definitely coming. I'm waiting for being able to live stream video, for example. And I know some folks from like 12 Labs are talking about almost live live stream embedding. So definitely multimodal from Google. I think, folks, we've been at this for 30 minutes.

[00:35:48] Andrej Karpathy leaves OpenAI

[00:35:48] Alex Volkov: Alright, so folks, I think we're going to move on and talk about the next kind of a couple of stuff that we've already covered to an extent, but there's some news from OpenAI, specifically around Andrej Karpathy leaving, and this was announced, I think broke in the information, and Karpathy, some folks here call them senpai, Karpathy is a very Very legit, I don't know, top 10, top 5, whatever, researchers, and could potentially have been listening to the space that we had with LDJ after he left, or, yeah, I think it says, it was clear that he left it was the information kind of announcement didn't have a bunch of stuff, but then Andrei just As, as a transparent dude himself, he came and said, hey, this wasn't the reaction to anything specific that happened because speculations were flying.

[00:36:33] Alex Volkov: And I think at least, at least to some extent, we were in charge of some of these speculations because we did a whole space about this that he could have just listened to. But as speculation was flying, maybe this was ILLIA related, maybe this was open source related, like all of these things.

[00:36:46] Alex Volkov: Andre basically Helped start OpenAI, then left and helped kickstart the Tesla Autopilot program, scaled that to 1500, then left. On the chat with Lex Friedman, Andrei said that Basically, he wanted to go back to hands on coding, and in OpenAI, his bio at least said that he's working on a kind of Jarvis within OpenAI, and definitely Andrei has been also talking about the AI as an OS, Swyx, you wanna, you wanna cover like his OS approach?

[00:37:14] Alex Volkov: I think you talked about this. He had a whole outline, I think you

[00:37:17] Swyx: also

[00:37:17] Swyx: talked about this. LLM OS.

[00:37:18] Swyx: Yeah. He wasn't working on it so much as thinking about it.

[00:37:21] Swyx: Thinking about it,

[00:37:21] Swyx: yeah. And maybe now that he's independent, he might think about it. The main thing I will offer as actual alpha rather than speculation is I did speak to friends at OpenAI who reassured us that it really was nothing negative at OpenAI when he left.

[00:37:40] Swyx: Apparently because they spoke to him before he left.

[00:37:43] Swyx: So yeah, he's for the way I described it is he's following his own internal North Star and every time he does that the rest of us

[00:37:51] Alex Volkov: And definitely the rest of us win.

[00:37:53] Alex Volkov: the open source community is hoping, or I've seen many, many multiple things that say, hey, Andre will unite like the, the, the bands of open source, the different bands of open source.

[00:38:02] Alex Volkov: Andre posted this thing. on his ex, where like his calendar was just free, which shows maybe part of the rationale why he left, because meetings and meetings and meetings and everything and now he can actually work. So shout out to Andrej Karpathy for all he did in OpenAI and for all he's going to continue to do.

[00:38:16] Alex Volkov: We're going to definitely keep up to date with the stuff that he releases. Andrej, if you're listening to this, you're more than welcome to join. We're here on every Thursday. You don't have to have a calendar meeting for this. You can hop on the space and just join. Also on the topic of OpenAI, they've added memory to ChatGPT, which is super cool.

[00:38:31] Alex Volkov: They released a teaser, this, I didn't get into the beta, so they released it to a limited amount of people. They added memory to ChatGPT, and memory is very, very cool, the way they added this as well. So I've said for a long time that 2024 is not only about multimodality, that's obviously going to come, but also it's about time we have personalization.

[00:38:51] Alex Volkov: I'm getting tired of opening a ChatGPT. Chat, and have to remember to say the same things on, it doesn't remember the stuff that previously said. The folks in OpenAI are working on the differentiator, the moat, and different other things, especially now where Google is coming after them with the 10 million context window tokens.

[00:39:08] Alex Volkov: And, they're now adding memory, where ChatGPT itself, like the model, will manage memory for you, and will try to figure out, oh, OpenAI, oh my god, breaking news. OpenAI just shared something. As I'm talking about them, you guys want to see this? Literally, I got a

[00:39:28] Alex Volkov: notification from OpenAI as I'm talking about this.

[00:39:30] Swyx: What?

[00:39:32] Alex Volkov: Let's look at this. I, dude, I needed my, my breaking news button today. Opening, I said, introducing Sora, our text to video model. Sora can create videos for up to 60 seconds.

[00:39:44] Alex Volkov: Holy shit, this looks incredible. Oh my god, somebody please pin this to the, to the, Nisten, you have to see, there's a video, 60 second video, folks.

[00:39:54] Alex Volkov: Like, all of the, oh my god, breaking, I have to put the breaking news button here, holy shit. So folks, just to describe what I'm seeing, cause somebody please pin this to the top of the space every video model we had so far, every video model that we had so far does 3 to 4 seconds, Pica the other labs, I forgot their name now, Runway, all of these models,

[00:40:16] Swyx: they

[00:40:16] Swyx: do

[00:40:16] Swyx: Oh my god, Runway.

[00:40:18] Alex Volkov: They

[00:40:18] Alex Volkov: do three to five seconds and it looks like wonky, this thing just that they show generates a 60 second featuring highly detailed scenes and the video that they've shared, I'm going to repost and somebody already put it up on space has folks walking hand in hand throughout a There's a zoomed in, like behind the scenes camera zooming in.

[00:40:39] Alex Volkov: There's a couple Consistent I cannot believe this is January. Holy shit The consistency is crazy. Nothing changes. You know how like previously video would jump frames and faces and things would shift

[00:40:52] Alex Volkov: Wow, okay, so I guess we should probably talk about this. Reactions from folks. I saw LDJ wanted to come up to see the reaction I'm

[00:41:00] Far El: just wild. Honestly, it looks crazy. It looks really good quality. Better than most text to video models that I've seen.

[00:41:08] Alex Volkov: Holy shit okay, so I'm scrolling through the page, folks,

[00:41:13] Alex Volkov: those who are listening, openai. com slash Sora, Sora is their like text to video I'm seeing a video of a model walking through like a Japan street, whatever, the prompt is, a stylish woman walks down a Tokyo street filled with warm glowing neon animated city signage, she wears a black leather jacket, long red dress, and black boots, and the consistency here is insane.

[00:41:35] Alex Volkov: I do

[00:41:35] Far El: out the mammoths. Or actually go on their websites. On the Sora, [00:41:40] on OpenAI's website. They've got a

[00:41:42] Far El: few examples. It's crazy. It's crazy. I've

[00:41:45] Far El: never seen a

[00:41:48] Alex Volkov: the if you showed me this yesterday, Far El, if you showed me this yesterday and said this is generated, I would not believe you. So what happens is, now the same video of this woman walking, they have a video camera zooming in, into her eyeglasses, her face stays the same, the same consistency, you can see reflection in the, in the sunglasses.

[00:42:08] Far El: Alex, you have to go on the website. There's like this video of, oh like literally the prop is reflections in the window of a train traveling through the Tokyo suburbs. And

[00:42:19] Far El: honestly, it looks, it looks like someone captured this no way this is AI

[00:42:23] Far El: generated. It's, it's crazy

[00:42:27] Alex Volkov: Wow,

[00:42:27] Alex Volkov: folks. What's the availability of this? Let's, let's see, what do we know? So we know safety. We'll be taking several important safety steps ahead of making SORA available on OpenAI's products, so it's not available yet. Working with Red Teamers, they don't want this to be used in deepfakes for porn, obviously.

[00:42:43] Alex Volkov: That's like the first thing that the waifus are going to use it for. The C2PA metadata that, if you guys remember, we've talked about that they started including in DALI, they're going to probably include this as well. And new techniques prepared for deployment, leveraging the existing safety methods.

[00:42:56] Alex Volkov: Okay research techniques.

[00:42:58] Far El: Crazy.

[00:43:00] Alex Volkov: Consistency is crazy, right folks?

[00:43:02] Swyx: Yeah, it's not available it looks like.

[00:43:03] Swyx: Not available

[00:43:04] Swyx: yet.

[00:43:04] Swyx: To answer your question. They released some details about it being a diffusion model. They also talked about it having links to DALI 3 in the sense that Honestly, I don't know if people know that there was a DALI 3 paper, which is very, very rare in this age of Not close.

[00:43:22] Swyx: Not open ai.

[00:43:23] Alex Volkov: Yeah, not

[00:43:24] Swyx: open AI.

[00:43:24] Swyx: And so they doing this like synthetic data captioning thing for the DO three model and they're referencing the same method for soa. I would just go read the Dolly three paper

[00:43:37] Alex Volkov: Wow. I, I, the consistency has been the biggest kind of problem with these LDJ.

[00:43:41] Alex Volkov: Go ahead, please. As I'm reading this and reacting and, and my mind is literally blown the demo of the doggy. Hold on nj one second. There's a demo. There's a video of the dog, like walking from one window and jumping to another window and the pause, they look like it's a video, like folks like literally does not look like generated, like anything we've seen before.

[00:44:02] Far El: This, is going to disrupt Hollywood immediately we're talking about, text to video disrupting media content creation and so on this is it, this is like the mid journey moment of, of text to video that same feeling that we had when we were able to crop mid journey and get some really high quality images this is the same but for video, essentially.

[00:44:23] Alex Volkov: This, this breaks reality for me right now. Literally I'm watching this video multiple times. I cannot believe that the dog's paws are not shaping in different shapes. The spots on this Dalmatian dog stay in the same place throughout the video. It, it don't make sense. Alright, LDJ, go. I think, I think,

[00:44:37] Far El: Yeah so

[00:44:38] Far El: Sam here, I'll post it on the, on the ding board. Sam said that that certain select creators have access now. And, oh, I just lost the tweet. I'll, I'll get it. But yeah, he says that some creators already have access and I guess they're going to slowly expand it out to like beta users or whatever.

[00:44:59] Alex Volkov: Wow, so Sam asked for some we can show you what Sora can do. Please reply with captions for videos you'd like to see and we'll start making some.

[00:45:06] Alex Volkov: So

[00:45:06] Swyx: Oh yeah, basically give him some really complicated prompt, and let's, let's go, let's go.

[00:45:12] Alex Volkov: A bunch of podcasters sitting, watching Sora and reacting in real time and their heads are blown.

[00:45:17] Alex Volkov: Not literally, because this is insane. How's that for a prompt? I'm gonna post it. Hopefully some will get it.

[00:45:25] NA: Just opening a portal through Twitter, through OpenAI to the Munich and then string

[00:45:31] Alex Volkov: Oh, there's, there's also, I don't wanna spend the rest of Thursday. 'cause we still have a bunch of talk about folks.

[00:45:38] Alex Volkov: Is anybody not scrolling through examples right now? And you definitely should. There's an example of a

[00:45:43] Swyx: there's only nine examples.

[00:45:45] Alex Volkov: What, what

[00:45:45] Far El: This is insane.

[00:45:46] Alex Volkov: The whole, no website has a bunch of, scroll down.

[00:45:48] Alex Volkov: There's like every, every kind of example has

[00:45:51] Alex Volkov: more scrollies. So I'm looking at an example of a chameleon, which, has a bunch of spots and has guys, the spots are in the same place. What the fuck? It doesn't move. it does not look like honestly, let's do this. Everybody send this to your mom and say, Hey mom, is this AI generator?

[00:46:07] Alex Volkov: Or not? Like older folks will not believe this shit, like

[00:46:10] Swyx: I, I will

[00:46:13] Far El: What's the most impressive

[00:46:14] Swyx: compare this to Google

[00:46:15] Far El: right? Like humans,

[00:46:17] Swyx: don't know, I think you guys

[00:46:18] Alex Volkov: hold on. Pharrell, I think, I think we're talking over each other. Give us a one sec. Swix and then Farrell.

[00:46:22] Swyx: Oh, sorry, yeah, there's a bit of a lag. Oh, no, nothing. Just compare this to Google Lumiere where they release a bunch of sample videos as well.

[00:46:29] Swyx: But you could, the, the, I was impressed by the consistency of the Lumiere demo videos. They would, they demoed sort of pouring syrup onto a pancake and then infilling the syrup and showing that, it would be pretty realistic in pouring all that syrup stuff. Didn't really see that kind of very technical test here.

[00:46:49] Swyx: But the resolution of these videos and the consistency of some of these movements between frames, and the ability to cut from scene to scene is way better. Instantly way better. I was thinking that Lumiere was, like, state of the art a few weeks ago, and now it is completely replaced by Sora.

[00:47:08] Swyx: This is a way better demo. I think OpenAI is showing Google how to ship.

[00:47:12] Alex Volkov: eye. Decided to say, you know what, Google, you think you can one up us with the context window?

[00:47:18] Alex Volkov: We got another thing coming, because I've

[00:47:20] Swyx: just pull up the Lumiere page, and then pull up the Sora page, and just look at them side by side, and you can see how much better they

[00:47:26] Alex Volkov: Lumiere

[00:47:26] Alex Volkov: was mind blowing as well. Go ahead, Far El. Go ahead, because we're still reacting in real time to this whole ridiculously impressive.

[00:47:32] Far El: Yeah, I was just saying that the the most impressive thing are, is like how alive these video shots feel, right? Humans talking action scenes like, all the text to video models that I've seen so far and I've used were very very simplistic, right? It felt like more like you're animating an image to do very minor movements.

[00:47:55] Far El: It wasn't actually alive in any way, but Sora's text to videos is, is nuts, the quality, the consistency, the action, like the actual action of the characters. I wonder how much like granular control do you have on a scene to scene basis. I know that Google released like a paper I think a few months back where they had a basically like a script that allowed the, like for much more long form.

[00:48:27] Far El: video content, but I'm not sure if that's the case here. It's just, it's just really impressive. It's, it's really impressive.

[00:48:35] Alex Volkov: I want to say one of our friends, LaChanze, just sent, at the bottom of the page, it says, Sora serves as a foundation model that can understand and simulate the real world. I can it's really hard for me to even internalize what I'm reading right now, because the simulation of the real world, it triggers something in me, tingles the simulation hypothesis type of thing, and this can regenerate the map of the world and then zoom in and then generate all the videos.

[00:48:58] Alex Volkov: And I'm wearing this Mixed, slash, augmented, slash, spatial reality headset that just generates and this happens on the fly, and what am I actually watching here? So this says Sura serves as a foundation for models that can understand and simulate the real world, a capability we believe will be an important milestone for achieving AGI.

[00:49:15] Alex Volkov: Yeah. Alright, folks. I will say, let's do two more minutes, cause this is I can't believe we got both of them the same day today, holy shit, we got 10 million contacts window from Google announcement, which is incredible, multi modal as well, I like, my whole thing itches right now to take the videos that OpenAI generated and shove them into, into a Gemini to understand what it sees and see if if it understands, it probably will.

[00:49:40] Alex Volkov: Wow.

[00:49:40] Far El: Thing that would make this Thursday a tiny bit even more awesome is if Meta comes out with telemetry. Too much, too much, too much.

[00:49:51] Alex Volkov: It's

[00:49:51] Alex Volkov: gonna be too much. We need, we need a second to like breathe. Yeah, definitely folks. This is a Literally like singular day. Again, we've [00:50:00] had a few of those. We had one on March 14th when ThursdAI started, OpenAI released GPT 4, Entropic released Cloud, I think on the same day. We had another one when OpenAI Dev Day came about, and I think there's a bunch of other stuff.

[00:50:12] Alex Volkov: I consider this to be another monumental day. We got Gemini 1. 5 with a potential 10 million context window, including incredible results in understanding multimodality in video, up to an hour of video. And then we also have some folks from RECA that's gonna come up soon and talk about their stuff, which is, they just with all due respect with RECA folks this news seems bigger, but they still launched something super, super cool we're gonna chat about, and now we're getting, it's just, the distance, we're used to jumps, we're used to state of the art every week, we're used to this, we're used to this model beats this model by Finetune, whatever, we're used to the OpenAI leaderboard, this is

[00:50:53] Alex Volkov: such a

[00:50:53] Alex Volkov: big jump on top of everything we saw.

[00:50:55] Alex Volkov: From Stable Visual Diffusion. From what are they called again? I just said their name, Runway. I forgot their always forget their name.

[00:51:02] Swyx: Poor guys.

[00:51:04] Alex Volkov: Poor Runway. From Pica Labs. From folks who are generating videos. This is just such a huge jump in capability. They're talking about 60 seconds.

[00:51:14] Alex Volkov: Oh, Meta just announced JEPA. Yeah, I don't know if JEPA is enough. People are commenting about JEPA, and I'm like, okay wait, hold

[00:51:21] Swyx: You, you spiked my heart rate when you said Meta just announced. I was like, what the fuck?

[00:51:25] Alex Volkov: the fuck? Meta literally just came out with an announcement, VJEPA, supervised learning for videos.

[00:51:29] Alex Volkov: But, folks unless they come out with Lama 3 and it's multimodal and it's available right now, not Meta is not participating in the

[00:51:35] Swyx: thing

[00:51:36] Alex Volkov: day

[00:51:36] Far El: Oh wait, this is actually cool. So this is this is something,

[00:51:39] Far El: actually a paper they came out with like about a month ago, but this is for video understanding. So this is pretty much like for input of video, while OpenAI's model is for output of video.

[00:51:51] Alex Volkov: It just, I will say it's a research thing, right? So they're not showing anything there, unless I'm mistaken. Um So, I kinda, so I still have a bunch of stuff to give you updates for, and I still have a bunch of interviews as well, there's a new stability model, but I'm still like, blown away, and I just wanna sit here and watch the videos,

[00:52:07] Alex Volkov: Is this what Ilya saw? Yeah, somebody reacted like, what did Ilya see? Did Ilya see a generated video and the model understanding this and that's why, that's why?

[00:52:16] Far El: No, I think, I think, I think AGI has been achieved internally at

[00:52:21] Far El: this rate.

[00:52:22] Alex Volkov: Wow. I, I'm, I'm still blown away. Like I, if a model can generate this level of detail in very soon, I just wanna play with this. I wish, I wish we had some time to, to, to, I, I was one of the artists and I hope that somebody in the audience here is, and that they will come to talk about this on Thursday.

[00:52:43] Alex Volkov: I and because I'm, yeah. I'm still mind blown. So I see. Quite a few folks that I invited that I wanna, I wanna welcome to the stage. VJEP understands the world while Sora generates one. That's the comment that some folks led. And okay, okay. VJEP is going to be something we definitely cover because Meta released this and Meta are the GOATs, even though yeah, no, Meta's definitely GOATs. I'm just a little bit lost for words right now.

[00:53:06] Nisten Tahiraj: Yeah, so if people have watched a lot of speeches from Yann LeCun is the, the main idea is that these AI models are not very good at understanding the world around them or thinking in 3D. So in some ways, you could reason out that A cat is a lot more intelligent even if it was blind and it couldn't smell, it could still figure out where to go and find its letterbox stuff like that.

[00:53:30] Nisten Tahiraj: This is one part that's missing from the world model that they get purely just from word relationships or word vectors. And so this is a step in that direction, it seems. Again, I haven't read the paper, so I'm Half making stuff up here but it feels like this is a step in, in that direction towards AI models that understand what's going on like us and animals do.

[00:53:56] Nisten Tahiraj: So that, that's the main, the gist of it for, the audience.

[00:54:04] Alex Volkov: Oh, what, what a what A Thursday. What A Thursday. I gotta wonder how am I'm gonna summarize this, all of this. And I just wanna invite, we have here in the audience and I sent you a request to join. If you didn't get it. Make sure that you're looking at requests and then accept. And then we should have, we should have Max as well at some point.

[00:54:20] Alex Volkov: Lemme text Max. 'cause we have guest speakers here from, from Breca that we wanna chat with. Meanwhile I'm gonna continue and, and move forward in some of the conversations. Let's roll back. Okay, while we're still super excited and I can't wait for this to come out, this is an announcement that they did.

[00:54:35] Alex Volkov: It's very polished. We haven't seen we didn't see any access or anything about when it's going to come out. I do feel that this is a breakthrough moment. from Google and from OpenAI. And it does look like it's reactionary to an extent. The folks in OpenAI were sitting on this and saying, Hey, what's a good time to release this?

[00:54:52] Alex Volkov: And, actually now, to let's steal some thunder from Google and they're like 10 million thing that also not many people can use. And let's show whatever we have that not many people can use which, which is an interesting. Think, to think about, because, again, the pressure is on a bunch of other labs, on Meta, to release something, we know Lama3 is coming at some point, will it be multi modal, will it be able to generate some stuff every

[00:55:16] NA: Really, really quick, sorry to interrupt

[00:55:18] Alex Volkov: Go

[00:55:19] NA: the thing about VJEBA seems to be good at is understanding video instructions I guess you could point the camera to something you're doing with your hands and arts and crafts things, or repairing something, and it understands what you're doing, so that, that's actually very easy.

[00:55:36] NA: Powerful for what data sets data sets of skills that will come, because then you can generate actions. I, I think that, that will apply a lot to robotics, what they're doing.

[00:55:48] Alex Volkov: Oh, alright, yeah. And they also have the Ego4D datasets of robotics as well, and they've talked about this.

[00:55:55] Nvidia relases chat with RTX

[00:55:55] Alex Volkov: so let's go to open source like super quick. NVIDIA released a chat with RTX for local models. And it's actually like very, very cool. So a few things about the chat with RTX. First of all, NVIDIA packed a few, a few models for you. It's 38 gigabytes or something download. And they, they have they have quite a few I think they have two models packed in there.

[00:56:16] Alex Volkov: I wasn't sure which ones. And this, this is basically a, a package you download. I don't know if a doc or not. That runs on any desktop PC with RTX 30 or 40 series with at least 8 gigabytes of RAM. And it gives you a chatbot that's fully local. And we love talking about open source and local stuff as well.

[00:56:33] Alex Volkov: And it Not only that, they give you a rag built in. So you can actually run this on some of the documents that you have. They also have something that runs through a YouTube. You can give it like a YouTube playlist or a video link, and it will it will have you talk to YouTube video. So it has built in rag, built in Tensor rt, LLM, which runs on their, on their stuff RTX acceleration and.

[00:56:56] Alex Volkov: I think it's pretty cool, like it works only on the very specific types of devices, only for like gamers or folks who run these things but I think it's pretty cool that that folks are, that NVIDIA is releasing this. They also have something for developers as well to be able to build on top of this.

[00:57:11] Alex Volkov: And I think the last thing I'll say about this is that it's a Gradio interface, which is really funny to me that people are shipping Gradio interfaces on production. It's super cool.

[00:57:18] Cohere releases Aya 101 12.8B LLM with 101 language understanding

[00:57:18] Alex Volkov: Cohere releases an open source called AYA 101, a model that's like 12. 8 billion parameters model with understanding of multilingual 101 languages from Cohere. It's, it's honestly pretty cool because Cohere has been done doing a bunch of stuff. AYA outperforms the Bloom's model and MT0 on wide, a variety of automatic evaluations despite covering double the number of languages.

[00:57:41] Alex Volkov: And what's interesting as well, they released a dataset together with AYA and then what is interesting here? Yeah, just, oh, Apache 2 license, which is super cool as well. Apache 2 license for, for this model. Let me invite Yi as a co host, maybe this can, join. Far El, go ahead.

[00:57:58] Alex Volkov: Did you see, do you want to talk about Yi Aya?

[00:58:00] Far El: Yeah first off, I I appreciate and commend Cohere to building a multilingual open source data set and so on. That's awesome. We need more of that. But unfortunately, With the first few questions that I asked in Arabic specifically most of the answers were complete. [00:58:20] nonsense on their train model.

[00:58:23] Far El: Yeah. And to, to the point that it's it's laughable, right? For instance in Arabic, I asked who was the who was the first nation that

[00:58:32] NA: had astronauts on the moon. I

[00:58:38] Alex Volkov: Yes.

[00:58:39] NA: think, I think you cut out for a sec.

[00:58:43] Alex Volkov: I think he dropped. I don't see him anymore.

[00:58:45] NA: He might have

[00:58:46] NA: His phone might have

[00:58:47] Alex Volkov: yeah, we're gonna have to

[00:58:48] NA: I can briefly

[00:58:50] NA: comment on it. Yeah, we're pretty happy now that also Kahira has started contributing,

[00:58:56] NA: To open source because datasets are very important. And yeah, I think the reason it wasn't performing so well In other languages, it's just because some languages do not have there wasn't enough data in that for it to be, to be trained.

[00:59:12] NA: But the beautiful thing is that it is Apache 2. 0. You can just add your own languages data set and it will. Literally, make the whole thing better. And yeah, that's, those are my comments on it.

[00:59:22] Interview with Yi Tay and Max Baine from Reka AI

[00:59:22] Alex Volkov: Awesome. All right, folks. So now we're moving into the interview stage, and we have quite a few folks. As one of the most favorite things that I want to do in ThursdAI, and it's been an hour since we've been here, is to actually talk with the folks who released the stuff that we're talking about.

[00:59:35] Alex Volkov: So the next thing I'm going to announce, and then we're going to talk with Yitei and Max, and then after that, we're going to talk with Dom as well. Earlier this week, a company named Reka AI released two models, or at least released a demo of two models, right? I don't think API is still available.

[00:59:51] Alex Volkov: We're going to talk about this as well. Called Reka Flash and Reka Edge. And Reka Flash and Reka Edge are both multimodal models that understand text, understand video, understand audio as well, which is like very surprising to me as well. And I had a thread where I just geeked out and my head was blown to the level of understanding of multimodality.

[01:00:09] Alex Volkov: And I think some of the folks here had, had had talked about Sorry, let me reset. Some of the folks here on stage have worked on these multi models models. And so with this I want to introduce Yi Tei and Max Bain. Please feel free to unmute and introduce yourself briefly and then we're going to talk about some record stuff.

[01:00:25] Alex Volkov: Yi first maybe and then Max.

[01:00:27] Yi Tay: Yeah, thanks thanks Alex for inviting me here. Can people hear me actually?

[01:00:31] Alex Volkov: Yeah, we can hear you

[01:00:32] Yi Tay: okay, great, great. Because this is the first, hey this is the first time using space, so yeah, try to figure out how to use it. But thanks for the invite, alex, and so I'll just introduce myself. I'm Yi Teh, and I'm one of the co founders of RectorAI.

[01:00:45] Yi Tay: We're like a new startup in the LMS space. We train multi modal models. Previously I worked at Google Brain working on Flan stuff like that. So yeah, that's just a short introduction about myself. And maybe Max, do you want to introduce yourself? Yeah,

[01:00:59] Alex Volkov: Yeah, Max, go ahead, please.

[01:01:00] Max Bain: thanks Ian. Yeah.

[01:01:01] Max Bain: Thanks Alex for having me. So yeah, as you said yeah, I'm part of Wrecker. So I joined more recently, like six months ago. I just finished my PhD and that was all my video, audio, speech understanding. I've done a bit of work in open source. So if you use WhisperX that was like something I worked on and yeah, now working more on part of Wrecker and really enjoying it.

[01:01:22] Max Bain: yeah, that's pretty much

[01:01:23] Alex Volkov: First of all, let me just say, thank you for WhisperX, I did use this, and it was awesome, and I think this is how we connected before or at least, to some extent, I think this is the reason maybe I follow you, I was really surprised that you were Reka. Let's talk about the models that you guys just released, and because Very impressive on the multimodality part, but also very impressive on just the regular comparative benchmark, and I think you guys released the comparisons to just regular MMLU scores, so Wreck A Flash gets 73.

[01:01:52] Alex Volkov: 5 on MMLU and 65 on Human EVAL, and GPT 4 is at 67, at least, and Gemini Ultra, they claim is 74, but your guy's model is like significantly smaller. What can you tell us about, and I know you said before there's like a bunch of stuff that you won't be able to talk about what can you tell us about the performance just on the textual kind of comparison, even though this is a multimodal model and there's a bunch more that we will talk about?

[01:02:17] Yi Tay: Yeah, thanks so I'll just I can't really say that much, but I can say that there's quite a lot of headroom in pre training just for language alone, and I think that we're still not near the headroom yet for pre training, and I think even for us, actually, we have a better version of RecoFlash internally right now, but we've not even published metrics for that because while we were preparing for the launch we actually have even a better model now.

[01:02:39] Yi Tay: So I think actually there's still quite a lot of headroom for pushing that and there's quite a lot of things to do in pre training but I can't really wouldn't be able to say much about? About like more details, yeah.

[01:02:48] Alex Volkov: About specifics. I did see the comments that you left in your thread, that you talked about the folks who do foundational models from scratch, they, there's a lot of banging a lot of creation they have to do in the process as well, and it looks like at least some of this amount, some of this amount of hard work you guys had to go through in order to train these foundational models.

[01:03:09] Alex Volkov: So let's talk about the multimodality, what what can this model do? And I think I have a

[01:03:15] Alex Volkov: good idea, but can you talk to us on the multimodal part? What can those models do in terms of multimodality?

[01:03:23] Max Bain: Yeah, so in terms of multimodal yeah, if you just, you can use it actually on chat. reco. ai, and I would say the image understanding's pretty good, so people have noticed, you can recognize text pretty well. Yeah, more nuanced details, which tended to be a big issue with VLMs, like they used to be quite biased or it'd hallucinate a lot.

[01:03:41] Max Bain: I think in Rekka Fafri noticed that dropped a lot. So I think kind of image understanding is, I'd say, yeah, pretty on par with Gemini Pro or a bit better. But yeah, that's up to the jury. The video understands also pretty good. We limit it to a one minute input. We do have internally like better things and like bounded by how much we can run like for free. So, yeah, I'd say yeah, overall pretty good video understanding and image. We haven't focused too much on audio right now, but that's like definitely on the, on the roadmap.

[01:04:14] Alex Volkov: I did run into the audio stuff, and I ran a few videos through the demo, and folks definitely should check out the demo. I'll add this in the show notes, and hopefully some folks will add this to the space as well. I just started uploading like short clips, and it's great to hear that you're saying, you guys are limited, you're limiting on the demo, but you can, if I'm hearing correctly, you can The model can understand longer videos as well.

[01:04:39] Alex Volkov: So I uploaded a video of a trip that I took to Hawaii and there's a submarine there and somebody was narrating in the submarine and he yelled something like, there, there, there's the submarine goes, dive, dive, dive, something like this. Very excitedly. And the model really understood this, and actually it said, the commenter said, Dive, dive, dive, like this, with a bunch of I's in it.

[01:05:00] Alex Volkov: And to me, this was like the, the holy shit moment. I uploaded this video. The narrator for this video was very excited. I did not expect the model to actually pick up on the excitement. And, It was very surprising to me because if you use something like Whisper and you just extract the audio from the, from the video, you would not get this result.

[01:05:20] Alex Volkov: You would not get like the, the excitement in this person's voice. And while we try to get max back in, could you, so could you mention stuff about audio? Do you train this specifically for audio as much as you can share, obviously. Or is it like a, a, a byproduct of, of just this model being multimodal and understanding and can listen as well?

[01:05:39] Yi Tay: Wait, so let me take a step back. Actually, thanks for sharing that example because I

[01:05:43] Yi Tay: actually had to watch your example to find that, that dive, dive, dive. I actually watched the entire video to find that, that clip. So I think it was a pretty Good clip. To be honest, it also surprised me that you found this example.

[01:05:56] Yi Tay: I, I think I was not also expecting this but I, we, we, we co trained this with many modalities. We are not sure, like, why this this specific case is like this. I think that's all I can say, but probably

[01:06:09] Yi Tay: yeah, next one

[01:06:09] Alex Volkov: I can definitely, definitely add one thing that this video wasn't for sure not in your training data set because it was a private video of mine that didn't exist on the internet before. So it wasn't like a result of this video being in a training set. Max, you rejoined. I hope you heard some of this question as well, attributed to you.

[01:06:26] Alex Volkov: Did you see this example? Did it cut you off guard as well? Do you see other examples like this that were like very, very surprising in how this model performs?

[01:06:33] Max Bain: Yeah, I saw that. I was surprised. To be honest, one thing I've noticed is that video benchmarks are quite poor. So [01:06:40] we, in the question answering datasets, we don't really get a chance to see this, especially ones that use like the speech information and things like that. So I guess really, I'm glad you like tested it a lot.

[01:06:50] Max Bain: Cause yeah, like internally we maybe haven't had a chance to I think but it's the benefit of kind of, yeah, training everything from scratch and adding all the modalities

[01:06:58] Yi Tay: and yeah

[01:06:58] Alex Volkov: That's awesome. So I also want to talk about the fact that you guys raised two models and you talked about there's a bigger one. Let's talk about the edge model. Can you talk about Are we going to be able to use this on device? I assume what's the play here? At least from what you can say, what's the play in terms of using the smaller models?

[01:07:14] Alex Volkov: Obviously, smaller models, the benefit of them is using them closer on the edge and device, and that's how you named it. What's the, what's the thinking about releasing, these two models in different sizes? And and what's your plans for those?

[01:07:26] Yi Tay: Oh yeah, sounds good. Yeah, that's a great question. So for the H model, 7B model, it's I think it's it's at a size that it's possible to run it locally, but we are thinking also along the lines of okay, it's actually Faster, like it's just for latency sensitive applications sometimes you just need certain things like this Slightly faster than the 21b model and it's also cheaper to to to host for for a lot of applications So I think that's mainly like this one of the reasons why seven.

[01:07:55] Yi Tay: We also ran lots of ablations at low smaller scale. So this, this turns out to be just the size that we have. And I, I think it's mostly, mainly for latency sensitive stuff. And then like for people who are like for businesses and stuff, like they might just choose to deploy the smaller model if they don't like, need a larger models like the.

[01:08:13] Yi Tay: Flash or the, the core model. So I think that's really like the idea behind it. And then from the research point of view, or at least from the playground point of view, right? Like the, the demo point of view is that people get to, to, to, to get a sense of the view of the model at the seven B scale and the 21 B scale, right?

[01:08:28] Yi Tay: So there's kind some kind of you might be able to, to get a sense of like how this setup looks at the different scale. I think that's mainly like why we deployed two models in the background just so that people can play with. Two variants and the stuff. Actually not much thought here.

[01:08:42] Yi Tay: I mean it's not like super complicated, it just happened this way, but yeah, that's all I can say, yeah.

[01:08:48] Alex Volkov: Awesome. And so folks can go check out the demo. It looks like you guys are set up for API keys as far as I understood. So will developers be able, be, be able to build with this? What stage are you in? I think you, you invited to a disco or something. Could you talk about how we can play with these models, what we can do, and if there's any expected open source, because we'll have open source here on ThursdAI.

[01:09:08] Alex Volkov: If there's anything to talk about there as well, please, please feel free to, to tell us how to actually try these models beyond the demo. Build with them.

[01:09:16] Yi Tay: Yeah, sounds, sounds good. So for API, actually, we, we, we have our API as a system already like working and then some people are already using it. We are like rolling out access coupling without the billing and everything, like we're just making sure everything is running very well.

[01:09:29] Yi Tay: And then we will roll it out soon. So I think that's mainly like the, the idea behind the slightly stitch. API release yeah, so that's for APIs. And then for open source, we I'll just be candid here, we are constantly, we're not sure yet about whether we want to do it or we don't want to do it.

[01:09:44] Yi Tay: It's always a question we have but we're not promising anything, but we're also not saying no yet. So it's a, it's a competition we have very regularly about about this kind of thing. So I, I, so yeah, that's currently the stance we have right now. But we are, we are

[01:09:55] Yi Tay: writing a we are writing a tech report it's not like a paper paper, but it's also not going to be that there'll, there'll be some details in the tech report, but not complete details, but some details.

[01:10:04] Yi Tay: But yeah, so I think that's mainly like the extent of like how we're thinking about things right now, yeah.

[01:10:09] Alex Volkov: Awesome. So first of all, I want to consider you guys friends of ThursdAI. Thanks for coming on the pod. And here, we definitely love open source. We talk about it all the time. And we're just like Champions of Open Source, so if you do release anything Open Source, you're welcome to come back as well. Yi and Max, we have Swyx here, I'm actually in Swyx's audience, so you can hear them from my microphone.

[01:10:29] Alex Volkov: And Swyx has a few follow up questions for Yi and Max as well, so Swyx, go ahead.

[01:10:32] Swyx: Oh, sure. Yeah. Hey I actually tried to set up a chat with you when I was in Singapore, but it didn't happen.

[01:10:39] Swyx: So sorry about that. But I actually wanted to just chat with you more about something that you hinted on your announcement post. You talked about how much of the infra you had to rebuild, you Reka. Everything, you said everything from robust training infra. Proper Human Evaluation Pipelines and Proper RLHF Setups.

[01:11:00] Swyx: I was wondering if you can just give us like a preview of What did you miss? What does Google have? And then what do you think like the industry could innovate on?

[01:11:09] Yi Tay: Okay. That's a very interesting question. I need to be, need to think about what I can say and what I cannot say. But so definitely, definitely I miss GPUs credit to GPUs and being like a, a Googler for all my. Professional life, definitely the infra was completely new to me, and then at Rekka, we have a lot of people from GTM and, and Google in Alphabet in general I think a lot of us could, I feel the same way and then, I think in terms of infra, I think GPU tooling is not as robust as at least what I experienced for TPU Infra back at, at, at Google. So I think that's mainly the first thing is the robustness of the the, the training the, the, the, the, the, the accelerators itself, right? And then also even things like FileIO is something that people take for granted. At Google, the file systems, the X Manager box and stuff orchestrators and stuff like that are, like, just so well designed at Google.

[01:12:02] Yi Tay: And then externally, it's a lot of them are just missing. So I think yeah, I, I, yeah, I think that's basically on the training infrasight and yeah, so I think, I think the tooling for like training like large models is not really super like robust externally, like you're, you're, it's not easy to like just pick off something and then like train like.

[01:12:26] Yi Tay: Like a 100 bit model easily without actually making sure your checkpointing is you're, you're, you're resuming your checkpointing, your, your notes failing and stuff like that. I think those are, like, hard, hard stuff things that, that need to be taken care of but at, at, at Google some, some team Does that for you.

[01:12:43] Yi Tay: Yeah, TLDR of the training infrastructure, yeah.

[01:12:48] Swyx: Does Google have the equivalent of Weights and Biases?

[01:12:51] Yi Tay: TensorBoard, I think, yeah.

[01:12:53] Swyx: Oh yeah, yeah, yeah, of course.

[01:12:55] Yi Tay: Yeah yeah, yeah, yeah yeah.

[01:12:58] Alex Volkov: So

[01:12:58] Alex Volkov: we don't work with Google yet, but hopefully if if folks at Google are listening to us and you want to use kind of Weights Biases, definitely reach out. But at least you guys, now that you're out of Google, you definitely can. You want to follow up with Swyx, or are you,

[01:13:10] Swyx: are you Oh,

[01:13:10] Swyx: I don't know. Did you guys talk about Ricoh Core already?

[01:13:13] Alex Volkov: Yeah, so I think, Yi, there's not a lot of stuff that you can say about the bigger model that you guys have, but give us a little teaser live for a few folks here on stage, like what can we expect from the bigger model, maybe when, what can you tell us?

[01:13:28] Yi Tay: So the bigger model, okay, so I can just say that we, we ourselves are quite impressed by the results and it's if, if if you try to extrapolate from our 7 and 21 based on relative to other models of the scale you can. Try to imagine like what the type of metrics look like, right? But I think we are, we ourselves are, ourselves, we are quite impressed by, by the, the, the, the, the metrics.

[01:13:49] Yi Tay: So like we are I think that's all we can say. I think in the polls, we say that coming out in coming weeks is around that ballpark. It's not like next week, the kind of thing. It's also not like one, two weeks. It's probably like a couple of weeks. But we still, we also kind of like a bit tired after the release.

[01:14:05] Yi Tay: Take

[01:14:05] Yi Tay: a few days light break and then start working again, that kind of thing. So Yeah. I think that that's, that's basically what I can say, but it's, I, we are, we are very happy in the model and as well, yeah.

[01:14:17] Alex Volkov: All right, so we're excited to see this. I want to flip back to Max just for a second. Max as we just talked covered, there's some stuff that I use that you guys are watching. Oh, find somebody test this out. When folks interact with your demo, first of all, I'll just say, definitely folks should do the thumbs up, thumbs down, and reply, so you guys will get some nice RLHF.

[01:14:35] Alex Volkov: What other venues of giving you guys feedback would folks can go? Is there a Discord you want to call out, or anything else you want to add to this as we move on?

[01:14:44] Max Bain: Yeah, thanks guys. We, we actually have a discord channel and if people post, use cases where maybe our model is doing well, or could do better, you can post that, or maybe there's something you're not happy with the current models, like GPT 4V also. And like, I guess, cause we're [01:15:00] such a small team in an early stage, like we'd.

[01:15:02] Max Bain: We're taking a lot of that on board and yeah if you can point any of that stuff, if you have stuff in more detail, you can put that on the Discord and yeah, we're like, really happy for any feedback,

[01:15:10] Alex Volkov: awesome. Are you guys distributed, by the way? Are you working co located? Like, where's, where's RECA located?

[01:15:16] Max Bain: Like, all over the globe, yeah, So he's in Singapore, I'm, like London, sometimes the West Coast, but yeah, it's like a remote first

[01:15:23] Max Bain: company.

[01:15:25] Max Bain: and also, yeah, sorry. Another thing is if we have, do you have job posting? So if you guys would Yeah, like the sound of record, you can also apply to join. We have yeah, quite a few

[01:15:35] Max Bain: positions open.

[01:15:42] Alex Volkov: friends of the pod from now on. E, anything else you wanna, you wanna add as, as we finish up and then move to the next

[01:15:49] Yi Tay: No, thanks. Yeah, really thanks for inviting. It's really nice chatting with you. And yeah, it's been great. Yeah.

[01:15:56] Alex Volkov: I'm, I was, like, like I said, I was blown away by the performance of the multimodality. I was blown away by the tonality understanding, which I've never experienced in any model so far. I heard that it's possible and I saw some technical stuff. I never experienced this on something like my videos as well.

[01:16:11] Alex Volkov: Definitely folks should play around with, with the demo. I'll add this in the show notes and follow Yi and Reka and, oh yeah, one last thing Yi, before you go. What's the meaning of Reka? I know this is a word in Hebrew that I know, but what's, what's the meaning of this word? Like, where, where did this come from?

[01:16:24] Alex Volkov: I was really curious.

[01:16:26] Yi Tay: I think one of the meanings, it's not official, it's not canon, but like one of the meaning it comes from Reka in Eureka, like Eureka, like the Reka

[01:16:35] Yi Tay: in Eureka, but it's not Okay, this is not canon, it's just one of the interpretations of that but it's a bit reverse engineered where people ask us, we just, this is what we say, but that's actually I think that that's it's not really like canon, yeah.

[01:16:49] Alex Volkov: Awesome. Thank you guys for joining and folks, definitely should go check out the demo. And I think the tradition continues because now we have we're moving on to the diffusion area and we have the, the, the, the awesome, the awesome chance to have Dome here. And we. Just released, or I guess we saw this week, a new release from Stable Diffusion called Stable Cascade.

[01:17:09] Alex Volkov: And Dom, I reacted to Imad's tweet about this hey Imad, you want to come to ThursdAI? And he said, Dom, and I think did you say Rodrigo was the other guy? Are the real heroes. And I want to welcome Dom to the stage. Dom, welcome. Feel free to unmute yourself, give a brief introduction. Let's talk about, let's talk about Stable Cascade. .

[01:17:25] Dome: So yeah, my, my name's Dom. I joined stability a couple, actually a couple of months only ago. And I'm currently enrolled in, in Germany in a in a degree. I'm currently finishing that up and I've met Pablo more than a year ago. And ever since that we started working on, generative models, mostly in vision. So image modality and also slowly moving into video stuff. And yeah, at some point, so pretty early, we already connected to stability via Lyon. And at some point they liked what we were doing and liked the progress of how the paper that we called Verstehen was going, which is German and means sausage.

[01:18:09] Dome: I can tell more about that

[01:18:10] Alex Volkov: Oh, that's what it means! Okay.

[01:18:13] Dome: yeah, yeah, yeah. And yeah, so then we joined, we joined and we joined the apply team and we were able to, to work on the third version of it which in the end then was called Stable Cascade, just to make it fit in more, not to confuse people where that name comes from, what's this third version about.

[01:18:31] Dome: And yeah.

[01:18:34] Dome: That's bad.

[01:18:34] Alex Volkov: Awesome. So let's, let's say hi to Pablo as well. Welcome, Pablo. Feel free to unmute yourself. Brief intro from you as well. And let's talk about what makes Cascade different than SDXL or even the V2.

[01:18:45] Pablo: Hey, hi, Alex. A bit about myself. I am a machine learning researcher. I used to work before working at Stability. I used to work at Disney. So I was able to bring a lot of interesting ideas from there. And then I, yeah, I joined Dom and we have been working on very cool things since, since I met him.

[01:19:03] Pablo: And the latest is, is our new stable cascade.

[01:19:08] Alex Volkov: That's awesome. Let's talk about Stable Cascade. I've been able to test this out, and the things I was able to, the things that blew me away were, like, speed, inference speed as well, but also the base model already has hands built in, and they're fine. You guys said you're working with Worshen for a couple iterations, and this became Stable Cascade?

[01:19:26] Alex Volkov: Like, where talk to me about the history, and why is it so good, and so fast?

[01:19:30] Dome: Okay. Yeah. Yeah. So basically the, the biggest difference, and I think that's what it boils down eventually is the, the, the space or the dimension where stuff is generated for, for the text conditional part and for Stable Diffusion XL is, that they have this thing called the VAE, which takes images and just compresses it down to a smaller space.

[01:19:53] Dome: And the only reason to do that is. Just that you work at a smaller resolution, which then gives you faster training and faster inference. Imagine training or generating stuff at a pixel resolution of 1024, so one megapixel. This will be a lot slower than if you try to do the same, try to trying the same model at what, 32 by 32, for example.

[01:20:15] Dome: So the idea is you still want to have high, high quality, high resolution images, but you don't want to generate at that very high pixel space. So you just try to find something, how you can compress it even further. And up, up until now, people always use VAEs, VQGANs, normal autoencoders and so on but they reach limits very early on.

[01:20:34] Dome: So you can get to an spatial compression of eight. So Pablo had this incredible idea of using it. diffusion model to increase that compression, basically, and long story short by using a diffusion model on top of a normal VAE, or you could also leave the VAE away and just start at pixel space, you can achieve much, much higher compressions because you have the diffusion model that can iteratively at first at the lower frequency, so the, the the rough details, and then later on at the high frequency.

[01:21:04] Dome: So at all the details. And so it has just a lot more space to reconstruct an image. And with that it's possible to, to compress images a lot further. And the version that we have now achieves a compression of 42. And that makes a huge difference in terms of training and inference time. And That's probably what you saw because then

[01:21:24] Dome: the big model, the 3.

[01:21:26] Dome: 6 billion, which is. quite big for images. So stable diffusion XL is 2. 2 billion. We're not in the, in the large language models. So yeah, this makes it just a lot faster. And then you have this diffusion decoder, which works at at a higher resolution, but needs a lot less steps and combining this just gives results in making the model very fast.

[01:21:49] Alex Volkov: That's super cool. I want to switch back to Pablo just real quick. So I'm looking at this graph for inference speed, but also checked out some of the examples. One thing that I noticed is the real time rendering basically of how the model kind of searches through the diffusion space. And the last step just like kicks into like super high resolution.

[01:22:09] Alex Volkov: Pablo, what can you tell us from some exciting or maybe surprising results that you've seen or people using it and Yeah, feel free to speak about your cool model a little bit more.

[01:22:18] Pablo: Yeah, I actually I have been really surprised on how well this model could, could could be. We, we, we're not expecting it to be as good as it is. We started this more as an like a, an experimental idea of trying to achieve the same quality of existing models but focusing on, on speed on performance.

[01:22:39] Pablo: But then somehow we ended up with a model that was like very competitive and yeah, I don't know. I think this last step as, as you mentioned, is the the, the upsampling stage. Which is this diffusion model that Dominic mentioned that can bring the image from 24 by 24 latent to a one megapixel.

[01:23:00] Pablo: And that's why you see this like very big difference between the previous to last and the last step.

[01:23:06] Alex Volkov: Yeah, the last step is poof, high quality. I love it.

[01:23:11] Dome: Yeah, we, we, yeah, we, we actually provided a previewer. So when we work in this very highly compressed latent space, In order to be able [01:23:20] to see what the model is doing, we have this very tiny convolutional model that can preview what's going on. That's what you're seeing, which looks pretty blurry. And then yeah, the final step does that.

[01:23:33] Dome: And yeah, why the model can make We're also pretty surprised. The, the big

[01:23:41] Alex Volkov: Text is also very impressive. I think let's not skip over this. The out of the box text. is so good. Compared to, let's say, the Stable Diffusion 1. 4, which it released was, which was bigger, right? I think it was like five gigabytes or something. This is just miles, miles, miles better. And the text out of the box, hands out of the box is very impressive.

[01:23:59] Alex Volkov: Text is super cool as well. Very surprising. Yeah, go ahead, please.

[01:24:02] Pablo: The, the, the biggest difference compared to V2, which was our previous iteration of the model was the size of the architecture of the model and the quality of the data, which I think. It shows how important that, that is, and I think probably, since, since our model is able to work on this very, very highly compressed space, it can learn much more efficiently if, if it has good data, it can learn much more efficiently these, these kind of things.

[01:24:30] Pablo: Maybe it learns them faster than other models which is why Yeah, we're able to have this kind of results.

[01:24:39] Alex Volkov: Awesome. Thank you guys for coming up. I really wanted to make sure that, yeah, you guys get the recognition because like really, really cool. This is under the stability membership, right? This is not like fully, fully open source, but folks are going to be able to use this model for, for their stuff and maybe keep training.

[01:24:55] Alex Volkov: Does it support all of the, the, the fine tuning and the LoRa ecosystem as well?

[01:24:59] Pablo: Yeah, one detail, it's not yet on the the subscription. It's still for only for research but it, it will change probably in, in the following weeks, you asked about the Loras and Control Nets. Yeah, we

[01:25:13] Pablo: we

[01:25:13] Pablo: we we made sure to provide some example code for training Loras, Control Nets, and the full, full fine tunings on, on our repository. We also provide some pre trained Control Nets for in painting, for canny edges for super resolution, which is not the best super resolution model out there, but it's, it's interesting enough to, to share with the community, and we provided Tiny Laura with Dom's dog which is, it's pretty and,

[01:25:44] Alex Volkov: Nice.

[01:25:45] Dome: yeah, and I think that's it for now, that, that's

[01:25:48] Yi Tay: all the

[01:25:49] Alex Volkov: Awesome. Thank you for joining and folks, definitely give Dom and Pablo a follow. Folks, really great shout out for building this and releasing this from Stability and it looks really good and I'm sure the community will adopt this. I've already seen a bunch of AI artists in my, in my kind of field.

[01:26:02] Alex Volkov: field are getting very excited about the possibilities here. Thank you for your work and thank you for coming for Thursday. I please feel free to stay because we're going to cover a bunch of other stuff as well, like super quick. Meanwhile, I just want to do a quick reset. It's been an hour and let's say 35 minutes since we're here.

[01:26:20] Alex Volkov: If you're just joining us, you're on the Thursday I X space, which is live recording for the Thursday I podcast and newsletter. I'm your host,

[01:26:28] Alex Volkov: Alex Volkov, I'm here joined by a co host, Nisten is here on stage, Yamil Spokin, and we have Swyx here, who dropped off the stage, but he's in the microphone, and I will move towards a corner that I have, and then

[01:26:40] This weeks Buzz

[01:26:40] Alex Volkov: I have a surprise for Swyx I'm moving towards a corner that I have usually, which is called This Week's Buzz, where I talk about the stuff that we have, or I learn in Weights Biases every week, so if you are subscribed to the newsletter, you definitely already know this, I just learn as I go and talk about this.

[01:26:55] Alex Volkov: If you're not subscribed to the newsletter, Why not? I guess you'll be up to date with everything that happens in the world of AI. So definitely check out thursdai. news. This is the URL, HTTPS, thursdai. news. And this week's buzz is all about this new course that we released with Hamil Hussain about putting models in production.

[01:27:13] Alex Volkov: I think I've spoken about this before. Weights Biases has an academy. We release courses and the courses are free for you. There's a bunch of knowledge. The last one we've talked about was the, with Jason Liu about the instructor. And we also have Hamel Hussain who released a course about model management and in production as well.

[01:27:29] Alex Volkov: And this is definitely A very illuminating one, including how to use weights and biases for the, like the best companies do, OpenAI does, and like Microsoft and Meta, and hopefully we'll get Google at some point. Definitely, of course, it's worth checking out and signing up for. This will be in the show notes as well, and I'll post the link as well here.

[01:27:47] Interview with Swyx from Latent Space

[01:27:47] Alex Volkov: And now I'm gonna Actually yeah, Swyx is now back on stage, and here's my surprise, if you guys follow and Swyx's voice, you know that he's a co host of Latentspace together with Alessio, and we're now sitting in the Latentspace pod studio, which looks incredible the surprise is, I don't remember you being on the other side of the mic, so this is like a surprise interview with Alex and Swyx, but you're gonna be a guest and not a host, and I just wanted to hear about some stuff that you guys are doing, and how Latentspace is going, like all these things.

[01:28:14] Alex Volkov: So this turns from ThursdAI into ThursdAI, like deep dive interview, just a brief

[01:28:18] Alex Volkov: one.

[01:28:19] Alex Volkov: I figured I'd use the opportunity to give you a surprise. This was not staged. Swix told me he may not be able to even join. 'cause you just flew back from

[01:28:26] Swyx: Singapore. Singapore, yeah. Yeah.

[01:28:27] Swyx: Yeah.

[01:28:28] Swyx: Cool, okay,

[01:28:29] Alex Volkov: So as,

[01:28:30] Swyx: I feel like we talk so much and you've been a guest on our pod like five times, so

[01:28:35] Alex Volkov: and

[01:28:36] Alex Volkov: I, I would wanna start with how you would introduce yourself to the audience that doesn't know you.

[01:28:41] Swyx: you so I'm Swyx, I mostly work on developer tooling, and, and, mostly known as the editor or podcaster of Latent Space, which has done pretty well.

[01:28:51] Swyx: I think we're celebrating our first year anniversary pretty soon. And on the the other half of my life is I'm working on small AI and AI Engineer Conference, which we just, which we just announced for June 25th to 27th. Yeah.

[01:29:05] Alex Volkov: Yeah. You've had quite a long career in DX as well. I think Netlify, you had a stint in

[01:29:09] Swyx: Netlify

[01:29:09] Swyx: Yeah, I was one of their earliest employees slash dev rel of Netlify. That's where a lot of people know me. That's where I became quote unquote famous in developer tooling and in React specifically. Because I did a lot of content on React and serverless speaking and writing. And then I've been head of developer experience for Temporal, Airbyte, and then also spent a year at AWS working on the same thing.

[01:29:34] Alex Volkov: Hmm. Awesome. I also from that kind of that side of your career, you work with the Chroma guys as well.

[01:29:40] Alex Volkov: And Chroma

[01:29:41] Alex Volkov: just announced that they have been a year around and looked like millions of companies that probably had

[01:29:48] Alex Volkov: something to do with that. So shout out Jeff. And and, I'm blanking out on the

[01:29:53] Swyx: name, Anton. Yeah, yeah. I so I consulted for them on their DevRel when they were doing their, their first hackathon a year ago, actually. And yeah, I

[01:30:03] Alex Volkov: think

[01:30:04] Swyx: It seems like they are the leaders in open source vector databases. Retool, we did a chat or interview with David Hsu, the founder of Retool, and Retool did a state of AI survey among their customers what they're using.

[01:30:18] Swyx: And Chroma was, like, up and to the right in terms of the adoption and the NPS score, which I think NPS is actually a very important metric to keep tracking. Yeah. Really, really cool. Glad to be involved with Chroma.

[01:30:30] Alex Volkov: Glad to be involved with Chroma. You've been also prolific in writing, like I know many people go to your blogs and like the stuff that you have, how many publications in total are you like, publishing your content in right now?

[01:30:46] Alex Volkov: You have your own personal

[01:30:47] Swyx: one, Yeah, I have three blogs. Three blogs. But Latentspace is the currently primary active blog. I have a personal one and then I have a developer tools advising one because I do a bunch of angel investing and advising for people.

[01:31:01] Swyx: And I don't know. I think More people should blog! It helps you think through what you think that and share your knowledge with other people.

[01:31:10] Swyx: And also, actually the most valuable thing is the most embarrassing thing, which is when you get things wrong. People will come out and correct you, and you will be embarrassed for a second, but then you'll remember the lesson forever.

[01:31:21] Alex Volkov: Can you give me an example of something that you went wrong and people corrected you, and then this improved your thinking?

[01:31:28] Swyx: improved thinking?

[01:31:31] Swyx: Yesterday or into coming into today, right? Because I do a monthly recap where I think what ThursdAI does is [01:31:40] recap news every week and then other people like NLW from the breakdown recaps news every day. And I think the lower frequency granularity of a month means that I only get to do 12 of these a

[01:31:53] Alex Volkov: year.

[01:31:54] Swyx: And that. forces me to think through okay, what is really actually important when you step back and think about it. And for my January recap, January was a slow month, to be honest. Today was more news than January. So I was like, I was trying to recap January, and I was like, okay nothing super interesting this month.

[01:32:11] Swyx: What Do we, if we step back, it's important for AI progress. And I listed a bunch of things, long inference and all that. One thing I specifically said was not interesting for state of the art models was long context.

[01:32:26] Alex Volkov: was, long context. It

[01:32:28] Swyx: I said that yesterday. It's published, I sent it out to 35, 000 people, including Satya Nadella, Drew Houston, and all the people who read the newsletter.

[01:32:36] Alex Volkov: Satya doesn't read, he also participates, like he clicks on

[01:32:39] Swyx: links,

[01:32:39] Swyx: Yeah.

[01:32:40] Alex Volkov: there's an engagement, active engagement from Satya from Lydian Space.

[01:32:43] Swyx: so it's, so it's embarrassing, but also it just forces me to think about okay, how much do I really believe in million token and ten million token context? And I know now, today I learned that Nat Friedman strongly disagrees.

[01:32:58] Swyx: And that's good. That's, that's useful to update. And Google, of course. Yeah, yeah. I think It's, it's a, basically, so it's not about that specific point because we can always debate the pros and cons of that, but the act of writing down what you believe and taking strong opinions instead of saying that everything is awesome, instead of celebrating every little bit of progress as equally important, you have to rank them, and being wrong in your rankings gives you information to update your rankings, and if you don't give yourself the chance to be wrong, then you don't really learn.

[01:33:36] Alex Volkov: You

[01:33:37] Alex Volkov: publish a bunch of stuff. Some of the stuff that you publish turns into more than just an article. You have essays, and I think that the one essay that I remember specifically, obviously, is about the AI engineer essay. Talk to me about thinking about how you approach writing this. Is that stuff that you saw?

[01:33:51] Alex Volkov: And I think as background for folks who are not familiar with you and where you are in, in, you're sitting in the middle of the arena that you helped also coin in San Francisco, right? We're in the middle of Soma Mission, Hayes Valley, somewhere there, if I'm not confusing. We're in this space it's called Newton that you're also like I think you're plugging in latent space where Tons of companies that we know from the Twittersphere are just literally behind us here.

[01:34:15] Alex Volkov: There's Tab with Avi and Julius with Rahul like like a bunch of other companies like sitting right here building like very cool things and And this is an example of one of those so actually I think it was very natural to put those kind of hubs within the bigger bubble of San Francisco. And you, as far as I'm concerned, it was very plugged in to this even before coming to AiEngineer, right?

[01:34:34] Alex Volkov: And potentially, this is the reason why the engineer the conference had so many amazing speakers on stage because very I think you told me back then a lot of like personal favors were pulled to get some folks to show up on that on that. And As somebody who's an outsider from Denver, what I said, right?

[01:34:48] Alex Volkov: This is, this is incredible to see, but also it's very hard to penetrate and understand like what's going on and where the trends are. And this is part of the reason for ThursdAI. So you're sitting in the middle of this, you have all these connections, you said you're an angel investor as well. How does this shape your thinking about the AI engineer?

[01:35:02] Alex Volkov: Do these old people talk in like the hackathons? How do you draw to create something like this that's fairly seminal that now people are considering themselves AI

[01:35:11] Swyx: engine. Okay. Oh. Okay. So there's, there's two questions here.

[01:35:15] Swyx: If I can do rag on your questions. Yeah, please. Which is that one, how do you write impactful perspectives or come up with interesting ideas that will stick around? And two, how do you make sense of San Francisco? Especially as an outsider. And people, I think people can hear in my voice that I'm not American.

[01:35:34] Swyx: I'm Singaporean. And the last seven years of my developer career, I did not spend in San Francisco. I only moved here in April of last year. You don't have to be an SF to have a background in tech. Oh, I think the other the other thing I should offer as context is that I, I have been blogging for quite a bit.

[01:35:57] Swyx: I often say that you have to blog 50 times a year, but in order to get like one post a year that it, that makes up the entire year, it's the one that people know you for. So this is my sort of fourth or fifth Quote, unquote, industry defining blog posts. So I, I've done this for serverless, runtimes and cloud orchestration and AWS, so I've done this before and I knew the work that goes into writing something like this. Rise of the AI Engineer took two months. I had a few potential collaborators

[01:36:35] Swyx: who ultimately did not co author but were heavily involved.

[01:36:43] Swyx: And I can talk about the writing of the post, but the main inspiration is trying to figure out what is important directions.

[01:36:48] Swyx: And it is not purely about coining a term, which I think is a very vanity metric, but it is about picking directions in terms of identifying what is wrong about the zeitgeist. At if you rewind this time one year ago, people were very much focusing on prompt engineering. People were worried about the end of jobs for AI, for, for engineers, for software engineers.

[01:37:13] Swyx: And I think both have been proven wrong in terms of the scope of the prompt engineer. Now, like now you're no longer really here about. Professional prompt engineers, because it's been replaced by the AI engineer who can code. And I think the importance of the ability to code to wield AI makes you a thousand times more effective than people who use AI without the ability to code.

[01:37:37] Swyx: And I think identifying this core difference in ability, understanding that this stack is starting pretty thin and small, but it's going to grow over time, understanding that it is fundamentally very different from the ML engineer stack is a part of the mix that made me convinced that AI engineer would be a category to invest in which is why I started the conference and then pivoted the newsletter and podcast.

[01:38:04] Alex Volkov: Yeah, so let's talk about that as well. So definitely the audience that ThursdAI draws, at least in part, is AI engineers, but also in part, like folks who are trained in Finetune models. And I've noticed like a little bit of a AI engineering is almost like the gateway drug into the larger AI stuff, because at least the folks that I'm familiar with, the folks who are like JSTS devs, that did the Netlify stint, that did React, etc.,

[01:38:27] Alex Volkov: they started to build with these tools. The tools are like significantly easier to get into than ML, than traditional ML. You just do some API calls open AI exposes a bunch of stuff, and suddenly you're like, oh, okay. I have, I've tapped all this power, this incredible power. I'm building intuitions about how to use this power.

[01:38:42] Alex Volkov: I'm building intuitions, how to put this power in production for my users. They tell me some feedback. How do I do more of this? Am I only limited to open ai? Or maybe I can go to the open source. Try some stuff like this. Maybe I can do Olama, which, by the way, shout out to Olama, our friends, just released the Windows thing.

[01:38:56] Alex Volkov: Maybe I can do this like locally on device. Maybe you can do this on Edge, on Cloudflare, for example. All these new tools are popping up, and these people are sounding like from a very limited scope of API users, are growing into API users who also have an intuition about prompting is just one of those things, embedding in RAG and better RAG systems, like we've seen some folks going there.

[01:39:14] Alex Volkov: Definitely the scope grows, and as every category, like frontend was a very tiny scope, JavaScript, HTML, and the client, and suddenly like it became a full stack, you have prompt and like frontend, ops, and like all of these like things. So scope grows.

[01:39:30] Alex Volkov: Where do people learn about this new and upcoming thing?

[01:39:32] Alex Volkov: And I think like the conference is one such way. So we've talked about the conference. This is actually not your first time. I just remembering I interviewed you after the conference for a full hour that we had a full conversation. It wasn't about Swyx. So how was the conference after the conference received?

[01:39:46] Alex Volkov: How did your direction into thinking about latent space and kind of exposing AI in San Francisco to the world? And let's take this to the kind of the next conference where you want to take us. What happened to the AI engineer?

[01:39:59] Alex Volkov: I think I asked

[01:39:59] Swyx: three

[01:39:59] Swyx: or

[01:39:59] Swyx: four. [01:40:00] Yeah, I know.

[01:40:00] Alex Volkov: Break them down however you want.

[01:40:02] Swyx: So the conference was really good, but I would actually classify that as the end of a process rather than the start of a process. It basically recaps

[01:40:10] Swyx: the work

[01:40:11] Swyx: that people are doing in the industry over the past year.

[01:40:14] Swyx: And then, I get to curate and pick and invite people to present, the best of their work and their thought. And I think that's a very privileged position. And then for me, The work begins after the conference for the next the next thing. And I picking directions and having so last year was like a single track conference, this year for World's Fair we're doing nine

[01:40:36] Alex Volkov: When is that, just for the

[01:40:38] Swyx: June 25th to 27th. Yeah.

[01:40:40] Alex Volkov: make sure you sign up.

[01:40:41] Alex Volkov: It's gonna

[01:40:42] Swyx: yeah, yeah. We're going four times bigger this year, 2, 000 people, and last year, 17, 000 people tuned in on the livestream, and hopefully we'll have, we'll have more impact this year. But yeah I think For me, actually, it's a really good way to think about okay, who do people want to hear from, who actually did impactful work that I will be proud to showcase 10 years from now.

[01:41:04] Swyx: I'm always thinking about the test of time. And I was very inspired by NeurIPS, where they actually had a test of time award. And I was like,

[01:41:10] Alex Volkov: man, that's Did Jeremy Howard get it or something, if I remember

[01:41:13] Alex Volkov: correctly?

[01:41:13] Alex Volkov: No, Jeff Dean. Jeff Dean.

[01:41:14] Swyx: Jeff Dean. Yeah.

[01:41:16] Alex Volkov: Shoutout Jeff Dean for today, by the way.

[01:41:17] Swyx: Yeah, yeah, for Word2Vec. I, I always said some people are speculating what is Test of Time for next year, and it was like Ilyas Oskarver, if he ever shows his face

[01:41:25] Swyx: again.

[01:41:26] Swyx: And then I was like, but I know what's gonna win the Test of Time for 2027. Which is attention is all you need.

[01:41:32] Swyx: Yeah, yeah. But basically it's a flex for any, any conference to say okay, Test of Time award goes to something that was presented here 10 years ago. And that and Neuros has been going on for 37 years.

[01:41:46] Alex Volkov: what of the AI engineer presentations would stand the test of

[01:41:50] Swyx: question. I think the audience has voted. It looks like Pydantic and Jason Liu's Instructure is very, very, very, very popular. And I think he's just fundamentally correct that every model, instead of there's like some table six versions of every model. You have the base model when you train it, then you have the chat tune model.

[01:42:07] Swyx: And now I think it's going to be table stakes that every model should have structured output or function calling as, as they call it. And it's even useful if you're not actually using it to, to generate code or call code because it's very good for chain of thought. And so Max Wolf mini maxer on Twitter and on Hacker News actually wrote a really influential post that I'm going to try to showcase.

[01:42:27] Swyx: Yeah, for me as a conference curator that's what I do. Read a lot of stuff and then I try to try to feature like the best of things and also try to make bets that are important. I do think as content creators, like we're like the end of the food chain and not the value chain.

[01:42:45] Swyx: And it's always important to understand like even stuff that we don't pick is very important and substantial and it's

[01:42:53] Swyx: You're, you're picking for an audience to use at work, which is a small subset of the total progress that humanity can make.

[01:43:01] Alex Volkov: Interesting, interesting. Tell

[01:43:02] Alex Volkov: me

[01:43:03] Swyx: I just people, you want to engage in philosophical conversation, you go to Lex Friedman or Dorkesh Patel.

[01:43:11] Swyx: And then if you want to Think, talk about things that you can use in open source. You go to Thursday, ai. And then we have less of an open source focus. We are, we're very much focused on enterprise and things you, things you can use at work to code and to build products and startups with.

[01:43:26] Swyx: And so like I, whatever you do, as, as long as you have a clear focus for the, of the audience that you serve and you know how to reach them, then they will love you because you are, you're making literally the thing for them. And you don't have to appeal to everyone. And I think that's fine.

[01:43:40] Alex Volkov: switching gears from the kind of the conference.

[01:43:43] Alex Volkov: How did the podcast came about? It's you said you're coming up on the year

[01:43:46] Alex Volkov: of

[01:43:46] Alex Volkov: the

[01:43:46] Alex Volkov: podcast. And you also said you moved here in April. I did not know this.

[01:43:49] Alex Volkov: I

[01:43:49] Alex Volkov: thought you're here for SF Native. So how did the podcast came about? How you and Alessia met? Let's talk about

[01:43:54] Swyx: later. Yeah. And we should talk about doing well in San Francisco and like the taxi in, in Ingra, I think, which I, which I think is important and something I'm.

[01:44:01] Swyx: going through but have also done well at. So the podcast specifically was because I started the newsletter writing opinion pieces on just AI stuff. It was actually inspired by Stable Diffusion at the time which was sort of August 2022 ish.

[01:44:16] Alex Volkov: My life changed after that open sourcing.

[01:44:19] Swyx: Yeah, and then you you really run out of opinions very

[01:44:22] Alex Volkov: and

[01:44:24] Swyx: and then you're like, oh, I need to generate unique or new tokens.

[01:44:29] Swyx: The only way to do that is to get source material by interviewing people and putting a microphone in front of them. When you put microphones in front of people, they get more chatty. And sometimes they break news. For us, the big breakthrough was George Hotz when he talked about GPT 4 with being a mixture of experts.

[01:44:44] Swyx: Yeah, that was, that was a surprise, but he likes to do that sort of thing, just drop random alpha.

[01:44:49] Alex Volkov: he dropped it and then you guys posted it and then I had no idea what Mixture of Experts is as well as like most of us and then it turns out to be like a true and now we we

[01:44:59] Swyx: saw it. Now Gemini is

[01:44:59] Alex Volkov: Gemini's Mixture of Experts the 1.

[01:45:01] Alex Volkov: 5 which is quite incredible so that was like a big thing did was this like natural to you to start turning on the microphone did you have to do an

[01:45:08] Alex Volkov: adjustment period

[01:45:09] Swyx: another thing that people don't know is that I started four podcasts before.

[01:45:13] Swyx: So I'm not new to the conversation game, and I'm not new to like audacity and like editing and publishing, but I think, Having taken a few runs at it helps to prep you for, like, when something actually has audience fit.

[01:45:26] Swyx: Because all the others were very small. There were maybe like a few hundred listeners each time. This one went to number 10 on the U. S. tech charts.

[01:45:33] Alex Volkov: Yes, I saw that. That was incredible. Is that the top, top,

[01:45:36] Swyx: I think that's the highest it's been. Recently when it was like as high as 16 over the holidays, and then now it's dropped back down again. It's very, very volatile.

[01:45:44] Alex Volkov: But it's like very clear that you're in the top 50 like tech podcasts in the world, even though AI is Fairly niche. And the topics you discuss are fairly technical.

[01:45:52] Alex Volkov: Like when you talk with folks, it's not a general appeal audience for like Sweden does, or the, the guys from the four guys, the VCs, right? It's very technical. So very impressive that like you broke the top 50 charts and it wasn't by chance you bring like great guests. Like, how do you, is the same approach that you have for the engineer you do for guests as well?

[01:46:13] Alex Volkov: Or are you now getting like requests to come on the podcast from some other

[01:46:15] Swyx: We get requests but you usually, for the, the people that draw the audiences, you have to go reach out to them. It's obviously, that's how it is. I

[01:46:24] Alex Volkov: I heard one such person now does not work in OpenAI, so he can

[01:46:28] Alex Volkov: potentially, potentially join

[01:46:29] Alex Volkov: podcasts as

[01:46:30] Swyx: yeah, he's a, he's a he's a listener and he has said that he'll come on at some point.

[01:46:35] Alex Volkov: We're talking about bad Mephisto for folks in the

[01:46:37] Swyx: Mephisto for Fortunyaga. So yeah,

[01:46:41] Swyx: I don't think it's actually just guests. I think it's also about focus on topics and then being engaged enough with the material that you get to ask questions that no one else asks.

[01:46:51] Swyx: Because, for example, if you have a VC asking questions, they often ask about market and business. But if you're an engineer, you're really asking about API and limitations and trade offs, stuff like that. Things that you don't really get into unless you're, like, actually evaluating it to use something at work.

[01:47:09] Swyx: And I think that's important. And also, I think a lot of guests For us, we try to be like the first podcast that somebody has done. Like we're the first podcast for for Fine, for Cursor, for a bunch of these guys. So they're not experienced speakers. They're not some of them are good speakers.

[01:47:25] Swyx: But they're not experienced at the whole telling their story and all that. So you have to help them. But it doesn't matter because I think that you just try to serve your audience at the end of the day, right? What do people want to know? Ask those questions and then get out of their way and let them talk.

[01:47:38] Swyx: I think that the other thing that we do, the reason I say it's not just GUESS. is because we do special episodes where we have breaking news. We haven't done one in a while because I don't know. I think, I think you got, you have taken that spot of, of the breaking news guy. We

[01:47:50] Alex Volkov: got

[01:47:51] Alex Volkov: the, we got three breaking news, you were here. This is kind of like, that as

[01:47:54] Swyx: that as well. And then we also do like events recaps. Like we did Dev Day we did NeurIPS and that is like a really big sort of editing process work that I really like to do where you're basically performing the work of summarization and Curation, instead of doing long form interviews, and people really like that.

[01:48:13] Alex Volkov: summarization part, like the multiple folks, I think I participated in one, you did one in DevDay NeurIPS as well. So what's, what's [01:48:20] now that we're coming up on an annual kind of thing for, for Latentspace, what's next for Latentspace?

[01:48:24] Swyx: More conversations? That's the weird thing we think that we've done and have done as well as a technical podcast can do in the general podcasting space.

[01:48:36] Swyx: The ultimate number of people who listen to podcasts is still very low. compared to the general audience that might be interested in the same kind of content. That's why I branch out into a conference where you produce talks and very highly polished and all that. We The way to grow a podcast is to not just podcast it's to actually write, where, my essays still get a lot more readers than listeners than to grow on YouTube or whatever, and that's fine.

[01:49:05] Swyx: I think ultimately, podcasting is a mix of entertainment and Education, right? You have to be attached to some kind of story, some kind of personality, and, and then learn something along the way that might be useful at work. So I think personally, I growing as a podcaster is about just growing your influence or understanding of an industry in general and the ability to serve an audience.

[01:49:29] Swyx: And then maybe opening up as hosts and as industry experts as we gain knowledge and understanding. So that people come to us not just for access to guests, but access to us as well, which people have when we did the end of year listener survey people actually requested for us to have more mic time.

[01:49:47] Swyx: Alessio and I did our first just the two of us conversation in a year and that was really good.

[01:49:52] Alex Volkov: Wow. So are you playing more, more of those?

[01:49:54] Swyx: Yeah, yeah, we, so we used to do these one on one episodes where we do Introductions to a topic, like we did Datasets 101, Benchmarks 101, and we did Transformer Math 101, and then we also did RLHF 201.

[01:50:07] Swyx: And so we want to do more of those, where it's like it's like inspired by Acquired FM. And the work for this kind of episode is so different than a normal chat, because in normal chat you just sit down and you, you, maybe you prep a bit, a bit of question, you, you research the other guy's background, and then you just have a nice conversation, and that's it.

[01:50:23] Swyx: Whereas for a content heavy episode like that one, you do

[01:50:27] Swyx: a

[01:50:27] Swyx: week of research. And you compile a whole bunch of stuff, and you simmer it in your mind, and then you try to rehash it and introduce it for an audience who hasn't done that amount of work. Yeah, that, that is a lot more work up front, but obviously it's very high value, and, and also I, I like to call it evergreen.

[01:50:43] Swyx: Evergreen content, meaning, like You want to build up something that will still be useful and relevant in a year.

[01:50:48] Alex Volkov: Yeah. So definitely let me, let me just take a personal position here with Latentspace.

[01:50:53] Alex Volkov: I've been a guest host, in Latentspace a couple of times, in special episodes as well. I, now this, this studio is like super cool, like a home away from home. They're able to come here to the spaces, Alessio on Tap into the AI scene in San Francisco. And I've learned a bunch from just the way you render.

[01:51:11] Alex Volkov: Latentspace, for folks who are listening, is not only just a podcast. If you're subscribing on just your Spotify or Apple News, you're missing a big part of it, which is the newsletter that you send, which has a bunch of links and show notes and folks that you talk

[01:51:23] Swyx: about.

[01:51:23] Swyx: There's one more part. Discord.

[01:51:26] Alex Volkov: Oh, there's also Discord.

[01:51:27] Alex Volkov: You do paper readings as well, right? There's a whole community that you're building.

[01:51:30] Swyx: community The Discord is surprisingly good. For the zero effort that I put into it, people just show up, and then they ask really very good questions, they drop things that I don't know, and then I learn from the Discord, and then I talk about it later. But, yeah, Discord has a lot of alpha.

[01:51:47] Swyx: And it's surprising because I have this newsletter that, I have this bot, That summarizes all the top AI discords, right? Obviously the top ones are, like, Eleuther, TheBloke what else?

[01:51:55] Swyx: Yeah, mid, mid, yeah, but it's not, that's not very technical. That's mostly just prompting.

[01:52:00] Swyx: Midrani is 8 million members. That's something like 13 percent of total Discord membership. Ha ha ha ha ha. That's freaking crazy. But anyway, so like the Discord is the community attachment to the podcast and the newsletter. And then it's, people interacting with each other, some people getting jobs, some people getting investments, I have founders coming in and VCs there also funding them.

[01:52:22] Swyx: And like I, I really think that every every piece of content is a minimum viable community, right? People gather, they're chatting in the Twitter space comments right now. They're chatting in your newsletter comment section. But if you let people gather together live, whether it's online or in person we also have in person meetups.

[01:52:40] Swyx: I just had one in Singapore. We have one in San Francisco, I think, monthly.

[01:52:45] Swyx: I hope to have it monthly. And then obviously once a year you get people together for a really big conference where like they put out their best work. So I call this community annealing, right? You have cold community, like podcasts are cold.

[01:52:58] Swyx: Newsletters are cold because they're asynchronous. There's not somebody there, you don't expect to respond to the other person. Twitter spaces are warm because they're live and, there's some chance of live feedback. Discords are live, but when you, when you, when they're hot, it's when like everyone is on the same call and you're looking in each other's eyes.

[01:53:16] Swyx: And you're conversing and you're, you're having like a real bond and relationship there. And so like communities need this whole range of like warm and hot and cold. And I try to build that for Dane Space.

[01:53:28] Alex Volkov: So for folks who are just listening on podcasts, you're missing several parts of the space. Newsletter is definitely worth checking out. Latent. space is actually a URL.

[01:53:38] Swyx: And that was donated by a reader. Not donated. Sold to us for cheap.

[01:53:42] Alex Volkov: You can consider this a donation but also the Discord part speaking of work that I think we need to wrap up because like we're after two hours and I want to let you go back to work. I also need to edit this and send this. I also want to check out the stuff that we did. Any last kind of parting things here?

[01:53:56] Alex Volkov: Maybe let's touch briefly or is that a bigger conversation? How to succeed in SF or is that for a later

[01:54:02] Swyx: Oh yeah, yeah, yeah. Oh man. This is such an interesting topic, especially for people who are not in sf, right?

[01:54:06] Swyx: Yeah. I think SF is a group of humans and not a place, and they are mostly available on Twitter. Yeah. But then sometimes they, they often gather in San Francisco and Yes, when you meet them in person. There are some people that are not famous online or not fully consistently candid online that you talk to them in person and you're like, Oh, okay, I fully understand you now and everything that you've done and everything that you're going to do, I understand where you're coming

[01:54:33] Swyx: from.

[01:54:34] Swyx: And to me, that is obviously a very high offer, that's why I moved here. But you don't have to go there directly, right? One of my mentors And the last one that I want to talk about is in career is Andrew Chen, who basically blogs his way into being a general partner at Andrews and Horowitz.

[01:54:49] Swyx: Like he runs one of their top three funds, the consumer fund. And he consistently is Hey, just Put out your best work, learn in public, tweet a lot and instead of going to all these parties, there's always, there's always a party every week in San Francisco

[01:55:03] Alex Volkov: Every day, multiple stacks a day sometimes, yeah.

[01:55:06] Swyx: There was one Thursday last year with 10 AI meetups in San Francisco.

[01:55:09] Alex Volkov: So

[01:55:10] Swyx: can go through the motions of networking, but you still end up with a smaller network than you would if you stayed at home. And you just wrote a lot, or you thought a lot, or you did quality work. And so then you don't have to be in San Francisco to do that. You can just, you can keep doing that online.

[01:55:27] Swyx: And then, take advantage of a big conference or something to come into San Francisco and actually meet people in person. And that's totally fine. I don't intend to stay in San Francisco forever, right? I have, once I know enough people, I can just come here like once a quarter and people will still think that I'm in San Francisco.

[01:55:41] Swyx: And that's fine.

[01:55:41] Alex Volkov: I get this question quite a lot. I've been here, maybe this is the fourth or fifth time for the past six months, and I get this question, do you live here? I was

[01:55:48] Swyx: Yeah. I think, I think people are just like borders. I, I'm, I'm a border disrespector and I think I hope more people do that. But do come into San Francisco every now and then maybe for a big conference that's happening June 25th to 27th.

[01:56:02] Swyx: But otherwise do great work online and people will notice it and find you and chat with you. And the in person component doesn't matter so much as plugging into the mentality and the community online.

[01:56:12] Alex Volkov: Yeah. SWIX, it's been a surprising interview. I didn't plan on this.

[01:56:15] Alex Volkov: I just thought we're here. I haven't heard you in a while. The anniversary of latency is coming up a huge kudos for this effort. Like huge kudos. Big, big, big, big. Thank you for me because a lot of what the stuff that you did, you and Alessio pulled me through. I, I still get like a bunch of listeners for Thursday.

[01:56:30] Alex Volkov: I, from the Latan space work on Substack. And so a huge thanks for me because you, you kinda shaped. what I'm doing as well. The newsletter and the podcast combo that I forced myself to doing every [01:56:40] week. This was, this was based on the Substack stuff from you as well. And I really appreciate your, your friendship as well.

[01:56:45] Alex Volkov: So thank you for coming up on Thursday. I thank you for hosting us in Latentspace. And with that, I think I'll move on to the last piece of what we have on Thursday, iFolks, which is a recap of everything we've talked about. And then I'll just briefly run through recap and I'll let you go to your day. We haven't, let me just start with the music, obviously, because like, how else would this work?

[01:57:02] Alex Volkov: However, with that, I just want to wish you a great Thursday. Thank you for joining us from week to week. I want to thank the co hosts that I had on stage. Thank you, Nisten. Thank you, Jan. Thank you, LDJ. Far El was here. Alignment was here. Thank you. A huge thank you for Swyx, Alessio, and the Latentspace folks for hosting me here.

[01:57:19] Alex Volkov: A shout out to a bunch of friends in Silicon Valley who I'm gonna meet. And with that, we'll see you next week. I'm gonna go and try to somehow summarize this all in the newsletter and podcast for you. And we'll see you folks next week. From San Francisco. This has been Alex Volkov. Cheers, everyone.

[01:57:34] Alex Volkov: Not this one. Bye bye.