ThursdAI - Recaps of the most high signal AI weekly spaces
ThursdAI - The top AI news from the past week
🎙️ThursdAI - LLM Finetuning deep dive, current top OSS LLMs (Platypus 70B, OrctyPus 13B) authors & what to look forward to
Preview
0:00
-52:26

🎙️ThursdAI - LLM Finetuning deep dive, current top OSS LLMs (Platypus 70B, OrctyPus 13B) authors & what to look forward to

Excited to bring you this exclusive 2 part interview, with the authors and open source contributors of the worlds top performing open source models. Episode is free while full transcript is for paid.

Brief outline for your convenience:

[00:00] Introduction by Alex Volkov

[06:00] Discussing the Platypus models and data curation process by Ariel, Cole and Nathaniel

[15:00] Merging Platypus with OpenOrca model by Alignment Labs

  • Combining strengths of Platypus and OpenOrca

  • Achieving state-of-the-art 13B model

[40:00] Mixture of Experts (MOE) models explanation by Prateek and Far El

[47:00] Ablation studies on different fine-tuning methods by Teknium

Full transcript is available for our paid subscribers 👇 Why don’t you become one?

I want to thank you for reading / listening and I hope you enjoy this one. Sundays will be our special episode time, this one took a lot of fixing, editing, I must have deleted 500 uhhms and uhs 😅

Here’s a list of folks and models that appear in this episode:

OpenOrca + Platypus = OrcaPlaty!

I am recording this on August 18th, which marks the one month birthday of the Lama 2 release from Meta. It was the first commercially licensed large language model of its size and quality, and we want to thank the great folks at MetaAI. Yann LeCun, BigZuck and the whole FAIR team. Thank you guys. It's been an incredible month since it was released.

We saw a Cambrian explosion of open source communities who make this world better, even since Lama 1. For example, LLaMa.Cpp by Georgi Gerganov is such an incredible example of how open source community comes together and this one guy in the weekend Took the open source weights and made it run on CPUs and much, much faster.

Mark Zuckerberg even talked about this, how amazing the open source community has adopted LLAMA, and that Meta is also now adopting many of those techniques and developments back to run their own models cheaper and faster. And so it's been exactly one month since LLAMA 2 was released.

And literally every ThursdAI since then, we have covered a new state of the art open source model all based on Lama 2 that topped the open source model charts on Hugging Face.

Many of these top models were fine tuned by Discord organizations of super smart folks who just like to work together in the open and open source their work.

Many of whom are great friends of the pod.

Nous Research, with whom we've had a special episode a couple of weeks back Teknium1 seems to be part of every orgm Alignment Labs and GarageBaind being the last few folks topping the charts.

I'm very excited not to only bring you an interview with Alignment Labs and GarageBaind, but also to give you a hint of two additional very exciting efforts that are happening in some of these discords.

I also want to highlight how many of those folks do not have data scientist backgrounds. Some of them do. So we had a few PhDs or PhD studies folks, but some of them studied all this at home with the help of GPT 4. And some of them even connected via ThursdAI community and space, which I'm personally very happy about.

So this special episode has two parts. The first part we're going to talk with Ariel. Cole and Natniel, currently known as GarageBaind, get it? bAInd, GarageBaind, because they're doing AI in their garage. I love it.

🔥 Who are now holding the record for the best performing open source model called Platypus2-70B-Instruct.

And then, joining them is Austin from Alignment Labs, the authors of OpenOrca, also a top performing model, will talk about how they've merged and joined forces and trained the best performing 13b model called Open Orca Platypus 13B or Orctypus 13B

This 13b parameters model comes very close to the Base Llama 70b. So, I will say this again, just 1 month after Lama 2 released by the great folks at Meta, we now have a 13 billion parameters model, which is way smaller and cheaper to run that comes very close to the performance benchmarks of a way bigger, very expensive to train and run 70B model.

And I find it incredible. And we've only just started, it's been a month. And so the second part you will hear about two additional efforts, one run by Far El, Prateek and Alpay from the SkunksWorks OS Discord, which is an effort to bring everyone an open source mixture of experts model, and you'll hear about what mixture of experts is.

And another effort run by a friend of the pod Teknium previously a chart topper himself with Nous Hermes models and many others, to figure out which of the fine tuning methods are the most efficient. and fast and cheap to run. You will hear several mentions of LORAs, which stand for Low Rank Adaptation, which are basically methods of keeping the huge weights of LAMA and other models frozen and retrain and fine tune and align some specific parts of it with new data, which is a method we know from Diffusion World.

And it's now applying to the LLM world and showing great promise in how fast, easy, and cheap it is to fine tune these huge models with significantly less hardware costs and time. Specifically, Nataniel Ruiz, the guy who helped Ariel and Cole to train Platypus, the co-author on DreamBooth, StyleDrop and many other diffusion methods, mentioned that it takes around five hours on a single A100 GPU to fine tune the 13B parameter model. That, if you can find an A100 GPU, that's around $10.

That's incredible.

I hope you enjoy listening and learning from these great folks, and please don’t forget to checkout our website at thursdai.news for all the links, socials and podcast feeds.


Brief outline for your convinience:

[00:00] Introduction by Alex Volkov

[06:00] Discussing the Platypus models and data curation process by Ariel, Cole and Nathaniel

[15:00] Merging Platypus with OpenOrca model by Alignment Labs

  • Combining strengths of Platypus and OpenOrca

  • Achieving state-of-the-art 13B model

[40:00] Mixture of Experts (MOE) models explanation by Prateek and Far El

[47:00] Ablation studies on different fine-tuning methods by Teknium

Full transcript is available for our paid subscribers 👇 Why don’t you become one?

Listen to this episode with a 7-day free trial

Subscribe to ThursdAI - Recaps of the most high signal AI weekly spaces to listen to this post and get 7 days of free access to the full post archives.

ThursdAI - Recaps of the most high signal AI weekly spaces
ThursdAI - The top AI news from the past week
Every ThursdAI, Alex Volkov hosts a panel of experts, ai engineers, data scientists and prompt spellcasters on twitter spaces, as we discuss everything major and important that happened in the world of AI for the past week.
Topics include LLMs, Open source, New capabilities, OpenAI, competitors in AI space, new LLM models, AI art and diffusion aspects and much more.
Listen on
Substack App
Apple Podcasts
Spotify
YouTube
Overcast
Pocket Casts
RSS Feed
Appears in episode
Alex Volkov
Ariel N. Lee
Autometa
Cole Hunter
Far El
Nataniel Ruiz
Prateek Yadav
Teknium
yam