Hey everyone, we have an exciting interview today with Maxime Labonne.
Maxime is a senior Machine Learning Scientist at JPMorgan, the author of Hands on GNNs book and his own ML Blog, creator of LazyMergeKit (which we cover on the pod) and holds a PHD in Artificial Intelligence from the Institut Polytechnique de Paris.
Maxime has been mentioned on ThursdAI a couple of times before, as he released the first Phi mixture-of-experts, and has previously finetuned OpenHermes using DPO techniques which resulted in NeuralChat7B
For the past couple of months, following AI on X, it was hard not to see Maxime's efforts show up on the timeline, and one of the main reasons I invited Maxime to chat was the release of NeuralBeagle7B, which at the time of writing was the top performing 7B model on the LLM leaderboard, and was specifically a merge of a few models.
Model merging has been around for a while but recently has been heating up, and Maxime has a lot to do with that, as he recently checked, and his wrapper on top of MergeKit by Charles Goddard (which is the library that put model merging into the mainstream) called LazyMergeKit was in charge of >50% of the merged models on HuggingFace hub leaderboard.
Maxime also authored a model merging blogpost on Hugging Face and wrote quite a few articles and shared code that helped others to put merged models out.
Modern day Alchemy
This blogpost is a great resource on what model merging actually does, so I won't go into depth of what the algorithms are, please refer to that if you want a deep dive, but in a nutshell, model merging is a technique to apply algorithms to the weights of a few models, even a few instances of the same model (like Mistral7B) and create a new model, that often performs better than the previous ones, without additional training!
Since this is algorithmic, it doesn't require beefy GPUs burning power to keep training or finetuning, and since the barrier of entry is very low, we get some cool and crazy results as you'll see below.
Yeah, quite crazy as it sounds, this method can also create models of non standard sizes, like 10B or 120B models, since it's slicing pieces of other models and stitching them together in new ways.
If you recall, we had a deep dive with Jon Durbin who released Bagel, and Jon specifically mentioned that he created Bagel (based on everything everywhere all at once) as a good base for merges, that will include all the prompt formats, you can read and listen to that episode here
This merge frenzy, made HuggingFace change the leaderboard, and add a checkbox that hides model merges, because they are flooding the leaderboard, and often, and require much smaller effort than actually pre-training or even finetuning a model
And quite often the top of the leaderboard was overrun with model merges like in this example of Bagel and it's merges by CloudYu (which are not the top ones but still in the top 10 as I write this)
ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
On why it works?
Nisten summarized this pretty well in this now famous copypasta tweet and I've confirmed with Maxime that this is his current understanding as well, it's quite unclear why this seems to perform so well, but it of course doesn't stop the "folks who look for AI Waifus" to keep merging.
Following folks like Nathan Lambert from interconnects.ai to start paying attention even though he didn't want to! (
Still waiting on your writeup Nathan!)
UPDATE: As of today Monday Jan 29th,just released a super comprehensive deep dive into merges, which you can read here 👇👏
YALL + Automated LLM Evaluation
Maxime as also worked on so many models of his own, that he built a convenient little tracking leaderboard to track their performance, which he called YALL, Yet Another LLM Leaderboard and it's on HuggingFace.
You can see that NeuralBeagle is the top dog (sorry, I literally could not resist)
It uses the Nous evaluations, and Maxime has created an automation called LLM AutoEval that makes it really simple to run evaluations, which you can run in a Colab super easily.
LLM AutoEval is on Github.
Since chatting, Maxime has released a Colab and later a HuggingFace space that takes models names, and shows the genealogy, nay, Merge-aology of the models, which models it was merged from and it's pretty crazy how deep this rabbit hole goes, and crazier even still that these models perform very well after all of these lobotomies!
Try it out here: https://huggingface.co/spaces/mlabonne/model-family-tree
I really hope you enjoy this special deep dive, I definitely learned a BUNCH from this conversation with Maxime, and I'm very happy that he came on!