Happy Sunday everyone, I am very excited to bring you this interview with the folks who took LLaMa 2 and made it LLoooooongMa!
Extending LLaMa 2 context window from 4,000 to a whopping 128,000 tokens (Yarn-Llama-2-13b-128k on Hugging Face), these guys also came up with a paper called YaRN (Efficient Context Window Extension of Large Language Models) and showed that YaRN is not only requires 10x less tokens to create these long contexts, but also 2.5x less training steps!
And, the models generalize so there’s now no need to collect extremely long sequences (think books length sequences) for the models to understand those context lengths.
I have decided also to do something different (which took me half of Sunday so I can’t promise and am not committing to this format, but for the premium subscribers, you can now watch this interview with running Karaoke style subtitles and improved audio! This will be uploaded to Youtube in a week but aren’t you glad you subscribed and is getting this first?)
Here’s a teaser preview:
And here’s the chapter for your convenience (the only thing that’s ai generated 😂)
0:00 - Introduction
3:08 - Discussion of extending LLAMA2's context length from 4,000 tokens to 128,000 tokens using the YaRN method
8:23 - Explanation of rope scaling for positional encodings in transformers
13:21 - How the rope scaling idea allows for longer context through positional interpolation
18:51 - Using in-context learning to train models on shorter sequences but still handle long contexts
25:18 - Sourcing long-form data like books to train 128k token models
31:21 - Whether future models will natively support longer contexts
37:33 - New model from Adept with 16k context using rope scaling
42:46 - Attention is quadratic - need better algorithms to make long context usable
49:39 - Open source community pushing state of the art alongside big labs
52:34 - Closing thoughts
Listen to this episode with a 7-day free trial
Subscribe to ThursdAI - Recaps of the most high signal AI weekly spaces to listen to this post and get 7 days of free access to the full post archives.