A little guide to building Large Language Models in 2024
A little guide through all you need to know to train a good performance large language model in 2024.
This is an introduction talk with link to references for further reading.
This is the first video of a 2 part series:
- Video 1 (this video): covering all the concepts to train a good performance LLM in 2024
- Video 2 (next video): hands-on applying all these concepts with code example
This video is adapted from a talk I gave in 2024 at a AI/ML winter school for graduate student. When I shared the slides online people kept asking for a recording of the unrecorded class so I decided to spend a morning recording it to share it more widely along the slides.
Link to the slides: docs.google.com/presentation/...
Chapters:
00:00:00 Intro
00:00:59 Workflow for LLMs
Part 1: Training: data
00:01:17 Data preparation - intro and good recent ressources on data preparation
00:05:28 A web scale pretraining corpus - goals and challenges
00:11:29 Web scale data sources - Focus on recent datasets
00:18:01 Language, and quality filtering
00:24:34 Diving in data deduplication
00:27:40 Final data preparation for training
00:31:31 How to evaluate data quality at scale
00:36:29 The datatrove and lighteval libraries
Part 2: Training: modeling
00:38:18 Introduction in modeling technics for LLM training
00:39:09 When the model is too big: parallelism
00:40:00 Data parallelism
00:41:18 Tensor parallelism
00:44:38 Pipeline parallelism
00:47:00 Sequence parallelism and references on 4D parallelism
00:47:52 Synchronisation: GPU-CPU and GPU-GPU challenges
00:52:14 Flash attention v1 and v2
00:56:23 Stable training recipes
00:59:12 New architectures: Mixture-of-experts
01:03:13 New architectures: Mamba
01:04:49 The nanotron library
Part 3: Fine-tuning: RLHF and alignement
01:06:15 RLHF in 2024
01:08:23 PPO, DPO and REINFORCE
Part 4: Fast inference techniques
01:11:23 Quantization, speculative decoding and compilation: overview and ressources
End
01:14:36 Sharing your model, datasets and demo - final words
This is why I love youtube. Getting to hear the thoughts of the CSO of one of the hottest startups around! Thomas, I'll be at the HuggingFace x Mixtral hackathon in Paris next month, hope to see you there!
Thanks for posting this. Lots of customers have been asking us how they can understand the process of creating LLMs
Thank you so much for this extensive overview of the complete pipeline on LLM training and inference.
Thank you for this! A very good introduction to the whole LLM training ecosystem for beginners.
Brilliant lecture! please, continue recording and sharing your knowledge, it's invaluable resource for everyone in this field.
Thank you very much for your effort..Awaiting for Video 2.
Thank you for sharing this amazing video!
Brilliant lecture! Just so much information and insights! Thanks a lot for this!
This was wonderful, spending this much time on talking data preparation is key!
Very insightful. Thank you for sharing.
Really insightful 🔥🔥🔥
Thank you, Thom.
Thanks so much!!! Much appreciated.
This is really helpful! Thank you very much.
Thank you for this video
Gold 🥇🥇🥇
Merci beaucoup Thomas!!
Very interesting, thank you.
Amazing!
Thanks a lot for this. Nanotron is really useful
amazing lecture
Thank you so much :)
Great video! When is the second one coming out?
很不错的视频
🎉❤
What has become of the retentive network architecture which was touted as alternative for transformers? Why have no published LLMs been trained using it?
slides link? :)
33:23, what's his example of the noisier dataset? It sounds like he's saying "Zopalé" or something 😄
"The Pile" - it's on the slide...
@@clray123 Hah! Duh -- thank you 😅