The Attention Mechanism in Large Language Models

2023 ж. 24 Шіл.

72 362 Рет қаралды

Attention mechanisms are crucial to the huge boom LLMs have recently had.
In this video you'll see a friendly pictorial explanation of how attention mechanisms work in Large Language Models.
This is the first of a series of three videos on Transformer models.
Video 1: The attention mechanism in high level (this one)
Video 2: The attention mechanism with math: • The math behind Attent...
Video 3: Transformer models • What are Transformer M...
Learn more in LLM University! llm.university

Пікірлер

I have been reading the "attention is all you need" paper for like 2 years. Never understood it properly like this ever before😮. I'm so happy now🎉
@arvindkumarsoundarrajan94793 ай бұрын
Your videos in the LLM uni are incredible. Builds up true understanding after watching tons of other material that was all a bit loose on the ends. Thank you!
@RG-ik5kw9 ай бұрын
Best teacher on the internet, thank you for your amazing work and the time you took to put those videos together
@malikkissoum7305 ай бұрын
This is one of the best videos on KZhead to understand ATTENTION. Thank you for creating such outstanding content. I am waiting for upcoming videos of this series. Thank you ❤
@gunjanmimo8 ай бұрын
Truly amazing video! The published papers never bother to explain things with this level of clarity and simplicity, which is a shame because if more people outside the field understood what is going on, we may have gotten something like ChatGPT about 10 years sooner! Thanks for taking the time to make this - the visual presentation with the little animations makes a HUGE difference!
@EricMutta5 ай бұрын
The way you break down these concepts is insane. Thank you
@nealdavar93917 күн бұрын
So glad to see you're still active Luis ! You and Statquest's Josh Stamer really are the backbone of more ml professionals than you can imagine
@apah9 ай бұрын
Just THANK YOU. This is by far the best video on the attention mechanism for people that learn visually
@JyuSubАй бұрын
I appreciate your videos, especially how you can apply a good perspective to understand the high level concepts, before getting too deep into the maths.
@calum.macleod9 ай бұрын
One of the best explainations of attention I have seen without getting lost in the forest of computations. Looking forward to future videoas
@mohandesai9 ай бұрын
- Thank you so much!
  @SerranoAcademy9 ай бұрын
I always struggled with KQV in attention paper. Thanks a lot for this crystal clear explanation! Eagerly looking forward to the next videos on this topic.
@pruthvipatel87208 ай бұрын
best description ever! easy to understand. I've been suffered to understanding attention. Finally I can tell I know it!
@bobae1357Ай бұрын
One of the best intuitions for understanding multi-head attention. Thanks a lot!❣
@aadeshingle75938 ай бұрын
These videos where you explain the transformers are excellent. I have gone through a lot of material however, it is your videos that have allowed me to understand the intuition behind these models. Thank you very much!
@TheMircus2244 ай бұрын
I really enjoyed how you give a clear explanation of the operations and the representations used in attention
@mohameddjilani41096 ай бұрын
This is one of the clearest, simplest and the most intuitive explanations on attention mechanism.. Thanks for making such a tedious and challenging concept of attention relatively easy to understand 👏 Looking forward to the impending 2 videos of this series on attention
@sayamkumar72769 ай бұрын
THE best explanation of this concept. That was genuinely amazing.
@saeed5772 ай бұрын
Thank you for making this video series for the sake of a learner and not to show off your own knowledge!! Great anecdotes and simple examples really helped me understand the key concepts!!
@amoghjain4 ай бұрын
Wow, clearest example yet. Thanks for making this!
@kevon2178 ай бұрын
This is amazingly clear! Thank for your your work!
@RamiroMoyano8 ай бұрын
Fantastic !!! The explanation itself is a piece of art. The step by step approach, the abstractions, ... Kudos!! Please more of these
@ajnbin4 ай бұрын
That's an awesome explanation! Thanks!
@aaalexlit7 ай бұрын
This video helps to explain the concept in a simple way.
@caryjason4171Ай бұрын
Omg this video is on a whole new level . This is prolly the best intuition behind the transformers and attention. Best way to understand. I went thro' a couple of videos online and finally found the best one . Thanks a lot ! Helped me understand the paper easily
@anipacify11632 ай бұрын
best explanation of embeddings I've seen, thank you!
@karlbooklover9 ай бұрын
Thank you so much for making these videos!
@notprof7 ай бұрын
Nicely done! This gives a great explanation of the function and value of the projection matrices.
@dr.mikeybee9 ай бұрын
Hey Louis, you are AMAZING! Your explanations are incredible.
@soumen_das8 ай бұрын
Great explanation. After watching a handful of videos this one really makes it real easy to understand.
@arulbalasubramanian94746 ай бұрын
El mejor video que he visto sobre la materia. Muchísimas gracias por este gran trabajo.
@JorgeMartinez-xb2ks5 ай бұрын
This is the most amazing video on "Attention is all you need"
@satvikparamkusham74548 ай бұрын
this video is really teaching you the intuition. much better than the others I went through that just throw formula to you. thanks for the great job!
@docodemo7275 ай бұрын
This is such a good, clear and concise video. Great job!
@ccgarciabАй бұрын
amazing explanation Luis. Can't thank you enough for your amazing work. You have a special gift to explain things. Thanks.
@abu-yousuf5 ай бұрын
Incredible explanation. Thank you so much!!!
@debarttasharan9 ай бұрын
Amazing! Loved it! Thanks a lot Serrano!
@prashant56118 ай бұрын
Deep respect, Luis Serrano! Thank you so much!
@dragolov9 ай бұрын
If I understand correctly, the transformer is basically a RNN model which got intercepted by bunch of different attention layers. Attention layers redo the embeddings every time when there is a new word coming in, the new embeddings are calculated based on current context and new word, then the embeddings will be sent to the feed forward layer and behave like the classic RNN model.
@hyyue75494 ай бұрын
This is wonderful !!
@thelookerful8 ай бұрын
Wooow thanks so much. You are a treasure to the world. Amazing teacher of our time.
@agbeliemmanuel60239 ай бұрын
I watched a lot about attentions. You are the best. Thank you thank you. I am also learning how to explain of a subject from you 😊
@orcunkoraliseri9214Ай бұрын
Kudos to your efforts in clear explanation!
@pranayroy2 ай бұрын
The most easy to understand video for the subject I've seen.
@sari547544 ай бұрын
This is a great video (as are the other 2) but one thing that needs to be clarified is that the embeddings themselves do not change (by attention @10:49). The gravity pull analogy is appropriate but the visuals give the impression that embedding weights change. What changes is the context vector.
@drdr34962 ай бұрын
Well the gravity example is how I understood this after a long time. you are true legend.
@kafaayari9 ай бұрын
Explained very well. Thank you so much.
@bananamaker48775 ай бұрын
Great explanation. Thank you very much for sharing this.
@erickdamasceno9 ай бұрын
What a great explanation on this topic! Great job!
@justthefactspleaseАй бұрын
You're my fav teacher. Thank you Luis 😊
@perpetuallearner82579 ай бұрын
This is an great explanation of attention mechanism . I have enjoyed your maths for machine learning on coursera. Thank you for creating such wonderful videos
@tvinay87589 ай бұрын
Yeah!!!! Looking forward to the second one!! 👍🏻😎
@bengoshi49 ай бұрын
Thanks for the amazing videos! I am eagrly waiting for the third video. If possible please do explain the bit how the K,Q,V matrices are used on the decoder side. That would be great help.
@bankawat17 ай бұрын
What a great video man!!! Thanks for making such videos.
@alijohnnaqvi63832 ай бұрын
amazing, love your channel. It's certainly underrated.
@davutumut14699 ай бұрын
Luis Serrano you have a gift for explain! Thank you for sharing!
@LuisOtte-pk4wd3 ай бұрын
This clarifies EMBEDDED matrices : - In particular the point on how a book isn't just a RANDOM array of words, Matrices are NOT a RANDOM array of numbers - Visualization for the transform and shearing really drives home the V, Q, K aspect of the attention matrix that I have been STRUGGLING to internalize Big, big thanks for putting together this explanation!
@MikeTon3 ай бұрын
It's so great, I finally understand these qkvs, it bothers me so long. Thank you so much !!!
@user-dg2gt2yq3cАй бұрын
I did not even realize this video is 21 minutes long. Great explanation.
@DeepakSharma-xg5nuАй бұрын
Outstanding video. Amazing to gain intuition.
@ignacioruiz3732Ай бұрын
You are great at teaching Mr. Luis
@vishnusharma_79 ай бұрын
This was great - really well done!
@jeffpatrick7874 ай бұрын
you are a great teacher. Thank you
@maysammansor2 ай бұрын
Very impressed with this channel and presenter
@cyberpunkbuilds2 ай бұрын
I subscribe your channel immediately after watching this video, the first video I watch from your channel but also the first making me understand why embedding needs to be multiheaded. 👍🏻👍🏻👍🏻👍🏻
@hkwong745313 ай бұрын
Great video and very intuitive explenation of attention mechanism
@eddydewaegeneer951420 күн бұрын
Amazing explanation 🎉
@SulkyRain4 ай бұрын
Excellent description.
@drintro3 ай бұрын
Very well explained ❤
@traveldiaries3476 ай бұрын
Wooow. Such a good explanation for embedding. Thanks 🎉
@orcunkoraliseri9214Ай бұрын
This video is really clear!
@user-uq7kc2eb1i4 ай бұрын
Your videos are so awesome plse upload more video thanks a lot
@ProgrammerRajaa9 ай бұрын
Brilliant explanation.
@khameelmustapha9 ай бұрын
Wow wow wow! I enjoyed the video. Great teaching sir❤❤
@jayanthkothapalli9.2Ай бұрын
Thanks a lot Sir, clearly understood.
@surajprasad87414 ай бұрын
super good job guys!
@naimsassine4 ай бұрын
Excellent job
@serkansunel2 ай бұрын
Amazing explanation Luis! As always...
@WhatsAI9 ай бұрын
- Merci Louis! :)
  @SerranoAcademy9 ай бұрын
Thanks my friend.
@muhammadsaqlain37205 ай бұрын
Great video!
@EigenA4 ай бұрын
oh my god never understood V,K,Q as matrix transformations, thanks luis, love from india
@TemporaryForstudy8 ай бұрын
Amazing
@preetijani96585 ай бұрын
wonderful!
@ernesttan80904 ай бұрын
My comment is just an array of letters for our algorithmic gods..Good stuff.
2 ай бұрын
13:32 "feel free to pause the video" reminds me of Chess KZheadr agadmator 🤣
@BigAsciiHappyStar2 күн бұрын
You are amazing !
@junaidfayaz83232 ай бұрын
thank you sir 🙏, love from india💌
@shashankshekharsingh93365 күн бұрын
First of all thank you for making these great walkthroughs of the architecture. I would really like to support your effort on this channel. let me know how I can do that. thanks
@sukhpreetlotey1172Ай бұрын
- Thank you so much, I really appreciate that! Soon I'll be implementing subscriptions, so you can subscribe to the channel and contribute (also get some perks). Please stay tuned, I'll publish it here and also on social media. :)
  @SerranoAcademyАй бұрын
Fantastic.
@liminal68239 ай бұрын
7:00 even with word embedding, words can be missing context and there’s no way to tell like the word apple. Are you taking about the company or the fruit? Attention matches each word of the input with every other word, in order to transform it or pull it towards a different location in the embedding based on the context. So when the sentence is “buy apple and orange” the word orange will cause the word apple to have an embedding or vector representation that’s closer to the fruit 8:00
@mostinho74 ай бұрын
The great Luis!
@samirelzein10959 ай бұрын
Thanks. I saw also your "Math behind" video, but still missing the third in the series.
@bravulo5 ай бұрын
- Thanks! The third video is out now! kzhead.info/sun/pMWQfbORnWaonHA/bejne.html
  @SerranoAcademy4 ай бұрын
love the video
@angminhquan149112 күн бұрын
oh, thank you very much, it's a lot better than simpy talking about that paper without really explaining that
@Tony-tu8uz9 ай бұрын
¡Gracias!
@s.chandrasekhar82906 ай бұрын
- Muchisimas gracias por tu colaboración!!! Que amable!
  @SerranoAcademy6 ай бұрын
Thanks for your great effort to make people understand it. I, however, would like ask one thing such that you have explained V is the scores. scores of what? My opninion is that the V is the key vector so that the V makes QKT matrix to vector space again. Please make it clear for better understanding. Thanks!
@today-radio-in-the-zone4 күн бұрын
Thanks so much for making this video. It's difficult to find people explaining these concepts on a higher level. One thing I missed was how we are able to have two different apples in the matrix. If something like this is possible then I'm guessing we have several instances every single word floating around - the ones with several different contextual potentials very scattered in the matrix, while the ones without so much variation in meaning closer together. So is this process where the positions of the words in the matrix is reevaluated based on the "gravitational pull" from the associations of the other words in the sentence also a process deciding whether or not to continue to use an existing instance of the word or to create an entirely new version of the word in a new position in the matrix?
@tristanwheeler23007 ай бұрын
thanks for this amazing video. i have a question. will you consider to make a video of teaching how t create videos like yours? the softwares and all from 0. it will be most helpfull. or at least please reply the softwares that you use for making these kind of beautiful animated presentation. thanks.
@AlirezaGolabvand4 ай бұрын
amazing explanation! What software is used to make the visuals (graphs, transformations etc.) Thanks!
@ramelgov78913 ай бұрын
- Thank you so much! I use Keynote for the slides.
  @SerranoAcademy3 ай бұрын
Unless I'm mistaken, I think the linear transformations in this video incorrectly show the 2D axis as well as the object changing position, but in fact the 2D axis would stay exactly the same but with the 2D object rotating around it for example.
@benhargreaves55563 ай бұрын
At last someone explained the meaning of Q, K and V. I read original article and it just says "Ok, let's have 3 additional matrix Q, K and V to transform input embedding" ... What for? Thanks for explanation, this video really helps!
@SergeyGrebenkin26 күн бұрын