Transformers for beginners | What are they and how do they work

2023 ж. 29 Мау.
27 504 Рет қаралды

Over the past five years, Transformers, a neural network architecture, have completely transformed state-of-the-art natural language processing.
*************************************************************************
For queries: You can comment in comment section or you can mail me at aarohisingla1987@gmail.com
*************************************************************************
The encoder takes the input sentence and converts it into a series of numbers called vectors, which represent the meaning of the words. These vectors are then passed to the decoder, which generates the translated sentence.
Now, the magic of the transformer network lies in how it handles attention. Instead of looking at each word one by one, it considers the entire sentence at once. It calculates a similarity score between each word in the input sentence and every other word, giving higher scores to the words that are more important for translation.
To do this, the transformer network uses a mechanism called self-attention. Self-attention allows the model to weigh the importance of each word in the sentence based on its relevance to other words. By doing this, the model can focus more on the important parts of the sentence and less on the irrelevant ones.
In addition to self-attention, transformer networks also use something called positional encoding. Since the model treats words as individual entities, it doesn't have any inherent understanding of word order. Positional encoding helps the model to understand the sequence of words in a sentence by adding information about their position.
Once the encoder has calculated the attention scores and combined them with positional encoding, the resulting vectors are passed to the decoder. The decoder uses a similar attention mechanism to generate the translated sentence, one word at a time.
Transformers are the model behind GPT, BERT, and T5
#transformers #naturallanguageprocessing #nlp

Пікірлер
  • This is the only video around that REALLY EXPLAINS the transformer! I immensely appreciate your step by step approach and the use of the example. Thank you so much 🙏🙏🙏

    @lyeln@lyeln3 ай бұрын
    • Glad it was helpful!

      @CodeWithAarohi@CodeWithAarohi3 ай бұрын
    • exactly

      @eng.reemali9214@eng.reemali921415 күн бұрын
  • Very well explained ! I can instantly grab the concept ! Thank you Miss !

    @exoticcoder5365@exoticcoder536510 ай бұрын
    • Glad it was helpful!

      @CodeWithAarohi@CodeWithAarohi10 ай бұрын
  • Very nice high level description of Transformer

    @mdfarhadhussain@mdfarhadhussain4 ай бұрын
    • Glad you think so!

      @CodeWithAarohi@CodeWithAarohi4 ай бұрын
  • Great explanation! Keep uploading such nice informative content.

    @aditichawla3253@aditichawla32535 ай бұрын
    • Thank you, I will

      @CodeWithAarohi@CodeWithAarohi5 ай бұрын
  • Accidentally I came across this video, very well explained. You are doing an excellent job .

    @PallaviPadav@PallaviPadavАй бұрын
    • Glad it was helpful!

      @CodeWithAarohi@CodeWithAarohiАй бұрын
  • Hello and Thank you so much. 1 question: I don't realize where the numbers in word embedding and positional encoding come from?

    @user-mv5bo4vf2v@user-mv5bo4vf2v5 ай бұрын
  • This is a fantastic, Very Good explanation. Thank you so much for good explanation

    @servatechtips@servatechtips10 ай бұрын
    • Glad it was helpful!

      @CodeWithAarohi@CodeWithAarohi10 ай бұрын
  • best explanation i saw multiple video but this provide the clear concept keep it up

    @imranzahoor387@imranzahoor3873 ай бұрын
    • Glad to hear that

      @CodeWithAarohi@CodeWithAarohi3 ай бұрын
  • Thank you so much

    @_Who_u_are@_Who_u_are12 сағат бұрын
  • Just amazing explanation 👌

    @MAHI-kj5tg@MAHI-kj5tg6 ай бұрын
    • Thanks a lot 😊

      @CodeWithAarohi@CodeWithAarohi6 ай бұрын
  • Well explained. before watching this video i was very confused in understanding how transformers works but your video helped me alot

    @VishalSingh-wt9yj@VishalSingh-wt9yj4 ай бұрын
    • Glad my video is helpful!

      @CodeWithAarohi@CodeWithAarohi4 ай бұрын
  • I had watched 3 or 4 videos about transformers before this tutorial. Finally, this tutorial made me understand the concept of transformers. Thanks for your complete and clear explanations and your illustrative example. Specially, your description about query, key and value was really helpful.

    @MrPioneer7@MrPioneer76 күн бұрын
    • You're very welcome!

      @CodeWithAarohi@CodeWithAarohi6 күн бұрын
  • Best video ever explaining the concepts in really lucid way maam,thanks a lot,pls keep posting,i subscribed 😊🎉

    @user-dl4jq2yn1c@user-dl4jq2yn1c11 күн бұрын
    • Thanks and welcome

      @CodeWithAarohi@CodeWithAarohi10 күн бұрын
  • Its great. I have only one query as whats the input of the masked multi-head attention as its not clear to me kindly guide me about it?

    @user-kx1nm3vw5s@user-kx1nm3vw5s8 күн бұрын
  • Great Explanation, Thanks

    @BharatK-mm2uy@BharatK-mm2uy2 ай бұрын
    • Glad it was helpful!

      @CodeWithAarohi@CodeWithAarohiАй бұрын
  • Very well explained, even with such niche viewer base, keep making more of these please

    @harshilldaggupati@harshilldaggupati9 ай бұрын
    • Thank you, I will

      @CodeWithAarohi@CodeWithAarohi9 ай бұрын
  • Thanks for making such an informative video. Please could you make a video on the transformer for image classification or image segmentation applications.

    @vimalshrivastava6586@vimalshrivastava658610 ай бұрын
    • Will cover that soon

      @CodeWithAarohi@CodeWithAarohi10 ай бұрын
  • excellent explanation madam... thank you so much

    @pandusivaprasad4277@pandusivaprasad42774 ай бұрын
    • Thanks and welcome

      @CodeWithAarohi@CodeWithAarohi4 ай бұрын
  • Hello Ma’am Your AI and Data Science content is consistently impressive! Thanks for making complex concepts so accessible. Keep up the great work! 🚀 #ArtificialIntelligence #DataScience #ImpressiveContent 👏👍

    @soravsingla6574@soravsingla65747 ай бұрын
    • Thank you!

      @CodeWithAarohi@CodeWithAarohi7 ай бұрын
  • thank you very much for explaining and breaking it down 😀 comparatively so far, your explanation is easy to understand compared to other channels thank you very much for making this video and sharing to everyone❤

    @satishbabu5510@satishbabu551012 күн бұрын
    • Glad it was helpful!

      @CodeWithAarohi@CodeWithAarohi12 күн бұрын
  • Thanks. Concept explained very well. Could you please add one custom example (e.g finding similarity questions)using Transformers?

    @thangarajerode7971@thangarajerode797110 ай бұрын
    • Will try

      @CodeWithAarohi@CodeWithAarohi10 ай бұрын
  • Very Good Video Ma'am, Love from Gujarat, Keep it up

    @vasoyarutvik2897@vasoyarutvik28976 ай бұрын
    • Thanks a lot

      @CodeWithAarohi@CodeWithAarohi6 ай бұрын
  • Thank you. The concept has been explained very well. Could you please also explain how these query, key and value vectors are calculated?

    @akshayanair6074@akshayanair607410 ай бұрын
    • Sure, Will cover that in a separate video.

      @CodeWithAarohi@CodeWithAarohi10 ай бұрын
  • Ma'am, we are eagerly hoping for a comprehensive Machine Learning and Computer Vision playlist. Your teaching style is unmatched, and I truly wish your channel reaches 100 million subscribers! 🌟

    @AbdulHaseeb091@AbdulHaseeb091Ай бұрын
    • Thank you so much for your incredibly kind words and support!🙂 Creating a comprehensive Machine Learning and Computer Vision playlist is an excellent idea, and I'll definitely consider it for future content.

      @CodeWithAarohi@CodeWithAarohiАй бұрын
  • The best explanation of transformer that I have got on the internet , can you please make a detailed long video on transformers with theory , mathematics and more examples. I am not clear about linear and softmax layer and what is done after that , how training happens and how transformers work on the test data , can you please make a detailed video on this?

    @sahaj2805@sahaj28052 ай бұрын
    • I will try to make it after finishing the pipelined work.

      @CodeWithAarohi@CodeWithAarohi2 ай бұрын
    • @@CodeWithAarohi Thanks will wait for the detailed transformer video :)

      @sahaj2805@sahaj28052 ай бұрын
  • Nice explanation Ma'am.

    @TheMayankDixit@TheMayankDixit7 ай бұрын
    • Thank you! 🙂

      @CodeWithAarohi@CodeWithAarohi7 ай бұрын
  • excellent explanation

    @bijayalaxmikar6982@bijayalaxmikar69824 ай бұрын
    • Glad you liked it!

      @CodeWithAarohi@CodeWithAarohi4 ай бұрын
  • Very well explained

    @soravsingla6574@soravsingla65747 ай бұрын
    • Thanks for liking

      @CodeWithAarohi@CodeWithAarohi7 ай бұрын
  • Great Explanation mam

    @debarpitosinha1162@debarpitosinha1162Ай бұрын
    • Glad you liked it

      @CodeWithAarohi@CodeWithAarohiАй бұрын
  • Can you also talk about the purpose of the 'feed forward' layer. looks like its only there to add non-linearity. is that right?

    @_seeker423@_seeker4233 ай бұрын
    • Yes you can say that..but mayb also for make key, quarry and value trainable

      @abirahmedsohan3554@abirahmedsohan35542 ай бұрын
  • Really very nice explanation ma'am!

    @minalmahala5260@minalmahala5260Ай бұрын
    • Glad my video is helpful!

      @CodeWithAarohi@CodeWithAarohiАй бұрын
  • Thanks Aaroh i😇

    @manishnayak9759@manishnayak97596 ай бұрын
    • Glad it helped!

      @CodeWithAarohi@CodeWithAarohi6 ай бұрын
  • maam can you please make one video of classification using multi-head attention with custom dataset

    @mahmudulhassan6857@mahmudulhassan68579 ай бұрын
    • Will try

      @CodeWithAarohi@CodeWithAarohi9 ай бұрын
  • can you please upload the presentation

    @burerabiya7866@burerabiya78663 ай бұрын
  • Can you please make a detailed video explaining the Attention is all you need research paper line by line, thanks in advance :)

    @sahaj2805@sahaj28052 ай бұрын
    • Noted!

      @CodeWithAarohi@CodeWithAarohi2 ай бұрын
  • Question about query, key, value dimensionality Given that query is a word that is looking for other words to pay attention to key is a word that is being looked at by other words shouldn't query and word be a vector of size the same as number of input tokens? so that when there is a dot product between query and key the word that is querying can be correctly (positionally) dot product'd with key and get the self attention value for the word?

    @_seeker423@_seeker4233 ай бұрын
    • The dimensionality of query, key, and value vectors in transformers is a hyperparameter, not directly tied to the number of input tokens. The dot product operation between query and key vectors allows the model to capture relationships and dependencies between tokens, while positional information is often handled separately through positional embeddings.

      @CodeWithAarohi@CodeWithAarohi3 ай бұрын
  • Can you please let us know I/p for mask multi head attention. You just said decoder. Can you please explain. Thanks

    @user-wh8vy9ol8w@user-wh8vy9ol8w14 күн бұрын
  • Could you make a video on image classification for vision transformer, madam ?

    @palurikrishnaveni8344@palurikrishnaveni834410 ай бұрын
    • Sure, soon

      @CodeWithAarohi@CodeWithAarohi10 ай бұрын
  • Great Video ma'am could you please clarify what you said at 22:20 once again... I think there was a bit confusion there.

    @sukritgarg3175@sukritgarg31752 ай бұрын
    • same here

      @AyomideFagoroye-oe2hd@AyomideFagoroye-oe2hd28 күн бұрын
  • Didn't understand what is the input to the masked multi head self attention layer in the decoder, Can you please explain me?

    @tss1508@tss15086 ай бұрын
    • In the Transformer decoder, the masked multi-head self-attention layer takes three inputs: Queries(Q), Keys(K) and Values(V) Queries (Q): These are vectors representing the current positions in the sequence. They are used to determine how much attention each position should give to other positions. Keys (K): These are vectors representing all positions in the sequence. They are used to calculate the attention scores between the current position (represented by the query) and all other positions. Values (V): These are vectors containing information from all positions in the sequence. The values are combined based on the attention scores to produce the output for the current position. The masking in the self-attention mechanism ensures that during training, a position cannot attend to future positions, preventing information leakage from the future. In short, the masked multi-head self-attention layer helps the decoder focus on relevant parts of the input sequence while generating the output sequence, and the masking ensures it doesn't cheat by looking at future information during training.

      @CodeWithAarohi@CodeWithAarohi6 ай бұрын
  • thank you mam

    @niluthonte45@niluthonte457 ай бұрын
    • Most welcome 😊

      @CodeWithAarohi@CodeWithAarohi7 ай бұрын
  • hello maa is this transform concept same for transformers in NLP?

    @techgirl6451@techgirl64516 ай бұрын
    • The concept of "transform" in computer vision and "transformers" in natural language processing (NLP) are related but not quite the same.

      @CodeWithAarohi@CodeWithAarohi6 ай бұрын
  • how to get pdfs mam

    @user-gf7kx8yk9v@user-gf7kx8yk9v7 ай бұрын
  • Can you please make a video on bert?

    @KavyaDabuli-ei1dr@KavyaDabuli-ei1dr3 ай бұрын
    • I will try!

      @CodeWithAarohi@CodeWithAarohi3 ай бұрын
  • Could you explain with python code which would be more practical. Thanks for sharing your knowledge

    @kadapallavineshnithinkumar2473@kadapallavineshnithinkumar247310 ай бұрын
    • Sure, will cover that soon.

      @CodeWithAarohi@CodeWithAarohi10 ай бұрын
  • can you please explain 22:07 onward

    @akramsyed3628@akramsyed36286 ай бұрын
  • I thought it's transformers in CV. all explanations were in NLP

    @saeed577@saeed5773 ай бұрын
    • I recommend you to understand this video first and then check this video: kzhead.info/sun/p8-Tfc5pjX16bKs/bejne.html After watching these 2 videos, you will understand properly the concept of transformers used in computer vision. Transformers in CV are based on the idea of transformers in NLP. SO its better for understanding if you learn the way I told you.

      @CodeWithAarohi@CodeWithAarohi3 ай бұрын
  • Gonna tell my kids this was optimus prime.

    @Red_Black_splay@Red_Black_splayАй бұрын
    • Haha, I love it! Optimus Prime has some serious competition now :)

      @CodeWithAarohi@CodeWithAarohiАй бұрын
  • Use mic, background noise irritate

    @jagatdada2.021@jagatdada2.0216 ай бұрын
    • Noted! Thanks for the feedback.

      @CodeWithAarohi@CodeWithAarohi6 ай бұрын
  • Speaking in Hindi would be more better

    @_Who_u_are@_Who_u_are12 сағат бұрын
KZhead