The Absolutely Simplest Neural Network Backpropagation Example

2024 ж. 22 Мам.

147 669 Рет қаралды

I'm (finally after all this time) thinking of new videos. If I get attention in the donate button area, I will proceed:
www.paypal.com/donate/?busine...
sorry there is a typo: @3.33 dC/dw should be 4.5w - 2.4, not 4.5w-1.5
NEW IMPROVED VERSION AVAILABLE: • 0:03 / 9:21The Absolut...
The absolutely simplest gradient descent example with only two layers and single weight. Comment below and click like!

Пікірлер

Dude, this was just what I needed to finally understand the basics of Back Propagation
@Vicente754805 жыл бұрын
- if you _Really_ liked his video, just click the first link he put on the description 👍
  @webgpu19 күн бұрын
GREAT, it was a perfect inspiration for me to explain this critical subject in a class. Thank you!
@GustavoMeschino20 күн бұрын
Really nice work. Thank you so much for your help.
@markneumann38119 күн бұрын
@8:06 this was super useful. That's a fantastic shorthand. That's exactly the kind of thing I was looking for, something quick I can iterate over all the weights and find the most significant one for each step.
@justinwhite27253 жыл бұрын
My long search ends here, you simplified this a great deal. Thanks!
@bedeamadi93173 жыл бұрын
Best video ever about the back propagation in the internet 🛜
@animatedzombie6428 күн бұрын
Very clearly explained and easy to understand. Thank you!
@EthanHofton3 жыл бұрын
just perfect, simple and with this we can extrapolate easier when in each layer there are more than one neuron! thaaaaankksss!!
@mateoacostarojas60315 жыл бұрын
This was great. Removing non linearity and including basic numbers as context help drove this material home.
@adoughnut123453 жыл бұрын
- If you use relu there is nothing more that that
  @gerrypaolone67862 жыл бұрын
Fantastic. This is the most simple and lucid way to explain backprop. Hats off
@saral1233 жыл бұрын
I had to write a comment and thank you for your very precise yet simple explanation, just what I needed. Thank you sir.
@arashnozarinejad99154 жыл бұрын
To understand mathematics, I need to see an example. An this video from start to end is awesome with quality presentation. Thank you so much.
@ilya57825 ай бұрын
Thanks very helpful.
@bhlooli Жыл бұрын
Unreal explanation
@lazarus801122 күн бұрын
You made this concept very simple. Thank you
@SureshBabu-tb7vh5 жыл бұрын
After a long frantic search, I stumbled upon this gold. Thank you so much!
@gautamdawar50673 жыл бұрын
I was just looking for this explanation to align derivatives with gradient descent. Now it is crystal clear. Thanks Miakel
@Freethinker332 жыл бұрын
The best short video explanation of the concept0 on KZhead till now...
@praneethaluru26013 жыл бұрын
Not kidding. This is the best explanation of backpropagation on the internet. The way you're able to simplify this "complex" concept is *chef's kiss* 👌
@xflory26x28 күн бұрын
GOD BLESS YOU DUDE! SUBSCRIBED!!!!
@AjitSingh147 Жыл бұрын
Great video
@fredfred98472 жыл бұрын
finally, a proper explanation.
@whywhatwherein25 күн бұрын
I watched almost every videos of back propagation even Stanford but never got such clear idea until I saw this one ☝️. Best and clean explanation. My first 👍🏼 which I rarely give.
@sparkartsdistinctions12573 жыл бұрын
- a 👍is very good, but if you click on the first link on the description, it would be even better 👍
  @webgpu19 күн бұрын
- @@webgpu 🆗
  @sparkartsdistinctions125719 күн бұрын
Thanks for making this
@anirudhputrevu38782 жыл бұрын
dude please make more videos. this is amazing
@adriannyamanga15804 жыл бұрын
Thanks you
@rachidbenabdelmalek3098 Жыл бұрын
Hi I have question for you, at 3:42, you have, 1.5*2(a-y) = 4.5*w-1.51, how did you get this result?
@SamuelBachorik-Mrtapo8-ApeX2 жыл бұрын
- ... in case someone missed it like me - it's in the description (it's a typo). y=0.8; a=i*w = 1.5*w, so 1.5*2(a-y) =3*(1.5*w - 0.8) = 4.5*w - 3*0.8 = 4.5*w - 2.4 is the correct formula.
  @nickpelov Жыл бұрын
Thanks for the video! Awesome explanation
@mixhybrid4 жыл бұрын
I'm currently programming a neural network from scratch, and I am trying to understand how to train it, and your video somewhat helped (didn't fully help cuz I'm dumb)
@bettercalldelta2 жыл бұрын
Thats sick bro I just implemented it
@formulaetor8686 Жыл бұрын
Absolutly simple. Very useful illustration not only to understand Backpropagation but also to show gradient descent optimization. Thanks a lot.
@ronaldmercado47687 ай бұрын
best on internet.
@polybender11 күн бұрын
Thank you bro! Its so easier to visualize it when its presented like that.
@outroutono4937 Жыл бұрын
Very helpful tutorial. Thanks!
@aorusaki4 жыл бұрын
What a breakthrough, thanks to you. BTW, not to nitpick, but you are missing a close paren on f(g(x), which should be f(g(x)).
@drummin2dabeat3 ай бұрын
excellent video, simple & clear many thanks
@zh48424 жыл бұрын
Thanks
@sameersahu3987 Жыл бұрын
Good content sir keep making these i subscribe
@sabinbaral4132 Жыл бұрын
My maaaaaaaannnnn TYYYY
@riccardo7002 ай бұрын
It clicked after just 3 minutes. Thanks a lot!!
@ApplepieFTW Жыл бұрын
This makes more sense than anything I ever heard in the past! Thank you! 🥂
@santysayantan2 жыл бұрын
- It beats the 1002165794 thing and 1001600474 jumping and calculating with 1000325836 and 1000564416. Much easier 😊
  @brendawilliams80629 ай бұрын
- you are wrong: Say me what is deltaW?
  @jameshopkins35418 ай бұрын
this was kicking my a$$ until i watched this video. thanks
@svtrilogywestsail32782 жыл бұрын
Great illustrated, thanks
@hamedmajidian44513 жыл бұрын
Thanks for a very explanatory video.
@lhyd7hak2 жыл бұрын
Thank you for sharing this video!
@Controlvers3 жыл бұрын
very clear
@LunaMarlowe3272 жыл бұрын
Absolutely amazing 🏆
@jakubpiekut14462 жыл бұрын
Great video, going to spend some time working out it looks for multiple neurons, but a demonstration on that would be awesome
@DaSticks4 ай бұрын
I have to say it. You have done the best video about backpropagation because you chose to explain the easiest example, no one did that out there!! Congrats prof 😊
@riccardo7002 ай бұрын
- did you _really_ like his video? Then, i'd suggest you click the first link he put on the description 👍
  @webgpu19 күн бұрын
thank you, this is exactly what I was looking for, very useful!
@giuliadipalma50422 жыл бұрын
Exactly what i needed
@RohitKumar-fg1qv5 жыл бұрын
Excellent , please continue we need this kind of simplicity in NN
@rdprojects29543 жыл бұрын
Helped me so much!
@meanderthalensis2 жыл бұрын
Bro this is awesome, I was struggling to understand chain rule, now it is clear
@mahfuzurrahman45177 ай бұрын
Awesome dude. Much appreciate your effort.
@satishsolanki97663 жыл бұрын
This video is very well done. Just need to understand implementation when there is more than one node per layer
@evanparshall13233 жыл бұрын
- Have you looked at my other videos? I have a two-dimensional case in this video: kzhead.info/sun/dcirnZGahnVreKM/bejne.html
  @mikaellaine94903 жыл бұрын
I am so happy that I can't even express myself right now
@sunilchoudhary8281Ай бұрын
- there's a way you can express your happiness AND express your gratitude: by clicking on the first link in the description 🙂
  @webgpu19 күн бұрын
Bro i just worked it through and it makes so much sense once you do the partial derivatives and do it step by step and show all the working
@shilpatel58363 жыл бұрын
best explanation i had ever seen, thanks.
@paurodriguez5364 Жыл бұрын
Thank you so much!
@elgs19803 жыл бұрын
Thank you for the easiest expression for bacpropagation dude
@ahmetpala79454 жыл бұрын
Nice and clean. Helped me a lot!
@zeljkotodor2 жыл бұрын
This is the best tutorial on back prop👏
@shirish30083 жыл бұрын
man, thanks!
@alexandrmelnikov51267 ай бұрын
You made it easy to understand. Really appreciated it. You also earned my first KZhead comment.
@popionlyone5 жыл бұрын
Great video. Just one question, this is for 1 x 1 input and batch size of 1 right?. If we have, let´s say a batch size of 2, It is just to sum (b-y)^2 to the loss function ( C= (a-y)^2 + (b-y)^2) isnt it?, with b = w * j and j = the input of the second batch size. Then you just perform the backpropation with partial derivatives. Is it correct?
@javiersanchezgrinan91923 күн бұрын
I don’t get it you write 1.5*2(a-y) = 4.5w -1.5 But why? It should be 4.5w -2,4 Because 2*0,8*-1,5= -2,4 Where am I rong?
@btmg482825 күн бұрын
4:03 Shouldn't 3(a - y) be 3(1.5*w - 0.8) = 4.5w - 2.4? Where have you got -1.5 from?
@SuperYtc13 күн бұрын
thanks a lot... a great start for me to learn NNs :)
@malinyamato2291 Жыл бұрын
thanks a lot for that explanation :)
@user-og9zn9vf4k4 жыл бұрын
Perfect
@tellmebaby183 Жыл бұрын
Thank you for your video. But I’m a bit confused about 1,5.2(a-y) = 4,5.w-1,5, Might you please explain that? Thank you so much!
@TrungNguyen-ib9mz3 жыл бұрын
- I think this is how he got there : 1.5 * 2(a - y) = 1.5 * 2 (iw - 0.5) = 1.5 * 2 (1.5w - 0.5) = 1.5 * (3w - 1) = 4.5w - 1.5
  @user-gq7sv9tf1m3 жыл бұрын
- @@user-gq7sv9tf1m dude thanks for that, I was really scratching my head over how he got there too
  @christiannicoletti97623 жыл бұрын
- i am also confused this error
  @Fantastics_Beats Жыл бұрын
- @@user-gq7sv9tf1m y is 0.8 not 0.5
  @morpheus1586 Жыл бұрын
Thank you
@rafaelscarpe29283 жыл бұрын
I see. As previously mentioned, there are a few typos. For anyone watching, please note there are a few places where 0.8 and 0.5 are swapped for each other. That being said, this explanation has opened my eyes to the fully intuitive explanation of what is going on... Put simply, we can view each weight as an "input knob" and we want to know how each one creates the overall Cost/Loss. In order to do this, we link (chain) each component's local influence together until we have created a function that describes weight to overall cost. Once we have found that, we can adjust that knob with the aim of lowering total loss a small amount based on what we call "learning rate". Put even more succinctly, we are converting each weight's "local frame of reference" to the "global loss" frame of reference and then adjusting each weight with that knowledge. We would only need to find these functions once for a network. Once we know how every knob influences the cost, we can tweak them based on the next training input using this knowledge. The only difference between each training set will just be the model's actual output, which is then used to adjust the weights and lower the total loss.
@jks2343 ай бұрын
I think there is a mistake. 4.5w -1.5 is correct. On the first slide you said 0.5 is the expected output. So "a" is the computed output and "y" is the expected output. 0.5 * 1.5 * 2 = 1.5 is correct. You need to correct the "y" next to the output neuron to 0.5.
@OviGomy5 ай бұрын
THIS IS SOO FKING GOOD!!!!
@AAxRy3 ай бұрын
Great video! One thing to mention is that the cost function is not always convex, in fact it is never truly convex. However, as an example this is really well explained.
@st0a8 ай бұрын
Brilliant. What would be awesome is to then further expand if u would and explain multiple rows of nodes...in order to try and visualise if possible multiple routes to a node and so on...i stress "if possible...".
@kitersrefuge73535 ай бұрын
ECE 449 UofA
@demetriusdemarcusbartholom8063 Жыл бұрын
@Mikael Laine even though you say that @3:33 has a typo. i cant see the typo. 1.5 is correct because y is the actual desired out put and it is 0.5. so 3.0 * 0.5 = 1.5
@ExplorerSpace Жыл бұрын
The video shows what is perhaps the simplest case of a feedforward network, with all the advantages and limitations that extreme simplicity can have. From here to full generalization several steps are involved. 1.- More general processing units. Any continuously differentiable function of inputs and weights will do; these inputs and weights can belong not only to Euclidean spaces but to any Hilbert spaces as well. Derivatives are linear transformations and the derivative of a unit is the direct sum of the partial derivatives with respect to the inputs and with respect to the weights. 2.- Layers with any number of units. Single unit layers can create a bottleneck that renders the whole network useless. Putting together several units in a layer is equivalent to taking their product (as functions, in the set theoretical sense). Layers are functions of the totality of inputs and weights of the various units. The derivative of a layer is then the product of the derivatives of the units. This is a product of linear transformations. 3.- Networks with any number of layers. A network is the composition (as functions, and in the set theoretical sense) of its layers. By the chain rule the derivative of the network is the composition of the derivatives of the layers. Here we have a composition of linear transformations. 4.- Quadratic error of a function. --- This comment is becoming a too long. But a general viewpoint clarifies many aspects of BPP. If you are interested in the full story and have some familiarity with Hilbert spaces please Google for papers dealing with backpropagation in Hilbert spaces. Daniel Crespin
@dcrespin Жыл бұрын
Great video. I believe there is a typo at 1:10. y should be 0.5 and not 0.8. That might cause some confusion, especially at 3:34, when we use numerical values to calculate the slope (C) / slope (w)
@thiagocrepaldi60715 жыл бұрын
- Thanks for pointing that out; perhaps time to make a new video!
  @mikaellaine94905 жыл бұрын
- yes, that should say a=1.2
  @mikaellaine94905 жыл бұрын
- +Mikael Laine I would be si glad if you could make more videos explaining these kind of concepts and how they actually work in a code level.
  @Vicente754805 жыл бұрын
- Did you have any particular topic in mind? I'm planning to make a quick video about the mathematical basics of backpropagation: automatic differentiation. Also I can make a video about how to implement the absolutely simples neural network in Tensorflow/Python. Let me know if you have a specific question. I do have quite a bit experience in TF.
  @mikaellaine94905 жыл бұрын
- @@mikaellaine9490 How about adding that to description? Someone else asked that question.
  @mychevysparkevdidntcatchfi14895 жыл бұрын
Very helpful
4 жыл бұрын
Thanks a lot :)
@alexaona88053 жыл бұрын
This video is gold.
@samiswilf3 жыл бұрын
Thank you so much! I'm 14 years old and I'm now trying to build a neural network with python without using any kind of libraries, and this video made me understand everything much better.
@smartdev16366 ай бұрын
- No way me too
  @Banana-anim8ions3 ай бұрын
- Brooo WW I ended up coding something which looked good to me but for some reason It didn't work so I just gave up on it. I wish you good luck man@@Banana-anim8ions
  @smartdev16363 ай бұрын
if we take directly the derivitive dC/dw from C=(a-y)^2 is the same thing right? do we really have to split individually da/dw and dC/da ???
@TruthOfZ06 күн бұрын
man 4:08 i dont undestrand how you find the valor 4.5, in expression 4.5.w-1.5,
@srnetdamon3 ай бұрын
So what is the clever part of back prop? Why does it have a special name and it isn't just called "gradient estimation"? How does it save time? It looks like it just calculates all derivatives one by one
@giorgosmaragkopoulos91102 ай бұрын
Спасибо братан, наконец-то выкупил что после последнего слоя происходит:)
@user-tt1hl6sk8y4 жыл бұрын
Thank you. Here is pytorch implementation. import torch import torch.nn as nn class C(nn.Module): def __init__(self): super(C, self).__init__() r = torch.zeros(1) r[0] = 0.8 self.r = nn.Parameter(r) def forward(self, i): return self.r * i class L(nn.Module): def __init__(self): super(L, self).__init__() def forward(self, p, t): loss = (p-t)*(p-t) return loss class Optim(torch.optim.Optimizer): def __init__(self, params, lr): defaults = {"lr": lr} super(Optim, self).__init__(params, defaults) self.state = {} for group in self.param_groups: for par in group["params"]: # print("par: ", par) self.state[par] = {"mom": torch.zeros_like(par.data)} def step(self): for group in self.param_groups: for par in group["params"]: grad = par.grad.data # print("grad: ", grad) mom = self.state[par]["mom"] # print("mom: ", mom) mom = mom - group["lr"] * grad # print("mom update: ", mom) par.data = par.data + mom print("Weight: ", round(par.data.item(), 4)) # r = torch.ones(1) x = torch.zeros(1) x[0] = 1.5 y = torch.zeros(1) y[0] = 0.5 c = C() o = Optim(c.parameters(), lr=0.1) l = L() print("x:", x.item(), "y:", y.item()) for j in range(5): print("_____Iter ", str(j), " _______") o.zero_grad() p = c(x) loss = l(p, y).mean() print("prediction: ", round(p.item(), 4), "loss: ", round(loss.item(), 4)) loss.backward() o.step()
@JAYSVC2349 ай бұрын
This is absolutely awesome. Except..... Where did that 4.5 come from???
@grimreaperplayz5774 Жыл бұрын
- You’ve probably figured it out by now but just in case: i = 1.5, y=0.8, a = i•w. This means the expression for dC/dw = 1.5 • 2(1.5w - 0.8). Simplify this and you get 4.5w - 2.4. This is where the 4.5 comes from. Extra note: in the description it says -1.5 was a typo and the correct number is -2.4.
  @delete731610 ай бұрын
Thanks! This is Awesome. I have I question, if we make the NN more complicated a little bit (adding an activation function for each layer), what will be the difference?
@user-mc9rt9eq5s3 жыл бұрын
Ow you did not lie on the tittle.
@banpridev27 күн бұрын
Where and how did you get the learning rate?
@hegerwalterАй бұрын
Amazing
@Blue-tv6dr3 жыл бұрын
are you able to briefly describe how the calculation at 8:20 works for a network with mutliple neurons per layer?
@mysteriousaussie39003 жыл бұрын
i like this vd
@MATLAB1Expert12 жыл бұрын
hmm, if y = .8 then should dc/dw = 4.5w - 2.4. Because .8 * 3 = 2.4, not 1.5. What am I missing?
@Nova-Rift3 жыл бұрын
in the final eqn why it is 4.5w-1.5 instead it should be 4.5w-2.4 since y=0.8 so 3*0.8 =2.4
@RaselAhmed-ix5ee2 жыл бұрын
- Yes you are right. I noticed too.
  @kamilkaya53672 жыл бұрын