Stanford CS229: Machine Learning - Linear Regression and Gradient Descent | Lecture 2 (Autumn 2018)

2024 ж. 17 Мам.

1 163 196 Рет қаралды

For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: stanford.io/3pqkTry
This lecture covers supervised learning and linear regression.
Andrew Ng
Adjunct Professor of Computer Science
www.andrewng.org/
To follow along with the course schedule and syllabus, visit:
cs229.stanford.edu/syllabus-au...
#andrewng #machinelearning
Chapters:
00:00 Intro
00:45 Motivate Linear Regression
03:01 Supervised Learning
04:44 Designing a Learning Algorithm
08:27 Parameters of the learning algorithm
14:44 Linear Regression Algorithm
18:06 Gradient Descent
33:01 Gradient Descent Algorithm
42:34 Batch Gradient Descent
44:56 Stochastic Gradient Descent

Пікірлер

Dude is a multi-millionaire and took valuable time meticulously teaching students and us. Legend.
@krishyket10 ай бұрын
- Bro needs to train his future employees
  @The_Quaalude4 ай бұрын
- yes bro. i think the more people with the knowledge, the faster the breakthroughs in the field
  @vikram-aditya3 ай бұрын
- ...and FOR FREE.
  @clerpington_the_fifth3 ай бұрын
This course saves my life! The lecturer of the ML course I'm attending rn is just going thru those crazy math derivations preassuming that all the students have mastered it all before😂
@calvin_7137 ай бұрын
0:41: 📚 This class will cover linear regression, batch and stochastic gradient descent, and the normal equations as algorithms for fitting linear regression models. 5:35: 🏠 The speaker discusses using multiple input features, such as size and number of bedrooms, to estimate the size of a house. 12:03: 📝 The hypothesis is defined as the sum of features multiplied by parameters. 18:40: 📉 Gradient descent is a method to minimize a function J of Theta by iteratively updating the values of Theta. 24:21: 📝 Gradient descent is a method used to update values in each step by calculating the partial derivative of the cost function. 30:13: 📝 The partial derivative of a term with respect to Theta J is equal to XJ, and one step of gradient descent updates Theta J 36:08: 🔑 The choice of learning rate in the algorithm affects its convergence to the global minimum. 41:45: 📊 Batch gradient descent is a method in machine learning where the entire training set is processed as one batch, but it has a disadvantage when dealing with large datasets. 47:13: 📈 Stochastic gradient descent allows for faster progress in large datasets but never fully converges. 52:23: 📝 Gradient descent is an iterative algorithm used to find the global optimum, but for linear regression, the normal equation can be used to directly jump to the global optimum. 58:59: 📝 The derivative of a matrix function with respect to the matrix itself is a matrix with the same dimensions, where each element is the derivative with respect to the corresponding element in the original matrix. 1:05:51: 📝 The speaker discusses properties of matrix traces and their derivatives. 1:13:17: 📝 The derivative of the function is equal to one-half times the derivative of Theta multiplied by the transpose of X minus the transpose of y. Recap by Tammy AI
@Eric-zo8wo8 ай бұрын
- How much we have to pay for your valuable overview on the entire class? Kudos to your efforts 👍
  @Lucky-vm9dv7 ай бұрын
- Thank you so much 👍🫡
  @MLLearner14 күн бұрын
when u paying 12k to your own university a year just so you can look up a course from a better school for free
@k-bobmakabaka4420 Жыл бұрын
- University cost needs to be as low cost as possible.
  @paulushimawan5196 Жыл бұрын
- while youtube have the unlimited free information and courses better than the tech university and colleges 🙂
  @_night_spring_11 ай бұрын
- Hahahahaahaha fucking hell thats what i am doing right fucking now.
  @Call-me-Avi7 ай бұрын
- which uni is that...
  @preyumkumar74044 ай бұрын
- @@preyumkumar7404 University of Toronto
  @k-bobmakabaka44204 ай бұрын
Feels like sitting in stanford classroom from india ...Thanks stanford. you guys are best
@manudasmd Жыл бұрын
- for real bro, me sitting in panjab, would have never come across how the top uni profs are, this is surreal.
  @gurjotsingh37267 ай бұрын
- @@gurjotsingh3726 Sat sri akaal, ਖੁਸ਼ਕਿਸਮਤੀ
  @hamirmahal2 ай бұрын
We learn, and teachers give us the information in a way that can help stimulate our learning abilities. So, we always appreciate our teachers and the facilities contributing to our development. Thank you.
@imad199611 ай бұрын
I am not good at math anymore, but I think math is simple if you get the right teachers like you. Tnks.
@DagmawiAbate Жыл бұрын
Thank you to Stanford and Andrew for a wonderful series of lectures!
@jeroenoomen81454 ай бұрын
8:50 notations and symbols 13:08 how to choose theta 17:50 Gradient descent
@user-hm5qk8ic6j Жыл бұрын
- 52:50 Normal equations
  @dens325410 ай бұрын
Andrew Ng you are the best
@ikramadjissa370 Жыл бұрын
Hey can I point out how an amazing teacher professor Andrew is?! Also, I love how he is all excited about the lesson he is giving! It just makes me feel even more interested in the subject. Thanks for this awesome course!
@LuisFuentes-xd4yz Жыл бұрын
- Look at Coursera, he founded that and has many free courses.
  @tanishsharma13611 ай бұрын
One of the greats, a legend in AI & Machine Learning. Up there with Prof. Strang and Prof LeCun.
@jaeen76652 ай бұрын
the best professor in the world.
@claudiosaponaro45659 ай бұрын
Thank you so much Dr. Andrew! It took me some time but your stepwise explanation and notes have given me a proper understanding. I'm learning this to make a presentation for my university club. We all are very grateful!
@anushka.narsima Жыл бұрын
- Hi I was not able to download the notes, 404 error, from the course page in description. Other PDFs are available on the course page. Are you enrolled or where did you download the notes from?
  @Amit_Kumar_Trivedi Жыл бұрын
- @@Amit_Kumar_Trivedi cs229.stanford.edu/lectures-spring2022/main_notes.pdf
  @anushka.narsima Жыл бұрын
- @@anushka.narsima thanks
  @georgenyagura7742 Жыл бұрын
8:50 notations and symbols 13:08 how to choose theta 17:50 Gradient descent 8:42 - 14:42 - Terminologies completion 51:00 - batch 55:00 problem 1 set 57:00 for p 0
@i183x49 ай бұрын
- notes are not available on the website ???
  @AshishRaj048 ай бұрын
We define a cost function based on sum of squared errors. The job is minimise this cost function with respect to the parameters. First, we look at (Batch) gradient descent. Second, we look at Stochastic gradient descent, which does not give us the exact value at which the minima is achieved, however, it is much much more effective in dealing with big data. Third, we look at the normal equation. This equation directly gives us the value at which minima is achieved! Linear regression models is one of the few models in which such an equation exist.
@dimensionentangled45142 жыл бұрын
- I wish you sat next to me in class 😂
  @xxdxma6700 Жыл бұрын
- Bro who named that equation as normal equation?
  @rajvaghasia9942 Жыл бұрын
- @@rajvaghasia9942 the name "normal equation" is because generalizes the concept of perpendiculum (normal to something means perpendicula to something). In fact "the normal equation" represent the projection between the straight line that i draw as a starting point (in the case of LINEAR regression) and the effective sampling data .This projection has , obviously , information about the distances between the real data (sampling data) and my "starting line"...hence to find the optimal curve that fit my data i 've to find weight a bias (in this video Theta0 , Theta1 and so on) to minimize this distance. you can minimize this distance using gradient descend (too much the cost), stochastic gradient descend (doing a set of partial derivative not computing all the gradient of loss function) or using the "normal equations"...uderstand?... Here an image from wikipedia to understand better (the green line are the famous distances) en.wikipedia.org/wiki/File:Linear_least_squares_example2.svg
  @alessandroderossi8930 Жыл бұрын
- @@rajvaghasia9942 because we're in the matrix now bro! ha. For real though. It's about the projection matrix and the matrix representation/method of acquiring the beta coefficients.
  @JDMathematicsAndDataScience Жыл бұрын
- I have been wondering why we need such an algorithm when we could just derive the least squares estimators. Have you seen any research comparing the gradient descent method of selection of parameters with the typical method of deriving the least squares estimators of the coefficient parameters?
  @JDMathematicsAndDataScience Жыл бұрын
Really easy to understand. Thanks a lot for sharing!
@diegoalias2935 Жыл бұрын
- sure it is, it is high school topic, at least in Italy
  @massimovarano407 Жыл бұрын
- @@massimovarano407 I'm pretty sure multivariate calculus is not a high-school topic in Europe
  @gustavoramalho9454 Жыл бұрын
8:42 - 14:42 - Terminologies completion 17:51 -- Checkpoint 57:00 - run1
@olinabin2004 Жыл бұрын
Fantastic. Thank you deeply for sharing
@mortyrickerson632211 ай бұрын
Thank you Stanford for this amazing resource. Pls csn i get a link to the lecture notes. Thanks
@polymaththesolver57219 ай бұрын
Loving the lectures!!
@PhilosophyOfWinners6 ай бұрын
I love you Sir Andrew, you inspire me a lot haha
@chandarayi56738 ай бұрын
I really don't have a clue about this stuff, but it's interesting and I can concentrate a lot better when I listen to this lecture so I like it
@Honey-sv3ek Жыл бұрын
- You can see his lecture on coursera about Machine learning. You will surely get what he is saying in this video.
  @FA-BCS-MUHAMMADHAMIDSAEEDUnkno Жыл бұрын
- @@FA-BCS-MUHAMMADHAMIDSAEEDUnkno yes, that course is beginner-friendly. Everyone with basic high school math can take that course even without knowledge of calculus.
  @paulushimawan5196 Жыл бұрын
Simple and understandable
@victor3btn598 Жыл бұрын
Very clear explanations. Extra points for sounding like Stewie Griffin
@clinkclink7814 Жыл бұрын
this men is great teatcher
@RHCIPHER Жыл бұрын
I need that lecture notes ASAP professor
@GameFlife8 ай бұрын
May I ask, down to 7:50 what does O (teta) represent?
@chideraagbasiere78689 ай бұрын
39:38 we're subtracting because to minimize the cost function, the two vectors must be at 180⁰. So we get a negative from there.
@nikhithar30772 ай бұрын
Knowledge is power
@punksoulcool7 ай бұрын
thanks a lot 吴恩达,i learned a lot
@jpgunman0708 Жыл бұрын
Attending Stanford University from Nairobi, Kenya.
@ambushtunes Жыл бұрын
my machine learning lecturer is so dogshit I thought this unit was impossible to understand. Now following these on study break before midsem and this guy is the best. I'd prefer that my uni just refers to these lectures rather than making their own
@vseelix957 Жыл бұрын
47:00 51:00 - batch 55:00 problem 1 set 57:00 for p 0
@ZDixon-io5ww Жыл бұрын
I didn't understand the linear regression algorithm is there any way to understand it better ??
@ayushhya9 ай бұрын
Can I get notes for these lectures?
@aliiq65729 ай бұрын
Why do we take the transpose of each row, wouldn't it be stacking columns on top of each other?
@tanmayshukla8660 Жыл бұрын
This is really cool. ❤
@sivavenkateshr11 ай бұрын
The partial derivative was incomplete to me. we should take the derivative 2/2 thetha as well? is that term a constant? shouldn't we go with the product rule!
@26d88 ай бұрын
"Wait, AI is just math?" "Always has been"
@zzh3153 ай бұрын
Where can I find the notes and other videos and any material related to this class!?
@HeisenbergHK3 ай бұрын
at 40:10, how about if we set the initial value at a point that the gradient is a negative direction, then we should increase theta rather than decrease theta?
@user-up3fn9cw7hАй бұрын
Where do i get the assignments for these lecture series?
@sipraneye70 Жыл бұрын
Why aren't we using the usual numerical methods(least squares) to fit a straight line to a given set of data points?
@ozonewagle11 ай бұрын
Andrew Ng, FTW!
@johndubchak10 ай бұрын
Andrew讲得太好了
@clairewang8370 Жыл бұрын
Which book is he using? and where do we find the homework?
@anonymous-3720 Жыл бұрын
Hi. Can anyone recommend any textbook that can help in further study of this course. Thank you
@ChidinmaOnyeri5 күн бұрын
สุดจัดปลัดบอก
@sarayutsawaengsap67992 жыл бұрын
BRILIANT TECHER
@Nardosmelaku-ph6sb4 күн бұрын
why in cost function he did 1/2 and not 1/2*m ?
@labiditasnim6239 ай бұрын
Does anyone know which textbook goes well with these lectures?
@ahmednesartahsinchoudhury26284 ай бұрын
Thank you!
@nanunsaram Жыл бұрын
it's hard, but everything thats worth doing is
@wishIKnewHowToLove Жыл бұрын
Dear Dr. Andrew I saw yours other video with the cost function with linear regression by 1/2m but this video 1/2, so what is different between it?(footnote 16:00)
@chhaysith Жыл бұрын
- I don't really understand what you mean by 1/2m. However, from my understanding, the 1/2 is just for simplicity when taking the derivative of the cost ftn the power 2 will be multiplied to the equation and cancellyby the half.
  @treqemad Жыл бұрын
- It should be 1/2m where m is the size of the data set. That's because we'd like to take the average sum of squared differences and not have the cost function depend on the size of the data set m kzhead.info/sun/jd6edNiLpKSIoo0/bejne.html He explains it here at 6:30 minutes
  @googgab Жыл бұрын
- @@googgab It should be ok if J depends on m since m isn't changing?
  @aman-qj5sx11 ай бұрын
- same question
  @labiditasnim6239 ай бұрын
1:01:06 Didn't know Darth Vader attended this lectures
@skillato9000 Жыл бұрын
thank you
@kaipingli-mh3mw6 ай бұрын
Very impressive. Somebody knows where are the lecture notes?
@boardsdiscovery58038 ай бұрын
- There is a link in the video description, make sure you click where it says "...more"
  @bradmorse63208 ай бұрын
i wish i had access to the problem sets for this course
@promariddhidas689518 күн бұрын
can anyone pls explain what do we mean by "parameters" that is denoted by theta here?
@gauravpadole10358 ай бұрын
- Parameters are TRAINABLE numbers in the model such as weights and bias's, since the prediction of the model is based on some combination of weight and bias values. So when 'parameters' of 'theta' are changed or 'trained', it means that the weights and bias's are changed or trained.
  @SteveVon75 ай бұрын
if board is full, slide up the board, if it refuses to go up, pull it back down, erase and continue writing on it.
@learnfullstack Жыл бұрын
Is the lecture note available publicly for this? I have been going watching this playlist and I think the lecture note will be very helpful.
@techpasya9742 ай бұрын
- cs229.stanford.edu/main_notes.pdf
  @KorexeroK21 күн бұрын
Podemos dar la clase fuera?
@7takeo2 жыл бұрын
cant download the course class note pls look onto ot
@souravsengupta13115 ай бұрын
The pdf link to the problem set says Error Not found. Can someone help Please ?
@bhavyasharma97847 ай бұрын
how to access the lecture notes:(. they have been removed from standford websites.
@samrendranath10 ай бұрын
In the very last equatin (Normal equation 1:18:06) Transpose(X) appears on both sides of the equation, can't this be simplified by dropping transpose(T)?
@jerzytas6 ай бұрын
- no because , x is neccesarily not a square a matrix
  @manasvi-fl6xq4 ай бұрын
difficult word : cost function gradient descent convex optimization hypothesis fx target j of theta = cost/loss function partial derivatives chain row global optimum batch gradient descent stochastic gradient descent mini batch gradient descent decreasing learning rate parameters oscillating iterative algorithm normal equation trace of a
@uekiarawari3054 Жыл бұрын
Can you update the lecture notes and assignments in the website for the course? Most of the links to the documents are broken
@raymundovazquezmusic216 Жыл бұрын
- Hi there, thanks for your comment and feedback. The course website may be helpful to you cs229.stanford.edu/ and the notes document docs.google.com/spreadsheets/d/12ua10iRYLtxTWi05jBSAxEMM_104nTr8S4nC2cmN9BQ/edit?usp=sharing
  @stanfordonline Жыл бұрын
- @@stanfordonline Where can I access the problem sets?
  @adi29raj Жыл бұрын
- @@stanfordonline Please post this in the description to every video. Having this in an obscure reply to a comment will only lead to people missing it while scrolling.
  @salonisingla1665 Жыл бұрын
Does someone know how to get the lecture notes? They are not available on stanford's website.
@HarshitSharma-YearBTechChemica3 ай бұрын
- Same issue for me alsoo....
  @logeshwaran15373 ай бұрын
Would anyone please share the lecture notes? On clicking on the link for the pdf notes on the course website, its showing an error that the requested URL was not found on the server. It would really be great if someone could help me with finding the class notes.
@parthjoshi5892 Жыл бұрын
- I think i found them here : chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/cs229.stanford.edu/main_notes.pdf
  @amaia70452 ай бұрын
Wonder: Is m equals to n+1 ？n stands for number of inputs, while the m stands for the number of the rows which includes X0 in addition.
@DrPan88 Жыл бұрын
- n actually stands for the number of attributes here, or the number of features (columns)
  @sandeeproy6564 Жыл бұрын
- No not necessarly m is the number of rows, and n is the number of column or features. In his example n is equal to two (Size, and bedrooms), m can be any number. But i think that in the example m is 50
  @MaxTheKing289 Жыл бұрын
- @Louis Aballea yeah I got it. Thanks !
  @DrPan88 Жыл бұрын
The notes from the description seem to have vanished. Does anyone have them?
@user-rj5ws9ry1w9 ай бұрын
- same problem
  @JunLiLin06169 ай бұрын
1.14.54 my answer is (X^T Xθ )+(X^T θ^T X)-(X^T Y)-(Y X^T) its same or my ans is wrong ?
@Baru_Bangun_Tidur11 ай бұрын
is the explanation at 40:00 correct?
@surendranmurugesan4 ай бұрын
28:51, what is x0 and x1? If we have a single feature, say # of bedrooms, how can we have x0 and x1? Wouldn't x0 be just nothing? I'm confused. Or, in other words, if my Theta0 update function relies on x0 for the update, but x0 doesn't exist, theta0 will always be the initial theta0...
@truszko9111 ай бұрын
- The value of x0 is always one 1. So theta0 can rely on x0 for the update. If we have single feature then h(X) =x0*Theta0 + x1* theta1 (which is ultimately equal to theta0 + x1*theta1 as x0=1, theta0 can also be referred as intercept and theta1 as slope if you compare it with the equation of a straight line such that price of house is linear function of # of bedrooms)
  @MahakYadav1211 ай бұрын
- @@MahakYadav12 thank you!!
  @truszko9111 ай бұрын
Seems like the lagrangian or path of least action theory in physics can be applied to algorythmic manipulations in machine learning as well as economics where isoquant curves and marginal analysis depend on many variables...not being an expert in any field the topics seem very similar and some corelation may exist...perhaps already being used.
@Goaks81282 жыл бұрын
- Do you speak english?
  @godson200 Жыл бұрын
Took me quite some time to realize this class was not being taught to darth vader
@michaelcochran62602 ай бұрын
Had to study basic Calculus and Linear algebra at the same time to understand a bit, but don't get it fully yet,
@GatsbiАй бұрын
Could you please tell me the actual use of Gradient Descent by minimizing the y(theta)?
@veeraboinabhavaniprasad38646 ай бұрын
- Gradient Descent is basically the optimization model that help minimizing the cost of Model. We obtain the cost by calculating the MSE (Mean Squared Error)
  @AdeelKhalidSE5 ай бұрын
anybody know where the notes are? the link doesnt work for me
@cristianreyes828810 ай бұрын
Wondering if lecture notes are also available to download from somewhere ?
@faisalhussain40225 ай бұрын
- hey bro I found them: cs229.stanford.edu/lectures-spring2022/main_notes.pdf
  @williambrace68854 ай бұрын
- @@williambrace6885thanks a lot!
  @kag463 ай бұрын
where do I find the lecture notes? Help
@user-ni3uw5cc7u4 ай бұрын
I asked ChatGPT how to learn machine learning. #1 Coursera: Course: "Machine Learning" by Andrew Ng (Stanford University)
@putinscat12089 ай бұрын
Very clear, but what I don't get is for the multiple data sets when I sum the errors, do I do two passes through the data and choose the error that is less?
@mikeeggleston1769 Жыл бұрын
- Just continue changing theta till cost function reduces to optimal
  @victor3btn598 Жыл бұрын
- Yes the goal Is to reach less error and by tweaking theta you can achieve that and make sure you don't overshoot
  @victor3btn598 Жыл бұрын
How can I implement this?? any references??
@ayanbhattacharya16674 ай бұрын
why is it that the cost function has the constant 1/2 before the summation and not 1/2m?
@samsondawit Жыл бұрын
- I think it's because he is taking one learning example and not m learning examples
  @ihebbibani712211 ай бұрын
- @@ihebbibani7122 ah I see
  @samsondawit11 ай бұрын
hey where can i get the notes?
@shawol27511 ай бұрын
What's the difference between his course on coursera and the videos that are posted on here ?
@prekshasharma1138 Жыл бұрын
- His Preksha, great question! These videos are lectures from the graduate course at Stanford. Here is a link to course if you are interested: online.stanford.edu/courses/cs229-machine-learning His courses on coursera are more introductory than this graduate level course. Hope this helps, don't hesitate to get in touch with our team if you have more questions online.stanford.edu/contact-us
  @stanfordonline Жыл бұрын
- Short answer: The coursera version is much easier
  @paulushimawan5196 Жыл бұрын
is there thanks sitting in the class??
@amritkalash76086 ай бұрын
54:13 Normal Equation
@ganeshgummadi73018 ай бұрын
has someone(possibly newbie like me) gone through all the videos and learnt enough to pursue an ML career or created a project? Wondering if a paid class should be taken or these free videos are enough.
@Nobody23102 ай бұрын
- i also want to know have you gone through all the videos
  @orignalbox20 күн бұрын
why even go to uni, wtf this is so much better than my lectures and it's free and it's recorded lmao wtf unis be doing they are dying fr
@MikeSieko175 ай бұрын
Fred has a one hundred sided die. Fred rolls the dice, once and gets side i. Fred then rolls the dice, again, second roll, and gets side j where side j is not side i. What is the probability of this event e? Assume the one hundred sides of the one hundred sided die all have an equal probability of facing up.
@wonggran99832 жыл бұрын
- 1 - (1/10000) = 9999/10000
  @Tryingitoutletsee Жыл бұрын
- the probability of getting the same results for two rolls and they are both defined is 1/10000. So that we will subtract that from 1
  @ahmettolgakarabulut9380 Жыл бұрын
- Wouldn't it be 99/100? The first roll can be any number so it doesn't really matter what's there. The second roll just needs to be one of the other 99 numbers. The first roll doesn't really change the probability. Of course, I barely know any math so I'm no expert lol
  @billr5842 Жыл бұрын
- @@billr5842 you're right, the probability calculated above as 1/10000 is the probability of getting the same result for a "specific side", like getting "side 3" twice. But there are 100 different sides that has the 1/10000 probability to occur twice, so the probability 1/10000 is multiplied by the different side number 100 which makes the probability of getting the same result for two rolls equal to 1/100. Then 1 - 1/100 = 99/100
  @emirkisa Жыл бұрын
where can i find the notes?
@GauravThakur-ub8uqАй бұрын
How can I get the lecture papers pdf..?
@MohamedAli-wo6qv Жыл бұрын
- Google
  @user-ft4xr8gv7o Жыл бұрын
Can we get access to class lecture notes? @Stanford Online
@harshalssingh3689 Жыл бұрын
- click on show more on the description of the video. the link to class notes is the last link.
  @smn7074 Жыл бұрын
- @@smn7074 Does that still work for you? It says "Not Found" when I click on a pdf link.
  @afmirror01 Жыл бұрын
- @@afmirror01 some of the stuff still existed but there were things removed from that website.
  @smn7074 Жыл бұрын
- @@videowatching9576 those aren't the notes for the class of autumn 2018
  @smn7074 Жыл бұрын
- Free at coursera
  @tanubist7721 Жыл бұрын
Where can I find the lecture notes? Thank you Edit: Reading through comments I got the answer:)
@kvnsrinu Жыл бұрын
- Where could I get that?
  @govardhansathvik5897 Жыл бұрын
- @govardhansathvik5897 the links are broken :(
  @shree27108 ай бұрын
That feel when you need to pause the video every n-minutes and need to google the terminology coz highschool was too long ago
@Rubariton Жыл бұрын
miss the sound quality 😕😕
@QAMARRAZA-pm6ncАй бұрын