Machine Learning Fundamentals: Bias and Variance
Bias and Variance are two fundamental concepts for Machine Learning, and their intuition is just a little different from what you might have learned in your statistics class. Here I go through two examples that make these concepts super easy to understand.
For a complete index of all the StatQuest videos, check out:
statquest.org/video-index/
If you'd like to support StatQuest, please consider...
Buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - statquest.gumroad.com/l/wvtmc
Paperback - www.amazon.com/dp/B09ZCKR4H6
Kindle eBook - www.amazon.com/dp/B09ZG79HXC
Patreon: / statquest
...or...
KZhead Membership: / @statquest
...a cool StatQuest t-shirt or sweatshirt:
shop.spreadshirt.com/statques...
...buying one or two of my songs (or go large and get a whole album!)
joshuastarmer.bandcamp.com/
...or just donating to StatQuest!
www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
/ joshuastarmer
0:00 Awesome song and introduction
0:29 The data and the "true" model
1:23 Splitting the data into training and testing sets
1:40 Least Regression fit to the training data
2:16 Definition of Bias
2:33 Squiggly Line fit to the training data
3:40 Model performance with the testing dataset
4:06 Definition of Variance
5:10 Definition of Overfit
Correction:
4:06 I say that the difference in fits between the training dataset and the testing dataset is called Variance. However, I should have said that the difference is a consequence of variance. Technically, variance refers to the amount by which the predictions would change if we fit the model to a different training data set.
#statquest #biasvariance #ML
Correction: 4:06 I say that the difference in fits between the training dataset and the testing dataset is called Variance. However, I should have said that the difference is a _consequence_ of variance. Technically, variance refers to the amount by which the predictions would change if we fit the model to a different training data set. Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
And at 4:55, why do you say straight line has low variance? That isn't necessarily true since those points on the graph could be anywhere else and if they are farther from the line, the sum of squares could easily be much greater. .
@@leif1075 Given this dataset, the straight line has lower variance than the squiggly line. Given another dataset, things could be very different.
@@statquest Ok so you were only referring tp this dataset then? Sorry What I said is correct though in general right?
@@leif1075 Regardless of the models and the data, you always have to test to see which one has the least variance.
@@statquest So what I said was correct then?
4 hours of the lectures with a lot of complicated math: got nothing 6 minutes with the singing guy: *DOUBLE BAM*
Hooray! :)
Math is important. Go learn the math.
You can't get anywhere without the math
@@fluxqubit ima jus import da python library my G. math is for fools
Better than lots of courses on Udemy. I really like your humor
Thanks! :)
@@statquest BAMMMM!!!!!!
@@Ex_Arc :)
@@statquest DOUBLE BAMMM
@@prashdash112 Thanks! :)
This guy has united his two passions-Machine Learning and guitar.
Yes! :)
and mice :)
and singing & composing! Loved the intro in this video :)
and saying "Bam"
Josh! How about that transformers video? Eagerly awaiting your humorous and mad explanation skills. Perhaps how it relates to its predecessor models? Key Query Value bit would be great as well. Keep on rocking it.
LOL What a way to present dry material with a dry approach yet making it interesting and easy to follow :-) Great job!
His dryness went full circle XD
I went from BUMMED to DOUBLE BAM in six and a half minutes. God bless you!
Hooray! :)
I did the same in just over three minutes with increased playback speed! BAM
Notes for myself: Def. of Bias: The inability for a machine learning method to capture the true relationship is called Bias. Def. of Variance: The difference in fits between data sets is called Variance.
M-m-Morty huh? Learning some m-m-machine learning? Your grandpa rick would be p-p-proud of you **burp**, Morty.
Thanks Morty for ur short note, which helps me to understand the definition more clearly. Good luck for ur adventure with ur crazy Grandpa
Thank you Morty
@@BrandonSLockey That doesn't sound like something Rick would say! He'd probably berate Morty for trying to learn this and then go on a soliloquy of how nothing actually will ever matter :D
These two definitions are completely counter-intuitive for me, have to re-define them for myself constantly. Because, bias sounds like the model is biased to the training data, but the bias in the definition is towards the model's assumptions (i.e linear model biased towards linearity). Variance sounds like the model's variation from the training set data (creating high variance), but the definition refers to the large variance of the error values (i.e residuals) when the model is fit to new data. Hope this helps if your intuition is similar to mine.
I wish professors taught like this!! Such clarity - I am so thankful to you.
Wow, thank you!
You're like the postal mailman of online videos. Neither snow nor rain nor heat nor gloom of night can stop StatQuest!
i read it as post malone
you have watched enough Seinfeld , haven't you?
Wow this was so straight to the point with great visuals that I managed to figure out all in one go! Great stuff!
Awesome!!! :)
I am currently in a trainee program to learn machine learning...my teachers suggested this channel. This is awesome
Welcome aboard!
I have watched many of Josh's videos several times. Whenever I find myself trying to remember a concept, I know that a StatQuest video will sort me out in 10 minutes or less
BAM!
*Opens StatQuest Videos* -> Automatically clicks 'Like'
So much of quality content on Machine Learning!! I wish I knew about this channel a bit before. A must follow channel for ML & DS enthusiasts. Great job Josh :) Please continue the good work and serve the humanity!!
Thank you very much! :)
The best and most interesting videos combine fundamental statistics, machine learning for beginners. The heavy textbook for statistics are so bored and after watching your series videos, I have a better understanding of many abstract things. Thanks, tons!!!
BAM ! Mindblowing how clearly explained these videos are, with even a sense of humour and some home made music. Really nice work, hats off.
Wow, thanks!
you just did it in a perfect way. I've read blogs, "best ML books", and other resources, but you just nailed this. thank you!
Thank you!
After watching more than hundred of videos on machine learning, i find your way of explanation very easy to understand and digest. Plus, i am really amazed with the way you start your lectures and wait for 'BAM' to come.
Wow, thanks!
The world of learning is still enjoyble cuz of people like you are still present
Thank you! :)
From Intro to Statistical Learning with Application in R. I fully grasp the picture of Bias and Variance. In addition, flexible techniques vs less flexible techniques now cement into my memory, before I just crammed the terminology without knowing exactly what it means. I will be a constant goer to this channel
I have paid for courses on edX and also have many free resources available to me through school- nothing has explained Bias and Variance as quickly and efficiently as you have in this video. Thank you, thank you, THANK YOU!
Hooray! I'm glad my video was helpful.
My masters course in ML has been challenging. Getting washed over with lots of maths with greek (I've only taken calc I) and statistical jargon (never taken stats) when I am a simple computer science pleb has made class really hard. These videos are making light work of looking past the confusing figures and long-winded over-technical lectures! Thank you, Josh. Thanks, StatQuest!
Hooray! I'm glad my videos are helpful! :)
How on earth did you get into a masters of ML without more background in relevant subjects?
@@mitchellsteindler I'm looking back at my previous reply and see that it sounds like I'm doing a masters program in ML. What I was trying to say is that I was taking an ML course in my masters program. My program is just computer science :) But I passed my class with an A with big thanks to these awesome videos!
@@BenStoneking ah okay
Best, most intuitively understoood, explanation of this that I've ever seen!
BAM! :)
Amazing video, love the clarity and simplicity.
It couldn't have been made easier to understand these concepts.Great job, I hope your journey to making abstruse concepts easy to understands doesnt end here
I hope not! :)
You're gifted to turn unclear concepts to pretty clear ones. Baaam!
Thank you! :)
This guy is awesome... this video actually explain bias and variance To Me finally. I have watch lots of other video but it was this video who taught me this concept
Awesome!!! Thank you so much! :)
the best video so far on bias-variance tradeoff.
Thanks!
Thanks man, i do not know what the start was about, but your video really helped me. Thanks
I have understood not only the Bias and Variance, but also even more ML terminology that has been quite difficult for me to understand until this point! Keep it up brother! Very good job :)
Awesome!!! :)
You are probably the best resource when it comes to understanding the fundamentals of Machine Learning... like it's not even close
Thank you! :)
hands down the best explanation that i have ever seen. plus the humour is soo good
Thanks!
Thank you, Josh, for this wonderful video on Bias and Variance in ML. It was a great visual-heavy explanation and the explanations were made very clear for these two concepts!
Thank you very much! :)
Josh , I don't know which i love more, your songs or your lessons on stats. You're amazing.
bam! :)
Very concise and easily understandable video. In the past I have read this topic in books and seen other videos but never understood bias variance so clearly earlier.
Thanks! I'm glad my video is helpful. :)
This is some quality educational content...Keep up the good work brother!! Definitely gonna buy some merch to support the channel!!
Awesome! Thank you! :)
Brilliant and clear and concise explanation: the best i have seen!!! Congrats and many thanks.
Thank you! :)
Currently reading the Intro to Statistical Learning with Application in R and I can't tell you the number of times I've loaded up one of your videos to help me understand one of the concepts such as Bias and Variance because they do a poor job in explaining for a broader audience. Please keep it up!
Hooray! One of my long term goals is to "translate" most of that book into StatQuest videos. This was the first, but I also just put out a vide on Ridge Regression and will soon put out a vide on Lasso Regression.
Literally doing exactly the same thing
I was searching Bias and Variance for the same reason. Thankfully I found this channel!
Came here for the exact same reason lol
I do love the way you explain and the way you keep people alert to upcoming information
Thank you! :)
This is absolutely brilliant M8, crisp, clear and very concise. Well Done!! You've got one more stat fan now!
Hooray! Thank you very much! :)
Wow.. Go through many blogs.. Watched many videos and asked n no.of questions in quora and other platforms, but your single video (less than 7 minute video) explained well.. Really Thanks man.. Done a great job..
Thank you!!! :)
awsome and very clear explanation!
My new favourite channel to learn the fundamentals of ML. Plus you use R!!! 🔥
Great explanation in simple terminologies , Thanks !
Thank you so much for this video at this special moment! I hope you can keep safe during Florence hurricane! Good luck to you and the Carolinas!
Thank you! We got a lot of flooding, but I stayed dry and now the sun is shining again. :)
Just found this channel today. Also making my way through ISLR. They have a great video series to go along with the book, but still pretty technical. This channel is a god send. Thank you!
You're welcome!
I will comment on every single video of yours. Just to show how much I love your teaching style.
bam!
Sir I must say you are the Gem. This 6:35 Mins video has taught me what our Phd Dr. 3 Hrs with 50 slides cant.... Hat's off
bam!
You should sell these videos as DVD sets. I bet a lot of educators would buy them.
Great video, very clear. Also, the graphics are intuitive. Thank you!
Thanks! :)
Thank you for your work Josh, I learn more from your six and a half minute videos than I do from six and a half of hours of textbooks and classwork
Glad to help!
Thank God I found this channel! I understand 2 hour lectures under 10 minutes - Thanks StatQuest!!
Happy to help! :)
Such a GREAT video on bias-variance trade-off. Looking forward to your lectures on regularization and boosting~
Great job man! Seriously, you made my journey in data science easier 👍
BAM! :)
this is probably how education is gonna be in the future, thanks a lot!
:)
WHAT IS STATQUEST? MY LIFE SAVER
:)
very clear, no extra unnecessary "noise". I really enjoyed this lesson.
Awesome, thank you!
I loved your composition Miss Carolina. You have amazing voice Sir!
Thank you very, very much!!!! :)
Dude you are awesome, this is my first video that I have seen from your channel. Plan on watching your other videos as well. Such great visualizations. just wow.
Thank you very much! :)
Very simply and amazingly explained, saw many tutorials but this was by far the best. Thank you :)
Thank you!
Thank you very much for this video! I am learning a lot from it and it helps me understand what people mean by Bias-Variance tradeoff!
Thank you very much! :)
3:09 psst. I can listen to this all day.
:)
Thank you so much for all of your videos. I'm watching them all in a row. All the subjects are so clearly explained ! Thank you very much from France !
Thank you very much!!! :)
I got to the point where I first check statquest if I come across unfamiliar topics. Thank you so much for all of your hard work!!!
Bam! :)
Paid thousands of dollars on Udacity, but ALWAYS have to come to your channel for a clear explanation. Love the way you explained all these complicated concepts Josh :) (Btw, we met at IVADO's 100 Days Event haha:) )
Hooray! I'm so glad my videos are helpful and IVADO's 100 Days Event was super cool. :)
Your videos are AMAZING!! Thank you Josh for being such an inspiration :) Have a wonderful weekend! :)
Tons of Thanks for You..your videos are really nice..pls do the video on regularization soon..
I should have the first video on Regularization out in the next week or two. :)
👍
MAN!!! i was reading about bias and variance trade off, but not a word got into my head...this video made it beyond clear!! thanks a ton!!
Hooray! I'm glad the video was helpful. :)
Thanks for all your videos, I will go through all of them! You are the best!
PERFECT AND CLEAR!
Awesome!
youtube should give option to add thousand likes. Your channel beats paid ML courses out there hands down.
Thank you very much! :)
You are a scholar and a gentleman. Thank you for explaining what my lecturer tried to explain in 2 hours in 6 minutes.
You're very welcome!
What an outstandingly simple and intuitive explanation, bravo!
Thank you! :)
You’re on my list of guys I’ll buy a beer for if I ever see in a bar. You, Jeremy Howard, and the folks over at Deep Lizard.
Wow! Thank you very much! :)
Hi Josh! You are the "God of ML and Stats". You really made me fall in love with these subjects. I had a query. According to you, if we cut the data into training and testing sets, what % should be assigned to test? I think it should vary with the amount of data, but is there a thumb rule?
There are a handful of "rules of thumb". One simple one is if you do 10 fold cross validation, then you divide your data into 10 equally sized bins (see the StatQuest on cross validation: kzhead.info/sun/mbeypN5_rql4ia8/bejne.html ). Another standard is to use 75% for training and 25% for testing. This is the default setting for Python's scikit-learn function train_test_split().
Simple yet concise explanation, thank you! Very helpful.
Thank you! :)
this was so straight to the point, with some great visuals that I managed to figure out all in one go! BAM!!!!!
Hooray! :)
Man, you're very didactic! For each statement, there is a 'because', so that your students never ends with a question mark in the head. Besides that, you don't mind to repeat the because's again and again in different ways, and that's what make things clearer. Why can't teachers, coaches, tutors realize that? Triple BAMMM!
Thank you very much!! :)
Have watched many of your videos and that have forced me to write a comment, Stat Quest is AWESOME!! and @Josh Starmer, I am you fan. The way you begin your videos and go about explaining some of the most difficult concepts in Statistics and Machine Learning is GREAT. Many books and tutorials mention making the complex simple, but rarely do so. This channel is not one of them, it truly makes things simple to understand. I have just one request (i think most of your followers would agree to this point), please write a book on Machine Learning and it's application of various algorithms (may be a series of books).
Thanks so much! If I ever have time, I'll write a book, but right now I only have time to do the videos.
its amazing how 6 minutes video did a far (and i mean really far) more better job in explaining the concepts than hours spent on articles that did nothing but increase confusion. thanks a lot for sharing this... much luv
Wow, thanks!
Woah your original songs are beautiful too'
BAM. Subscribed.
One of the best videos I have come so far
Thank you!
I love that you explained why you square the differences! Most people don't bother explaining that and it always seemed strange to me.
Thanks!
great work, so funny
StatQuest terminology : Bam with a high tone means this is the point you should understand. Little bam means something more important are coming. Double bams means at this point, you should be enlightened.
That's perfect!!! You made me laugh out loud. :)
@@samarthgoel1671 I think Tiny Bam means "boring but important."
@@statquest TRIPLE BAM
People are letting youtube rabbit holes radicalize them and here I am finally gaining a deeper understanding of complex stats
bam! :)
Man I have been trying to learn this for 1 month finally I found this video no video on Internet beats this.
bam! :)
@@statquest Double Bam ! AI in Nepal has your spark of knowledge.
@@aashishkarn That is awesome! Go for it! :)
perfect video doesn't exist... wait nvm, found it!
bam! :)
Who on the EARTH disliked this video? Probably other content creators...
It's always a mystery why someone doesn't like StatQuest. Maybe they couldn't handle the BAM! :)
@@statquest could't agree more xD
Wonderful presentation, explanation and the effort you put in visualising every step...Thank you Josh!!
Glad you enjoyed it!
you are so good like one step destination for learning this sort of statistical concepts in ML..........eagerly waiting for the regularizatuon and boosting videos..thanks a ton
linear regression (aka least square) finally, now I can die in peace. you explain things in very nice way.
Thanks!
Also, can you tube customize the like button to BAM!! that would be Great.. ;)
That would be awesome! :)
Thank you very much, your explanations are really clear and to the point. Looking forward to the regularization lecture.
The first one, one Ridge Regression, is coming along. It should be ready by either this coming monday or the next. Lasso and Elastic Net Regression will follow.
All your intro music give me a feeling tat the concepts are easy to understand....thanks you for building tat confidence.
Hooray! :)
I could simply replace my tuition payments with payments for a KZhead Premium subscription. Much cheaper and easier to study :D
bam! :)
You are the male version of Phoebe Buffay!!! 😁
you are seriously the absolute BEESSTT when it comes to teaching this things... thank you very much
Wow, thanks!
This is an excellent series of machine learning, and I especially like the song at the starting of the video. Thank you statquest❤️
Glad you enjoy it!
Bam ! Double bam!!
Hooray!!!! :)