Thanks to Secret Lab for sponsoring today's video! Check them out at lmg.gg/SecretLabTQ
Learn how a CPU core works.
Leave a reply with your requests for future episodes, or tweet them here: / jmart604
►GET MERCH: www.LTTStore.com/
►SUPPORT US ON FLOATPLANE: www.floatplane.com/
►LTX EXPO: www.ltxexpo.com/
AFFILIATES & REFERRALS
---------------------------------------------------
►Affiliates, Sponsors & Referrals: lmg.gg/sponsors
►Private Internet Access VPN: lmg.gg/pialinus2
►MK Keyboards: lmg.gg/LyLtl
►Secretlabs Gaming Chairs: lmg.gg/SecretlabLTT
►Nerd or Die Stream Overlays: lmg.gg/avLlO
►Green Man Gaming lmg.gg/GMGLTT
►Amazon Prime: lmg.gg/8KV1v
►Audible Free Trial: lmg.gg/8242J
►Our Gear on Amazon: geni.us/OhmF
FOLLOW US ELSEWHERE
---------------------------------------------------
Twitter: / linustech
Facebook: / linustech
Instagram: / linustech
Twitch: / linustech
FOLLOW OUR OTHER CHANNELS
---------------------------------------------------
Linus Tech Tips: lmg.gg/linustechtipsyt
Mac Address: lmg.gg/macaddress
TechLinked: lmg.gg/techlinkedyt
ShortCircuit: lmg.gg/shortcircuityt
LMG Clips: lmg.gg/lmgclipsyt
Channel Super Fun: lmg.gg/channelsuperfunyt
Carpool Critics: lmg.gg/carpoolcriticsyt
Fun Fact ! - The Harvard Mark I from 1944 used a 3.5 hp electric motor running at a constant speed to act as the computer's "clock" and could mange up to 3 (THREE !) additions or subtractions a second. A single multiplication took around 6 seconds, a division 15 seconds and a log or trig function around about a minutes. The miracle of the age !
Talk about perfect timing. We were going over ALUs and registers in one of my classes just last week.
Yo same wtf
Perfect timing is the 4:20 runtime this video has 😏
Good luck on ur studies!
hey fellow microprocessor learner
Hi microprocessor learners, I have already learnt x86_16 bit architecture and currently learning MIPS architecture.... Which one(s) are you guys learning?
i think an interesting video would be how a core is different than a thread, any why some CPUs support 2 threads per core and other CPUs do not.
Some server CPUs can actually do 4 threads per core for some specific reason.
The principle runs as follows: * A core has both an FPU and an ALU which can do one thing each per clock cycle * With a single thread, you have a single instruction per clock cycle. So either the ALU or the FPU will do something, but not both. * It would be great if we could keep both the ALU and the FPU busy/active on every clock cycle. * So, add a second thread. This means a second instruction register and so we can load two instructions at once. If one of the instructions needs the ALU and the other needs the FPU, hooray! We can do two things in the same clock cycle within a single core. * The obvious catch is that if both instructions need the ALU (or FPU) then one of them is going to have to wait a cycle. That's what makes this different from having two threads on different cores (which would execute independently of one another, in some vague sense) This is why sometimes it's smart to turn off threading. Real CPUs take this a lot further, by having multiple ALUs/FPUs and other such units, inter-core handoffs to avoid stalling, etc. There are related areas relevant to understanding this: * Super scalar processing * Pipelined architectures (all but the smallest embedded cores are pipelined architectures these days) * Branch prediction * Instruction predication * Vector processing * SIMD/MIMD instructions * CISC Vs RISC memory instruction design (not the commonly misunderstood CISC/RISC debate) And a whole bunch of other stuff.
(I agree, these would make great videos :) )
@@EdNutting I appreciate that explanation. Makes sense to me.
In a nutshell, one core can support multiple threads when there is more than one instruction pointer. As far as I know every other component inside a core is then shared between those threads, that is, when one of the threads is not using parts of the core, the others are free to use it, if they can. But keep in mind that one thread can be executing a completely different program than the others.
"more cores more better" -linus a long time ago
“Fast enough to help you watch this video” *Wise words*
@Prince Cooper Everyone knows that.
This should be expanded upon in very technical details... we need a hour+ long CPU core video on LTT (or series of shorter vids)
True we really need a "Turbo Nerd edition" of this video.
We need a TechLong
An hour+? Honestly, that's nowhere near enough considering today's CPU's complexity! Just the complexity of modern pipelines with execution units, macro- and microoperations-elision (or, fwiw, the concept of microarchitectures), dependency resolution and out-of-order-/parallel execution, branch prediction, vectorization, TLBs, and so on and so on and so on... Modern CPU architectures are insanely complex and giving you "very technical details" would result in an insanely long video series. Just sayin'. :-D Also, if you want to get into CPU architectures, I fully agree with the above comment: Ben Eater builds a complete 8-bit processor in a video series - of just over 24h.
If you would like a few hours of content introducing all this, and 24hrs of livestream building a complete simple CPU in Minecraft (to a real specification), please consider watching my videos.
I would want something that will explain how the whole CPU works, easy enough for everyone. GPU could be nice too.
In the early days things were easier, one core running very slowly directly connected through motherboard lines with memory. Cache? What was that! That aside ;) yeah it is good that we now have IO controllers, memory controllers, TTM's and a whole lot of cache. From L1, L2 and L3. All to make sure your fancy cpu's actually can push out the data as quick as possible. Without cache we still would go very slow indeed.
Have you seen IBM's new Power10 architecture? It is a nice demonstration that the assumption RAM is slower than L2 cache is highly questionable, at least when looking at the cutting edge of design and technology.
I think if this video had been presented based on a very old school CPU then it could have been much clearer
@@sandmaster4444 It is based on old CPU technology. I learned most of this in the late 1970's in college. We had a lab in which we had to build our own processors using ALU's, FPU's, I/O and a basic CPU.
It's kind of a shame though that many programming languages (most prominently Java) force the programmer to allocate every object on the heap, thus reducing cache locality, so that the cache cannot actually be used to its full potential.
@@ThePC007 Now read The Garbage Collection Handbook, along with a few Intel, ARM, RISC-V and Java manuals, the papers on the Java GC designs over the years and the reports on cache locality for Java V8. Heap allocation isn't the problem. It's a very naive understanding of the hardware and software stack (firmware to hypervisor to OS kernel to Java to application characteristics) that leads people to make this age old C-based assertion about performance that hasn't held true for a long time. Source: I read the above, studied (by measurement and simulation) and designed/implemented hardware and software GCs for my PhD.
I'm reminded of the time when we had to build a rudementary 8-bit ALU out of individual DIP type IC's of logic gates and memory. Needing to wire up power, clock signal and 8 data wires between each IC. Good times.
Sounds fun. No sarcasm.
We did that in my class but with FPGAs. Sounds easier, in theory. But we spent most of the time debugging the firmware to get the things to actually flash. I miss the days of physical transistors and solder.
Btw this falls under which subject? Computer science?
@@naono9715 Electrical engineering and computer engineering. Edit: computer science is more about the theory of computing things.
@@coder0xff Thanks.
There are different implementations of CPUs. The accumulator type is typical with x86 (Intel and AMD) CPUs. Other implentations store data on RAM entirely without the need of an accumulator or registers, others like MIPS (old game consoles and other computers) primarily use registers only. There's also different implementations of how instructions are read- one instruction at a time keeps things simple but logs up waiting instructions, multicycle allows reuse of parts of the CPU for each instruction but in some cases can end up increasing instruction time, and pipelining divides the CPU into sections so that multiple instructions can run on each stage of the CPU, but with added complexity and the need to ensure an instruction isn't waiting on the result of a future instruction still being completed.
Agree with most of this, but, modern x86 can use many registers to store results, which is is not the same as the original "accumulators" where you only had 1-4 registers to store results from an operation.
How would it work in ARM cpus?
I never knew CPUs had a different ALU for Floating Point numbers. It actually really makes sense, both an Integer and a Float are represented as a 32 bit number, the difference is the mathematical operations on them
Things are even more interesting when you add SIMD to the mix. SIMD is a very "novel" way to use an ALU, where you use an 128bit ALU that can process for example four numbers of 32bit in parallel. Like instead of telling it to just add two numbers, you tell it to add two "packs" of four 32bit numbers, and it will perform four additions in the time an 32bit ALU would do one. It takes a lot less space on the CPU than having four separate 32bit ALUs because it's only one instruction.
In modern systems, they're even more divided than just ALU and FPU, and sometimes single cores have multiple of each to use with out-of-order execution (and programs that don't work well with this can reclaim lost efficiency with hyperthreading).
Fun fact, the FPU works on 80 bits and usually converts back and forth during use. (This is specific to the x86 ISA.)
Integers can be 64 bits just like floating point numbers. They're not limited to 32 bits.
You should do a video explaining the “bridge mode” on modems and the pros and cons
good Idea. Would be aswsome. Also could we get a video explaining how a gpu works
It'd be good to explain it, and specifically it'd be useful to know why "pros and cons" isn't actually a relevant question when deciding whether to use bridge mode or not.
Along does is disable inbuilt routing functions such as : NAT, firewall, DHCP, WiFi AP
I needed this video exactly last year today,
Where is the smoke kept? I've accidently let that out before and couldn't figure out where to put it back in.
I hear IBM sells magic smoke refills. Good to have a couple of bottles on hand.
It *is* the smoke.
What do you have, a sorcerer's computer? lol
Electrons!
Don't forget the cpu 'quar'ks as well!
@@PlanetXtreme Haha I love Science puns
Topic suggestion: how high level code gets converted to CPU instructions. Always fun to see under the hood.
What ever code you write, you compile it to a program or the code gets compiled during runtime. The compiled code is bytecode, 0's and 1's and is stored in the memory like any other program. That process is completely seperate from how the CPU recieves instructions. So what you are asking is a three part question: "How do compilers work?", "How does a program get loaded into the RAM?" & "How does the CPU get its instructions?" But it would be a nice video, not sure if it would be short enough for a Techquickie though.
@@theFirstAidKit i was about to say this
Converting high level code to machine instructions is a nightmare. First you have to interpret the meaning of the code, then convert that into basic manually coded blocks, then optimize the code by checking against known patterns which have simpler solutions, then finally output the final code. Don't even get me started on vectorization, garbage collection, Virtual Machines, register allocation, system calls, etc which fall in between there. I'm no expert by any means, but I know enough to know it is astoundingly complex.
FWIW, contrary to other commenters, your topic is relevant and not a thing purely of parts. One cannot understand computer architecture without understanding the programs the architecture will execute and thus the compilation process and practical compilers. Similarly, you cannot understand a compiler properly without an understanding of architecture. The two topics have to be addressed in tandem and coherently. Sadly, places like The University of Bristol have forgotten this, leaving students in a right old mess! Here's hoping channels like LTT, if they tackle this topic, don't make the same mistakes.
Great topic and coverage that explains the basics about a CPU. I love these types of videos because Enthusiast and students alike will be referring to your knowledge for a very long time. Great work!! Cheers
It was pretty interesting to watch. Really informative. Though, I would say a music was a bit loud, but it keeps you awake ;D Keep it up!
me an illiterate in cpu: core is brain, but not really
I was just talking about how CPU does things with my friend, and was searching for resources to explain. This was the perfect timing.
Bro you don’t need to lie
Bringing back my (vague) memories studying macro assembler language. Masm. All that pushing and popping! Thanks, nice piece.
When you process it just makes everything better. 🤗 Thank you guys for such awesome channels!
IPS, VA, and other panel types of displays Tech quickie needs an update
crt
@@KokoroKatsura grandma forgot her meds again I see
I work in this industry so I know almost everything in these videos, but I LOVE seeing the simple explanations. It takes a lot of skill to teach complex things. For anyone wanting to learn more, computer architecture is a fantastic course.
It's funny to me that in computing, "high level" actually means more basic and accessible amount of knowledge or communication with the computer
I'm curious what LET'S thoughts are on the RISC-V architecture. Starting learning about it and how the architecture deals with memory and what your opinions are.
I love these types of videos
taught my first 2 weeks of computer engineering lecture in 5 minutes gotta love it
The parts of the CPU that are not the actual processing core are called "uncore" ( en.wikipedia.org/wiki/Uncore )
That's actually kinda funny.
We need an encore video about uncores!
Encore encore encore!
First, you should fix the link because it accidently included the ')'. Second, that term is only used by Intel and no other CPU company.
I do recall doing some assembler on the Acorn Archimedes (A3000) back in the day. It was so simple, and elegant - that programming in assembler was a joy! Nowadays I can imagine it would be hell!!!
Now explain Pipelining and Branch Prediction. ;) Anyway, good job explaining without the giant flowcharts of LC-3 that you study in college.
There is really interesting talk by Chandler Carruth called "Going nowhere faster" here on KZhead which covers those topics in a really practical way
lc3.... curious how many schools actually teach LC-3. My computer architecture professor was one of the creators (Patt), is it used at other schools other than University of Texas?
@@Brandonforty2 UIUC taught it in the computer architectures course.
That's cruel :D
I like the fact that most of these videos are 4:20 in length :`D
Given the time constraints, you did a wonderful job. I'll hold it down in the comments.
Quick reduction: Individual cores have L1/L2 cache attached to them, not L3 as show @ 2:45 L3 cache is shared by all cores and can in some cases even be on the motherboard instead on the CPU
So happy to see better titles in these videos
One of the best ways, I think, to learn how a processor works is to learn some 8-bit assembly. I haven't delved into z80 assembly, but 6502 assembly is pretty nice.
I found resources on the instructions, but idk how to find anything about using 6502 assembly in practice, like with a specific computer model where it needs to do i/o and communicate with the ppu. And I also have no clue what software even lets you write and emulate the code
Amazing video. As a tech savvy guy you never really think about details like this.
this guy literally taught me engineering in 4 mins than my stupid teachers
it's quite simplified but yeah. They didn't mention the initialization vector (which is an address hard-coded in the CPU where it will start executing code when it first powers on. Usually the BIOS is loaded at that location!) nor the difference between register vs stack based operation
ow the edge?!
Too bad about the English, though
i have a test on this next week thank you for this
That's a pretty good explanation. Thank you.
I think the music is a bit much, kind of distracting. Would be nice with some calmer music and maybe slightly lower volume. :)
This is the first video where i noticed, and got distracted, by the music. It’s definitely a bit much.
vsauce music is so well thought that even if it's really loud it's not distracting at all
Please cover this more in depth, maybe as an LTT video. Very fascinating stuff.
Cool Informative Video
this video has attracted more electrical/comp engineer students who most likely know most of this stuff from a digital systems class than people who are not familiar with these concepts. lmao happy studying my fellow nerds
Lol, this comment has identified me ;p (I'm a former PhD student in formal verification of computer architectures. I loved this video :) )
I've always described a core like this , the CPU is a call center building and each core is like a cubical where the agents carry out tasks 😂
The MMU is one of the most underrated parts of the CPU. It's the only thing physically stopping programs from reading and writing each other's memory. Without it, programs could just steal passwords from each other, and there would be a total system crash if a program started using more memory than the OS allocated for it.
Hehe, I like this comment. You've hit on my PhD topic ;) Fun fact: Modern MMUs suck for modern programs, but are great for the C-style programs of the 80s. Intel x86 and ARM are holding us back from the next leap forward in performance. The research exists proving it, and IBM Power10 has taken an interesting step towards resolving the underlying dilemma. If they carry on in that direction, we could see IBM Power architecture leapfrog Intel and ARM. Now there's an interesting prospect ;)
@@EdNutting Now I'm curious what it is that IBM Power10 has done to fix the problems with x86 and ARM. I've always seen ARM as a big improvement, since programs can work with more than just 6 or so registers. I hope we some day get a better instruction set, as I can easily see a future where we are still using (or at least emulating) x86 in a hundred years from now.
I also highly recommend reading A manga guide to microprocessors!! It's so concise and makes everything easy to understand.
sounds cool except it's manga
@@FlagerMiszcz Well it just makes thing easier that's all. The manga story is pretty shit and bare bones, it does the job of teaching basics of MP very well though.
I just got it from my local library days ago. XD
What inside a cpu core ? There is a factory of dwarfs that run your pc
There is a book called "The Elements of Computing Systems, Building a Modern Computer from First Principles" or more known as From Nand to Tetris That book starts from boolean logic and logic gates, then develop your own architecture, then your own assembler, your own high level language and the compiler, then a graphic operative system that works in your own architecture and finishes with developing games (tetris) for that OS. Is 200% recommended if you want to really understand how computers work
This is a pretty OK explanation of how a cpu works
dude perfect timing i started doing computer science and we started this last week
The cores are the CPU. Hell, every Operating system out there sees cores as separate CPU's anyway.
The OS *do* distinguish between logical and physical, though. I'm not sure how well you could even fake that (or if it's just a convention to "report the correct numbers").
This video taught me more about CPU's than my 1 hour CS lecture at uni
just read structured computer organization by tanenbaum smh, pretty short book only like eight hundred pages
Registers are usually both input and output, not just input. They are general purpose registrars. The result of a multiplication is stored in a register, that's where the application gets it from, not the cache.
I still don't know what a core is? just learned a lot about everything in the CPU, but still have know idea what the core is.
Bro that was the core. Having multiple cores means that that there are multiples of what groups were described. 4 cores means 4 of these. 6 means 6. 64 means there are 64 of these.
yeah, actually this video tells almost nothing about what CPU core actually is xD empty content
It executes a sequence of instructions provided by a programmer (a.k.a software engineer) to move and copy data around, do math, and react to different states by switching between different sequences of instructions. Through these activities, the computer as a whole (in addition to the CPU chip) interacts with us humans via Input and Output (IO) like a keyboard and monitor, stores data, and carries out algorithms to solve a given problem.
In much, much, much older processors these functions - if they existed at all! - were either just placed on memory address space and directly accessed, or provided by accessory microchips. SRAM memory was fast enough, and processors slow enough, that it only took one clock cycle to get memory from a register to RAM, or from, rendering cache unnecessary. It took until about the 386 for "RAM wait states" to become a problem, and the 486 for the first motherboards that supported cache. Because yes, cache was originally actual DIP microchips that you had to plug into your motherboard. If you wanted DRAM, you needed to have a separate control chip. Fun fact! The DRAM memory controller did not become part of the processor in desktop PCs until AMD's "HyperTransport" and Intel's original Core i-series processors. If you wanted an FPU, you needed a separate chip. In fact, in the earliest examples, even ALU functions were external to the processor. I mean, it could DO them, but it was VERY slow. For faster mathematical operations, and if you wanted floating point at all, you had to outsource it to your numeric coprocessor. This was brought into the processor in the 486. However due to yield concerns, a design defect, and market segmentation, some processors still shipped without functional inboard FPUs. And the vast majority of I/O that wasn't directly on the bus (which, due to rapidly increasing core/"frontside" bus speeds, quickly became "almost all of it") had to be handled by external chips as well. Fun fact! Your PC has about six different buses in it, all handled by their own little control chips.
Actually quite ok explanation. Not any mistakes that I noticed
I have my IT Exam tomorrow and this video drops now lol.
My last cpu had 8mb of level 3 cache. I remember thinking that was heaps back in the day, now I just got a cpu with 32mb of L3 cache! Insane!
Lmao those memes are just *chef kiss* amazing.
There's a great book called, 'But how do it know". Really into the core understanding of a pc.
I think a better explanation of a core is to talk about a basic pipeline because I think that’s a better high level idea of a CPU core than this. Thank you for making this video though, I love hardware engineering content!
This is way better than my school's outdated material on the von newman architecture
Nothing mentioned in the video is new.
Good content. Not too jokey. I like it.
I would like to learn more about the history of mobile processors. All the way from texas instruments to spreadtrum to tegra to snapdragon, mediatek etc
Acorn Computers UK invented ARM in the 80's (Acorn RISC Machine) The film Micro Men covers that time very well. 1 person wrote the entire instruction set by her self. Then Olivetti bought the computer part of the business and ARM (Advanced RISC Machine) was formed to licence out ARM designs. Acorn ARM Desktops were in schools in the UK during the 80's and 90's That's the whole story (none of those company's you mention had any part of it) They won't cover it because every Englishman over 35 would be able to poke holes in every detail they missed but this should give you a good start.
Can you put links in the description for the videos you refer to or point at.
I love how tech quicky breaks down complicated tech topics like an ELI5 subreddit 🤣😉☺️
The code you showed at 1:03 is C#... That made me happy
I love videos like this
These video times are unbeatable
i study computer engineer i love how i know these things and see it again in an entraining video
Ayo that music a bit annoying
I could be wrong, but I don't think your "output" of the equation is accurate. The output doesn't get sent to the cache on its own, but only when the data is copied to ram (via another instruction). So, the next instruction could either use the data straight from the accumulator, or pull data from ram (which would inherently pull from the cache first)
You're both slightly wrong. In general, it can be implementation-dependent. ALU output does typically go to a register. Although common opcodes write results to a register, some instruction sets have opcodes that write directly to RAM, which would get cached in a data cache if there is one.
Computer Organization class is just now having us makes Datapaths using these elements, convenient timing!
Why pick an accumulator architecture? Neither x86 or ARM are accumulators?
this video doesn't even try to explain how much variation there are in CPU architectures, it's too dumbed down
after seeing this now i have question, what's the graphic core inside APU doing when we use external GPU? did it does nothing or it handles another thing?
Something I've always wanted to know is how we teach computers what binary means. Like at its very basic level 0 = off and 1 = on, but how do we make a piece of silicon understand that?
If you send a voltage beyond the threshold voltage,the gate turns on else off
That's electrical engineering and digital circuit design. Each transistor is like a light switch, with the critical difference being that instead of turning on and off lights, they turn on and off *each other*. Teaching them on and off isn't a good way to describe it in the same way that you don't teach a light switch how to be on or off. But if you connect enough of these transistors together in just the right way you get a CPU.
This was cool :)
2:08 I have one. I think the seat cushion is too hard. No more comfortable than sitting on a sheet of plywood.
Impressed, no criticism. Thumbs up.
Teaching Computer Science to tech enthusiasts and sells Sponsorship... . Clapping for the buisness model👏👏
this is a very interesting video i am amazed
Can you guys do either a techquickie or a turbo nerd dive into both how a cpu actually works and/or what exactly a cpu die is?
I am prepping my uni presentation about 'what a CPU core does' fully basing on this
That might not get you a very good grade... safer to go by textbooks and Wikipedia articles. This video is so dumbed down, some of it is wrong. And it misses a lot of important details.
make a video on all the steps that happen from pressing the power button to os booting
".. fast enough to help you watch this video" Sir you underestimate the power of my PC!
nice video lenght
Hi James!
ALU doesn't store result on "cache"... it's an registry for that.
they didn't do a good job with the details there, and they should have said "data cache" (to distinguish it from an "instruction cache")
Can we talk about the music at 1:09? LOL
Basically a simplified version of learning about the Von Neumann architecture in College
Had to look in the comments for mentions to Ben Eater. Was not disappointed.
So which way does the cpu go in?
Literally this is basically what I wanted explained by a teacher the first time I went into the programming class.
why is this music so funky!
Hmm, is it true that the MMU actually resides off core? That does not seem like a typical configuration in any microarchitectures I'm aware of -- perhaps the video meant to refer to an integrated memory controller (which is definitely off core), rather than the MMU?
Is it normal for my CPU to become sentient and hold my family hostage? What am I supposed to do when this happens?
How about a video on high precision event timers and what the hell they are.
Designed a core for my Thesis and have to submit it tmrw. Coinicidence ? xD
WHERE DO RISC-V, CISC, ARM, AND x86 FIT INTO THIS PUZZLE?!?
The music in this episode made me feel like I was in a Guy Richie film.