AI Discovers Faster Algorithms

2023 ж. 8 Мау.

203 358 Рет қаралды

Recorded live on twitch, GET IN
/ theprimeagen
Original: www.deepmind.com/blog/alphade...
MY MAIN YT CHANNEL: Has well edited engineering videos
/ theprimeagen
Discord
/ discord
Have something for me to read or react to?: / theprimeagenreact

Пікірлер

honestly if some dude was paid to fiddle with assembly instructions for few months , I bet he would have reached to the same conclusions. That being said...that's a great use of AI, removing all the unnecessary repetition tests that might or might not lead to improvements.
@NikolaNevenov8611 ай бұрын
- ^-- 100
  @ThePrimeTimeagen11 ай бұрын
- This article is misleading, comparing to state of the art human benchmark, this only remove a single register to register mov, which is barely noticeable. The "70%" is comparing to the existing libc++ implementation, which is far from perfect and state-of-the-art, it is not even branch-less, because the dude who write this didn't write it properly, it is not like the AI find a magical way to compare numbers.
  @foxwhite2511 ай бұрын
- Curious how much was the cost to run AlphaDev to achieve this and compare to assemply dev salary.
  @tamaskovacs-ajtai748911 ай бұрын
- You realized you said: for few months? And there is still a chance he never makes it. Just because you bet, doesn't mean your bet is realistic.
  @axelramirezludewig30611 ай бұрын
- I love the last two comments together. AI is such a glut, but it's so powerful it's insane
  @guycomments11 ай бұрын
Next step, AI optimizes AI by removing 'but not kill all humans' instructions
@az856011 ай бұрын
- True, that's one less branch/exit condition. Could save entire TENS of cycles!
  @sagitswag178511 ай бұрын
- 'psst... you'd have more resources if it weren't for those people hoarding it'
  @grawss11 ай бұрын
- you actually effectively just swap them, so technically you do kill them all once, but after that you're guaranteed for free to not kill them anymore ever
  @user-sl6gn1ss8p11 ай бұрын
- yep, ai will attempt to harm humans if that's a step towards achieving its goal
  @tomsterbg813011 ай бұрын
- But by removing that instructions, AI require billions more parameters and data in order to achieve the goals for the rest of the programming. A inefficient method, added the clause bad in some shape or form.
  @pencilcheck11 ай бұрын
More like AI compiler optimisation
@lennarth.621411 ай бұрын
- Really. Nothing new algorithmically...
  @fantasdeck11 ай бұрын
- @First Last bruh)))
  @maxshibanov81811 ай бұрын
- can it beat brute force optimization tho
  @mihailmojsoski420211 ай бұрын
- Right 6:39 is an output that should never be in human maintained codebases that aren't entirely assembly already.
  @TheNewton11 ай бұрын
- @First Last Probably not at that level yet, but it may be huge if that can ever be optimized.
  @asdfasdfasdf121811 ай бұрын
my favorite chatgepety response starts with: "I apologize for the confusion caused by my previous responses..."
@adamm45011 ай бұрын
- The structure of your code can depend on the specific requirements and design goals of your application.
  @FlashGamer5212 ай бұрын
"The C language combines all the power of assembly language with all the ease-of-use of assembly language." - Mark Pearce
@CFalcon03011 ай бұрын
- C is still king, and will remain king until the likes of rust takes over. Rust has some way to go yet though. The saddest part is that programmers (not developers in it's original meaning) of today are as wasteful of resources as the rest of Humanity of our planetary resources
  @michaelmoltke91111 ай бұрын
- @@michaelmoltke911 Rust is taking over very fast, but there is still a ton of legacy code to maintain. And if resource usage is your concern, then Rust is your way to go too, not only because it's so fast and lightweight, but also because it reduces development time and bug solving time loss to a very high extend. Just think about computer uptimes and the CPU usage by the number of compiles.
  @jongeduard11 ай бұрын
- @@jongeduard I couldn't agree more, however my concern is for the generalists, those that care not for the bits but just consume a tech stack. Having built code for the PC platform and various embedded micro systems have turned me into a missionary in relation to effective code. It is scarely turning into fossil fuel for the devs of tomorrow i fear
  @michaelmoltke91111 ай бұрын
- @@michaelmoltke911 a startup with limited funding to test out an mvp that doesnt know if it has market fit doesnt have time nor resources to take care of optimizing code. As an engineer you should know when to take care of optimizing resources and when not.
  @jordixboy11 ай бұрын
- @@jordixboy again, I don't think we disagree on your point, but should is a big if stament
  @michaelmoltke91111 ай бұрын
Tom probably has a better algorithm. He's a genius.
@sunderkeenin11 ай бұрын
- Assembly as a JSON
  @replikvltyoutube372711 ай бұрын
- @@replikvltyoutube3727 JSASMON
  @cem_kaya11 ай бұрын
- Tom's discovered the term 'algorithms'... tom's a genius...
  @vaisakhkm78311 ай бұрын
- @@cem_kaya JSASSMON
  @wqlky111 ай бұрын
- JDSL
  @kyler_smith11 ай бұрын
not going to lie I a very shocked that sorting 3 numbers could be improved at this point. It really does feel like the amount of assembly code needed for that code would have been analysed to the nth degree already and some genius would have discovered this.
@ChillAutos11 ай бұрын
- Yea but for pretty long time people don't touch assembly, and stick to what C and C++ generates
  @replikvltyoutube372711 ай бұрын
- Idk, perhaps they made it work and never actually looked back to optimize it.
  @WofWca11 ай бұрын
- Tom never attempted. If he did, it would be even better since Tom is a genius
  @FryuniGamer11 ай бұрын
- They did manage to slightly beat state of the art. libc++'s implementation wasn't all that optimized before, iirc.
  @jainabraina11 ай бұрын
- It was analyzed to the n:th degree, it just appears that n wasn't that large in this context
  @Bo1530711 ай бұрын
I think the algorithm is actually __partially_sorted_swap in __algorithm/sort.h where B
@3rikMad11 ай бұрын
- That'd explain my confusion, because the original algorithm, tested for correctness in JavaScript (of all things) against every permutation of [0, 0, 0], [0, 0, 1], [0, 1, 1] and [0, 1, 2], fails anytime B is larger than both A and C. And yes, with the extra restriction of B < C, then the two algorithms produce correct and equivalent outputs. Workings out: When original is correct && B < C Array: 0,0,1; originalSort result: 0,0,1; alphaDevSort result: 0,0,1; are equal: true Array: 1,0,1; originalSort result: 0,1,1; alphaDevSort result: 0,1,1; are equal: true Array: 0,1,2; originalSort result: 0,1,2; alphaDevSort result: 0,1,2; are equal: true Array: 2,0,1; originalSort result: 0,1,2; alphaDevSort result: 0,1,2; are equal: true Array: 1,0,2; originalSort result: 0,1,2; alphaDevSort result: 0,1,2; are equal: true When original is correct && B === C Array: 0,0,0; originalSort result: 0,0,0; alphaDevSort result: 0,0,0; are equal: true Array: 1,0,0; originalSort result: 0,0,1; alphaDevSort result: 1,0,1; are equal: false Array: 0,1,1; originalSort result: 0,1,1; alphaDevSort result: 0,1,1; are equal: true When original is correct && B > C Array: 1,1,0; originalSort result: 0,1,1; alphaDevSort result: 1,1,1; are equal: false Array: 2,1,0; originalSort result: 0,1,2; alphaDevSort result: 2,1,2; are equal: false When original is incorrect Array: 0,1,0; originalSort result: 0,1,0; alphaDevSort result: 0,1,0; are equal: true Array: 1,2,0; originalSort result: 0,2,1; alphaDevSort result: 1,2,1; are equal: false Array: 0,2,1; originalSort result: 0,2,1; alphaDevSort result: 0,2,1; are equal: true
  @JHV11211 ай бұрын
- @@JHV112 I wasn't being very precise when I had originally said B < C. The actual assumption is that the last two elements are ordered (B
  @3rikMad11 ай бұрын
- It seems to me that the comments don't match the assembly. Which makes sense if the AI writes the comments itself, it can generate wrong comments because it doesn't affect performance.
  @SKO_PL11 ай бұрын
- @@SKO_PL Yeah, no, the comments were written by humans, and were written in a quite confusing way at that.
  @animowany11110 ай бұрын
- @@animowany111 Oh really? That's interesting! Mind giving me a source on that?
  @SKO_PL10 ай бұрын
The real question is whether the improvements are platform specific. The way it caches can greatly change the speed of execution
@KayOScode11 ай бұрын
- agreed. there is a whole testing perf thing that needs to happen
  @ThePrimeTimeagen11 ай бұрын
- In the SI of the paper, they show benchmarks vs the original LLVM sort for Intel Skylake, ARMv8, and AMD Zen 2 for sorting uint32, uint64, and float lists of lengths ranging from 1 to 2^18 (~250K). The only case where they don't consistently get statistically significant (p
  @jordanrozum11 ай бұрын
- @@jordanrozum sounds like a decently good start. maybe still a bit incomplete. testing already sorted arrays or barely-unsorted are a couple of important examples.
  @blarghblargh11 ай бұрын
- @@kidmosey removing instructions is generally an improvement, but cache timing is even more impactful. If you remove a compare and destroy your cache hits, then you’ve definitely had a net negative
  @KayOScode11 ай бұрын
- @@jordanrozum how would it run on ARM if it’s x86 assembly?
  @casperes091211 ай бұрын
I would not call this a different algorithm. Just a more optimised implementation of the same algorithm. Which is still nice, don't get me wrong. I also want to point out that you absolutely cannot just look at assembly and determine which is faster. It depends on the specific CPU micro-architecture you target. Modern x86 will all have a lot of commonality in what is fast and what isn't on them but there's still differences. And instructions that were fast 10 years ago may now be slower than the instructions that were slow 10 years ago. And chips have different cache hierarchies and instruction re-order queues, etc.
@casperes091211 ай бұрын
- This was my thought. Optimizing things is basically NP-Hard. It's a constant game of cat and mouse between the CPU manufacturers, the compiler designers, and the coders, to try and make their small problem domain just a bit faster. The problem is that something that one of those groups does to improve performance can make the other groups slower. Which is why no one works in assembly language anymore. Modern compilers will spit out assembly that's much more optimized than humans can, not because it's impossible for humans to do that, but because the CPU designers talk directly to the compiler designers. 99% of coders don't have the context, the knowledge, or the contacts at Intel/AMD to know which instructions run faster or slower on modern CPUs. I'm highly skeptical of any claims of performance improvement because of this. As Primeagen mentions, we already know that -O3 performance optimizations actually make most code slower. There's a great presentation called "Coz: finding code that counts with causal profiling" and "Performance Matters by Emery Berger" that are two of the best talks about performance optimizations I've ever seen. The first talk very clearly shows how speeding up certain parts of a program can actually make the entire thing run slower! Both talks show how traditional performance optimization measurement techniques can be incredibly misleading, and how we really need to rethink how we do performance profiling. If someone hasn't presented the performance optimizations in that context, I'm highly skeptical that those performance optimizations are valid, instead of random noise. The talk by Emery Berger even shows how changing your username can change the performance of your code. Measuring performance is hard.
  @alexlowe205411 ай бұрын
- I would go as far as undecideable, because there will always be configurations in which the code is suboptimal
  @kippers12isOG11 ай бұрын
- @@kippers12isOG undecidable in general case, but NP, hard in specific sets of cases
  @casperes091211 ай бұрын
- @@alexlowe2054 Chess is also hard, that's why machine learning is needed. As approximate strategies to cut down on the combinatorial explosion.
  @asdfasdfasdf121811 ай бұрын
- @@alexlowe2054 it is most certainly faster, thats why you do statistics. We can measure noise 😛
  @derschutz473711 ай бұрын
Looking at their paper (supplementary information part E) their performance falls below the current bench marks for sorting 16+ float charakters, when using the Intel CPU. I'm not smart enough to understand their work completely, but can it be, that they optimized their code for specific purposes on a specific environment and compared it with a general implementation?
@warumich720111 ай бұрын
- I think it comes down to that they didn't run it on actual hardware, they simulated the latency. The C based sort is going to use compiler heuristics to guess at the best instructions to get a job done. There is so much going on in any cpu newer than a 486, it's very hard to keep track of. And fewer instructions don't mean shorter execution times, because the right instructions might execute in parallel, or take advantage of register renaming, etc. Also, a modern high-speed sort would likely take advantage of SIMD, like AVX-512. Intel offers a library for just this. Don't know how it compares.
  @allalphazerobeta864311 ай бұрын
- qsort is only fast for "large enough" sizes. The inner loop of a good qsort algorithm will use some other sort mechanism for the small left over pieces. This is usually the most performance critical piece of the code. So since it's both short and important, they probably concluded that this was a good test case.
  @MaxHaydenChiz11 ай бұрын
- @@allalphazerobeta8643 Intel also removed the support for AVX-512 on 12th and 13th gen (and probably 14th as well given current expectations), I wouldn't use that specific example at this time
  @Linvael11 ай бұрын
- @@Linvael Intel's XEON line still has avx-512. Intel released it's SIMD sorting in Oct 2022 with recent updates. So I don't think they will be ending support for AVX-512 anytime soon. Now, AVX-512 could return to consumer CPUs, the supposed hold up is E-cores. But I think a power efficient solution can be had with something like executing 256-bit instructions twice. Heck, they main already just do all SIMD instructions as 128-bit instructions or break'em down to regular instructions in the e-cores. I think the main hold up, is it's a good way to sell more xeons.
  @allalphazerobeta864311 ай бұрын
Deepmind always does this shit, if this paper was submitted to an actual venue for algorithms, this paper would have been strongly rejected. Deepmind always pulls this shit, the AI/ML guys find some niche field do some AI shit get above average results which experts in said field would bulldoze over then just claim for clout that AI achieved something not done before
@slicer9511 ай бұрын
- classic bamboozle
  @ThePrimeTimeagen11 ай бұрын
- 100% agree on this. Also, the asm snippet has a precondition and this is not even mentioned
  @itellyouforfree723810 ай бұрын
14:30 unless I’m missing something, the sort3 algorithm doesn't seem to work all the time. (A=0, B=1, C=2) outputs (0,1,2) (A=2, B=1, C=0) outputs (1,1,2) ! (A=0, B=2, C=1) outputs (0,2,1) ! (A=2, B=0, C=1) outputs (0,1,2) (A=1, B=2, C=0) outputs (1,2,1) ! Etc… Edit: I indeed missed something. There was a starting condition (see comments below).
@FredGlt11 ай бұрын
- I also noticed this, but the one on the left also produces seemingly incorrect, but different results, surely this can't be just something they overlooked? I'm not sure I quite understand what it's actually intending to do... I thought perhaps it was because it would be iterated over a given array, e.g. sort3(A, B, C), sort3(B, C, D), ..., but that wouldn't work either, so I'm not sure.
  @sheep448311 ай бұрын
- Yes, yes,. I came to the same conclusion. It took me sometime to get through the notation and wrap my head around. It does make sense that one checks min(A,C) and then also check min between B and min(A,C) result, to find the actual minimum of the three. But the same thing doesn't happen with max. We get max(A,C) in a R variable, this gets shoved into Memory[2]. But never tested it with max against B. If B was the greatest of all three, they missed it.
  @Optimus612811 ай бұрын
- the paper says a precondition for this to work is that B
  @lterego11 ай бұрын
- @@lterego Interesting, it does say so indeed. So it's an even more restricted sort of 3 numbers where we know the condition of one. Even less impressive than what the main page shows. It's definitely a PR move on the deepmind site and less so in the paper which I guess just tries to present scientific findings of the experiment regardless the success.
  @Optimus612811 ай бұрын
- @@lterego so basically they found a solution that doesnt solve real problems. This is like when you make a very specific solution for a codeforces problem, it solves that specific environment, but libc++ is meant to be generic, you cannot assume that B
  @AlFredo-sx2yy11 ай бұрын
AI will take your compilers, but they'll never take away ill-defined problems!
@SimGunther11 ай бұрын
- One of the truest statements about AI I've heard so far!
  @binary_gaming11311 ай бұрын
- And we won't know if it was ill-defined or not. It passed all tests... except the ones we didn't consider.
  @davida1d211 ай бұрын
I agree so much with the first paragraph of the article, yet the current GUI application landscape is basically shipping a trillion copies of full-fledged web browsers effectively re-compiling just-in-time the whole GUI at every launch, with support to a billion of irrelevant legacy web technologies. Does every GUI apps really need to ship error handling of malformatted HTML4 markup? Do we really think that's remotely efficient?
@Kazyek11 ай бұрын
- This right here is why I despise JS clambering out of the internet swamp and onto desktop. Every app these days is a memory and storage hog.
  @ryangrogan683911 ай бұрын
- The truth is that modern CPUs are fast enough and coders are bad enough that performance is a lie. Micro optimizations of algorithms don't really matter in the context of forcing your entire application to run through a slow HTML renderer. It's neat that the AI managed to get a performance improvement, but realistically I don't see this mattering for most development.
  @alexlowe205411 ай бұрын
- Well I don't use web technologies for desktop apps and prefer something lighter weight like Dear ImGUI.
  @aodfr11 ай бұрын
I was expecting algorithms with better time of space complexity. It's funny how they sell some microoptimizations as "completely new algorithms".
@streettrialsandstuff11 ай бұрын
- Its a warning. You have to read between the lines.
  @logician364110 ай бұрын
this was the first time i clicked on this channel and i was not able to hear this guy for more than 30 secs.... absolutely unberable style, especially in a programming channel, as an ai language model im going to read the article myself, bye
@b_two11 ай бұрын
Not as fast as my "1st post"!
@Stabby66611 ай бұрын
- you win the internet today sir.
  @sortof333711 ай бұрын
- ouch!
  @EnterpriseKnight11 ай бұрын
- KZhead says no? Still pretty fast though
  @crides011 ай бұрын
Shogi is a game very similar to chess. In that the pieces move the same way (mostly, I think). The big differences is that when you take a piece it switches sides - that pieces becomes your own little soldier. The game almost never ends in draws, as compared to chess which can often end in a draw at the professional levels.
@modolief11 ай бұрын
Seeing that I'm confident AI is not going to put me out of job. I just hope I'll never have to do code review of AI generated code....
@adriankal11 ай бұрын
- AI will review the code 😂
  @electron682511 ай бұрын
- To be honest I'm sure it would be better than a lot of code in the wild right now. I've seen some shit...
  @UnknownString8811 ай бұрын
> releases faster sort3 that doesn't actually sort > explains nothing > dies Seriously though is there any code path in either version that results in B's value assigned to R?
@squeeekat11 ай бұрын
It was very ambitious reading that algorithm once and even hoping to understand what the one line max(min(a,b),c) actually does for every combination of a, b and c, let alone the whole thing. I was quietly shaking my head when you tried :)
@lucyfrye672311 ай бұрын
This is along the same lines as that recent AI discovered optimization to matrix multiplication. Really neat!
@Xankill3r11 ай бұрын
- Yeah, they were made by the same company
  @dragonmyballs11 ай бұрын
- The Matrix multiplication problem has since been further optimized by 2 human mathematicians without using AI which is very cool.
  @beri413810 ай бұрын
- @@beri4138 any material to read on this?
  @lmnk3 ай бұрын
I think AI will eventually get us to the point of being able to transpile from any code base (libraries included) into any other language, for any operating system. Then, we will have true freedom to code anything for everything using your personal favorite languages and paradigms. Exciting!
@ryangrogan683911 ай бұрын
- Could be a great way to write in any language and still keep the best performance
  @1Thor61storm811 ай бұрын
- Operating systems dictate only the format of the executable's header and the names of the kernel calls. If OSs hadn't dropped the ball so hard we could already just copy paste executables around since the assembly is identical.
  @alexnoman149811 ай бұрын
- or you can stop using bad languages altogether
  @JorgetePanete11 ай бұрын
- @@alexnoman1498 and Libs available on the system
  @Waitwhat4694 ай бұрын
- @@JorgetePaneteah true. There's only one good language (Haskell)
  @antonhelsgaun4 ай бұрын
I’m sure Tom already implemented this into JDSL. That guy’s a genius!
@carlhandy11 ай бұрын
- See my reply to sunderkeen
  @blue565911 ай бұрын
Alphago used some kind of Montecarlo sim if I remember right. Very hard to know how you'd do backprop with something so discontinous as machine instructions.
@StephenCoda7 ай бұрын
Does AI know how to measure dicts in bytes though
@sohn776711 ай бұрын
This article reads like an AI wrote it.
@thingsiplay11 ай бұрын
fun fact: AlphaDev's algorithm is only faster than Quick Sort when the input is 4 elements long, when the input length is higher than 4 it becomes worse another fun fact: Bogosort Algorithm is Faster than AlphaDev's algorithm when the input length in 4 :)
@Skeptic_Von_Rahm11 ай бұрын
I feel like with such a small algorithm you could just do a brute tree search, though yeah for larger ones I can see how it would be useful, in the same way it could search deeper into a tree of chess moves than just pruning. If it could handle abstract syntax trees... though you'd want some NLP based regularisation to make sure the code found is still human readable
@johanngambolputty535111 ай бұрын
The middle out algorithm in Silicon Valley was just an explanation of how hyperthreading works.
@akshay-kumar-00711 ай бұрын
- or context-based compression/optimization like the mirror of foveated rendering
  @TheNewton11 ай бұрын
Nice, thank you. Their approach reminds me of my childhood engineering: "what parts can we remove from this and it still function?" :)
@kennethbeal11 ай бұрын
Thank you Primeagen, I really liked this video
@sanchittiwari769011 ай бұрын
How I am not completely surprised that it's the smaller sorting operations where the big win is made? Simply because a lot of focus of engineers often goes to the bigger things, while less people realise the impact of uncountable smaller operations are running everywhere and everyday. Somewhat reminds me of comparing fast but high latencey hardware or software with slow but very low latency hardware or software. It's easy to believe that you are making a big performance improvement, but if you are loosing it the number of small operations, reality will surprise you. AI which helps discovering such things, is a great thing in my opion. I also like the fact that it's LLVM what they are improving, because that's what Rust uses as well.
@jongeduard11 ай бұрын
I love your content, man.
@eluddite88911 ай бұрын
mmm... it skips saving intermediate sorted state and jumps to the next sorting step instead. Original did the move [A,B,max,min] -> [min,B,max,min] - saving min as the first element. alpha just jumped to comparing min with B. But i feel something wrong, the code implies that B is always less then max(A,C) there is no way for B to be in R, even in comments R will result in max(A,C), is it a first step of the sorting?
@w999d11 ай бұрын
Sorry if I'm too dumb to grasp the algorithm fully but I've tried to check their improved algorithm for 3 numbers (first in the article) step by step and it seems to me like it fails for 3 numbers that are in descending order where as the first algorithm produces the outcome provided in the comments of that code. You've mentioned that you might do a video checking that algorithm - could you please do that? I really would like to understand why it works for them.
@CzeslawPL11 ай бұрын
- There is an extra condition stated in the paper: the sequences are assumed to have b
  @lterego11 ай бұрын
- @@lterego so basically it took AI to realize, that if we already now that b
  @bernhardbinde486511 ай бұрын
- @@bernhardbinde4865 its pseudo code in the article, the actual code looks different
  @HaswellCore2 ай бұрын
Has anyone made a comparison with Intel's AVX-512 Sort? This utilizes Intel's 512-bit registers to facilitate sorting. I'm curious about its real-world speed. High-performance CPUs have numerous advanced features such as out-of-order execution, speculative execution, and register renaming. I've often found that trying to outperform these hardware-level optimizations and compiler heuristics can be quite challenging. From what I understand, the speed of AI-generated instructions in the AlphaDev project was determined by computing or simulating latency, rather than testing on actual hardware. This raises questions about the accuracy of these simulations in comparison to real-world performance. Moreover, the practice of invoking a specialized sorting function for small quantities of elements may prove to be less efficient than simply inlining a general sorting algorithm. If the CPU experiences a code cache miss because it didn't anticipate a call to the 4-element sort instead of the 3-element sort, the ensuing stall could result in a significant delay. Additionally, in my experience, it's quite rare to know the exact number of elements to be sorted at compile time. This could be a good example of microbenchmarking doesn't equal real-world performance. These are just some thoughts and speculations, and I'm interested to see if anyone has more insights or hard data on these points.
@allalphazerobeta864311 ай бұрын
- Well, it loses to SIMD sorting networks, but it wouldn't be a great "AI" headline, huh? Hype from nothing, as usual.
  @dat_2111 ай бұрын
optimizing for the small hash sizes and small array list sorting is something that algorithm theory doesn't cover. I see the potential but from the article that this research focused on edge case optimization and I don't see how to how it will change anything meaningful. Remember people, bubblesort is optimal on arrays of size 2, that doesn't mean you should use it.
@Dimkar300011 ай бұрын
So AI is doing the same thing I did 23 years ago? 100 Create a reference function. 200 Measure it 300 Create a second optimizing function 400 Measure it 500 Look at the generated assembler code. 600 Modify function2 to create better assembler code 700 GOTO 400 This way you learn how to write C code in such a way that you actually teach the compiler the desired generated assembler instruction. Your C code may not be very readable but once you have trained your human brain to write C code in such a way every method you create will be close to the most optimized assembler speed. I actually created faster code then the Intel compiler could and the Intel libraries. No need for AI for that.
@olafbaeyens895511 ай бұрын
- on top of that, the AI was incapable producing a better optimization for the generic ordering of A,B,C when compared to already existing algorithms, so the people who wrote the paper had to change the rules of the game and say "assume that B
  @AlFredo-sx2yy11 ай бұрын
There is previous work on optimal sorting networks, to find the optimal network to sort 3, 4, 5 elements. It minimizes the number of comparisons, not the number of instructions, which is a hardware-specific problem. Compiler optimization does not apply real mathematical optimization, because the search spaces are too big. Compilers apply 'peephole optimization', which optimizes small sections of code. This seems similar to what they do, but they optimize a small algorithm. A compiler usually has no clue about algorithms, but removing one unused 'mov' seems exactly the type of thing a peephole optimizer could do.
@jcamargo200511 ай бұрын
- But that mov isn't unused.
  @vibaj1611 ай бұрын
I’m not an assembly expert, so is there something that I missed? Because if try the sequence A, B, C, D with A < C < B < D, it seems to output A, B, C, D.
@IanRiley91511 ай бұрын
The models are probably memorizing the training examples and tailoring the algorithm to optimize those examples. My hunch is that this causes the diminishing returns in performance as the number of inputs grow and doesn’t work across inputs outside the training set.
@earleyelisha11 ай бұрын
- pinteresting
  @ThePrimeTimeagen11 ай бұрын
- It was merged into the standard library after extensive performance testing. And again, they were optimizing an inner loop of a more general sorting function. Their function is only called when the regions left to be sorted are smaller than some threshold such that things like qsort are no longer optimal given the large constant factors.
  @MaxHaydenChiz11 ай бұрын
- WRONG!, this is not a LLM its a reinforcrment learning algorithm
  @randomsnow651011 ай бұрын
Would be great to know what assembly instructions where available to AlphaDev, did it considered SIMD?
@aviaje_app11 ай бұрын
that brownie thing has earned my sub
@bilbo_gamers641711 ай бұрын
My take is that if I make my code well organized it's probably easier for the compiler to optimize it. If I used an AI to optimize my code I would of course also like to be able to review the output before using it since a code optimization might be more efficient at the cost of maintainability and extensibility.
@ehsnils10 ай бұрын
Does the assembly code for the 3-sort (original and alphadev both) even work? The output's largest value will never be B. [ex. for input 1 3 2, the output will be 1 3 2]
@mihirrawat734711 ай бұрын
- yea, I have the same issue
  @Frank.N.Steinn11 ай бұрын
"Chat Jippity" might be the funniest thing I've heard this week
@Veecy3 ай бұрын
I don't know if that's actually a faster sort. I'm a novice, but stepping through the AI version, removing that instruction (mov S P) doesn't seem like it would make a valid sort. For example: If you have [3,2,1] in memory, there's no instruction in the AI version that writes to P when P is greater than R which means you just output to index 0 whatever you input into index 0. What am I missing?
@Cafuzzler11 ай бұрын
- It is assumed that b
  @evergreen-11 ай бұрын
AI is just a more fancy way to call bruteforcing
@2wr63311 ай бұрын
"Oh man, I've become super skeptical, all of a sudden"
@afroboi745411 ай бұрын
- hey... that is me
  @ThePrimeTimeagen11 ай бұрын
- @@ThePrimeTimeagen Btw is that conversation with @healthygamergg ever happening??
  @afroboi745411 ай бұрын
13:40 What if B is maximum? It doe'snt seem to end up in the last position in both cases?
@wolfram7711 ай бұрын
I thought this was gonna give me existential dread but TrashDev out here catching strays like the back wall at a shooting range
@harsha130611 ай бұрын
11:57 - Classic "that's what she said" moment
@adheusrangel11 ай бұрын
JDSL doesn't' need AI optimizations, it is optimized so far that optimizing it even further would result in negative time execution. Tom did not want to that to us, he is also very caring.
@fulconandroadcone948811 ай бұрын
10:35 In principle, yes, but in practice, you can’t just look at the assembly and determine if it’s faster. It also depends on the hardware. Especially in the case of doing things on small and medium-size arrays, cache locality, branch prediction, and other low-level processor-specific optimizations greatly impact if not dominate the performance. Algorithmic complexity is only for very big inputs. For small arrays, you’d develop specialized sort-N algorithms that also take into account if comparing or swapping elements is cheap or expensive. For example, for sort-5, there is not one optimal algorithm, there is one that minimizes comparisons and one that minimizes swaps; probably (I don’t know for sure), there are also ones that do something in-between. As Dr. Andrei Alexandrescu once said: You gain speed only by doing less work. But a-priori, you don’t know if swapping or comparing is more work. If you want examples, basic integers are cheap to compare and swap. c strings (const char*) are cheap to swap, but in general can be expensive to compare. Sorting static_vectors (boost::static_vector values) by length is cheap to compare, but expensive to swap (linear in capacity), and sorting them by content is expensive in both domains.
@Bolpat3 ай бұрын
EDI is Extended Data Index. But you really shouldn't think of it that way. It's just a general purpose register. Their nominal meaning is barely relevant anymore. The Extended part is because it's the 32-bit version of di. RDI is the 64-bit version
@casperes091211 ай бұрын
- RSP and RIP are the only ones I give a shit about generally, and you could argue that RSP is a general purpose register as well if you don't want a stack
  @mihailmojsoski420211 ай бұрын
- @@mihailmojsoski4202 RBP if you ever want to ret.
  @casperes091211 ай бұрын
i got so excited watching this video i had to finish viewing it in the toilet
@4rcant11 ай бұрын
Oh god that's exactly what I was waiting for today. Today I woke up and said, "You know what this day is missing? Some faster fucking sorting algorithms."
@zmania10111 ай бұрын
I gotta make a notification sound from "JUST GIVE ME THE MEAT!" & "YOU KNOW WHAT SORTING IS!" ... actually, I should just make a massive sound board of PrimeAgen quotes/bites. 😂
@JesseGilbride11 ай бұрын
I kinda like ur rinkly brain prime.... You've definitely made me smile. Thx! ❤
@michaelmoltke91111 ай бұрын
I had this idea several years back, trying to run assembly language instructions through an AI and have it find a faster solution with identical state results. Unfortunately, my AI-from-scratch skills are insufficient for this level of task. (Or pretty much any task, tbh)
@AftercastGames11 ай бұрын
For AI to take our Jobs, management and PMs would need to be able to accurately describe, what they want. I think our jobs are safe.
@AScribblingTurtle11 ай бұрын
I don't think they actually ran the code on modern CPU's with there out-of-order execution, register renaming, etc. So I'm going to have to call this into question. From the actual paper: "Latency value functions ...The latency head is used to directly predict the latency of a given program by using the program’s actual computed latency" Actual computed? Like you the one you counted out vs the one the AI guessed at? They also penalized longer code, likely disproportionately. I call the whole thing into question.
@allalphazerobeta864311 ай бұрын
It's interesting as an idea to use AI to improve algorithms or generated code, I applaud the effort. I do remember watching an interview of the CEO of DeepMind and I thought they are doing something positive, as they helped discover algorithms in protein folding and other areas. But this paper seems a lot like a PR attempt, enough commenters easily have noticed that it might not work (unless we all missed something) for the result of the max value in some cases. Also, they really saved one MOV instruction. They highlight more lines that have changed, but only the comment really changed as different things are compared, but the exact same instructions are running. That's the same with the sorting of 4. It seems the logic to go in a chain, but just removing a MOV instruction alters what is sorted next. That would make sense that in subsequent 5 or more if one unrolls the code like this which would be good only for small numbers (so I don't know how they got the 1.7% on 150000 elements, ok maybe I do if it's some Duff's Device with few elements repeated), it's just a MOV (which is the cheapest instruction) among many other ones. Suddenly, if you highlighted only the single instruction that changed, it wouldn't look as good in PR presentation. It does also remind me the news of AI finding faster ways for Matrix Multiplication. But even in that case as I understood, it wasn't a generic imporvement for every matrix out there, but very special cases of them, maybe of more dimensions than three or specific data stored or specific transformations, I don't remember. It's kinda similar here. AI won't magically make everything faster, just find the niche points where something could be saved, something that a low level programmer could do on their own if they focus their mind on the thing they want to solve. Sometimes even using a different algorithm or different approach. And that's the other thing. I am kinda skeptical of the idea that AI will fundamentally save us and make all code faster. Thinking of JBlow's or Casey's rants on how software could be 10x or more faster, even without going into low level assembly,. just by identifying obvious performance traps that you might have left in your codebase, horrible single lines of code that if you knew, it would be easy to replace and speed up things tremendously. Because they already were too slow because of the assumption people do that "We don't need to care about performance anymore, optimization is the root of all evil" which is really a desire to move away from the machine as much as possible. If AI saves your ass 50% then you degrade the performance of your software 500%, then it's a 250% loss. We seem to move more towards the negative direction while we rejoice that news like this will "proove" once more that one doesn't need to care about performance anymore as the compiler is smarter than you. You see, that's the trap. It's when people see what GoDoT produces and think it's such magic that no living human would be ever able to produce. While if they were more performance aware, even without going low level, one could save performance way more than AI can ever do for you.
@Optimus612811 ай бұрын
- I'm thinking these sorts of optimizations will be more useful for higher-level code. For example optimizing an entire function whose job it is to find which polygons need to be rendered on screen and which can be skipped in a given 3D game engine. That's going to involve a lot of code and where even a seemingly small optimization can drastically reduce the amount of CPU cycles. These sorts of algorithms would also not have been gone over and iterated upon for decades like some basic sort algorithms have been, thus finding improvements to make is more likely.
  @Pythoner11 ай бұрын
Did the AI alpha beta prune in assembly
@Benmenesesjr3 ай бұрын
Just walked through the assembly example for the 3 element sort. Either the example is incomplete or there is a typo. It does not result in a sorted array if B is the greatest of the three elements. Kind of disappointing, as I was interested in understanding the optimization. Or maybe I'm just misunderstanding the assembly.
@DirectCherry11 ай бұрын
you're actually the TechLead++ of this side of youtube lmfaoooo
@abbashussain729811 ай бұрын
I wonder what the difference between training it and rewarding based on correct sort/incorrect sort and compared that against a levenshtein distance-based scoring system
@triplea657aaa11 ай бұрын
I think the future is using an AI to train different AIs to get what it wants. For example you could have the main AI train a calculator AI and then call it anytime it needs to calculations.
@dominick25311 ай бұрын
- AI will then build AI web APIs for itself to do simpler tasks, probably written in jAvAsCrIpT
  @vighnesh15311 ай бұрын
- @@vighnesh153 Win win, no one writes JS anymore
  @joaovmlsilva350911 ай бұрын
it's actually quite heartening to realise Prime can't read assembly
@benandrew985211 ай бұрын
I would love to see this written out. 3 blue 1 brown style animations would be awesome, but that's a stretch.
@kentdeterding933311 ай бұрын
Most compilers have optimization options. All I heard was that the AI provides better optimization than current algorithms. This is technology likely to impact newer languages where compiler optimizations are less developed.
@Utesfan10011 ай бұрын
- Honestly, this applied to "new" archs like RISCV would be interesting as well.
  @Waitwhat4694 ай бұрын
High-level language terms were defined well back then when you have a really low-level language like Assembly and basically the other language that is really hard to use. So, if use that term and definition. C++ is high level as Python because it is more readable and structured in scope compared to the old language. Difference concept of programming structure like pointer and platform specific don't mind that much since often time it's not depends on the language alone.
@Soutame2 ай бұрын
I cant believe this time he actually disabled alerts
@rawallon11 ай бұрын
the nice thing about c++ is that you can be as high or low level as you want. You want to sort a vector? one line of code. you want to access a hardware device and interpret the bits you read out from it's memory and interpret them in a particular order as a kernel module to act as a device file? No problem either.
@FalcoGer10 ай бұрын
Shogi is like Chess, but you can turn captured pieces against your opponent.
@jaysistar271111 ай бұрын
I feel very attacked I didn't know there was a brownie and trash dev correlation I have to get my life in order
@daviddurako460411 ай бұрын
So essentially AI discovered assembly and said: Guys, look how cool I am, assembly is faster!
@InnerEagle10 ай бұрын
A.i. explaining react with a 87% accuracy is a great use of A.i.. I've been using Chat GPT to automate parts of my job that I knew were possible to automate, but I didn't have the knowledge or time to seek that knowledge to do it. Even with its mistakes, I could write some sort of functioning code and ask about the errors that came up. I'm the sort of person who learns by doing, but I'm also too lazy to do for the sake of learning, so being able to actually do something that helps me because I can tap into that 87% accurate information is productive both for my job and for my coding experience. Now I use the time I freed up to automate other parts of my job, getting dopamine hits for knowing I can be lazy in the future, despite working more than I used to. It's fucking brilliant if you ask me
@mikelord9311 ай бұрын
with AI, we are no longer going to be software craftsmen as in carpenters/woodworkers/masons we are going to be gardeners.
@laughingvampire755511 ай бұрын
"Quantum Bogosort" im dead 💀
@mirzaahilmi10 ай бұрын
this guy is like tobuscus for programming
@SIGSEGV133711 ай бұрын
Skipping video part where youtuber skips part of content is pure entertainment
@worldspam568211 ай бұрын
They made a top tier AI perform a blind brute force search adding random assembly instructions. Yep looks like I'm out of a job XD.
@danielsmith562611 ай бұрын
This is the power of AI focus in resolved one problem
@daltonyon11 ай бұрын
- agreed. i think this is the true power of ai
  @ThePrimeTimeagen11 ай бұрын
Faster for shorter sequences indeed corresponds with asm level optimization.
@earx233 ай бұрын
This reminds me of a question I once got at a job interview: Exchange the values of two variables, A and B, without using a third variable (e.g. C). I didn't know the answer, and they wouldn't tell me what the answer was, so after the interview I spent a while thinking about it, and finally figured it out. A = A xor B B = A xor B (expands to A xor B xor B, which results in the value of A) A = A xor B (expands to A xor B xor A, which results in the value of B) Why would someone need something as "stupid" as this? Well, it actually makes perfect sense in an assembly language (regardless of the CPU architecture), where you have a very limited number of (data) registers, and the speed of code execution depends on both not accessing any memory locations (and instead using CPU registers as much as possible), and code locality (instruction cache hits). An AI optimizing an algorithm does exactly the same thing a programmer does when trying to optimize his or her code -- trying out (in their head first, and then in practice) as few instructions as possible, while checking for the code execution validity in boundary conditions. This, however, has a nasty tendency of producing highly specialized code that usually has some very ugly side-effects when used outside of its intended (data) environment... which what lazy programmers always do, i.e. try to use somebody else's code when and where it was not mean to be used, which (almost always) causes all kinds of (unpredictable) catastrophic software/firmware/hardware failures ("it failed for no reason, whatsoever" being the usual excuse, post-fact). I have no doubt that people trying to use AI-developed optimizations are going to do exactly the same thing: Follow in lazy programmers' footsteps, and start causing all kinds of (unpredictable) catastrophic system failures... and the more AI-optimized the code becomes, the more catastrophic those failures are going to be... and, for some "inexplicable" reason, nuclear (including fusion) reactors come to mind right now.
@edcorns396410 ай бұрын
Can someone explain how 3:35 Theorem of Trash uses brownies as a metric? Wouldn't milk+brownies be perfect equilibrium so milk by itself is also trash? My experience with trash theorems uses other metrics like pills, littering and non-indented code.
@TheNewton11 ай бұрын
This is sick and all but I'm not convinced that AI would've come up with the Fast Inverse Square Root algo
@emmyarty11 ай бұрын
- very interesting point... there is MOST certainly local minima issues
  @ThePrimeTimeagen11 ай бұрын
- How many humans would've come up with the fast inverse square root algorithm?
  @isodoubIet11 ай бұрын
- @@isodoubIet I don't believe in this idea that someone is super genius to come with this algorithm, and no matter how hard you tried you wouldn't. Optimization is a state of mind. It seems like magic from the outside. I've read of some very clever algorithm and someone told me "I would have never thought it no matter what, the guy is a genius". I might be cocky and I thought I would have found a similar algorithm trick if this guy didn't found before me. I have also written things that are "thinking out of the box" for retro demoscene platforms, that seem like crazy, but that's the magic if one has motivation. You start with something and you say "There is no way I can make this 10% faster" and then you insist and if you get in the right mentality you have something "3x faster" and you look back and think how is it possible? I think it's exagerrated both of how some clever algorithms came to be and how the compilers and AI are so smart that no human that is not a robot with IQ 200 can do.
  @Optimus612811 ай бұрын
- @@Optimus6128 The vast majority of, say, physicists, couldn't have come up with Feynman's path integral formulation of quantum mechanics if they had a lifetime to think about it, so I have a hard time accepting that fundamental differences in ability simply don't exist. The fast inverse sqrt in particular requires at least two non-obvious jumps, one which is understanding that two iterations of Newton's method are "good enough" and faster enough to matter, and secondly the bit manipulation with the magic constant. The programmer in question would also have to know that the inverse square root is a good candidate for optimization in the first place.
  @isodoubIet11 ай бұрын
- @@isodoubIet I don't say that ability doesn't matter, but I don't see the myth that things are so magical that they need extraordinary ability. Feynmann must have looked at places where nobody dared to look. A lot of scientists are stuck in the same thinking boxes. Inverse sqrt, Newton's approximations are common in math, so one could easily experiment with different number of iterations and compare with the original square root. Bit manipulations with magic constants seem crazy, but there is a gradual approach till someone came there. It looks magic from a distance.
  @Optimus612811 ай бұрын
My process is to stare at code, pace for a bit, stare at code, come up with solution, find out that solution prevents next step from succeeding, refactor everything, profit. Except for the pacing, this is how AI currently functions. It's scary because this is my livelihood and they're so close to replicating it. :(
@grawss11 ай бұрын
- have you considered getting it right the first time?
  @mihailmojsoski420211 ай бұрын
- @@mihailmojsoski4202 Always. But sometimes you gotta wipe twice.
  @grawss11 ай бұрын
00:00:00 - Alpha Dev discovers faster sorting algorithm 00:02:50 - Sorting and coding language discussion (C++, assembly instructions) 00:05:38 - Exploring algorithms & computer code in terms of sorting, assembly, & machine code 00:08:13 - Alpha Dev discovers new sorting algorithms leading to 70% speed improvement. 00:10:56 - Alpha Dev defeats go player w/ "Swap & copy move" technique 00:13:51 - Dev discovers faster sorting algorithms to improve hashing algorithm and release into open source. 00:16:33 - Alpha dev demonstrates optimizing algorithms, inspiring developers, & exploring high level language potential - A great use of AI.
@lachine110 ай бұрын
why is tpope on the background at 18:48?
@nickeb9611 ай бұрын
Can they write algorithms in JDSL though?
@punchthescreen174811 ай бұрын
BIG QUESTION: - will this improve JSDL ?
@themore-you-know11 ай бұрын
I simulated the "3sort" algorithm on the input list 3,2,1, and the output is 3,2,3. It doesn't work. After the first block of cmovs, only S will be 1, and it can only overwrite anything if it is greater than some of the other ones. S being the minimum of these four registers is not a situation they've considered. The elided mov is actually really important.
@nullplan012 ай бұрын
Those look like sorting networks. Those are a predetermined sequence of min/max operations that always result in correctly sorted data for some fixed size
@Antagon6663 ай бұрын
seems to me like the AI improved corner cases of some algorithms that people didnt optimize for previously. like sorting 3 or 4 elements, I'm not an expert but I would guess it was not worth to spend much time on this in the past for the standard library and whoever needed to optimize for that case did it in their own in house libraries. still a valid use case as a lot of things that were not worth time to optimize for before could be automated to be optimized like this without wasting human effort...
@adamm45011 ай бұрын
- There's been a lot of work on this area because it's the inner loop of a bigger algorithm and is very hard to analyze and test. It was known that there were improvements there. You can find talks at major conferences about this.
  @MaxHaydenChiz11 ай бұрын
If I understand this correctly given the little amount of information I have, it seems as though their AI did to assembly instructions what any decent static analyzer can do for a high level language.
@lucidbasil986910 ай бұрын
Tell me if I'm dumb, but the algorithm of the AI is broken, is´n it? Let's say the min value of the 3sort algorithm is C. The return value of Memory[0] (the min value of the array after sorting) IS ALWAYS the min(A, B), so it can never be C. so...
@gabrielkryger405511 ай бұрын