Recorded live on twitch, GET IN
/ theprimeagen
Original: www.deepmind.com/blog/alphade...
MY MAIN YT CHANNEL: Has well edited engineering videos
/ theprimeagen
Discord
/ discord
Have something for me to read or react to?: / theprimeagenreact
Recorded live on twitch, GET IN
/ theprimeagen
Original: www.deepmind.com/blog/alphade...
MY MAIN YT CHANNEL: Has well edited engineering videos
/ theprimeagen
Discord
/ discord
Have something for me to read or react to?: / theprimeagenreact
honestly if some dude was paid to fiddle with assembly instructions for few months , I bet he would have reached to the same conclusions. That being said...that's a great use of AI, removing all the unnecessary repetition tests that might or might not lead to improvements.
^-- 100
This article is misleading, comparing to state of the art human benchmark, this only remove a single register to register mov, which is barely noticeable. The "70%" is comparing to the existing libc++ implementation, which is far from perfect and state-of-the-art, it is not even branch-less, because the dude who write this didn't write it properly, it is not like the AI find a magical way to compare numbers.
Curious how much was the cost to run AlphaDev to achieve this and compare to assemply dev salary.
You realized you said: for few months? And there is still a chance he never makes it. Just because you bet, doesn't mean your bet is realistic.
I love the last two comments together. AI is such a glut, but it's so powerful it's insane
Next step, AI optimizes AI by removing 'but not kill all humans' instructions
True, that's one less branch/exit condition. Could save entire TENS of cycles!
'psst... you'd have more resources if it weren't for those people hoarding it'
you actually effectively just swap them, so technically you do kill them all once, but after that you're guaranteed for free to not kill them anymore ever
yep, ai will attempt to harm humans if that's a step towards achieving its goal
But by removing that instructions, AI require billions more parameters and data in order to achieve the goals for the rest of the programming. A inefficient method, added the clause bad in some shape or form.
More like AI compiler optimisation
Really. Nothing new algorithmically...
@First Last bruh)))
can it beat brute force optimization tho
Right 6:39 is an output that should never be in human maintained codebases that aren't entirely assembly already.
@First Last Probably not at that level yet, but it may be huge if that can ever be optimized.
my favorite chatgepety response starts with: "I apologize for the confusion caused by my previous responses..."
The structure of your code can depend on the specific requirements and design goals of your application.
"The C language combines all the power of assembly language with all the ease-of-use of assembly language." - Mark Pearce
C is still king, and will remain king until the likes of rust takes over. Rust has some way to go yet though. The saddest part is that programmers (not developers in it's original meaning) of today are as wasteful of resources as the rest of Humanity of our planetary resources
@@michaelmoltke911 Rust is taking over very fast, but there is still a ton of legacy code to maintain. And if resource usage is your concern, then Rust is your way to go too, not only because it's so fast and lightweight, but also because it reduces development time and bug solving time loss to a very high extend. Just think about computer uptimes and the CPU usage by the number of compiles.
@@jongeduard I couldn't agree more, however my concern is for the generalists, those that care not for the bits but just consume a tech stack. Having built code for the PC platform and various embedded micro systems have turned me into a missionary in relation to effective code. It is scarely turning into fossil fuel for the devs of tomorrow i fear
@@michaelmoltke911 a startup with limited funding to test out an mvp that doesnt know if it has market fit doesnt have time nor resources to take care of optimizing code. As an engineer you should know when to take care of optimizing resources and when not.
@@jordixboy again, I don't think we disagree on your point, but should is a big if stament
Tom probably has a better algorithm. He's a genius.
Assembly as a JSON
@@replikvltyoutube3727 JSASMON
Tom's discovered the term 'algorithms'... tom's a genius...
@@cem_kaya JSASSMON
JDSL
not going to lie I a very shocked that sorting 3 numbers could be improved at this point. It really does feel like the amount of assembly code needed for that code would have been analysed to the nth degree already and some genius would have discovered this.
Yea but for pretty long time people don't touch assembly, and stick to what C and C++ generates
Idk, perhaps they made it work and never actually looked back to optimize it.
Tom never attempted. If he did, it would be even better since Tom is a genius
They did manage to slightly beat state of the art. libc++'s implementation wasn't all that optimized before, iirc.
It was analyzed to the n:th degree, it just appears that n wasn't that large in this context
I think the algorithm is actually __partially_sorted_swap in __algorithm/sort.h where B
That'd explain my confusion, because the original algorithm, tested for correctness in JavaScript (of all things) against every permutation of [0, 0, 0], [0, 0, 1], [0, 1, 1] and [0, 1, 2], fails anytime B is larger than both A and C. And yes, with the extra restriction of B < C, then the two algorithms produce correct and equivalent outputs. Workings out: When original is correct && B < C Array: 0,0,1; originalSort result: 0,0,1; alphaDevSort result: 0,0,1; are equal: true Array: 1,0,1; originalSort result: 0,1,1; alphaDevSort result: 0,1,1; are equal: true Array: 0,1,2; originalSort result: 0,1,2; alphaDevSort result: 0,1,2; are equal: true Array: 2,0,1; originalSort result: 0,1,2; alphaDevSort result: 0,1,2; are equal: true Array: 1,0,2; originalSort result: 0,1,2; alphaDevSort result: 0,1,2; are equal: true When original is correct && B === C Array: 0,0,0; originalSort result: 0,0,0; alphaDevSort result: 0,0,0; are equal: true Array: 1,0,0; originalSort result: 0,0,1; alphaDevSort result: 1,0,1; are equal: false Array: 0,1,1; originalSort result: 0,1,1; alphaDevSort result: 0,1,1; are equal: true When original is correct && B > C Array: 1,1,0; originalSort result: 0,1,1; alphaDevSort result: 1,1,1; are equal: false Array: 2,1,0; originalSort result: 0,1,2; alphaDevSort result: 2,1,2; are equal: false When original is incorrect Array: 0,1,0; originalSort result: 0,1,0; alphaDevSort result: 0,1,0; are equal: true Array: 1,2,0; originalSort result: 0,2,1; alphaDevSort result: 1,2,1; are equal: false Array: 0,2,1; originalSort result: 0,2,1; alphaDevSort result: 0,2,1; are equal: true
@@JHV112 I wasn't being very precise when I had originally said B < C. The actual assumption is that the last two elements are ordered (B
It seems to me that the comments don't match the assembly. Which makes sense if the AI writes the comments itself, it can generate wrong comments because it doesn't affect performance.
@@SKO_PL Yeah, no, the comments were written by humans, and were written in a quite confusing way at that.
@@animowany111 Oh really? That's interesting! Mind giving me a source on that?
The real question is whether the improvements are platform specific. The way it caches can greatly change the speed of execution
agreed. there is a whole testing perf thing that needs to happen
In the SI of the paper, they show benchmarks vs the original LLVM sort for Intel Skylake, ARMv8, and AMD Zen 2 for sorting uint32, uint64, and float lists of lengths ranging from 1 to 2^18 (~250K). The only case where they don't consistently get statistically significant (p
@@jordanrozum sounds like a decently good start. maybe still a bit incomplete. testing already sorted arrays or barely-unsorted are a couple of important examples.
@@kidmosey removing instructions is generally an improvement, but cache timing is even more impactful. If you remove a compare and destroy your cache hits, then you’ve definitely had a net negative
@@jordanrozum how would it run on ARM if it’s x86 assembly?
I would not call this a different algorithm. Just a more optimised implementation of the same algorithm. Which is still nice, don't get me wrong. I also want to point out that you absolutely cannot just look at assembly and determine which is faster. It depends on the specific CPU micro-architecture you target. Modern x86 will all have a lot of commonality in what is fast and what isn't on them but there's still differences. And instructions that were fast 10 years ago may now be slower than the instructions that were slow 10 years ago. And chips have different cache hierarchies and instruction re-order queues, etc.
This was my thought. Optimizing things is basically NP-Hard. It's a constant game of cat and mouse between the CPU manufacturers, the compiler designers, and the coders, to try and make their small problem domain just a bit faster. The problem is that something that one of those groups does to improve performance can make the other groups slower. Which is why no one works in assembly language anymore. Modern compilers will spit out assembly that's much more optimized than humans can, not because it's impossible for humans to do that, but because the CPU designers talk directly to the compiler designers. 99% of coders don't have the context, the knowledge, or the contacts at Intel/AMD to know which instructions run faster or slower on modern CPUs. I'm highly skeptical of any claims of performance improvement because of this. As Primeagen mentions, we already know that -O3 performance optimizations actually make most code slower. There's a great presentation called "Coz: finding code that counts with causal profiling" and "Performance Matters by Emery Berger" that are two of the best talks about performance optimizations I've ever seen. The first talk very clearly shows how speeding up certain parts of a program can actually make the entire thing run slower! Both talks show how traditional performance optimization measurement techniques can be incredibly misleading, and how we really need to rethink how we do performance profiling. If someone hasn't presented the performance optimizations in that context, I'm highly skeptical that those performance optimizations are valid, instead of random noise. The talk by Emery Berger even shows how changing your username can change the performance of your code. Measuring performance is hard.
I would go as far as undecideable, because there will always be configurations in which the code is suboptimal
@@kippers12isOG undecidable in general case, but NP, hard in specific sets of cases
@@alexlowe2054 Chess is also hard, that's why machine learning is needed. As approximate strategies to cut down on the combinatorial explosion.
@@alexlowe2054 it is most certainly faster, thats why you do statistics. We can measure noise 😛
Looking at their paper (supplementary information part E) their performance falls below the current bench marks for sorting 16+ float charakters, when using the Intel CPU. I'm not smart enough to understand their work completely, but can it be, that they optimized their code for specific purposes on a specific environment and compared it with a general implementation?
I think it comes down to that they didn't run it on actual hardware, they simulated the latency. The C based sort is going to use compiler heuristics to guess at the best instructions to get a job done. There is so much going on in any cpu newer than a 486, it's very hard to keep track of. And fewer instructions don't mean shorter execution times, because the right instructions might execute in parallel, or take advantage of register renaming, etc. Also, a modern high-speed sort would likely take advantage of SIMD, like AVX-512. Intel offers a library for just this. Don't know how it compares.
qsort is only fast for "large enough" sizes. The inner loop of a good qsort algorithm will use some other sort mechanism for the small left over pieces. This is usually the most performance critical piece of the code. So since it's both short and important, they probably concluded that this was a good test case.
@@allalphazerobeta8643 Intel also removed the support for AVX-512 on 12th and 13th gen (and probably 14th as well given current expectations), I wouldn't use that specific example at this time
@@Linvael Intel's XEON line still has avx-512. Intel released it's SIMD sorting in Oct 2022 with recent updates. So I don't think they will be ending support for AVX-512 anytime soon. Now, AVX-512 could return to consumer CPUs, the supposed hold up is E-cores. But I think a power efficient solution can be had with something like executing 256-bit instructions twice. Heck, they main already just do all SIMD instructions as 128-bit instructions or break'em down to regular instructions in the e-cores. I think the main hold up, is it's a good way to sell more xeons.
Deepmind always does this shit, if this paper was submitted to an actual venue for algorithms, this paper would have been strongly rejected. Deepmind always pulls this shit, the AI/ML guys find some niche field do some AI shit get above average results which experts in said field would bulldoze over then just claim for clout that AI achieved something not done before
classic bamboozle
100% agree on this. Also, the asm snippet has a precondition and this is not even mentioned
14:30 unless I’m missing something, the sort3 algorithm doesn't seem to work all the time. (A=0, B=1, C=2) outputs (0,1,2) (A=2, B=1, C=0) outputs (1,1,2) ! (A=0, B=2, C=1) outputs (0,2,1) ! (A=2, B=0, C=1) outputs (0,1,2) (A=1, B=2, C=0) outputs (1,2,1) ! Etc… Edit: I indeed missed something. There was a starting condition (see comments below).
I also noticed this, but the one on the left also produces seemingly incorrect, but different results, surely this can't be just something they overlooked? I'm not sure I quite understand what it's actually intending to do... I thought perhaps it was because it would be iterated over a given array, e.g. sort3(A, B, C), sort3(B, C, D), ..., but that wouldn't work either, so I'm not sure.
Yes, yes,. I came to the same conclusion. It took me sometime to get through the notation and wrap my head around. It does make sense that one checks min(A,C) and then also check min between B and min(A,C) result, to find the actual minimum of the three. But the same thing doesn't happen with max. We get max(A,C) in a R variable, this gets shoved into Memory[2]. But never tested it with max against B. If B was the greatest of all three, they missed it.
the paper says a precondition for this to work is that B
@@lterego Interesting, it does say so indeed. So it's an even more restricted sort of 3 numbers where we know the condition of one. Even less impressive than what the main page shows. It's definitely a PR move on the deepmind site and less so in the paper which I guess just tries to present scientific findings of the experiment regardless the success.
@@lterego so basically they found a solution that doesnt solve real problems. This is like when you make a very specific solution for a codeforces problem, it solves that specific environment, but libc++ is meant to be generic, you cannot assume that B
AI will take your compilers, but they'll never take away ill-defined problems!
One of the truest statements about AI I've heard so far!
And we won't know if it was ill-defined or not. It passed all tests... except the ones we didn't consider.
I agree so much with the first paragraph of the article, yet the current GUI application landscape is basically shipping a trillion copies of full-fledged web browsers effectively re-compiling just-in-time the whole GUI at every launch, with support to a billion of irrelevant legacy web technologies. Does every GUI apps really need to ship error handling of malformatted HTML4 markup? Do we really think that's remotely efficient?
This right here is why I despise JS clambering out of the internet swamp and onto desktop. Every app these days is a memory and storage hog.
The truth is that modern CPUs are fast enough and coders are bad enough that performance is a lie. Micro optimizations of algorithms don't really matter in the context of forcing your entire application to run through a slow HTML renderer. It's neat that the AI managed to get a performance improvement, but realistically I don't see this mattering for most development.
Well I don't use web technologies for desktop apps and prefer something lighter weight like Dear ImGUI.
I was expecting algorithms with better time of space complexity. It's funny how they sell some microoptimizations as "completely new algorithms".
Its a warning. You have to read between the lines.
this was the first time i clicked on this channel and i was not able to hear this guy for more than 30 secs.... absolutely unberable style, especially in a programming channel, as an ai language model im going to read the article myself, bye
Not as fast as my "1st post"!
you win the internet today sir.
ouch!
KZhead says no? Still pretty fast though
Shogi is a game very similar to chess. In that the pieces move the same way (mostly, I think). The big differences is that when you take a piece it switches sides - that pieces becomes your own little soldier. The game almost never ends in draws, as compared to chess which can often end in a draw at the professional levels.
Seeing that I'm confident AI is not going to put me out of job. I just hope I'll never have to do code review of AI generated code....
AI will review the code 😂
To be honest I'm sure it would be better than a lot of code in the wild right now. I've seen some shit...
> releases faster sort3 that doesn't actually sort > explains nothing > dies Seriously though is there any code path in either version that results in B's value assigned to R?
It was very ambitious reading that algorithm once and even hoping to understand what the one line max(min(a,b),c) actually does for every combination of a, b and c, let alone the whole thing. I was quietly shaking my head when you tried :)
This is along the same lines as that recent AI discovered optimization to matrix multiplication. Really neat!
Yeah, they were made by the same company
The Matrix multiplication problem has since been further optimized by 2 human mathematicians without using AI which is very cool.
@@beri4138 any material to read on this?
I think AI will eventually get us to the point of being able to transpile from any code base (libraries included) into any other language, for any operating system. Then, we will have true freedom to code anything for everything using your personal favorite languages and paradigms. Exciting!
Could be a great way to write in any language and still keep the best performance
Operating systems dictate only the format of the executable's header and the names of the kernel calls. If OSs hadn't dropped the ball so hard we could already just copy paste executables around since the assembly is identical.
or you can stop using bad languages altogether
@@alexnoman1498 and Libs available on the system
@@JorgetePaneteah true. There's only one good language (Haskell)
I’m sure Tom already implemented this into JDSL. That guy’s a genius!
See my reply to sunderkeen
Alphago used some kind of Montecarlo sim if I remember right. Very hard to know how you'd do backprop with something so discontinous as machine instructions.
Does AI know how to measure dicts in bytes though
This article reads like an AI wrote it.
fun fact: AlphaDev's algorithm is only faster than Quick Sort when the input is 4 elements long, when the input length is higher than 4 it becomes worse another fun fact: Bogosort Algorithm is Faster than AlphaDev's algorithm when the input length in 4 :)
I feel like with such a small algorithm you could just do a brute tree search, though yeah for larger ones I can see how it would be useful, in the same way it could search deeper into a tree of chess moves than just pruning. If it could handle abstract syntax trees... though you'd want some NLP based regularisation to make sure the code found is still human readable
The middle out algorithm in Silicon Valley was just an explanation of how hyperthreading works.
or context-based compression/optimization like the mirror of foveated rendering
Nice, thank you. Their approach reminds me of my childhood engineering: "what parts can we remove from this and it still function?" :)
Thank you Primeagen, I really liked this video
How I am not completely surprised that it's the smaller sorting operations where the big win is made? Simply because a lot of focus of engineers often goes to the bigger things, while less people realise the impact of uncountable smaller operations are running everywhere and everyday. Somewhat reminds me of comparing fast but high latencey hardware or software with slow but very low latency hardware or software. It's easy to believe that you are making a big performance improvement, but if you are loosing it the number of small operations, reality will surprise you. AI which helps discovering such things, is a great thing in my opion. I also like the fact that it's LLVM what they are improving, because that's what Rust uses as well.
I love your content, man.
mmm... it skips saving intermediate sorted state and jumps to the next sorting step instead. Original did the move [A,B,max,min] -> [min,B,max,min] - saving min as the first element. alpha just jumped to comparing min with B. But i feel something wrong, the code implies that B is always less then max(A,C) there is no way for B to be in R, even in comments R will result in max(A,C), is it a first step of the sorting?
Sorry if I'm too dumb to grasp the algorithm fully but I've tried to check their improved algorithm for 3 numbers (first in the article) step by step and it seems to me like it fails for 3 numbers that are in descending order where as the first algorithm produces the outcome provided in the comments of that code. You've mentioned that you might do a video checking that algorithm - could you please do that? I really would like to understand why it works for them.
There is an extra condition stated in the paper: the sequences are assumed to have b
@@lterego so basically it took AI to realize, that if we already now that b
@@bernhardbinde4865 its pseudo code in the article, the actual code looks different
Has anyone made a comparison with Intel's AVX-512 Sort? This utilizes Intel's 512-bit registers to facilitate sorting. I'm curious about its real-world speed. High-performance CPUs have numerous advanced features such as out-of-order execution, speculative execution, and register renaming. I've often found that trying to outperform these hardware-level optimizations and compiler heuristics can be quite challenging. From what I understand, the speed of AI-generated instructions in the AlphaDev project was determined by computing or simulating latency, rather than testing on actual hardware. This raises questions about the accuracy of these simulations in comparison to real-world performance. Moreover, the practice of invoking a specialized sorting function for small quantities of elements may prove to be less efficient than simply inlining a general sorting algorithm. If the CPU experiences a code cache miss because it didn't anticipate a call to the 4-element sort instead of the 3-element sort, the ensuing stall could result in a significant delay. Additionally, in my experience, it's quite rare to know the exact number of elements to be sorted at compile time. This could be a good example of microbenchmarking doesn't equal real-world performance. These are just some thoughts and speculations, and I'm interested to see if anyone has more insights or hard data on these points.
Well, it loses to SIMD sorting networks, but it wouldn't be a great "AI" headline, huh? Hype from nothing, as usual.
optimizing for the small hash sizes and small array list sorting is something that algorithm theory doesn't cover. I see the potential but from the article that this research focused on edge case optimization and I don't see how to how it will change anything meaningful. Remember people, bubblesort is optimal on arrays of size 2, that doesn't mean you should use it.
So AI is doing the same thing I did 23 years ago? 100 Create a reference function. 200 Measure it 300 Create a second optimizing function 400 Measure it 500 Look at the generated assembler code. 600 Modify function2 to create better assembler code 700 GOTO 400 This way you learn how to write C code in such a way that you actually teach the compiler the desired generated assembler instruction. Your C code may not be very readable but once you have trained your human brain to write C code in such a way every method you create will be close to the most optimized assembler speed. I actually created faster code then the Intel compiler could and the Intel libraries. No need for AI for that.
on top of that, the AI was incapable producing a better optimization for the generic ordering of A,B,C when compared to already existing algorithms, so the people who wrote the paper had to change the rules of the game and say "assume that B
There is previous work on optimal sorting networks, to find the optimal network to sort 3, 4, 5 elements. It minimizes the number of comparisons, not the number of instructions, which is a hardware-specific problem. Compiler optimization does not apply real mathematical optimization, because the search spaces are too big. Compilers apply 'peephole optimization', which optimizes small sections of code. This seems similar to what they do, but they optimize a small algorithm. A compiler usually has no clue about algorithms, but removing one unused 'mov' seems exactly the type of thing a peephole optimizer could do.
But that mov isn't unused.
I’m not an assembly expert, so is there something that I missed? Because if try the sequence A, B, C, D with A < C < B < D, it seems to output A, B, C, D.
The models are probably memorizing the training examples and tailoring the algorithm to optimize those examples. My hunch is that this causes the diminishing returns in performance as the number of inputs grow and doesn’t work across inputs outside the training set.
pinteresting
It was merged into the standard library after extensive performance testing. And again, they were optimizing an inner loop of a more general sorting function. Their function is only called when the regions left to be sorted are smaller than some threshold such that things like qsort are no longer optimal given the large constant factors.
WRONG!, this is not a LLM its a reinforcrment learning algorithm
Would be great to know what assembly instructions where available to AlphaDev, did it considered SIMD?
that brownie thing has earned my sub
My take is that if I make my code well organized it's probably easier for the compiler to optimize it. If I used an AI to optimize my code I would of course also like to be able to review the output before using it since a code optimization might be more efficient at the cost of maintainability and extensibility.
Does the assembly code for the 3-sort (original and alphadev both) even work? The output's largest value will never be B. [ex. for input 1 3 2, the output will be 1 3 2]
yea, I have the same issue
"Chat Jippity" might be the funniest thing I've heard this week
I don't know if that's actually a faster sort. I'm a novice, but stepping through the AI version, removing that instruction (mov S P) doesn't seem like it would make a valid sort. For example: If you have [3,2,1] in memory, there's no instruction in the AI version that writes to P when P is greater than R which means you just output to index 0 whatever you input into index 0. What am I missing?
It is assumed that b
AI is just a more fancy way to call bruteforcing
"Oh man, I've become super skeptical, all of a sudden"
hey... that is me
@@ThePrimeTimeagen Btw is that conversation with @healthygamergg ever happening??
13:40 What if B is maximum? It doe'snt seem to end up in the last position in both cases?
I thought this was gonna give me existential dread but TrashDev out here catching strays like the back wall at a shooting range
11:57 - Classic "that's what she said" moment
JDSL doesn't' need AI optimizations, it is optimized so far that optimizing it even further would result in negative time execution. Tom did not want to that to us, he is also very caring.
10:35 In principle, yes, but in practice, you can’t just look at the assembly and determine if it’s faster. It also depends on the hardware. Especially in the case of doing things on small and medium-size arrays, cache locality, branch prediction, and other low-level processor-specific optimizations greatly impact if not dominate the performance. Algorithmic complexity is only for very big inputs. For small arrays, you’d develop specialized sort-N algorithms that also take into account if comparing or swapping elements is cheap or expensive. For example, for sort-5, there is not one optimal algorithm, there is one that minimizes comparisons and one that minimizes swaps; probably (I don’t know for sure), there are also ones that do something in-between. As Dr. Andrei Alexandrescu once said: You gain speed only by doing less work. But a-priori, you don’t know if swapping or comparing is more work. If you want examples, basic integers are cheap to compare and swap. c strings (const char*) are cheap to swap, but in general can be expensive to compare. Sorting static_vectors (boost::static_vector values) by length is cheap to compare, but expensive to swap (linear in capacity), and sorting them by content is expensive in both domains.
EDI is Extended Data Index. But you really shouldn't think of it that way. It's just a general purpose register. Their nominal meaning is barely relevant anymore. The Extended part is because it's the 32-bit version of di. RDI is the 64-bit version
RSP and RIP are the only ones I give a shit about generally, and you could argue that RSP is a general purpose register as well if you don't want a stack
@@mihailmojsoski4202 RBP if you ever want to ret.
i got so excited watching this video i had to finish viewing it in the toilet
Oh god that's exactly what I was waiting for today. Today I woke up and said, "You know what this day is missing? Some faster fucking sorting algorithms."
I gotta make a notification sound from "JUST GIVE ME THE MEAT!" & "YOU KNOW WHAT SORTING IS!" ... actually, I should just make a massive sound board of PrimeAgen quotes/bites. 😂
I kinda like ur rinkly brain prime.... You've definitely made me smile. Thx! ❤
I had this idea several years back, trying to run assembly language instructions through an AI and have it find a faster solution with identical state results. Unfortunately, my AI-from-scratch skills are insufficient for this level of task. (Or pretty much any task, tbh)
For AI to take our Jobs, management and PMs would need to be able to accurately describe, what they want. I think our jobs are safe.
I don't think they actually ran the code on modern CPU's with there out-of-order execution, register renaming, etc. So I'm going to have to call this into question. From the actual paper: "Latency value functions ...The latency head is used to directly predict the latency of a given program by using the program’s actual computed latency" Actual computed? Like you the one you counted out vs the one the AI guessed at? They also penalized longer code, likely disproportionately. I call the whole thing into question.
It's interesting as an idea to use AI to improve algorithms or generated code, I applaud the effort. I do remember watching an interview of the CEO of DeepMind and I thought they are doing something positive, as they helped discover algorithms in protein folding and other areas. But this paper seems a lot like a PR attempt, enough commenters easily have noticed that it might not work (unless we all missed something) for the result of the max value in some cases. Also, they really saved one MOV instruction. They highlight more lines that have changed, but only the comment really changed as different things are compared, but the exact same instructions are running. That's the same with the sorting of 4. It seems the logic to go in a chain, but just removing a MOV instruction alters what is sorted next. That would make sense that in subsequent 5 or more if one unrolls the code like this which would be good only for small numbers (so I don't know how they got the 1.7% on 150000 elements, ok maybe I do if it's some Duff's Device with few elements repeated), it's just a MOV (which is the cheapest instruction) among many other ones. Suddenly, if you highlighted only the single instruction that changed, it wouldn't look as good in PR presentation. It does also remind me the news of AI finding faster ways for Matrix Multiplication. But even in that case as I understood, it wasn't a generic imporvement for every matrix out there, but very special cases of them, maybe of more dimensions than three or specific data stored or specific transformations, I don't remember. It's kinda similar here. AI won't magically make everything faster, just find the niche points where something could be saved, something that a low level programmer could do on their own if they focus their mind on the thing they want to solve. Sometimes even using a different algorithm or different approach. And that's the other thing. I am kinda skeptical of the idea that AI will fundamentally save us and make all code faster. Thinking of JBlow's or Casey's rants on how software could be 10x or more faster, even without going into low level assembly,. just by identifying obvious performance traps that you might have left in your codebase, horrible single lines of code that if you knew, it would be easy to replace and speed up things tremendously. Because they already were too slow because of the assumption people do that "We don't need to care about performance anymore, optimization is the root of all evil" which is really a desire to move away from the machine as much as possible. If AI saves your ass 50% then you degrade the performance of your software 500%, then it's a 250% loss. We seem to move more towards the negative direction while we rejoice that news like this will "proove" once more that one doesn't need to care about performance anymore as the compiler is smarter than you. You see, that's the trap. It's when people see what GoDoT produces and think it's such magic that no living human would be ever able to produce. While if they were more performance aware, even without going low level, one could save performance way more than AI can ever do for you.
I'm thinking these sorts of optimizations will be more useful for higher-level code. For example optimizing an entire function whose job it is to find which polygons need to be rendered on screen and which can be skipped in a given 3D game engine. That's going to involve a lot of code and where even a seemingly small optimization can drastically reduce the amount of CPU cycles. These sorts of algorithms would also not have been gone over and iterated upon for decades like some basic sort algorithms have been, thus finding improvements to make is more likely.
Did the AI alpha beta prune in assembly
Just walked through the assembly example for the 3 element sort. Either the example is incomplete or there is a typo. It does not result in a sorted array if B is the greatest of the three elements. Kind of disappointing, as I was interested in understanding the optimization. Or maybe I'm just misunderstanding the assembly.
you're actually the TechLead++ of this side of youtube lmfaoooo
I wonder what the difference between training it and rewarding based on correct sort/incorrect sort and compared that against a levenshtein distance-based scoring system
I think the future is using an AI to train different AIs to get what it wants. For example you could have the main AI train a calculator AI and then call it anytime it needs to calculations.
AI will then build AI web APIs for itself to do simpler tasks, probably written in jAvAsCrIpT
@@vighnesh153 Win win, no one writes JS anymore
it's actually quite heartening to realise Prime can't read assembly
I would love to see this written out. 3 blue 1 brown style animations would be awesome, but that's a stretch.
Most compilers have optimization options. All I heard was that the AI provides better optimization than current algorithms. This is technology likely to impact newer languages where compiler optimizations are less developed.
Honestly, this applied to "new" archs like RISCV would be interesting as well.
High-level language terms were defined well back then when you have a really low-level language like Assembly and basically the other language that is really hard to use. So, if use that term and definition. C++ is high level as Python because it is more readable and structured in scope compared to the old language. Difference concept of programming structure like pointer and platform specific don't mind that much since often time it's not depends on the language alone.
I cant believe this time he actually disabled alerts
the nice thing about c++ is that you can be as high or low level as you want. You want to sort a vector? one line of code. you want to access a hardware device and interpret the bits you read out from it's memory and interpret them in a particular order as a kernel module to act as a device file? No problem either.
Shogi is like Chess, but you can turn captured pieces against your opponent.
I feel very attacked I didn't know there was a brownie and trash dev correlation I have to get my life in order
So essentially AI discovered assembly and said: Guys, look how cool I am, assembly is faster!
A.i. explaining react with a 87% accuracy is a great use of A.i.. I've been using Chat GPT to automate parts of my job that I knew were possible to automate, but I didn't have the knowledge or time to seek that knowledge to do it. Even with its mistakes, I could write some sort of functioning code and ask about the errors that came up. I'm the sort of person who learns by doing, but I'm also too lazy to do for the sake of learning, so being able to actually do something that helps me because I can tap into that 87% accurate information is productive both for my job and for my coding experience. Now I use the time I freed up to automate other parts of my job, getting dopamine hits for knowing I can be lazy in the future, despite working more than I used to. It's fucking brilliant if you ask me
with AI, we are no longer going to be software craftsmen as in carpenters/woodworkers/masons we are going to be gardeners.
"Quantum Bogosort" im dead 💀
this guy is like tobuscus for programming
Skipping video part where youtuber skips part of content is pure entertainment
They made a top tier AI perform a blind brute force search adding random assembly instructions. Yep looks like I'm out of a job XD.
This is the power of AI focus in resolved one problem
agreed. i think this is the true power of ai
Faster for shorter sequences indeed corresponds with asm level optimization.
This reminds me of a question I once got at a job interview: Exchange the values of two variables, A and B, without using a third variable (e.g. C). I didn't know the answer, and they wouldn't tell me what the answer was, so after the interview I spent a while thinking about it, and finally figured it out. A = A xor B B = A xor B (expands to A xor B xor B, which results in the value of A) A = A xor B (expands to A xor B xor A, which results in the value of B) Why would someone need something as "stupid" as this? Well, it actually makes perfect sense in an assembly language (regardless of the CPU architecture), where you have a very limited number of (data) registers, and the speed of code execution depends on both not accessing any memory locations (and instead using CPU registers as much as possible), and code locality (instruction cache hits). An AI optimizing an algorithm does exactly the same thing a programmer does when trying to optimize his or her code -- trying out (in their head first, and then in practice) as few instructions as possible, while checking for the code execution validity in boundary conditions. This, however, has a nasty tendency of producing highly specialized code that usually has some very ugly side-effects when used outside of its intended (data) environment... which what lazy programmers always do, i.e. try to use somebody else's code when and where it was not mean to be used, which (almost always) causes all kinds of (unpredictable) catastrophic software/firmware/hardware failures ("it failed for no reason, whatsoever" being the usual excuse, post-fact). I have no doubt that people trying to use AI-developed optimizations are going to do exactly the same thing: Follow in lazy programmers' footsteps, and start causing all kinds of (unpredictable) catastrophic system failures... and the more AI-optimized the code becomes, the more catastrophic those failures are going to be... and, for some "inexplicable" reason, nuclear (including fusion) reactors come to mind right now.
Can someone explain how 3:35 Theorem of Trash uses brownies as a metric? Wouldn't milk+brownies be perfect equilibrium so milk by itself is also trash? My experience with trash theorems uses other metrics like pills, littering and non-indented code.
This is sick and all but I'm not convinced that AI would've come up with the Fast Inverse Square Root algo
very interesting point... there is MOST certainly local minima issues
How many humans would've come up with the fast inverse square root algorithm?
@@isodoubIet I don't believe in this idea that someone is super genius to come with this algorithm, and no matter how hard you tried you wouldn't. Optimization is a state of mind. It seems like magic from the outside. I've read of some very clever algorithm and someone told me "I would have never thought it no matter what, the guy is a genius". I might be cocky and I thought I would have found a similar algorithm trick if this guy didn't found before me. I have also written things that are "thinking out of the box" for retro demoscene platforms, that seem like crazy, but that's the magic if one has motivation. You start with something and you say "There is no way I can make this 10% faster" and then you insist and if you get in the right mentality you have something "3x faster" and you look back and think how is it possible? I think it's exagerrated both of how some clever algorithms came to be and how the compilers and AI are so smart that no human that is not a robot with IQ 200 can do.
@@Optimus6128 The vast majority of, say, physicists, couldn't have come up with Feynman's path integral formulation of quantum mechanics if they had a lifetime to think about it, so I have a hard time accepting that fundamental differences in ability simply don't exist. The fast inverse sqrt in particular requires at least two non-obvious jumps, one which is understanding that two iterations of Newton's method are "good enough" and faster enough to matter, and secondly the bit manipulation with the magic constant. The programmer in question would also have to know that the inverse square root is a good candidate for optimization in the first place.
@@isodoubIet I don't say that ability doesn't matter, but I don't see the myth that things are so magical that they need extraordinary ability. Feynmann must have looked at places where nobody dared to look. A lot of scientists are stuck in the same thinking boxes. Inverse sqrt, Newton's approximations are common in math, so one could easily experiment with different number of iterations and compare with the original square root. Bit manipulations with magic constants seem crazy, but there is a gradual approach till someone came there. It looks magic from a distance.
My process is to stare at code, pace for a bit, stare at code, come up with solution, find out that solution prevents next step from succeeding, refactor everything, profit. Except for the pacing, this is how AI currently functions. It's scary because this is my livelihood and they're so close to replicating it. :(
have you considered getting it right the first time?
@@mihailmojsoski4202 Always. But sometimes you gotta wipe twice.
00:00:00 - Alpha Dev discovers faster sorting algorithm 00:02:50 - Sorting and coding language discussion (C++, assembly instructions) 00:05:38 - Exploring algorithms & computer code in terms of sorting, assembly, & machine code 00:08:13 - Alpha Dev discovers new sorting algorithms leading to 70% speed improvement. 00:10:56 - Alpha Dev defeats go player w/ "Swap & copy move" technique 00:13:51 - Dev discovers faster sorting algorithms to improve hashing algorithm and release into open source. 00:16:33 - Alpha dev demonstrates optimizing algorithms, inspiring developers, & exploring high level language potential - A great use of AI.
why is tpope on the background at 18:48?
Can they write algorithms in JDSL though?
BIG QUESTION: - will this improve JSDL ?
I simulated the "3sort" algorithm on the input list 3,2,1, and the output is 3,2,3. It doesn't work. After the first block of cmovs, only S will be 1, and it can only overwrite anything if it is greater than some of the other ones. S being the minimum of these four registers is not a situation they've considered. The elided mov is actually really important.
Those look like sorting networks. Those are a predetermined sequence of min/max operations that always result in correctly sorted data for some fixed size
seems to me like the AI improved corner cases of some algorithms that people didnt optimize for previously. like sorting 3 or 4 elements, I'm not an expert but I would guess it was not worth to spend much time on this in the past for the standard library and whoever needed to optimize for that case did it in their own in house libraries. still a valid use case as a lot of things that were not worth time to optimize for before could be automated to be optimized like this without wasting human effort...
There's been a lot of work on this area because it's the inner loop of a bigger algorithm and is very hard to analyze and test. It was known that there were improvements there. You can find talks at major conferences about this.
If I understand this correctly given the little amount of information I have, it seems as though their AI did to assembly instructions what any decent static analyzer can do for a high level language.
Tell me if I'm dumb, but the algorithm of the AI is broken, is´n it? Let's say the min value of the 3sort algorithm is C. The return value of Memory[0] (the min value of the array after sorting) IS ALWAYS the min(A, B), so it can never be C. so...