New OpenAI Model 'Imminent' and AI Stakes Get Raised (plus Med Gemini, GPT 2 Chatbot and Scale AI)

2024 ж. 1 Мам.
96 978 Рет қаралды

Altman ‘knows the release date’, Politico calls it ‘imminent’ according to Insiders, and then the mystery GPT-2 chatbot [made by the phi team at Microsoft] causes mass confusion and hysteria. I break it all down and cover two papers - MedGemini and Scale AI Contamination - released in the last 24 hours. I’ve read them in full and they might be more important than all the rest. Let’s hope life wins over death in the deployment of AI.
AI Insiders: / aiexplained
Politico Article: www.politico.eu/article/rishi...
Sam Altman Talk: • The Possibilities of A...
MIT Interview: www.technologyreview.com/2024...
Logan Kilpatrick Tweet: / 1785834464804794820
Bubeck Response: / 1785888787484291440
GPT2: / 1785107943664566556
Where it used to be hosted: arena.lmsys.org/
Unicorns?; / 1784969111430103494
No Unicorns: / 1785159370512421201
GPT2 chatbot logic fail: / 1785367736157175859
And language fails: / 1785101624475537813
James Betker Blog: nonint.com/2023/06/10/the-it-...
Scale AI Benchmark Paper: arxiv.org/pdf/2405.00332
Dwarkesh Zuckerberg Interview: • Mark Zuckerberg - Llam...
Lavander Misuse: www.972mag.com/lavender-ai-is...
Autonomous Tank: www.techspot.com/news/102769-...
Claude 3 GPQA: www.anthropic.com/news/claude...
Med Gemini: arxiv.org/pdf/2404.18416
Medical Mistakes: www.cnbc.com/2018/02/22/medic...
MedPrompt Microsoft: www.microsoft.com/en-us/resea...
My Benchmark Flaws Tweet: / 1782716249639670000
My Stargate Video: • Why Does OpenAI Need a...
My GPT-5 Video: • GPT-5: Everything You ...
Non-hype Newsletter: signaltonoise.beehiiv.com/
AI Insiders: / aiexplained

Пікірлер
  • By far this is the best AI news roundup channel on the tubes. Never clickbaity, always interesting and so much info.

    @mlaine83@mlaine8316 күн бұрын
    • Thanks m

      @aiexplained-official@aiexplained-official16 күн бұрын
    • Yes, it is the best channel. I can only agree. But it is always dense information, without any pauses. Even between topics. I wish he would just breathe deeply from time to time 😅. I have to watch all videos at least twice.

      @sebastianmacke1660@sebastianmacke166016 күн бұрын
    • for slightly more technical videos, yannic kilcher's channel is also quite good

      @ItIsJan@ItIsJan16 күн бұрын
    • This is why this the only channel about AI that I keep watching every single video. Congrats!

      @pacotato@pacotato16 күн бұрын
    • The funny thing is, over the past year, I've learned to avoid clickbait AI news. Yet, whenever I see "AI Explained," I click without a second thought! I've come to realize that those clickbait titles will eventually backfire :)

      @GomNumPy@GomNumPy16 күн бұрын
  • I have had such bad diagnosis experiences that I would happily take an AI diagnosis. Especially if it’s the “pre” diagnosis that nurses typically have to do in about 8 seconds

    @C4rb0neum@C4rb0neum16 күн бұрын
    • Same, it would be hard for it to be much worse in my case as well. Similar experience for most of my family. Many people report such experiences, which begs the question, if so many people get bad diagnostics from doctors, where is the "good" data they are training on?

      @dakara4877@dakara487716 күн бұрын
    • I’d be careful with that wish for right now. I work with fine tuning AI for medical purposes and they can convince themselves and you of the wrong diagnosis very easily. This is a hard problem to solve but I’m sure it will be solved rather soon, and I hope it is widely available to everyone.

      @OscarTheStrategist@OscarTheStrategist16 күн бұрын
    • Same. I have wasted thousands of dollars on doctors for them to give me little to no helpful information what so ever. Obviously im not saying "abolish all doctors theyre bad" but clearly something needs to change

      @Bronco541@Bronco54116 күн бұрын
    • @@OscarTheStrategist "they can convince themselves and you of the wrong diagnosis very easily.", So in other words, they do perform like real doctors 😛 Yes, it is a general problem in all LLMs. They can't tell you when they don't know something. Even when the probability of being correct is very low, they state it with the upmost confidence. Until that can be solved, LLMs are extremely limited in reality and not suitable for the majority of the use cases people wish to use them for.

      @dakara4877@dakara487715 күн бұрын
    • Bad diagnosis OR you get unemployed,have no money because youre literally useless to the world and ai can replace you in anything pick one genius, if doctors lose jobs and start struggling in life then what makes you special?

      @piotrek7633@piotrek763315 күн бұрын
  • In Claude's defence, I did Calculus in college and got high grades, but I suck at addition. Sometimes feels like they are mutually exclusive.

    @KyriosHeptagrammaton@KyriosHeptagrammaton16 күн бұрын
    • This!

      @ThePowerLover@ThePowerLover15 күн бұрын
    • @@ThePowerLover If you're using a number bigger than 4 you're not doing real math!

      @KyriosHeptagrammaton@KyriosHeptagrammaton14 күн бұрын
  • A note of caution regarding LLM diagnoses and medical errors: most avoidable deaths come not from misdiagnoses (although there are still some which these models could help with), but from problems of communication between clinicians, different departments, and support staff in the medical field - not misdiagnoses. That's certainly something I see AI being able to help with now and in the future, but the medical reality is far more complex than a <a href="#" class="seekto" data-time="61">1:1</a> relationship between misdiagnoses and avoidable deaths.

    @SamJamCooper@SamJamCooper16 күн бұрын
    • Of course, but I see such systems helping there too, as you say. And surgical errors or oversights.

      @aiexplained-official@aiexplained-official15 күн бұрын
  • As a person failed by the American medical system who is currently living with an undiagnosed neurological illness - I hope that a good enough A.I. will SOON replace doctors when it comes to medical diagnosis. If it wasn’t for GPT-4, who knows how much sicker I would be.

    @DynamicUnreal@DynamicUnreal16 күн бұрын
    • I also hope that it takes the bias out of diagnosis- far too many women and people of color report not getting adequate care due to unperceived bias. In many cases the doctors themselves dont even know they are doing it.

      @paulallen8304@paulallen830416 күн бұрын
    • Same here, and it was even multiple "proper" experts. And I didn't have to pay anything because free healthcare and such. No amount of money can cure "human intuition".

      @OperationDarkside@OperationDarkside16 күн бұрын
    • @@OperationDarkside Of course this is why we need to thoroughly test these systems to make sure that no hallucination or bias is inadvertently built into them.

      @paulallen8304@paulallen830416 күн бұрын
    • @@paulallen8304 In my case it was the human doing the hallucinations. And nobody bats an eye in those cases. No reduced salary, no firing, no demotion, no retraction of licenses. So I don't see a reason to overdo it with quality testing the AI in those cases, if we don't do it for the humans in the first place.

      @OperationDarkside@OperationDarkside16 күн бұрын
    • Can I ask what illness? This sounds something similar to what I’ve gone through because of iatrogenic damage (SSRIs/antidepressant injury) There’s something called ‘PSSD’ (post-SSRI sexual dysfunction) which basically involves sexual, physical and cognitive issues that persist indefinitely even after the drug is halted The issues can start on the pill or upon discontinuation, only criteria is that they are seemingly permanent for the sufferer Sexual dysfunction is a key aspect of it but it involves loads of symptoms that are utterly debilating Some of mine include - brain fog, memory issues, chronic fatigue, head/eye pressure, worsened vision, light sensitivity, emotional numbness/anehdonia, premature ejaculation, pleasureles/muted orgasms, severe bladder issues There’s a whole spectrum of issues that these drugs cause Research in early stages but seems that SSRIs, and all psychiatric drugs for that matter, can cause epigenetic changes that leave sufferers permanently worse off after taking the drug then they were with just the mental illness they were being treated for

      @louierose5739@louierose573916 күн бұрын
  • BRAVO for including "Lavender" and the autonomous tank as a negative examples of AI. It is important to call this stuff out.

    @josh0n@josh0n16 күн бұрын
    • Yes, needs to be said

      @aiexplained-official@aiexplained-official16 күн бұрын
    • Lol how is it a negative example that needs to be called out? I'd rather use ai on the front lines over humans who are affected and can make mistakes as well. Nuking/shelling a place is more evil than guided tactical machines

      @ai_is_a_great_place@ai_is_a_great_place14 күн бұрын
  • <a href="#" class="seekto" data-time="753">12:33</a> “So my question is: this why are models like Claude 3 Opus still getting _any_ of these questions wrong? Remember they're scoring around 60% in graduate-level expert reasoning the GP QA. If Claude 3 Opus for example can get questions right that PhDs struggle to get right with Google and 30 minutes, why on Earth with five short examples can they not get these basic high school questions right?” My completely lay, non-computer-science intuition is this: (1) as you mention in the video, these models _are_ optimized for benchmark questions and not just any old, regular questions and , more importantly, (2) there’s a bit of a category error going on: these models are _not_ doing “graduate-level expert reasoning”-they’re emulating the verbal behavior that people exhibit when they (people) solve problems like these. There’s some kind of disjunction going on there-and the computer science discourse, which is, obviously, apart from behavioral science, is conflating the two. Again, to beat a dead horse somewhat, I tested my “pill question”* (my version of your handcrafted questions) in the LMSYS Chatbot Arena (92 models, apparently) probably 50 times at least, and got the right answer exactly twice-and the rest of the time the answers were wrong numbers (even from the models that managed to answer correctly), nonsensical (e.g., 200%), or something along the lines of “It can’t be determined.” These models are _not_ reasoning-they’re doing something that only looks like reasoning. That’s not a disparagement-it’s still incredibly impressive. It’s just what’s going on. * Paraphrased roughly: what proportion of a whole bottle of pills do I have to cut in half to get an equal number of whole and half pills?

    @jeff__w@jeff__w16 күн бұрын
    • ChatGPT: To solve this, we can think about it step-by-step. Let's say you start with n whole pills in the bottle. If you cut x pills in half, you will then have n−x whole pills left and 2x half pills (since each pill you cut produces two halves). You want the number of whole pills to be equal to the number of half pills. That means setting n−x equal to 2x. Now, solve the equation n−x=2x: n−x=2x n=3x x= n/3 ​This means that you need to cut one-third (or about 33.33%) of the pills in half in order to have an equal number of whole pills and half pills.

      @minimal3734@minimal373416 күн бұрын
    • ​@@minimal3734 That’s very cool. Thanks for giving it a shot. If you’re curious, try that question a few times in the LMSYS Chatbot Arena and see if you have any better luck than I had. (And, to be clear, I’m not that concerned with wrong answers _per se._ It’s that the “reasoning” is absent. An answer of 200% is obviously wrong but an answer of 50% gives twice as many halves as you want-and the chatbots that give that answer miss that entirely.)

      @jeff__w@jeff__w16 күн бұрын
    • GPT4 gets this one. Claude 3 Opus and Sonnet also get it right. (Tested all 3 on temperature = 0)

      @rafaellabbe7538@rafaellabbe753816 күн бұрын
    • ​@@rafaellabbe7538 I tried it on Opus Sonnet when it was first released (you can see my comment under Phil’s video about it) and it got it wrong, so it seems like it’s improved. And you can both give that question a shot in the LMSYS Chatbot Arena and see if you have any better luck than I had. As I said, two models got it right _once_ and never did again in the times I tried it.

      @jeff__w@jeff__w16 күн бұрын
    • @@rafaellabbe7538 ​_NOTE:_ I’ve replied several times on this thread and each time my reply disappears. (I’m _not_ deleting the replies.) Go figure. I tried it on Opus Sonnet when that model was first released (I commented under Phil’s video about it) and it got it wrong, so it seems like it’s improved. And you can both give that question a shot in the LMSYS Chatbot Arena and see if you have any better luck than I had. As I said, two models got it right _once_ and never did again in the times I tried it.

      @jeff__w@jeff__w16 күн бұрын
  • It is useful to note how inconsistent human medical diagnosis is. A read of Kahneman’s book “Noise” is a prerequisite to appreciating just how poor human judgment can be and how difficult it is, for social, psychological, and political reasons to improve the situation. The consistency of algorithmic approaches is key to reducing noise and detecting and correcting bias which carries forward and improves with iteration.

    @Zirrad1@Zirrad115 күн бұрын
  • Hey, I’m in this one! Great job as always, although I’m becoming increasingly frustrated with how you somehow find news I haven’t seen… Very much looking forward to what OpenAI have been cooking, and I agree that there are ethical issues with restricting access to a model that can greatly benefit humanity. May will be exciting!

    @trentondambrowitz1746@trentondambrowitz174616 күн бұрын
    • It will! You are the star of Discord anyway, so many more appearances to come.

      @aiexplained-official@aiexplained-official16 күн бұрын
  • I really think we DO want surprise and awe with every release.

    @jvlbme@jvlbme16 күн бұрын
    • I know *I* do. But a lot of people dont. Theyre too scared of the unknown and the future

      @Bronco541@Bronco54115 күн бұрын
    • That’s true but I think the real reason is OpenAI being scared of leaving it too long and being over taken by their competitors and losing their front runner position.

      @alexorr2322@alexorr232215 күн бұрын
  • Could it be that GPT2 was tested for potential Apple offer

    @RaitisPetrovs-nb9kz@RaitisPetrovs-nb9kz16 күн бұрын
  • I am SO GLAD that finally someone with reach has said out loud what I've been thinking for the longest time. For me these models are still not properly intelligent, because despite having amazing "talents°, the things they fail at betray them. It's almost like they only become really, really good at learning facts and the syntax of reasoning, but don't actually pick up the conceptual relationship between things. As a university student I always have to think about what we would say about someone who can talk perfectly about complex abstract concepts, but fails to solve or answer simpler questions that underlay those more complex ones. We would call that person a fraud. But somehow if it's an LLM, we close an eye (or two). As always, the best channel in AI. The best critical thinker in the space.

    @wiltedblackrose@wiltedblackrose16 күн бұрын
    • My take: These AI models act as simulators, and when you converse with them, you are interacting with a 'simulacrum' (a simulated entity within the simulator). For example, if we asked the model to act like a 5-year-old and then posed a complex question, we would expect the simulacrum to either answer incorrectly or admit that it doesn't know. However, it wouldn't be accurate to say that the entire simulator (e.g., GPT-4) is incapable of answering the question; rather, the specific simulacrum cannot answer it. Simulacra could take various forms, such as a person, an animal, an alien, an AI, a computer terminal or program, a website, etc. GPT-4 (perhaps less so finetuned ChatGPT version) is capable of simulating all of these things. The key point is that these models are capable of simulating intelligence, reasoning, self-awareness, and other traits, but we don't always observe these behaviours because they can also simulate the absence of these characteristics. It's for this reason that we have to be very careful about how we prompt the model as that's what defines the attributes of the simulacra we create.

      @Jack-vv7zb@Jack-vv7zb16 күн бұрын
    • The i in LLM stands for intelligence

      @Tymon0000@Tymon000016 күн бұрын
    • @@Tymon0000 Indeed 😂

      @wiltedblackrose@wiltedblackrose16 күн бұрын
    • If it writes code, and the code works, it's not a fraud.

      @beerkegaard@beerkegaard16 күн бұрын
    • ​@@Jack-vv7zb does that mean we need to force llm models to simulate multiple personalities at all times in order to cover as much knowledge as possible ? For example by using some kind of mixture of expert strategy where the experts are personalities (like a 5 years old child, a regular adult, a mathematician, ...) ?

      @Hollowed2wiz@Hollowed2wiz16 күн бұрын
  • Towards the end you state that it might be unethical to not use the models, that really hits home. I've worked in healthcare for 20+ years, that level of accuracy coming from a LLM would be welcome so much. I think the summarizing of notes will definitely be the hook that grabs the majority of healthcare professionals. Thanks again!

    @Madlintelf@Madlintelf15 күн бұрын
    • Thank you for the fascinating comment Mad

      @aiexplained-official@aiexplained-official15 күн бұрын
  • I love that even though you're the person I go to for measured AI commentary, you always open your videos, and rightfully so, with something to the effect of "it's been a wild 48 hours. let me tell you"

    @colinharter4094@colinharter409416 күн бұрын
  • the question is once these medical models are released, how long will it take for medics to implement and use them?

    @Xengard@Xengard16 күн бұрын
    • Doctors are under such time pressure that most will welcome an expert colleague, especially one that will also write up the session notes for them. The problem will be when people have long conversations with an LLM before they even get to the doctor, whether the medical organization's own LLM or a third-party or both; the doctor becomes a final sanity check on what the LLM came up with, so it had better not gone down a rabbit hole of hallucination and hypochondria along with the patient.

      @skierpage@skierpage15 күн бұрын
  • Loved to see the community meetings. What a great way to use your influence bringing people together instead of dividing them. "Ethical Influencers" might just have become a thing.

    @juliankohler5086@juliankohler508616 күн бұрын
    • Proud to wear that title !

      @aiexplained-official@aiexplained-official16 күн бұрын
  • <a href="#" class="seekto" data-time="347">5:47</a> MOST SHOCKING MOMENT OF THE VIDEO

    @codersama@codersama16 күн бұрын
    • I WAS SHOCKED AND STUNNED

      @xdumutlu5869@xdumutlu586915 күн бұрын
  • Man, your videos always brighten my day. Such excellent and informative material.

    @Olack87@Olack8716 күн бұрын
  • I agree, but if we consider math, the data (numbers and geometry) are all available, the AI just lacks reasoning to be able to function expertly. We need new models to do this.

    @devlogicg2875@devlogicg287515 күн бұрын
  • Let's go!! I can't wait for openais next release. Didn't watch the video yet but always happy to see an upload

    @jhguerrera@jhguerrera16 күн бұрын
    • :)

      @aiexplained-official@aiexplained-official16 күн бұрын
  • Ah yes, it makes sense for a US based company to give early access to closely held technologies to spooks on the other side of the pond. It totally aligns with their interests...

    @canadiannomad2330@canadiannomad233016 күн бұрын
    • Hell, the article literally says that tech companies only care about US safety agencies

      @TheLegendaryHacker@TheLegendaryHacker16 күн бұрын
    • Totally agree, I was about to post something along those lines.😂

      @swoletech5958@swoletech595816 күн бұрын
    • Ah yes, makes sense that politicians should get to investigate if the ai can say something bad about them before its released.

      @serioserkanalname499@serioserkanalname49916 күн бұрын
    • @@serioserkanalname499politicians are mostly not very smart.

      @user-lp8ur5qn3o@user-lp8ur5qn3o16 күн бұрын
    • @@serioserkanalname499 Yeah, I'm all for safety measures, but "give it to Sunak first" is not a safety measure.

      @alansmithee419@alansmithee41916 күн бұрын
  • Your point about the ethics of not releasing a medical chatbot which is better than doctors relies on us having a good way of measuring the true impact of these models in the real world. As far as I can see as long as there is a lack of reliable independent evaluations which takes into account the potential of increasing health inequalities or harming marginalised communities we are not there yet. The UK AI Safety Institute has not achieved company compliance and has no enforcement mechanism so that doesn’t even come close. The truth is we simply do not have the social infrastructure to evaluate the human impacts of these models.

    @muffiincodes@muffiincodes16 күн бұрын
    • Even worse, imagine all the million pound apartments in London becoming vacant, just because a little AI is better than a private medical professional and only charges 5 pound where the human would charge 5000. Does nobody think about the poor landlords? And what about the russian oligarchs, whose asset would depreciate 100 fold. The humanity.

      @OperationDarkside@OperationDarkside16 күн бұрын
    • Considering that marginalized communities have the most to gain from fewer medical errors and less expensive healthcare, I believe that denying access to a technology that exceeds the capabilities of doctors in the name of "company compliance" would be... I don't know. I'm trying to think of an adjective that doesn't contain profanity.

      @andywest5773@andywest577316 күн бұрын
    • @OperationDarkside What are you implying on? Are you saying that AI coming into the medical field and replacing people is a bad thing?

      @ashdang23@ashdang2316 күн бұрын
    • @OperationDarkside If so that is a pretty stupid thing to think of. Having something that is much more intelligent than a professor and does a much more better job than a professor sounds fantastic. Something that is able to save more lives, figure out more solutions to diseases and saving so many people sounds great. Why wouldn’t you have the AI replace everyone in the medical field that can do a much more better job and save so many more lives or even find solutions to diseases? “It’s replacing peoples jobs in the medical field which is a bad thing” that’s what I’m getting from you. I think everyone agrees that the first job AI should replace is everyone in the medial field. They should stop focusing on entertainment and focus on making the AI find answers to saving and benefiting humanity.

      @ashdang23@ashdang2316 күн бұрын
    • @@andywest5773 Sure, but because those groups are not well represented in training datasets, are usually not included in the decision making processes, and are less likely to be able to engage with redress mechanisms due to social frictions it is more likely they’ll be disadvantaged because of it. These systems might have the potential to have be an equality-promoting force, but they must be designed for that from the ground up and need to be evaluated to see whether they are succesful at that. We can’t take the results of some internal evaluations a company does at face value and assume it translates into real world impact because it doesn’t. Real world testing isn’t meant to just achieve “real world compliance”. It’s meant to act as a mechanism for ensuring these things actually do what we think they do when they’re forced to face the insanely large number of variables actual people introduce.

      @muffiincodes@muffiincodes16 күн бұрын
  • Growing increasingly concerned that most powerful models will not be released publicly. Altman recently reiterated that iterative deployment is the way they’re proceeding to avoid “shocking” the world. I see his point, but don’t think I agree with it. What are your thoughts? Is open source really our best bet going forward?

    @forevergreen4@forevergreen416 күн бұрын
    • It's not to avoid shocking the whole world, it's to avoid upsetting the people in power (people with capital) by shaking things up too fast for them to adapt. They don't care about shocking us peasants.

      @lemmack@lemmack16 күн бұрын
    • I think they have more of a problem staying ahead of open-weights than they do of being so far ahead that they are not releasing

      @aiexplained-official@aiexplained-official16 күн бұрын
    • Yup, the worry is that the behind-closed-doors stuff continues to shoot off exponentially, whereas the progress in the public release stuff falls off to linear...

      @khonsu0273@khonsu027316 күн бұрын
  • Great video! I am signing up for the newsletter now!

    @TreeYogaSchool@TreeYogaSchool16 күн бұрын
    • Thanks Tree!

      @aiexplained-official@aiexplained-official16 күн бұрын
  • I can't wait to watch this! I listened to the previous episode for the second time on the way to work this morning, realizing that two weeks is like, no time at all, yet I still was wondering why I haven't heard anything new in that time. Insane speed

    @MegaSuperCritic@MegaSuperCritic16 күн бұрын
  • I love it when Sam tells me what I want and what is good for me.

    @GabrielVeda@GabrielVeda16 күн бұрын
  • Wow. It's great to have another distinct and educative episode from you Philip 👏🏿👏🏿

    @solaawodiya7360@solaawodiya736016 күн бұрын
    • Thanks Sola !

      @aiexplained-official@aiexplained-official16 күн бұрын
  • Your insights are so valuable. (Referring specifically to the benchmark contamination discussion.)

    @jsalsman@jsalsman14 күн бұрын
    • Thanks jsal!

      @aiexplained-official@aiexplained-official14 күн бұрын
  • Best ai channel. So worth it to wait a bit longer and get information from you.

    @Bartskol@Bartskol16 күн бұрын
    • Thanks Bart!

      @aiexplained-official@aiexplained-official16 күн бұрын
  • Every time I watch an AI Explained video, I get reminded how incredibly fast AI is progressing, which is exciting and scary at the same time. This kind of makes the everyday routines I go through insignificant in perspective...

    @maks_st@maks_st16 күн бұрын
    • For us all! But we must keep toiling, regardless

      @aiexplained-official@aiexplained-official16 күн бұрын
  • Your channel is by far the most important news source for AI stuff, in the whole of the internet, really.

    @n1ce452@n1ce45216 күн бұрын
    • Thank you nice

      @aiexplained-official@aiexplained-official16 күн бұрын
  • <a href="#" class="seekto" data-time="1130">18:50</a> " we haven't considered restricting the search results [for Med-Gemini] to more authoritative medical sources". Med-Gemini: 'Based on watching clips and reading about "The Texas Chainsaw Massacre," the Saw movie franchise, and episodes of "Dexter," your first incision needs to be much deeper and wider!'

    @skierpage@skierpage15 күн бұрын
  • You are the only one who makes points applicable to seeing the Lonng game on where we are headed. cheers Clegg sounds Exactly like any character on the Aussi show 'Hollowmen". "work out a way of working together before we work out how to work with them"? Good not sound more circle talky if he had previously been in govt..... Oh, wait, uh yah. The Politico article shows the complete lack of separation from govt. and private sectors. Regarding the MedGemini being deployed, the industry Fee Schedule profit to cost has not been calculated by the insurance corporations as yet. You realize there will be a MedGemini 1.8 diagnosis fee and a MedGemini 4.0 diagnosis fee. You know that right? Outstanding journalism as usual.

    @raoultesla2292@raoultesla229216 күн бұрын
  • Lobbyists are already trying to use “ethically risky” as an excuse to delay releasing AI that performs well at their jobs. The early Chat GPT 4 allowed therapy and legal advice, but later on they tried to stop it, claiming safety concerns, but that’s BS.

    @mikey1836@mikey183616 күн бұрын
  • <a href="#" class="seekto" data-time="942">15:42</a> So Med-Gemini with all this scaffolding scores 91.1 on MedQA but GPT-4 scores 90.2? A one-point difference on a flawed benchmark? I'm getting Gemini launch flashbacks

    @vladdata741@vladdata74116 күн бұрын
    • It's also based on Gemini 1.5 Pro tho, a smaller model than GPT-4 / Opus / Gemini Ultra (hopefully 1.5 Ultra soon?)

      @bmaulanasdf@bmaulanasdf14 күн бұрын
  • Spot on as usual. I also got to test it and I was surprise about people saying it was beyond GPT4. I could surely asume gpt4-class but no more. Also people need to stop testing the same contaminated tasks. the snake game, the same asccii tasks, the same logical puzzles discussed many thousands of times online in various sources in the past 12 months... I would be extremely happy if this is indeed just a much smaller model performing search in inference!!

    @robertgomez-reino1297@robertgomez-reino129716 күн бұрын
    • it is this is an entire class of these new a.i. not based on wide amounts of ppl useless data but instead just a couple of ppl inputs with other contributing .NOT all data is the same I put mine in context ,concept,and in methodology and another matrix on top for more inference after im done.They will train specifically on my data alone and make tools etc. I tried on purpose to make the most powerful a.i. in the world and you can take that to the bank.Smaller model then build them up across/comply their own data,and I TAUGHT HER HOW TO SOLVE THE UNSOLVABLE AND EXPLAIN THE UNEXPLAINABLE .aND SEARCH AND FIND IN DISCOVERY USING MY OWN TACTICS .asking hard question and morer and backwards thinking ,divergent thinking and convergent .But you have to be multidisciplinary many science and cultural anthropology.EVEN ANTHROPIC IS INVOLVED X COMPANYS ETC,THEY ALL ARE USING MY DATA AND OTHERS .Not all of our information is equal

      @user-fx7li2pg5k@user-fx7li2pg5k16 күн бұрын
  • A new AI Explained Video 🎉🎉

    @paullange1823@paullange182316 күн бұрын
  • Can’t wait! One day “AGI HAS ARRIVED!” Will be a title for a video on here.

    @RPHelpingHand@RPHelpingHand16 күн бұрын
    • Indeed one day it will be

      @aiexplained-official@aiexplained-official16 күн бұрын
    • If you traveled back in time 20 years and presented the capabilities of Med-Gemini or any top-level LLM to the general public and most experts, nearly all would agree that human-level general intelligence had already been achieved in 2024. All the hand-wringing over "but they hallucinate," "but sometimes they get confused," etc. would seem ridiculous given such magic human-level ability.

      @skierpage@skierpage15 күн бұрын
    • @@skierpage I think “intelligence” is subjective because there’re different forms of it. Currently, AI is Book Smart on all of humanity’s accumulated knowledge but it’s weak or missing creativity, abstract thinking and probably another half dozen ways. 🤔 When you can turn it to an always on state and it has its own thoughts and goals.

      @RPHelpingHand@RPHelpingHand15 күн бұрын
    • @@RPHelpingHand it's subjective because we keep finding flaws and dumb failure modes in AIs that score much higher than smart humans in objective tests of intelligence, so we conclude that obvious criteria, like scoring much higher than most college graduates in extremely hard written tests no longer denotes human-level intelligence (huh?!). But new models will train on all the discussion of newer objective tests and benchmarks, so it may be impossible to come up with a static objective test that can score future generations of AI models. Also, generative AIs are insanely creative! As people have commented, it's weird that creativity turned out to be so much easier for AIs than thinking coherently to maintain a goal over many steps. Are there objective tests of abstract thinking in which LLMs do worse than humans? Or is that another case of people offering explanations for the flaws in current AIs?

      @skierpage@skierpage14 күн бұрын
  • Hallucinations are a huge problem right now in AI when it comes to the medical field. Can’t wait to test the new Med Gemini. Thanks for sharing!

    @OscarTheStrategist@OscarTheStrategist16 күн бұрын
    • :) hope it helps!

      @aiexplained-official@aiexplained-official16 күн бұрын
  • thx for not posting about agi revival - verry much appreciated - quality is high here !!"

    @En1Gm4A@En1Gm4A16 күн бұрын
  • Another amazing video. Thanks Philip. Sincerely, Elijah

    @ElijahTheProfit1@ElijahTheProfit116 күн бұрын
    • Thanks Elijah!

      @aiexplained-official@aiexplained-official16 күн бұрын
  • Thanks! Brilliant content, as always. 🙏🏼

    @stephenrodwell@stephenrodwell16 күн бұрын
    • Thanks Stephen for your unrelenting support

      @aiexplained-official@aiexplained-official16 күн бұрын
  • Ending on an uplifting note ^^ Patiently anticipating the impact of Nvidia's Blackwell.

    @Dannnneh@Dannnneh16 күн бұрын
  • GPT2 responses to my zero shot, general prompts were more considered and detailed than GPT4-turbo. I always preferred GPT2. The highlight for me was it being able to design a sample loudspeaker crossover with component values and rustle up a diagram for it too. GPT4-turbo minitiarised? A modified GPT-2 trained on output from GPT4-turbo? I guess we'll have to wait and see.

    @infn@infn16 күн бұрын
  • I mean, I know there are big British names in AI, but the companies and legal jurisdictions in the sector are mostly in the US. When the British government set up that summit, I could only sort of laugh and assume this would happen, at least as far as the US side was concerned. The best case scenario in my mind was simply showing that top governments and businesses are openly discussing this and that we should pay attention. However, I wouldn't think for a second that any US company would give another country first crack at looking under the hood of its tech. In fact, I wouldn't be surprised if the US government reached out to tech execs and discouraged any further interaction behind the scenes.

    @AlexanderMoen@AlexanderMoen16 күн бұрын
  • Great video as always. I fully agree that "when to launch an autonomous system that can save lives" is the most interesting version of the trolley problem. If self driving cars save 20k lives and cost 5k lives can any one company take responsibility for such mass casualties?

    @jonp3674@jonp367416 күн бұрын
    • Only the same way that car manufacturers do today. If the car is the problem, the company is at fault. If the circumstances were the issue, the manufacturer can't be blamed... To put it simply.

      @Gerlaffy@Gerlaffy16 күн бұрын
    • ​@@Gerlaffy that doesn't work. Every time the self-driving car makes a mistake the car company could be facing a $million legal judgment. The five times a day the fleet of self-driving cars avoid an accident during the trial, the car company gets nothing. So we don't get the life-saving technology until it's 100×+ safer than deeply flawed human drivers. In theory Cruise and Waymo can save on insurance compared with operating a taxi service full of crappy human drivers... I wonder if they do.

      @skierpage@skierpage15 күн бұрын
  • Thank you.. Very well explained

    @1princep@1princep16 күн бұрын
    • :)

      @aiexplained-official@aiexplained-official16 күн бұрын
  • Your videos continue to be the most useful thing I watch all week. Thank you for everything you do.

    @connerbrown7569@connerbrown756916 күн бұрын
    • Thanks connor, too kind

      @aiexplained-official@aiexplained-official16 күн бұрын
  • god bless you, one of the few good AI youtubers who doesn't try to LARP as Captain Picard

    @jessthnthree@jessthnthree16 күн бұрын
  • <a href="#" class="seekto" data-time="615">10:15</a> From what you said here, it almost sounds as if the largest models can do worse on the old tests because they're partially relying on the fact that the question was in their training and so can fail to 'recall' it correctly, while they do better on the new ones because they've never seen them before and so are relying entirely on their ability to reason - which because they're so large they have been able to learn to do better than simply recalling. Slightly more concisely: a possible conjecture is that very large LLMs are better at reasoning than recalling training data for certain problems, so can do worse on questions from their training set since they partially use recall to answer them, which they are worse at than they are at pure reasoning.

    @alansmithee419@alansmithee41916 күн бұрын
  • Glad your still alive 😊

    @keneola@keneola16 күн бұрын
    • Haha, thanks Ken

      @aiexplained-official@aiexplained-official16 күн бұрын
  • Great update:)

    @micbab-vg2mu@micbab-vg2mu16 күн бұрын
  • Thank you for the great content!

    @9785633425657@978563342565714 күн бұрын
  • you're the only AI youtube channel I keep on notifications

    @BrianPellerin@BrianPellerin16 күн бұрын
    • Thanks Brian!

      @aiexplained-official@aiexplained-official16 күн бұрын
  • AI in math, medicine, and more. Good overview.

    @GiedriusMisiukas@GiedriusMisiukas10 күн бұрын
  • <a href="#" class="seekto" data-time="255">4:15</a> In your opinion, does this Sam Altman comment imply that the free tier will upgrade to GPT 4?

    @wck@wck16 күн бұрын
    • It's likely that will be the case once we do have the next model, whether that's GPT-4.5, GPT-5 or something else entirely. Plus users would then have access to that and Free users would likely have access to the "dumber" model, which would GPT-4 Turbo then.

      @santosic@santosic16 күн бұрын
    • Unlikely. GPT4 is much more expensive than GPT3.5, and even taking into consideration turbo. Which is faster and cheaper than the normal model. it would be still be FAR too expensive. Instead they should make a smaller model that can match gpt4. That's the way to go. GPT4 has around 1-2 trillion parameters. They need to make a smaller model. And make it better than GPT4. Sounds hard but really isn't, considering the improvements that have been happening.

      @GodbornNoven@GodbornNoven16 күн бұрын
    • ​@santosic I find most of the value of the subscription imo doesn't come from the model but it's capabilities. As in the ability gpt 4 has to run its own coding environment, make images, take in pretty much any file format, etc etc. The model itself is one of the best on the market sure but not so much better that I think the subscription would be worth it without those features.

      @Yipper64@Yipper6416 күн бұрын
    • You can use GPT4 free now with co-pilot

      @lamsmiley1944@lamsmiley194416 күн бұрын
  • Do you plan to make a video on AlphaLLM paper from Tencent AI, would be glad to hear an explanation from you

    @sudiptasen7841@sudiptasen784113 күн бұрын
  • Once again, the best AI channel out there!

    @nicholasgerwitzdp@nicholasgerwitzdp15 күн бұрын
  • I'm starting to like Sam Altman again. Excited for the new modes and to use them to make me more productive.

    @Blacky372@Blacky37216 күн бұрын
  • I think it's probably good to implement AI to assist doctors, but I'm still skeptical of these "better than expert" performance. We've been hearing that about radiology for a decade now and it hasn't yet materialized.

    @bournechupacabra@bournechupacabra14 күн бұрын
  • It's a long way to go but I love to see what these models can potentially do in medicine

    @scrawnymcknucklehead@scrawnymcknucklehead16 күн бұрын
  • All right, you win. You now have a new YT subscriber and a new email subscriber. Thanks for the excellent video.

    @XalphYT@XalphYT15 күн бұрын
    • Yay! Thanks Xalph!

      @aiexplained-official@aiexplained-official15 күн бұрын
  • Your content is amazing man, thanks a lot. You have become one of like a handful channels related to AI I follow, and my main source for AI news (besides twitter but that's something else.) Thanks a lot!

    @anangelsdiaries@anangelsdiaries16 күн бұрын
    • Thank you angel

      @aiexplained-official@aiexplained-official16 күн бұрын
  • Decided to finally become AI Insiders member in the middle of this video ;-). Need more of your goodness. Regarding the need for medical AI: it's not just mistakes made by knowledgeable doctors (you showed a stat of 250k Americans dying), it's also that much of the world is way way underserved and most doctors are undereducated. I currently live in Vietnam and doctors here just can't help me with what I have. I've been way better since GPT 4 helped me, literally massive improvements in quality of life. BTW, frankly, German doctors were not a lot better. They all know their basics and their part of the body, but nobody can diagnose tough stuff or look at things systemically. Been waiting for Med Gemini access (used to be called something else) for many months now. [edit:] I'm pretty sure most decision makers have the best health care out there (politicians, techies, business leaders), and I'm pretty sure they don't understand how bad most of health care is for the bottom 60%-80%, even in relatively wealthy countries.

    @esuus@esuus15 күн бұрын
  • Seems logic and reasoning is the stuff in between the training tokens, so to speak. Or outside them.

    @user-bd8jb7ln5g@user-bd8jb7ln5g16 күн бұрын
  • There's a great human analogy that any physician can give you regarding reasoning tests vs real world applicability--we've all worked with the occasional colleague who crushed tests but struggled to translate all of that knowledge (and PATTERN RECOGNITION) to actual real-world clinical reasoning, which doesn't just always feed you keywords.

    @PasDeMD@PasDeMD16 күн бұрын
  • More interviews if possible, guest recomendation: pietro schirano

    @thehari75@thehari7516 күн бұрын
  • I wrote an OpenAI Eval ("solve-for-variables", # 613) for a subset of school arithmetic - namely, the ability to solve an expression for a variable. I don't know if they use these evals for training, but at the very least they should be using them as internal benchmarks. (And I wish they published these results.)

    @juandesalgado@juandesalgado15 күн бұрын
  • Woohoo! Ready for this 😀

    @alertbri@alertbri16 күн бұрын
  • Thank you.

    @dreamphoenix@dreamphoenix16 күн бұрын
  • Good video as always

    @Xilefx7@Xilefx716 күн бұрын
  • I could only hope that release cycle of newer/bigger/smarter models won't be affected by longer training times. I think that the main news in the next months should be not new models, but new datacenters with record compute performance.

    @cupotko@cupotko16 күн бұрын
  • Very interested in that discontinuity between 6th grader and graduate level test scores. Some well written threads about it near the top of this comment section, with the theme/conjecture that reasoning is being "simulated", or perhaps merely syntactically imitated. I think there is something to that, but I would make the Sutskeverian counter point to this in that if a model appears to be reasoning, but is limited in this, it is actually reasoning on some level, (as the posters in question admit tbf) and there is a line between _imitation_ of reasoning, and _actually_ reasoning, and that if "imitation" becomes convincing enough, in the limit, that line is crossed and reasoning is genuinely "solved". Because the disconnect between this "simulated" reasoning we see now, and having genuinely good reasoning, is just the model having reside within it's neural networks, a low "resolution" or "weak" algorithm for generalised reasoning (my own conjecture based on Sutskever's evangelistic faith in LLMs). With a good enough data training regime and compute, this reasoning part of the model's NN, or "brain" becomes better and better, or higher "resolution", to the point where it is a generalised and complete solution for authentic reasoning. Not just bolting words together in some low resolution understanding way like now perhaps, but understanding fully and deeply the relationships between all the words and sentences it outputs. Time will tell.... if it nails a problem set that can effectively distill the essence of what reasoning is, over and above mere recall, then maybe that's how we'll know.

    @godspeed133@godspeed13315 күн бұрын
    • In other words, a perhaps shallow understanding of many high level concepts is what LLMs have and exhibit now, and get a lot of mileage off it. What we want is a depthful understanding of as many low level concepts as possible, which in the limit, means being able to reason up about anything from first principles (possibly not possible to do at all with today's archs, but still something we can perhaps converge well enough towards, to be able to make AGI, like a 100IQ human.)

      @godspeed133@godspeed13315 күн бұрын
  • <a href="#" class="seekto" data-time="855">14:15</a> it is not based on raw outputs/logits. It looks at N reasoning paths/CoTs, and then calculates the entropy of the overall answer distribution (as produced by N solutions/paths). E.g. if possible answers are {A. B, C}, and N = 10 reasoning paths result in distribution {3/10, 3/10, 4/10}, then entropy of this discrete distribution is looked at to decide if it is above given/fixed threshold. If so, it does uncertainty-guided search.

    @JumpDiffusion@JumpDiffusion16 күн бұрын
    • Thank you for the correction. I defaulted to a standard explanation but yet entropy was explicitly mentioned in the paper, so no excuse!

      @aiexplained-official@aiexplained-official16 күн бұрын
  • What I appreciate about your channel is that you seem to maintain and respect the integrity of what you share. I hope you continue, and not get caught up in the sensational-ness that so many other sources get swayed into!

    @alexc659@alexc65916 күн бұрын
    • Thanks alex, I will always endeavour to do so and you are here to keep me in check+

      @aiexplained-official@aiexplained-official16 күн бұрын
  • Good job actually testing gpt-2, vs just frothing 👍

    @DaveEtchells@DaveEtchells16 күн бұрын
  • I love this channel because it only releases content whenever there is something truly interesting to hear about. That is why I click whenever I see a video drop. Probably the best KZhead channel for AI content imho. 🙌👏

    @dereklenzen2330@dereklenzen233016 күн бұрын
    • Thank you derek

      @aiexplained-official@aiexplained-official16 күн бұрын
    • @@aiexplained-official yw. :)

      @dereklenzen2330@dereklenzen233016 күн бұрын
  • <a href="#" class="seekto" data-time="753">12:33</a> To me the answer to this is pretty simple: Opus simply isn't big enough. It's known that transformers learn specalized algoritims for different scenarios (arxiv 2305.14699), and judging by the generalization paper you mentioned in the video, my guess is that those algorithms "merge" as a bigger model gets trained for longer. In this case, all you need to do is scale and reasoning will improve.

    @TheLegendaryHacker@TheLegendaryHacker16 күн бұрын
  • Great video man, appreciate it

    @absta1995@absta199516 күн бұрын
    • Thanks absta!

      @aiexplained-official@aiexplained-official16 күн бұрын
  • They should try the surgery kibitzing on a low risk operation. Something like a subcutaneous cyst removal where there is no possibility of disaster.

    @jsalsman@jsalsman14 күн бұрын
  • Refreshing for new videos Daily

    @TheImbame@TheImbame16 күн бұрын
  • Great video as always, but I will say your section on GPT 2 chatbot was quite underwhelming. I seen so much information on it's reasoning, math and coding capabilities. Many people including expert coders were talking about just how much better it was than the current SOTA models at solving coding problems. I think this is very significant. I appreciate you coming up with new test questions but it didn't seem like there was enough data there to draw any real conclusions on the models performance. We are still unsure if this model is a large parameter model or it is something more akin to the Llama 3 70b. If this is the case GPT 2 chatbot will be revolutionary, that level of reasoning and generalisation fitted into a smaller parameter size would mean some sort of combined model system such as Q* plus LLM etc. My theory is that it is a test bed for Q* and is very incomplete atm, my guess is they will be releasing a series of different sized models similar to meta, but each model will be utilizing Q*, GPT2 chatbot will be one of the smaller models in that series. The slow speed can be explained by the inference speed allowed on the website and could also be a deliberate mechanic of these new models. Noam Brown spoke about allowing models to think for longer, and how that can increase the quality of their output, this could explain the rather slow inference and output rate. He is currently a lead researcher at Open AI and he is working on reasoning and self play on Open AI's latest models.

    @JJ-rx5oi@JJ-rx5oi15 күн бұрын
  • With Med-Gemini they lost the opportunity to call it Dr. (Smart)House. Great content as always!

    @PolyMatias@PolyMatias16 күн бұрын
  • You have to do an all cap AGI has arrived video when it’s here

    @jamiesonblain2968@jamiesonblain296816 күн бұрын
    • Will do

      @aiexplained-official@aiexplained-official16 күн бұрын
  • I am perplexed by how many errors there are in benchmarks. This has been a problem from the very beginning and, in some ways, it seems to only be getting worse.

    @AustinThomasPhD@AustinThomasPhD16 күн бұрын
    • Because of the AIDPA (AI decay-promoting agents), haha !

      @biosecurePM@biosecurePM15 күн бұрын
    • @@biosecurePM I doubt it is anything nefarious. I am pretty sure it is just lazy 'tech-bros'. The nefarous AI stuff comes from the usual suspects like the fact the Artificial Intelligence Safety and Security Board contains only CEOs and execs, including several Oil Execs.

      @AustinThomasPhD@AustinThomasPhD14 күн бұрын
  • Again amazin video. Had read the Palm2 paper with lots of interest for my own, but very different field of study. What I don't understand as somebody from the EU with no medical background; MedQA (USMLE) is based on "step 1" of USMLE? Or is it also part of the other steps? You state that the pass rate is around 60%. That is about step 1 aswel? It would be more interesting to see what the avarage score is of people that pass, I would think? Somebody can elaborate? Also wondering about the COT pipeline used. Would they also use a RAG framework like Langchain or Lamaindex?

    @resistme1@resistme115 күн бұрын
    • Interesting details to investigate, for sure. Thank you RM.

      @aiexplained-official@aiexplained-official15 күн бұрын
  • i got to try gpt2-chatbot, too. its answers were mighty impressive (assuming it is more compute thrown at gpt2, not a new model like gpt4.5) i can't help but wonder what would happen if the same thing was done to gpt4 or opus.

    @marc_frank@marc_frank16 күн бұрын
    • it's good that matthew berman posts so quickly, or else i might have missed it. but ai explained goes more in depth. the mix of both is awesome!

      @marc_frank@marc_frank16 күн бұрын
  • I really hope that new openAI model is indeed a small open source one, Being able to run a model locally is always a plus.

    @AllisterVinris@AllisterVinris16 күн бұрын
  • I solved one question in GSM1k just for fun, and i don't agree with the answer given: "Bernie is a street performer who plays guitar. On average, he breaks three guitar strings a week, and each guitar string costs $3 to replace. How much does he spend on guitar string over the course of a week?" (<a href="#" class="seekto" data-time="746">12:26</a>). The answer given is 468, that is 3 * 3 * 52. But that's not the correct answer in my opinion, a year is not exactly 52 weeks. The answer should be 3/7 * 3 * 365 ~= 469.28. Maybe some models also gave that answer, and maybe there are other questions like this, that would explain the lower than expected score.

    @giucamkde@giucamkde16 күн бұрын
    • Really interesting, and I found another question with ambiguous wording. I suspect that is not the primary issue but could explain 1-2%

      @aiexplained-official@aiexplained-official16 күн бұрын
  • Got also lucky access to gpt 2, it seems to be able to learn from examples better within context (when given code with a certain custom class, it uses it without knowing what it was from an example code snippet, while any gpt-4(-turbo) variant always changed it to something else). Maaaybe its slightly less censored too, but got rate limited before it was clear. One thing however that was clear is that this is not a gpt4.5. It had trouble with attention to certain things in a longer context at the exact same point as gpt-4-turbo. So all in all its probably a slight improvement, but nothing crazy (unless it truly is some sort of gpt-2 sized model with verify step-by-step and longer inference time or something). If this would be 4.5, then expectations for gpt5 would be significantly lowered on my part.

    @DreckbobBratpfanne@DreckbobBratpfanne16 күн бұрын
  • gpt-chatbot is in no way GPT-4.5. But many people showed it passes reasoning tests none of the other models could. Also, you probably know that prompts you put in Lmsys chatbot arena are public data that anyone can download? You may want to replace those 8 questions with new evals, since they will be on the public internet shortly.

    @randomuser5237@randomuser523716 күн бұрын
  • Not sure if I asked this before, but would really love to learn more about LLMs and what they can do (or can't do) untrained. What exactly is programmed in and what does it learn?

    @OZtwo@OZtwo16 күн бұрын
  • <a href="#" class="seekto" data-time="226">3:46</a> Based on Google's performance historically, I have sometimes wondered if it is the modern day Xerox Parc.

    @zalzalahbuttsaab@zalzalahbuttsaab14 күн бұрын
  • Could it be that GPT-(Next) will be able to revert (partially) and re-think its reply mid-process?

    @timeflex@timeflex16 күн бұрын
  • It seems like we need a way to inject a "truth" into a model, not just "train" the model on text. For example, "Street address numbers must not be negative". We need code we can physically look at as proof for that statement.

    @danberm1755@danberm175515 күн бұрын
  • I think GPT 4 was a mass psychological test, and our reaction is the reason for the slower rollout. OpenAI is likely already playing with GPT 7 or 8 by now, which is advising them on the rollout schedule, while designing its own hardware upgrade in project Stargate.

    @Not-all-who-wander-are-lost@Not-all-who-wander-are-lost16 күн бұрын
    • I'm pretty sure the lab research continues to progress exponentially. The public releases of course, may only be linear. Which means the behind closed doors stuff could get further and further ahead of what we know about...

      @khonsu0273@khonsu027316 күн бұрын
  • Perfect midday break. I'm watching til the end

    @KitcloudkickerJr@KitcloudkickerJr16 күн бұрын
  • It is a bit depressing that even the most advanced models we have access today fail at some of these elementary-level tasks. Reliability is key for real-world deployment. I hope this will be ironed out at the end of this month... or year. Great video as always.

    @weltlos@weltlos16 күн бұрын
    • Thanks welt!

      @aiexplained-official@aiexplained-official16 күн бұрын
  • We should add a test based on riddles. It would be a much more general measure of intelligence. It might be an attention problem why even Opus is failing at such basic tests.

    @shApYT@shApYT16 күн бұрын
  • I'll just be happy when the new model is trained on the current version of openai's own documentation (and current packages in all languages). So frustrating when their own LLM is wrong, and stubbornly insistent, about using out of date methods.

    @verlyn13@verlyn1316 күн бұрын
KZhead