Language or Vision - What's Harder? (Ilya Sutskever) | AI Podcast Clips

2024 ж. 16 Мам.
32 319 Рет қаралды

Full episode with Ilya Sutskever (May 2020): • Ilya Sutskever: Deep L...
Clips channel (Lex Clips): / lexclips
Main channel (Lex Fridman): / lexfridman
(more links below)
Podcast full episodes playlist:
• Lex Fridman Podcast
Podcasts clips playlist:
• Lex Fridman Podcast Clips
Podcast website:
lexfridman.com/ai
Podcast on Apple Podcasts (iTunes):
apple.co/2lwqZIr
Podcast on Spotify:
spoti.fi/2nEwCF8
Podcast RSS:
lexfridman.com/category/ai/feed/
Ilya Sutskever is the co-founder of OpenAI, is one of the most cited computer scientist in history with over 165,000 citations, and to me, is one of the most brilliant and insightful minds ever in the field of deep learning. There are very few people in this world who I would rather talk to and brainstorm with about deep learning, intelligence, and life than Ilya, on and off the mic.
Subscribe to this KZhead channel or connect on:
- Twitter: / lexfridman
- LinkedIn: / lexfridman
- Facebook: / lexfridman
- Instagram: / lexfridman
- Medium: / lexfridman
- Support on Patreon: / lexfridman

Пікірлер
  • "where the vision ends, language begins" this line touches my heart!

    @darshantank554@darshantank5542 жыл бұрын
  • Not only is this guy brilliant, he’s just such a nice guy

    @JonKroeker@JonKroeker Жыл бұрын
  • Vision ends when the viewer (agent 1) sees the words. Language begins when the viewer (agent 1) combines the words it has seen with "prior knowledge" and then communicate "value added" information to a listener (agent 2). For example, when agent 1 "sees" (vision) the name Lewis Hamilton, it must be able to use its knowledge about Hamilton to effectively engage in a coherent conversation with an expert about this great F1 driver. At the moment state of the art like GPT3 can fake a coherent only when communicating with non-experts.

    @bubelevakalisa7313@bubelevakalisa73133 жыл бұрын
    • Vision take the visual input, Brain caches the sentences, NLP begins? If cache is out of memory, Vision goes back and queries the same input again?

      @sidgirase@sidgirase Жыл бұрын
  • "i am going to explain why"...opens by asking a question, nice!

    @adambrickley1119@adambrickley11194 жыл бұрын
  • The question "Where does vision end and language start?" was intriguing. It shows a potential final destination that needs to achieved for DL based AI.

    @nikhilvarmakeetha3917@nikhilvarmakeetha39174 жыл бұрын
    • ✔️John Venn liked this.

      @breakawaybooks4752@breakawaybooks47524 жыл бұрын
    • Windows start, should be, On/off, it is here Your illiteracy begin, better open some windows and get some fresh air, and understand the nature of dictator-principle. AI, is illiteracy and superstition, - intelligence can never be artificial. Repeating dead mantras, is Not individual thinking. The development of Consciousness and Language is two sides of the very same development, based on Eternal Principles.

      @holgerjrgensen2166@holgerjrgensen21664 жыл бұрын
    • @@holgerjrgensen2166 more like returnal than eternal 🤔😘

      @olivercroft5263@olivercroft52634 жыл бұрын
    • What do You mean, if You know what You're saying.

      @holgerjrgensen2166@holgerjrgensen21664 жыл бұрын
    • Ru-Mu, okai, can means allmost any thing, in danish, it is åvkæj, just a sound-combination.

      @holgerjrgensen2166@holgerjrgensen21664 жыл бұрын
  • This is about semantic interpretation. Whether image recognition and natural language processing could share the same "back end" for semantic interpretation and abstraction. I wonder if one could train an convolutional NN and a transformer to spit out the same semantic vector. So a natural language description of a picture and the picture would be compressed into the same (or similar) vector space coordinates ? :/

    @DamianReloaded@DamianReloaded4 жыл бұрын
    • There is already shine datasets for this task: you build net for NLP, net for CV and minimize KL-div between two hidden spaces

      @Gyringag@Gyringag4 жыл бұрын
  • You should get a linguist on lex might be interesting to talk about the hermeneutic aspect of language learning and interpretation for AGI

    @Ross-nd6xi@Ross-nd6xi4 жыл бұрын
  • 0:49 The Word is "Interdisciplinary"

    @user-my5qk5xu1d@user-my5qk5xu1d4 жыл бұрын
    • Man. I hate that word.... It stems from artificial boundaries that we've created due to historical happenstance.

      @ssssssstssssssss@ssssssstssssssss4 жыл бұрын
  • This was an interesting conversation! Lex - I wonder if the title should be "Language vs. Vision" instead. 6:56 - In terms of Generative AI, can Language and Vision both work to improve each other, like an arms race? How will the AI model and algorithm decide when to determine a pass or fail result for either/or?

    @PrinceKoopa@PrinceKoopa Жыл бұрын
  • I literarily suffer from the same cosmetic matter this respectable person suffers from. I use a solution daily, I understand how you can get used to it but please for the sake of other people research a solution too. I felt embarrassed to mention it and not many will, but I care about AI and those pushing it forward. Beyond being highly intelligent, you are an attractive person👍

    @FromFame@FromFame4 жыл бұрын
  • i believe cnn and nlp should stand as inputs for decision making systems and reinforcement learning should explore space for actions, state and targets states. so the 2 first are more like perception constructor and the last as decision space explorer

    @johnniefujita@johnniefujita4 жыл бұрын
  • Man this is going to finally get watched by people

    @chrisbarry9345@chrisbarry9345 Жыл бұрын
  • This conversation really seemed to enlighten me on how language would have been impossible with sight and hearing. I can see that a word can have many definitions without the presence of a visual or tone of voice. So for the computer to learn. If we relate these few in the algorithm things so that the computer can as we did. If the computer is a rigid piece of electronics, isn't that how life began billions of years ago? Maybe with a better architect.

    @jamesblankenship3077@jamesblankenship30774 жыл бұрын
  • 0:54 - Does anyone know what those principles are??

    @NoOne-uz4vs@NoOne-uz4vs4 жыл бұрын
    • This is just my ballpark guess, but i think it should be empirical risk minimization and something around no free lunch theorem. i dont know the third one

      @nobodykid23@nobodykid234 жыл бұрын
  • once the vision can read the language, the loop is complete

    @TimoNineSix@TimoNineSix3 жыл бұрын
  • I think the wife example is quiet bad because there is a sexual component in the perception of the other, probably with a friend there will be more objectivity. Also yes if you have human level speech recognition and understanding you'll have the vision for free, understanding text is just a primitive form of acquiring information, replace objects on a picture by words and voila.

    @justinkiff4159@justinkiff41594 жыл бұрын
  • Great Illya

    @pratik245@pratik2452 жыл бұрын
  • So interesting to look back at this interview now, in the wake of GPT4.

    @stevee5718@stevee5718 Жыл бұрын
  • I thought the interviewer was smart, but Ilya is on a different level.

    @jonomichi2262@jonomichi22625 ай бұрын
  • Hey lex, one cool thing would be to add some more media to the conversations. Show the guests some clips, read them news, and then we would like to hear their opinion. Great job ✋🏻👏🏻

    @mohammadaminparchami7462@mohammadaminparchami74624 жыл бұрын
  • Yes, the manmade world (physicality) our thought and action is primarily governed by language. So, language is fundamental.

    @shreeyatyagi@shreeyatyagi4 жыл бұрын
  • Lex, how about a podcast with Shai Ben-David on advances on the theoretical side of ML?

    @timdh100@timdh1004 жыл бұрын
    • YES, THIS

      @nobodykid23@nobodykid234 жыл бұрын
  • Which field have more jobs(NLP or CV)? It seems to me that so far there are a lot more applications for CV, and therefore CV has more jobs opportunities than NLP. Simply search “computer vision job USA” in google and “NlP jobs USA”, the comparison result of both will show that CV has more jobs. Wonder what is your 2 cent on it? Maybe I am wrong?

    @leecharlie2513@leecharlie25133 жыл бұрын
    • It will change this year or maybe in 2022.

      @MrSchweppes@MrSchweppes3 жыл бұрын
  • Language has much higher dimensionality than vision. Vision has three basic dimensions and that could probably be abstracted up to thousands or millions. Language has over 6,500 basic dimensions. The abstraction of these basic dimensions may go into the trillions

    @joshuaerkman1444@joshuaerkman14442 жыл бұрын
  • Throwing out a question here, as there are some clever people in the thread. Anyone care to help me understand why "natural language" (and does that exclude body language and tone of voice?) would be important for AI? As an example; IKEA furniture assembly instructions don't need words to explain stuff to humans. And being a poet is not a requirement for human level intelligence, right?

    @henrikbergman4055@henrikbergman40554 жыл бұрын
    • Your examples are more about language generation, even if important, the hot topic nowadays is language understanding. Understanding language hides a lot of very difficult challenges. Among them reasoning about entities is one of the most difficult one. Each time we speak we refer to events happened in the past and in the present, make implicit relations between entities and talk about abstract things. The language is the description of the world in which we live and the abstract world we have created (the concept of nations, politics, jokes etc.). To understand language a machine needs at first to understand the world we have built. We are far from achieving something like that with AI. How can we pretend to have an "intelligent" machine if it can not understand us?

      @seo95@seo953 жыл бұрын
  • The problem is that we are trying to make a robotic brain from scratch. Maybe the solution is to give initials steps so that it doesn't start from 0. It's like when you learn other language. You already know what is a dog, but need to learn how to say it in other "way" and when you should say it.

    @IsmaelAlvesBr@IsmaelAlvesBr4 жыл бұрын
    • How does that apply? When a baby is born he does not know what a dog is. The only thing he starts with are unconscious behaviors, like "cry if hungry". In that sense starting from scratch seams very similar.

      @maxsnts@maxsnts4 жыл бұрын
  • Hey Lex

    @styles9783@styles97834 жыл бұрын
  • Computer vision fascinating more

    @Priyanka-us8rw@Priyanka-us8rw Жыл бұрын
  • I don't understand the difficulty in the "Where vision ends and language starts" question. I imagine an advanced enough vision system can just recognize that a particular region of pixels assortment represents text, from that point it can be converted to raw text (which is a decades-old solved problem) and then fed to an NLP pipeline for interpretation. Imu, it's not a vision system's role to accomplish language understanding, but it would be ideal if it could at least identify what is text and relay it to the NLP component.

    @AM-qx3bq@AM-qx3bq3 жыл бұрын
  • I think vision lags language because it doesn't have a lot of labeled data

    @pawarboy7@pawarboy72 жыл бұрын
  • man.. I find it interesting how I really respect Ilya for what he achieved but I just don't agree with his views on things most of the time.

    @danielcogzell4965@danielcogzell49654 жыл бұрын
  • hmmm

    @ko95@ko95 Жыл бұрын
  • JUST

    @chocolategolemofroidgutand2839@chocolategolemofroidgutand28394 жыл бұрын
  • 8:15 such a blue pilled Lex

    @BaikalLV@BaikalLV4 жыл бұрын
  • Vision ends when language starts.

    @luisselvera9878@luisselvera98782 жыл бұрын
  • Rezpect ze russians🇷🇺

    @olivercroft5263@olivercroft52634 жыл бұрын
  • You are the most nicest cutest thing!

    @michaelpetronzio6557@michaelpetronzio65574 жыл бұрын
  • 🐸🐸🐸🐸🐸🐸

    @jefferysherwood7424@jefferysherwood74244 жыл бұрын
  • Language

    @shreeyatyagi@shreeyatyagi3 жыл бұрын
    • Why?

      @leecharlie2513@leecharlie25133 жыл бұрын
    • @@leecharlie2513 because language is a representation.

      @shreeyatyagi@shreeyatyagi3 жыл бұрын
    • @@shreeyatyagi But isn’t the recent GPT-3 demonstrating very promising result to generating meaningful text and dialog?

      @leecharlie2513@leecharlie25133 жыл бұрын
  • hav 2 say that the dumbest animals hav vision but not langwage

    @henrychoy2764@henrychoy27642 жыл бұрын
  • Read what Lacan says about language. Not chomsky.

    @enriquemartinez5647@enriquemartinez56474 жыл бұрын
  • This was an interesting conversation! Lex - I wonder if the title should be "Language vs. Vision" instead. 6:56 - In terms of Generative AI, can Language and Vision both work to improve each other, like an arms race? How will the AI model and algorithm decide when to determine a pass or fail result for either/or?

    @burkebaby@burkebaby Жыл бұрын
KZhead