JEPA Architectures - How neural networks learn abstract concepts about images (IJEPA)

2024 ж. 24 Мам.
2 268 Рет қаралды

This video explains Self-Supervised Learning from Images with a
Joint-Embedding Predictive Architecture paper that proposes a revolutionary approach for "human-like" Machine Learning training. The video dives into the ideologies behind JEPA methods, network architectures, results, and comparison with existing generative methods (like Masked Autoencoders) and contrastive learning methods (like SimCLR).
Follow on Twitter: @neural_avb
To support me, consider JOINING the channel. Members get access to Code, project files, scripts, slides, animations, and illustrations for most of the videos on my channel! Learn more about perks below.
Join and support the channel - www.youtube.com/@avb_fj/join
Learn more about contrastive learning in my breakdown video about Multimodal ML:
• Multimodal AI from Fir...
Papers referenced:
I-JEPA: arxiv.org/pdf/2301.08243.pdf
Yan Lecun's original human-like AI paper: openreview.net/pdf?id=BZ5a1r-...
SimCLR: arxiv.org/pdf/2002.05709.pdf
Masked AE: arxiv.org/pdf/2111.06377.pdf
RCDM: arxiv.org/pdf/2112.09164.pdf
Timestamps:
0:00 - Intro
1:05 - Why IJEPA?
5:22 - Network architecture
7:43 - Results
8:50 - Summary
#deeplearning #computervision #ai #machinelearning

Пікірлер
  • nicely explained!

    @josephsueke@josephsueke2 ай бұрын
  • There's something quite interesting about predicting embeddings from embeddings, feels like you give the model an extra degree of freedom in designing its representation space rather than training it on reconstruction and hoping that indirectly you can achieve a nice representation space. Both CLIP and the strategy used for the DALLE prior both also sort of learn to base their predictions based off the position of other embeddings and their continued success makes me think this is a promising area of research

    @ethansmith7608@ethansmith760810 ай бұрын
  • If "like human's do" just means 'using latent representation' that's definitely just attention grabbing imo. Neverthless, taking prediction to latent space is definitely the right direction.

    @ControllerQuickSwaps@ControllerQuickSwaps7 ай бұрын
    • Yeah, I agree! I do think there is an aspect of marketing slogan involved here... but as a concept it does make a ton of sense as a research initiative.

      @avb_fj@avb_fj6 ай бұрын
    • @@avb_fj I'm still trying to figure out what part of the idea is actually novel. I brought it up in a lab meeting today and people said that self-supervised loss is already often done in latent space?

      @ControllerQuickSwaps@ControllerQuickSwaps6 ай бұрын
KZhead