RAG vs Context Window - Gemini 1.5 Pro Changes Everything?

2024 ж. 16 Мам.

17 896 Рет қаралды

RAG vs Context Window - Gemini 1.5 Pro Changes Everything?
👊 Become a member and get access to GitHub:
/ allaboutai
📧 Join the newsletter:
www.allabtai.com/newsletter/
🌐 My website:
www.allabtai.com
In this video I talk about RAG vs Context Window. And is Gemini 1.5 Pro with the 1M - 10M context window challenging RAG.
00:00 RAG vs Context Window Intro
00:20 Context Window
02:15 RAG
03:52 Gemini 1.5 Pro Context
05:26 Groq 500 t/s
07:51 Price
07:25 RAG vs Context Window Examples
10:28 Multimodal
11:04 RAG Use Case
12:23 Conclusion

Пікірлер

Kris with groq, speed gains are specifically limited to output tokens. There is no similar speed gains for input tokens. They discuss this.
@avi72782 ай бұрын
The videos on this channel are so relatable and accessible and friendly vibe. Looking forward to the next one
@FredPauling2 ай бұрын
Not even 3 minutes in and I've already learned new things. Thanks for this!
@adventurelens0012 ай бұрын
- Its racist
  @TheZumph2 ай бұрын
People have been so preoccupied with laughing off Google's AI misses, that everyone completely missed how Google might actually have silently been taking over the AI lead.
@The-Rest-of-Us2 ай бұрын
- monkeys and typewriters....
  @bennie_pie7 күн бұрын
Thank you for your wonderful videos, keep up with the hot topics
@kate-pt2ny2 ай бұрын
Thanks, this was very helpful
@avgplayer2 ай бұрын
Kirs, I'd love to see what tools you use (if any) for the content creation process
@BillyRybka2 ай бұрын
Fantastic video! Really interesting with it understanding a codebase in context!
@darrylrogue77292 ай бұрын
Asking a question like "what does this text mean?" In RAG is entirely pointless. If you have a basic RAG setup which i surmise you do, it's going to find the chunk of text closest to your input "what does this text mean" which has no relation at all to the intention of your question. So it's going to find a random chunk of your text and then tell you what that random chunk means. In a RAG setup.like this, the LLM has no context of what "this text" is. In order for this to work you would need to add another layer of inference where the LLM is given context that there is a corpus of text in which it can search and instead of "this text" you reference tge same way. "What does the entire corpus of text mean?". It would then need to use its reasoning abilities to generate one or more search queries that would allow it to retrieve enough text that it could then apply your actual intention and answer the question. Even with this additional context and query generation layer, a question like "what does the entire corpus of text mean?" would be difficult for it because there is no definite or obvious set of queries it could make to retrieve most of the text nor can it even know if the result of the queries has all of the text. Imagine a book with five chapters and you ask it, what does this text mean. What set of queries doe it generate to retrieve all five chapters without knowing even how many chapters there are. In RAG you need to give it that additional context. This experiment is really quite pointless and doesnt illustrate the capability of RAG systems.
@avi72782 ай бұрын
- Do you have any good videos / tutorials on this topic to recommend?
  @yorgohoebeke2 ай бұрын
- @@yorgohoebeke langchain KZhead channel has a really good 9 part series called "RAG from Scratch". I would start there. About 5-10 minutes, very digestible each. Tell chat gpt that you want to learn RAG starting with the basics, ask it to generate a crash course study list of topics for learning RAG, then take that list and throw it into a perplexity search. Go through all the resources it finds (websites and yt videos) that you find helpful, and then do that process recursively with various parts of the RAG implementation steps until you're an expert. Voilà.
  @avi72782 ай бұрын
Great stuff. What route would you advise to go from an unstructured individual pdf to a structured json output with the help of an api?
@InnocenceVVX2 ай бұрын
In my case, a broader context window changed everything. I am using your prompts that you presented 6 months ago, and by including text for analysis as part of the prompt, the accuracy of the output is almost 100%.
@micbab-vg2mu2 ай бұрын
- is there a usable limit to the amount of text you can enter in to a prompt? Or do you link to local txt files?
  @jakey82 ай бұрын
- Currently, I analyze medical publications ranging from 10 to 20 pages of text simultaneously using ChatGPT-4. However, with Gemini 1.5, I will be able to analyze an entire book.@@jakey8
  @micbab-vg2mu2 ай бұрын
Is there no link to the discord? Was wondering if you are able to also explain how to convert the code u made to transformer.js and stuff, like the realtime faster-whisper from a month ago
@TheWorldGameGeneral2 ай бұрын
I think RAG is still needed, I want local models which access my personal data. For security I cannot be sending my personal data to Google or OpenAI. But for use cases like understanding git repositories this is a game changer.
@julian-fricker2 ай бұрын
could u give the link that how to use geminie 1.5 pro?
@sardormamarasulov33522 ай бұрын
🎯 Key Takeaways for quick navigation: 00:00 *📈 Introdução ao RAG e Janela de Contexto: Explicando a diferença entre RAG e janela de contexto em modelos de linguagem.* - O modelo com janela de contexto de 8K tokens processa tanto os tokens de entrada quanto de saída, - Exemplificação de como a janela de contexto pode limitar a inclusão de informações relevantes. 02:05 *🧩 Funcionamento do RAG: Como o RAG resolve o problema da janela de contexto limitada.* - Transformação de tokens de texto em embeddings vetoriais armazenados em uma base de dados, - Comparação entre a consulta do usuário e os textos armazenados para encontrar a correspondência mais próxima. 03:57 *🚀 Impressões sobre Gemini 1.5 Pro: Discussão sobre as capacidades do Gemini 1.5 Pro.* - Capacidade de processar bases de código completas e identificar problemas urgentes, - Comparação com RAG em termos de contextos e eficiência. 05:04 *💻 Novidades da Grock e Velocidade de Processamento: Apresentação do novo hardware da Grock.* - Processamento de 500 tokens por segundo, indicando um avanço significativo na velocidade, - Potencial combinação de alta velocidade com modelos de linguagem avançados. 06:09 *💰 Custos e Eficiência: Análise de custo-benefício entre RAG e Gemini 1.5.* - Discussão sobre a diferença de preços entre as chamadas de API de diferentes modelos, - Especulação sobre redução de custos futura e impacto na escolha do modelo. 07:34 *🧪 Testes Práticos com RAG e Contexto: Experimentos comparativos entre RAG e uso de contexto.* - Demonstração de como diferentes perguntas resultam em respostas variadas entre os modelos, - Análise das forças e fraquezas de cada abordagem. 10:32 *🎥 Recursos Multimodais do Gemini 1.5 Pro: Explorando as capacidades multimodais do Gemini 1.5 Pro.* - Capacidade de processar vídeos longos e responder a consultas específicas baseadas em momentos do vídeo, - Comparação com a capacidade do RAG em lidar com multimídias. 11:54 *🤔 Considerações Finais e Perspectivas Futuras: Reflexões finais sobre RAG e Gemini 1.5 Pro.* - Discussão sobre casos de uso ideais para cada tecnologia, - Expectativa em relação às inovações futuras e resposta da OpenAI ao avanço do Gemini 1.5 Pro. Made with HARPA AI
@FelipeDiPaula2 ай бұрын
is this local model or google? you didnt put any links to how to do this?
@JNET_Reloaded2 ай бұрын
I made a similar comment a couple of days ago about the Gemini context window replacing RAG. However, we don't know how Deepmind implemented the new context window, specifically how costly it is??? I personally hope it's something like a split overlapping series of smaller context windows - don't quote me on this, I have no idea if that is technically possible.
@user-bd8jb7ln5g2 ай бұрын
Am I correct in my assumption that a RAG is necessary for fetching information not in context? So an increased context window would be greatly beneficial to a RAG model because you can create larger documents that can be pulled into context for generating appropriate answers. I dont see how it is an either in-context or RAG.
@deino47532 ай бұрын
In my testing of Gemini Pro, it doesn't behave at all like it has anything like a huge context window, it 'forgets' details constantly. Curiously, Gemini behaves very much like my personal experiments with auto-summarization and RAG....
@googleyoutubechannel85542 ай бұрын
- Gemini _1.5_ Pro is the model with the big context window, and it isn't generally available yet.
  @somdudewillson2 ай бұрын
- @@somdudewillsoncorrect
  @googleyoutubechannel85542 ай бұрын
I trust this man because of his beard
@knthyl2 ай бұрын
I literally just had this conversation with my data engineer. What happens when we hit 1 billion and even 1 trillion context windows. Unstructured data just gets ingested and structured, and retrieved in real time at nearly unlimited rates.
@brentmoreno37732 ай бұрын
- Actually, the longer the context window, the more computation the model needs to perform. So it's really not computationally feasible. The again, with bigger compute power nothing is impossible.
  @XOXO-dv5vv2 ай бұрын
Share the method of using artificial intelligence to count cards using the High-Low system and win at blackjack in a casino.
@GrigoriyMa2 ай бұрын
So for now this video is essentially RAG versus Riches?
@drlordbasilАй бұрын
COST and SPEED, that is the QUESTION.
@yurijmikhassiak73422 ай бұрын
- costs will continue to decrease just as speed will increase. Assuming all trend lines remain constant.
  @brentmoreno37732 ай бұрын
Merci !
@kamelirzouni47302 ай бұрын
Nice technology, yet Google tortured Gemma into near idiocy with their woke shit. We need better fine tune to be able to appreciate that LLM.
@user-uc2qy1ff2z2 ай бұрын