Retrieval-Augmented Generation chatbot, part 1: LangChain, Hugging Face, FAISS, AWS

2023 ж. 23 Қаз.

19 506 Рет қаралды

In this video, I'll guide you through the process of creating a Retrieval-Augmented Generation (RAG) chatbot using open-source tools and AWS services, such as LangChain, Hugging Face, FAISS, Amazon SageMaker, and Amazon TextTract.
Part 2: • Retrieval-Augmented Ge... - scaling indexing and search with Amazon OpenSearch Serverless!
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos. Follow me on Medium at / julsimon or Substack at julsimon.substack.com. ⭐️⭐️⭐️
We begin by working with PDF files in the Energy domain. Our first step involves leveraging Amazon TextTract to extract valuable information from these PDFs. Following the extraction, we break down the text into smaller, more manageable chunks. These chunks are then enriched using a Hugging Face feature extraction model before being organized and stored within a FAISS index for efficient retrieval.
To ensure a seamless workflow, we employ LangChain to orchestrate the entire process. With LangChain as our backbone, we query a Mistral Large Language Model (LLM) deployed on Amazon SageMaker. These queries include semantically relevant context retrieved from our FAISS index, enabling our chatbot to provide accurate and context-aware responses.
- Notebook: gitlab.com/juliensimon/huggin...
- LangChain: www.langchain.com/
- FAISS: github.com/facebookresearch/f...
- Embedding leaderboard: huggingface.co/spaces/mteb/le...
- Embedding model: huggingface.co/BAAI/bge-small...
- LLM: huggingface.co/mistralai/Mist...

Пікірлер

always making great and timely videos.
@jacehua73347 ай бұрын
- Glad you like them!
  @juliensimonfr7 ай бұрын
The RAG chatbot you demonstrate is an excellent lesson with HuggingFaceEmbeddings. Regarding how to do it outside GPT being generic enough to have your own vectorDB on demand for any model I had wondered how that was done. Thanks for covering this really great stuff!
@AaronWacker4 ай бұрын
- Glad it was helpful!
  @juliensimonfr4 ай бұрын
Thank you for your lectures.
@caiyu5387 ай бұрын
- You are very welcome
  @juliensimonfr7 ай бұрын
Hey Julien, great job with the video. For QnA on corpus I'd recommend to generate hypothetical questions for each paragraph & ingesting them as well since those would have better similarity to the user input which is usually a question & can also help constrain the model to answer only closed domain questions.
@iAkashPaul7 ай бұрын
- Yes, that's a nice trick. I tried to keep things simple here ;)
  @juliensimonfr7 ай бұрын
thanks julien, one can learn so much from these!
@justwest6 ай бұрын
- That's the idea 😀
  @juliensimonfr6 ай бұрын
Hey Julien, Thanks for an insightful talk last night at the AWS center!
@devilliersduplessis79046 ай бұрын
- You're welcome. Thanks for coming!
  @juliensimonfr6 ай бұрын
Thank you, gonna check it out tomorrow!
@DCTekkieАй бұрын
- Have fun!
  @juliensimonfrАй бұрын
Thanks a lot! It was very, very helpful.
@ComFomeTo5 ай бұрын
- You're welcome.
  @juliensimonfr4 ай бұрын
Thanks for this clear explanation.
@kuzeyiyidiker134425 күн бұрын
- Glad it was helpful!
  @juliensimonfr24 күн бұрын
Thanks for the video!
@badbaboye17 күн бұрын
- You're welcome!
  @juliensimonfr16 күн бұрын
Hi Julien, thanks for your video, pretty clear explained ;-)
@edinsonriveraaedo2924 ай бұрын
- Glad it was helpful!
  @juliensimonfr4 ай бұрын
Superr video, Thanks for trying using open source solutions...
@VenkatesanVenkat-fd4hg7 ай бұрын
- Glad you liked it
  @juliensimonfr7 ай бұрын
Hi Julien, Thank you for this video. It's helping me learn a lot. I was trying to run the code. When I attempt the zero shot example, my output is quite different from whats shown in the video. I tried to split it, but I get something like this - [answers: * 1) The trend is to invest more in solar energy in China. * 2) The trend is to invest less in solar energy in China. * 3) The trend is to invest the same amount of money in solar energy in China. * 4) The trend is to invest more in solar energy in the United States. * 5) The trend is to invest less in solar energy in the United States. ] Can you please explain why this is happening and how it can be fixed?
@aishwaryakumar65045 ай бұрын
Thanks Julien! very nice video. very curious if there are some compare between bge-small with ada-002 when used in RAG.
@jingqiwu28656 ай бұрын
- Hi, please check our embeddings leaderboard at huggingface.co/spaces/mteb/leaderboard. ada-002 is #15, bge-small is #8 :)
  @juliensimonfr6 ай бұрын
Thanks Julien, for the good tutorial! Some use pinecone, do you see differences/advantages of using faiss over pinecone? Thank you
@GeigenAkademie6 ай бұрын
- FAISS is a simple lightweight open source solution. Pinecone is a fully managed, closed source DB running in the cloud. Depends what you're looking for, and how much work you want to do on managing the solution :)
  @juliensimonfr4 ай бұрын
Thanks!
@ccc_ccc7892 ай бұрын
- You bet!
  @juliensimonfr2 ай бұрын
Sagemaker with langchain streaming option is generating output
@anserali5512 ай бұрын
Thank you Julien - this is super useful and comes at the right time during my writing season (you know what I'm talking about :-) ) As someone else mentioned in the comment, I also received an error when calling Textract. I solved it by adding `pip install amazon-textract-textractor -qU` - hope it might help others
@SebastienStormacq6 ай бұрын
- Ok, good to know. Thanks Seb and good luck with the writing ;)
  @juliensimonfr6 ай бұрын
- also 'pip install pip install faiss-cpu' :-)
  @SebastienStormacq6 ай бұрын
thanks it was really informative, can do demonstrate fine tuning LLM's with lora and Qlora? In your experience, RAG has better performer over fine tuning ?
@krishnasunder949119 күн бұрын
- Llama2 fine-tuning with Qlora: kzhead.info/sun/jcmvZqpoi2OCZpE/bejne.html. IMHO RAG and fine-tuning solve different problems and are complementary. RAG lets you access fresh company data and gives you some domain adaptation. Fine-tuning gives you better domain adaptation and lets you customize guardrails and tone of voice.
  @juliensimonfr17 күн бұрын
It throws : KeyError: 'Blocks' after running the cell with boto3.client('textrac') thrown by the loader.load(), from parser in langchain
@coolcurly97365 ай бұрын
godlike
@whemmakatatt53116 ай бұрын
Thanks for the tutorial.In my case,i can't use Mistral somehow due to some restrictions on AWS test account.I have used FLAN-T5 but it is giving this error.ValueError: Error raised by inference endpoint: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (422) from primary with message "Failed to deserialize the JSON body into the target type: missing field `inputs` at line 1 column 503".
@Invincible6154 ай бұрын
- The input format for T5 is quite different, so sending a Mistral-formatted message won't work. Not sure what restriction you're facing, but maybe TinyLlama would work? I think you would only have to adapt the prompting format in the content handler.
  @juliensimonfr4 ай бұрын
I want to embed large data. In this case, if I want to embed document without a GPU notebook ml.t3.medium, is it possible to deploy the embedding model as well in some ml.g5.large GPU instance to make the processing faster?
@AbhisekgevАй бұрын
- Sure, it's what you would do for production.
  @juliensimonfrАй бұрын
Great video, But what if user question is related to chat history and it may contain short cuts like he/she/that/it etc then how to handle such cases
@Thirumalesh1003 ай бұрын
- Langchain has different ways to handle this, e.g. python.langchain.com/docs/modules/memory/types/buffer
  @juliensimonfr3 ай бұрын
- Thanks@@juliensimonfr , Basically it is question rephrase request by passing entire chart history, tried this approach which has cost and token limit problem Looking for other alternative for the same
  @Thirumalesh1003 ай бұрын
Hi Julien. The code is not working when I try to run it. I think the error I am getting is related to Sagemaker credentials. I made an account just now but don't know where to get information where I can plug into your code to make this work.
@kevinngo37226 ай бұрын
- Start here: docs.aws.amazon.com/sagemaker/latest/dg/howitworks-create-ws.html. Create a notebook instance and make sure its IAM role includes the SageMakerFullAccess and TexttractFullAccess managed policies. Once you've done that, the notebook will run as is.
  @juliensimonfr6 ай бұрын
- Thanks for your reply! It seems that this leads me to make a Jupyter notebook. How do I integrate this to do what you're showing on Colab in the tutorial?@@juliensimonfr
  @kevinngo37226 ай бұрын
can you tell how to get the key for sagemaker to work here?
@rnronie38Ай бұрын
- Not sure what you mean. Are you looking for a SageMaker tutorial ? See docs.aws.amazon.com/sagemaker/latest/dg/gs.html
  @juliensimonfrАй бұрын
Thx for the video :) can you update your vector database by a few lines ( if you want to add data to your knowledge base) automatically by running a python script or something like that?
@da-bb2up6 ай бұрын
- Sure, you can keep adding embeddings anytime you want.
  @juliensimonfr6 ай бұрын
- oh thats nice :) thx for the answer@@juliensimonfr
  @da-bb2up6 ай бұрын
how can I call onto my react frontend?
@rnronie38Ай бұрын
- A SageMaker endpoint is an HTTPS API, so you can plug it in anything. You should be able to find lots of examples out there.
  @juliensimonfrАй бұрын
Y r u deploying first in sage maker
@debojitmandal86705 ай бұрын
- Because I don't want to manage any infrastructure :)
  @juliensimonfr4 ай бұрын