In this video, we will look at all the exciting updates to the LocalGPT project that lets you chat with your documents. The new updates include support for GGUF format model with llama cpp, better prompt template for restricting answering using Llama-2 template and a lot more!
If you like the repo, don't forget to give it a ⭐
💻 Pre-configured localGPT VM: bit.ly/localGPT (use Code: PromptEngineering for 50% off).
▬▬▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬
☕ Buy me a Coffee: ko-fi.com/promptengineering
|🔴 Support my work on Patreon: Patreon.com/PromptEngineering
🦾 Discord: / discord
▶️️ Subscribe: www.youtube.com/@engineerprom...
📧 Business Contact: engineerprompt@gmail.com
💼Consulting: calendly.com/engineerprompt/c...
💻 Pre-configured localGPT VM: bit.ly/localGPT (use Code: PromptEngineering for 50% off).
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
LINKS:
Github Link: github.com/PromtEngineer/loca...
LocalGPT- Detailed walkthrough: • LocalGPT: OFFLINE CHAT...
LocalGPT with Llama2: • Llama-2 with LocalGPT:...
LocalGPT with Memory: • LocalGPT API: Build Po...
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Timestamps:
[00:00] Intro
[00:54] LlamaCpp with GPU
[04:54] GPU VRAM Required for LLMs
[06:33] Which LLMs are supported
[07:00] Adding Documents to Vector Store
[08:08] Chatting with Documents
[11:25] Limit Answers to the given context
[14:18] Where are the models downloaded.
[15:23] Define Context Window
[16:00] Change N_GPU_layers
[16:22] Change the Embedding model
[16:50] Change the LLM
All Interesting Videos:
Everything LangChain: • LangChain
Everything LLM: • Large Language Models
Everything Midjourney: • MidJourney Tutorials
AI Image Generation: • AI Image Generation Tu...
Want to connect? 💼Consulting: calendly.com/engineerprompt/consulting-call 🦾 Discord: discord.com/invite/t4eYQRUcXB ☕ Buy me a Coffee: ko-fi.com/promptengineering |🔴 Join Patreon: Patreon.com/PromptEngineering ▶ Subscribe: www.youtube.com/@engineerprompt?sub_confirmation=1
It'd be great if you could create a step by step series for all of this aimed at complete novices such as myself. Starting from the very beginning and assuming no prior knowledge or other supporting software already installed (e.g. git or conda).
I support this request. Thank you
There's many tutorials on how to get git, conda, and python setup
kzhead.info/sun/pap-kt5xanlviHA/bejne.html
Excellent work. Thank you for the walk through, especially on the mac/linux side. The modular approach will make much easier to maintain moving forward.
Glad you found it helpful. I agree, that’s the plan
Great update. I appreciate the video format for the updates like this. Thank you.
Thank you 🙏
thank you for sharing, have a great day 🙂
Thank you very much for this project, it's such a pleasure to use it !
Glad you like it!
Congrats on the great content, it is flawless! 🎉 Are we able to connect the project with slack and build a chat bot?
Great new changes. I have customized a lot of the core scripts myself, but maybe you could put an example that just shows how to access the persist database standalone? I was trying to run some data viz on my chroma, but running into issues understanding how to access it just standalone.
Hello, thank you for this video and the brilliant work how is it possible to force the response of the model in a language other than English typically here French the model that I use mistral, 7B already knows how to respond in French but yet the most answers are in English
Please share what is the devicetype for intel IRIS GPU ?
Hi, I like this project and wanna have a try in my notebook. However, it has 6GB VRAM only. Shall I use the openai API key? Being a layman to machine learning and Python, I would like to know any VRAM requirements for embeddings. If no need for VRAM, I may try on an old notebook without VRAM. Thanks a lot. :)
Great! Thanks for sharing! Does LocalGPT support code-tuned LLMs such as Codellama?
No reason it should not as long as you're using the right format, gguf, ggml, gptq, no clue on others like gpt4
Yes
This is very good project. How do we do fine tuning using Quantization?
Great, great ,great
Thanks for sharing this, when is the support for falcon expected?
You should be able to run gguf and gguf version models including falcon even now
For installing LLama CPP on windows, this worked for me: setx CMAKE_ARGS "-DLLAMA_CUBLAS=on" setx FORCE_CMAKE 1 pip install llama-cpp-python==0.1.83 --no-cache-dir Also if your computer defaults to using the cpu use --device_type cuda for windows Even with all that it kicks me out, BLAS=0
Same here .. I haven’t investigated why yet
@@kevinfutero7166 Any idea why its not using gpu?
How can I clear the cache from the last time I ran the model? I swapped all the docs with a new set of documents and my LOCALGPT model keeps giving me answers from the last set of docs which are no longer relevant for this version?
what are factors to consider when choosing a cloud server for the above project? say am using Llama-2-7b-Chat-GGUF model, which instance is best? how much gpu memory is required?
Do we need a tone of video ram for localgpt? Doesnt look like lenovo p51 can cope....
Can i use this with pdfs and docs having complex tables and images
This is exactly what I was looking for quite some time. I am just wondering if I can use it for generating Code in a specific structure, by ingesting the documents as .py or .java files and tries to use one of code generation model so that it can generate code in a specific structure as well as spot a snippet of code which is doing a particular functionality?
This is more of a search feature. Basically it will be looking for specific information in the document. You could use it to retrieve certain function or code snippet but then you will need to have a subsequent LLM call to use it.
Can this be used with dual K80 GPUs?
谢谢!
Thank you 🙏
How if I use ubuntu with cpu? Is there any change ? I am struggling with llama_cpp
Would love the code walk through!
can it produce structured article from prompts
Thanks, this video was extremely helpful! Could you make a video on how to use another language than english for the local chatgpt? For example if I feed it with documents in Swedish, and I want to ask the question in Swedish and also get an answer in Swedish? Is that possible?
Should be the same just use the multilingual embedding models
Yes, as mentioned above you need an embedding model that supports the language you are working with as well as an llm
@@engineerprompt Do you know which LLm from TheBloke (or in Hugging Face) provides answers in spanish? My docs are in spanish. I tried some models called Falcon, Roberta, Bert, but they are not compatible with localGPT. Thanks in advance. Amazing project
Hi, great video! You said if the "BLAS" variable is set to 1, llama.cpp uses my GPU, if it is set to 0 it is not. I have a M1 Mac and I want to run it on CPU, which I specify with --device_type cpu. However, BLAS is still set to 1. Can someone explain? The LLaMA 7B Chat model also takes forever to answer and if I load other models there is no answer at all.
BLAS=1 means that llamacpp is able to see your GPU. if you explicitly set device_type to cpu, then the code will use cpu. That might explain why its running so slow. How much RAM do you have on your system and what quantization level are you using?
How well does it work on excel or CSV file? Overall great info and thanks for sharing an update.
This setup will work with cvs and excel files but you will need to experiment with embeddings and models for better performance
With regard to the splitting operation, I want to split on paragraphs, not random chunks. Can localgpt accommodate that out of the box, or would I need to hack your source code? Could I achieve my result with a decorator? Thx
It’s using the recursive character text splitter which uses paragraphs for splitting. So I think it will work for your use case
Amazing Work! It would be really awesome, if you could give it a GUI and a 1click installer, like other tools have it already. Like GPT4ALL or subtitle edit. For Mac, Windows and Linux. This would extend the userbase dramatically and give your more coffees ;)
Thanks for the idea! will see what I can put together.
@@engineerprompt Seconding this request. I've spent the last 18 hours installing (and uninstalling and reinstalling and then uninstalling and reinstalling again) literally hundreds of gigabytes of CUDA and Visual Studio nonsense trying to get this thing to work.
@@engineerprompt I agree, it needs a GUI. Noone prefers a command prompt over a GUI
@@BabylonBaller There are two UI options now. One is via the API and the other is dedicated UI via streamlit. I am working on a gradio one that will make thing much more easier.
@@engineerpromptmuch appreciated , will look into streamlit
excellent updates. now can you write it in Nodejs?
I don’t have experience with Nodejs but hopefully someone can implement it
hey i have m1 8 gb ram model can i run a 4bit quantized model
If the pdf has the tables in it, it's not been extracted the same format as it is. What is the best way to ingest a pdf having tables which has lof of missing values?
Look at the unstructured loader for working with pdfs
@@engineerprompt tried it. Row columns alignments are mismatched after loading. Bcz of that llm is giving incorrect response while asking n×m based questions.
can local gpt installed on windows11? very exiting video
How is localgpt different from quivr project ?
Still im getting llm corpus data if I ask other than source document
I like to have a cup of coffee with you. Great.
can you release the code you are showing every example I am finding using openai I made several changes locally to localgpt but I would like to see this spin on it
Code is in the localgpt repo
Oh ok my sorry busy day writing code when i seen this KZhead video notification I thought was new code sorry@@engineerprompt
Is anyone else struggling to run llamacpp on windows using cublas No matter what, blas is always 0 😐
If you have NVidia gpu, my recommendation is to use gptq models
@@engineerprompt Gptq in giving nice inference speed on my nvidia 3070Ti but I am struggling to use conversationbuffermemory with it. What options do we have for memory for GPTQ models.
Im using cpu getting errro like int object not collable
Setx on visual studio gives a syntax error !
just use "set VARIABLE_NAME"
Dude you gotta start learning to put relevant links to your videos.
Are you indian bro ?
I think YES!
How can I use an Arabic LLM with this ?? How would I set that up and what steps would I need to take? This is awesome!!!