CODE-LLAMA For Talking to Code Base and Documentation
Learn how to chat with your code base using the power of Large Language Models and Langchain. In this video we will use CODE-Llama to talk to the GitHub repo of LangChain.
"🚀 Dive into Advanced Source Code Analysis with LLMs! | How GitHub Co-Pilot & Others Transform Coding 🧠 | Step-by-Step Guide on QA Pipeline & Innovative Splitting Strategies! 🔍 #CodeAnalysis #LLMmagic"
CONNECT:
☕ Buy me a Coffee: ko-fi.com/promptengineering
|🔴 Support my work on Patreon: Patreon.com/PromptEngineering
🦾 Discord: / discord
▶️️ Subscribe: www.youtube.com/@engineerprom...
📧 Business Contact: engineerprompt@gmail.com
💼Consulting: calendly.com/engineerprompt/c...
LINKS:
Code link: python.langchain.com/docs/use...
Timestamps:
00:00 Intro
01:22 Setup - LangChain
01:51 Doc loader and Text Splitter
03:03 Retriever and Vector Store
03:57 Code Llama with LlamaCpp
04:40 Where to Download Code-Llama From?
05:55 Setting up LlamaCpp
06:55 Running Code-llama
08:00 RAG with Code-llama
09:17 Wrong way of using Code-llama
10:40 Correct Template for Code-llama
12:43 GPU speed to expect with Llama-Cpp
It's one of the rare channels where I've been happy to see a new video notification for a long time. I'm waiting for the next time to come. I feel so lucky that you post videos very often and tell us the latest information perfectly. We don't want to lose you, keep it up bro
This is really great!
Thank you 🙏
Amazing mate!
Thank you! Cheers!
Nice video! I had a question. What would be the best and fastest small model to use on an old cpu? Like 6th generation i5 for example. I heard of orcamini3b and falcon1b what do you think?
First second I hear Indian accent = Imma'bout to get pure useful information from this video, this is for sure. Thx for the video though 1 thing - when you talk about "previous video" at the beginning of the video, I'd consider to provide this "leading link" that shows in the corner because I'm not familiar and have to dig into your feed to find it. Not a great issue but still - just a tip to consider.
Thank you for your videos. I would appreciate if you publish the notebook
Thanks for the Amazing tutorial, Not able to get BLAS=1, created a ticket for the same. Looking forward to hearing from you
谢谢!
Thank you 🙏
What's the difference between those two prompts you defined: sys prompt way and the simple one?
Thanks brother
Can you share the notebook link?
Hi Prompt Engineer. Amazing stuff as always. I just have a question about embedding in the Chroma DB in general. Where can i find the different Embedding models and how can i speed the embedding process up. Because for lets say 100 documents i think the app would crash. Doesnt seems scalable. Or am i getting something wrong here?
you can use something like this : embeddings = SentenceTransformerEmbeddings( model_name= "sentence-transformers/all-MiniLM-l6-v2", model_kwargs= {'device':'cuda'}, encode_kwargs= {'normalize_embeddings': False}) Chroma.from_documents( chunks, embeddings)
Are the models from hugging face the same as those released by Meta?
can we run llama2 model in windows machine with gpu with own documents. let me suggest which video will give information about this. mostly i am getting error while downloading data from meta llm using hugging face api.
Please do a video to setup GPU on a windows machine. I've been trying for days to use my GPU and can't get it to work. I've installed CMAKE, CUDA developer tools and etc. Nothing works. I have two GPUs on my laptop.
When I'm trying with your template that you describe in the video. I'm getting strange answer when I print the output from: # Docs question = "{Current Question}" docs = retriever.get_relevant_documents(question) print(docs) [Document(page_content='ans = qa.run(\'Based on the stg_jaffle_shop.yml file generate sample csv data for each table with minimum 100 rows. The sample data should be with foreign key constraint.\') # \'Write a test case for the database connection using unittest.\' #\'Write a test case for the code in connection.py using unittest.\' #\'Based on the .yml files generate sample csv data for jaffle_shop_customers table with 100 rows\' I see that they are in comments ... but the answer from model is the following..... "This request is not clear to me. Please provide more details so we can help you better. Do you want us to generate sample CSV files or do you want to create a test case using unittest? " it looks like the LLM model doesn't recognize the right question.
Thank you for the video. I get 0 documents loaded when integrating the code to my Python script. does this work on Windows?
How do I ask questions that hit the same file, for example if I want to know how many different constructors a certain class has? I tried everthing and it always came back to me with an inferior number to the actual ones
Can I use GGUF model my gpu doesnt have enough VRAM
thanks! if we do not want to use OpenAI embedding, what open source one you recommend for python code in this case? FAISS will be fine?
For embeddings, I like to use the instructors embeddings. FAISS will be fine as well.
Thanks! @@engineerprompt
How we can restrict the answers to the given context? No matter one what codebase I am creating embedding, the model LLM is given responses based on it pretrained data
You will need to provide a system prompt. Check out my latest video on localgpt. There is an example prompt
How to generate code documentation from codellama? The input will be a class or method in java.
Here also openAPI key is required. Is there anyway to access hugging face models without API key or with some free Key?
Yes, should have highlighted more, the openai api key is required for the embedding model, you can replace his by any other open source embedding model and this should work
Hi, thanks for your response, I have just started learning the long-chain and for the time being, I am not able to purchase chatPGPT for API key but I want to develop some apps using hugging face models to develop the apps, like 'Ask the doc' to read PDF and answer the questions. I will buy chatGPT if my app goes through the testing. Best Regards,
Hi,thanks for the tutorial but I am facing a RateLimitError while executing this particular line of code :"db = Chroma.from_documents(texts, OpenAIEmbeddings(disallowed_special=()))" for both Code-LLAMA and GPT-4 LLM .How can I solve this error?
use a different embeddings, like embeddings = SentenceTransformerEmbeddings( model_name= "sentence-transformers/all-MiniLM-l6-v2", model_kwargs= {'device':'cuda'}, encode_kwargs= {'normalize_embeddings': False})
Can you please give an example that loads 34b instead of 13b? For some reason, I can't get it to work with 34b,
I will have a look at it today, are you trying gguf format?
i did not see the notebook attachment. will you provide it? is this applicable to link to ui?
The link to documentation is in the description where the code is, should work with UI as well
Is it possible to use your localGPT project with Code-LLAMA (and ingest code to ChromaDB)?
Yes but you will need to change the document loader part in the code. Will also need to make changes to the splitter part, the rest will work just fine.
Could you do a video about Cursor? So far to me seems very useful and better than copilot
what is differrence form this and copilot
@@parasetamol6261 easier and more integrated compared to tabnine/copilot
Do we have to use LlamaCpp for RAG?
Not really, it’s just for using the ggml/gguf models
@@engineerprompt Hm yeah I couldn't get RAG to work properly with my org's source code. Even tried to use it with LlamaCpp and the llm = pipeline() -> llm("my prompt") method
Can I use codellama-7b-hf model with langchain
Is it possible to run this on Google colab?
Yes, you probably want to use the 7B model
all examples are on python code base, trying to make it work for Csharp code base but it doesnt work
Can you list system specs, rather than just stating that you used a GPU?
I have M2 Max 96GB
Nice vid, but you can't just use any generic embeddings model, you'd need something that works well for code and that's tough. OpenAI is probably one of the best still. Also anyone who uses GPT4 for code and tries 13B CodeLlama, or 34B even, is going to be sad. It's as bad as GPT3.5, mostly worse. Kind of useless for any higher level reasoning. Also if you're using it for Python why not use the Python-specific CodeLlama model that was optimized exactly for Python use?
Add collaboration file please
Can u share the codelab code ?
Yes? Will put together it in a colab and share
@@engineerprompt where can i find your github repo Sir?
Why are you saying "GPU" in relation to M2? It's arm CPU and llama.cpp project just uses very bespoke optimizations for that CPU, but not the GPU
M2 has integrated GPU.
@@erikjohnson9112 and the whole thing is having power supply and LCD screen. But what does it have to do with inference?
Again very useful video, Thank you very much!! I'm new in this area and it's not clear .... I have a concern about embedding function/model - if you use OpenAIEmbeddings should I worry about some privacy concerns in cases involving sensitive data. I have already tried with open source model embedding like HuggingFaceInstructEmbeddings model_name = "hkunlp/instructor-large" but when I try to load it into Chroma gives me the following error message: chromadb.errors.InvalidDimensionException: Embedding dimension 384 does not match collection dimensionality 1536
Yes, you are sharing your data with openai. I regards to that error, simply delete your vector store and then rerun the embedding computation, this should work. Basically you have existing vector store where embeddings have different dimensions. You want to recompute everything from scratch
It works - Thank you a lot again!! You save me from headache!!@@engineerprompt
My comments get deleted not sure what is happing
hello
I actually wanted to talk to my code to ask why it's so bad.
docs = retriever.get_relevant_documents(question), not sure what is this "retriever". getting NameError: name 'retriever' is not defined
had to include previous video code to make it work. Thanks
This is very insightful tutorial I have applied this but I have doubt how you have created retriever object for Llama? retriever.get_relevant_documents(questions)