All You Need To Know About Running LLMs Locally

2024 ж. 22 Мам.
94 694 Рет қаралды

RTX4080 SUPER giveaway!
Sign-up for NVIDIA's GTC2024: nvda.ws/48s4tmc
Giveaway participation link: forms.gle/2w5fQoMjjNfXSRqf7
Please read all the rules & steps carefully!!
1. Sign-up for NVIDIA's Virtual GTC2024 session between Mar 18 - 21st
2. Participate the giveaway DURING Mar 18 - 21st
3. ???
4. Profit
TensorRT LLM
[Code] github.com/NVIDIA/TensorRT-LLM
[Getting Started Blog] nvda.ws/3O7f8up
[Dev Blog] nvda.ws/490uadi
Chat with RTX
[Download] nvda.ws/3OHPRHE
[Blog] nvda.ws/3whKZTb
Links:
[Oobabooga] github.com/oobabooga/text-gen...
[SillyTavern] github.com/SillyTavern/SillyT...
[LM Studio] lmstudio.ai/
[Axolotl] github.com/OpenAccess-AI-Coll...
[Llama Factory] github.com/hiyouga/LLaMA-Factory
[HuggingFace] huggingface.co/models
[AWQ] github.com/mit-han-lab/llm-awq
[ExLlamav2] github.com/turboderp/exllamav2
[GGUF] github.com/ggerganov/ggml/blo...
[GPTQ] github.com/IST-DASLab/gptq
[LlamaCpp] github.com/ggerganov/llama.cpp
[vllm] github.com/vllm-project/vllm
[TensorRT LLM] github.com/NVIDIA/TensorRT-LLM
[Chat with RTX] www.nvidia.com/en-us/ai-on-rt...
[LlamaIndex] github.com/run-llama/llama_index
[Continue.dev] continue.dev/
Model recommendations:
[Nous-Hermes-llama-2-7b] huggingface.co/NousResearch/N...
[Openchat-3.5-0106] huggingface.co/openchat/openc...
[SOLAR-10.7B-Instruct-v1.0] huggingface.co/upstage/SOLAR-...
[Google Gemma] huggingface.co/google/gemma-7b
[Mixtral-8x7B-Instruct-v0.1] huggingface.co/mistralai/Mixt...
[Deepseek-coder-33b-instruct] huggingface.co/deepseek-ai/de...
[Madlad-400] huggingface.co/jbochi/madlad4...
[Colbertv2.0] huggingface.co/colbert-ir/col...
This video is supported by the kind Patrons & KZhead Members:
🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi, Hector, Drexon
[Discord] / discord
[Twitter] / bycloudai
[Patreon] / bycloud
[Music] massobeats - magic carousel
[Profile & Banner Art] / pygm7
[Video Editor] maikadihaika

Пікірлер
  • stay up-to-date on the latest AI research with my newsletter! → mail.bycloud.ai/ Minor correction: GGUF is not the predecessor to GGML, GGUF is the successor to GGML. (thanks to danielmadstv)

    @bycloudAI@bycloudAI15 күн бұрын
    • please make step by step guide how to install locally and private for example Mistral-7B. im trying to do this with multple guides and all time im stuck at something

      @Sebastian-oz1lj@Sebastian-oz1ljКүн бұрын
  • I hoooonestly don't know how to feel about the thumbnails looking too similar to you-know-who that got me accidentally clicking this video but meh... One's gotta do what one's gotta do I guess.

    @noobicorn_gamer@noobicorn_gamer2 ай бұрын
    • Same

      @EdissonReinozo@EdissonReinozo2 ай бұрын
    • I don't know who, who?

      @Deathington.@Deathington.2 ай бұрын
    • @@Deathington. Fireship

      @nathanfrandon2798@nathanfrandon27982 ай бұрын
    • Bycloud removed the frame and the grid background on his thumbnails, I think those work great as his signature style. I hope he keeps them

      @NIkolla13@NIkolla132 ай бұрын
    • Let's just hope he doesn't get _burned~_

      @seanrodrigues8184@seanrodrigues81842 ай бұрын
  • Thanks for the video! Minor correction: GGUF is not the predecessor to GGML, GGUF is the successor to GGML.

    @danielmadstv@danielmadstv2 ай бұрын
  • The amount of infos you give both in the videos and the descriptions is insane dude! Keep up the good work!

    @ambinintsoahasina@ambinintsoahasina2 ай бұрын
  • Stup osing Fireship thumbnails😭

    @christianremboldt1557@christianremboldt15572 ай бұрын
  • A thousand thanks! Finding a good LLM model was a complete nightmare for me + it is difficult to figure out which formats is outdated and which - new hot stuff.

    @RetroPolly@RetroPolly2 ай бұрын
  • You can also use ollama. It even runs on a raspberry pi 5 (although slow)

    @Leo_Aqua@Leo_Aqua2 ай бұрын
  • Poor Faraday nearly always gets overlooked when people talk about local LLMs, but it is without a doubt the most easy to use "install and run" solution. Unlike nearly all other options it's near-impossible to mess something up and default settings out of the box are not sub-par.

    @flexoo7@flexoo72 ай бұрын
    • How much is Faraday?

      @hablalabiblia@hablalabiblia2 ай бұрын
    • @@hablalabibliaLike all the best things in life - it's free.

      @flexoo7@flexoo72 ай бұрын
    • @@hablalabiblia It's free and very easy to use! It's really meant just for chatting, it's basically a Silly Tavern kind of app, just not with that many options but it has its own back end with a focus on GGML models. If you're looking to just run models through character cards I'd say, give it a go!

      @joure.v@joure.v2 ай бұрын
  • that was awesome, thanks for the concise information bycloud! 🔥

    @lunadelinte@lunadelinte2 ай бұрын
  • Immensely helpful video. I hope the future has tonnes of user controlled locally ran llms for us in store!

    @papakamirneron2514@papakamirneron25142 ай бұрын
  • Thanks, this is great. Please make a comprehensive video on Fine-tuning locally 101..Cheers

    @johnsarvosky533@johnsarvosky533Ай бұрын
  • Absolutely fantastic and informative video. Well done! I will say I feel like the information certainly speaks to the grip that OpenAI has, especially from a development standpoint, despite the whole video being about open-source models. The procedures, time, research, and money required for any rando or small (even mid size) business owners to integrate open-source and local AI without any practical knowledge about it is near impossible. OpenAI wraps up RAG, "fine-tuning", and memory nice and neat into Assistants which can be easily called via the API. It would be amazing to have a completely standardized system that allows for the same type of application, but geared towards the variety of open-source models out there. Some platforms like NatDev let you compare multiple models based on the same input. Being able to see how RAG and fine tuning affects different models, both open-source and non, from the same platform would be unreal.

    @trolik9113@trolik9113Ай бұрын
  • I love your adhd-friendly edits cloudy.

    @H1kari_1@H1kari_12 ай бұрын
  • I was pretty sure this was a fireship video, but the video is great and informative. Exacly what I was looking for.

    @robertmazurowski5974@robertmazurowski597427 күн бұрын
  • Nice video! Can you do a video about fine tuning a model?

    @bossdaily5575@bossdaily55752 ай бұрын
  • Your videos are way more fun than my algebra homework

    @juanantonionieblafigueroa377@juanantonionieblafigueroa3772 ай бұрын
  • but anyways this video was very helpful because no one made it very clear on what are the best front end interfaces to install, I kept trying to make one myself to no avail and give up after a while after testing stuff in the command prompt

    @NeostormXLMAX@NeostormXLMAX2 ай бұрын
  • Very nice, tons of useful info Thank you!

    @vladislava5237@vladislava52372 ай бұрын
  • Thank you. Very interessting. Is it possible in LM Studio to work with own files? Or create own LLM or extend LLM for own cases?

    @aketo8082@aketo808227 күн бұрын
  • In regards to context, would LLM Lora's help with that? Lets say im busy with story writer LLM and the fantasy world I'm working with would be as big as something like Middle Earth from LOTRs. Would a Lora help with that? Like if I train a Lora on all our past chat history about the story etc. Also more text regarding the lore of places and history of characters and family trees. So taking that into consideration, would that assist in keeping the context low? So I don't need to keep a detailed summerized chat history etc. What would the requirement be for training such a Lora and what would the minimum text dataset require for a coherent training?

    @kernsanders3973@kernsanders39732 ай бұрын
  • Now we just need a cheap inference card with 128GB memory to run 70B models locally... Maybe we can hope for Qualcomm

    @Veptis@Veptis2 ай бұрын
    • I’d love to see AI inference accelerator cards with dual or quad channel DIMM slots.

      @cbuchner1@cbuchner12 ай бұрын
    • @@cbuchner1 Qualcomm AI 100 Ultra is using LPDDR5

      @Veptis@Veptis2 ай бұрын
    • groq is using something of the sort, an LPU. although only usable through an api. no consumer cards yet that i know of, but it shows the trend towards it

      @nyxilos9167@nyxilos91672 ай бұрын
    • @@nyxilos9167 you can buy a single groq card right now. it costs 21k and has 230MB on board. So to run 70B models at fp16 you need like 572 cards.... which is several racks. 14+ million to buy and 30kW to power. It will run the model at 400 tok/s easily. You can buy a ready made 8x H100 box for maybe 350k and run that with like 8kW and it might be slower than the groq card. none of that are consumer solutions. The one I am hoping for is Qualcomm AI 100 Ultra. Which comes with 128GB LPDDR5 and 150W. They say it's for edge inference, but it would be perfect for workstation.

      @Veptis@Veptis2 ай бұрын
    • idk Qualcomm SoCs are for phones mostly... maybe iPhone 30 will have it XD

      @Vifnis@Vifnis2 ай бұрын
  • Boy, Chat with RTX is my personnel oracle for now on. Its RAG really indexes local documents without that whole hallucination from previous tools.

    @Paulo-ut1li@Paulo-ut1li2 ай бұрын
  • thanks, this videos is very funny and helpful!

    @rougeseventeen@rougeseventeen6 сағат бұрын
  • The one thing I hope to see soon is offloading different layers to different GPUs I have a 4090 mobile in my laptop and an RX6800 in my eGPU. I do have 96GB of system memory in addition to these two 16GB cards so I can do some fun stuff already.

    @magfal@magfal2 ай бұрын
  • Ive been hamfisting my way through llms for over year. Just ramming squares into circles till it worked since informations so sporadic. 100% checking out your other videos. Learned more in 5 min then 4 hours reading github docs

    @a.........1._.2..__..._.....__@a.........1._.2..__..._.....__2 ай бұрын
  • with local models are you able to make much longer responses given that you have enough ram and vram?

    @joseph-ianex@joseph-ianex2 ай бұрын
  • LM STUDIO and TRINITY 1.2 is my favorite non-GPT entities!

    @WINTERMUTE_AI@WINTERMUTE_AI2 ай бұрын
  • does the a Giveaway has country restriction?? I mean maybe you can't send it overseas due to shipping cost or something else.

    @shoddits2156@shoddits21562 ай бұрын
    • That's a great question.

      @krzysztofmaliszewski2589@krzysztofmaliszewski25892 ай бұрын
  • what about ollama as a backend, what is your take on that? Thank you so much for the video, sending love from switzerland

    @frazuppi4897@frazuppi48972 ай бұрын
  • Just to clarify then. For inference speed is more important GDDR6 will be GDDR5, but for fine tuning more more having 2x the amount of GDDR5 will be the GDDR6?

    @u13e12@u13e1212 күн бұрын
  • Where ollama?

    @pedrogorilla483@pedrogorilla4832 ай бұрын
    • agree, with the new windows installer its so easy for everyone to get local models

      @sZenji@sZenji2 ай бұрын
    • For a while it was only Mac-based, so it saw limited use with most AI folks who have Nvidia cards. If you're stuck on a Mac I hear it's really the better one for that.

      @4.0.4@4.0.42 ай бұрын
    • wow now on support windows too ?@@sZenji

      @zikwin@zikwin2 ай бұрын
    • I use it on my Raspberry Pi5 to run LMM's, which is seriously cool, er hot when working.

      @babbagebrassworks4278@babbagebrassworks42782 ай бұрын
  • Step 4 is Clear, but How can I unlock step 3? I only see questionmarks. Do I have to do step 1 and 2 to unlock what I have to do at step 3, Or do I just need to gain more XP for the unlock. Maybe I just have to do step 4 twice to make up for the missing third step...

    @dustindustir521@dustindustir5212 ай бұрын
  • Where do I upload the photo once GTC comes around ?

    @christopheralvarez1090@christopheralvarez10902 ай бұрын
  • I'm a noob when it comes to this. I've come across Ollama, and started using it. Can I upload multiple things, texts, and possibly images, to chat with RTX and create my own data? And will it be uncensored? what are some other good options to 'Chat with RTX'

    @fxstation1329@fxstation13292 ай бұрын
  • I can finally start my side project to take over the world, thanks!

    @the_gobbo@the_gobbo2 ай бұрын
  • i run LM Studio and i think its great, good video my dude

    @squfucs@squfucs2 ай бұрын
  • How did you miss Faraday? Very easy to use and runs faster than LM Studio

    @bigglyguy8429@bigglyguy84292 ай бұрын
  • Hope this works better than the time I tried to download more RAM

    @valeriapadilla5860@valeriapadilla58602 ай бұрын
  • Ollama + openwebui is the way to go. Same ui as ChatGPT, plenty of convenient functions. It's a no brainer.

    @plagiats@plagiats2 ай бұрын
  • What 3 models do you recommend with 24 GB VRAM? Preferably 21-22GB / 24GB in practical usage.

    @RedOneM@RedOneM2 ай бұрын
    • huggingface lists models with their respective memory requirements. any 7b model will likely work very well and be under 21gb. you could also go with a bigger model but at a lower quantization. mistral models are among the most popular, open source, and very competitive.

      @nyxilos9167@nyxilos91672 ай бұрын
  • Koboldcpp crying in the corner

    @fennecthechoosenone5189@fennecthechoosenone51892 ай бұрын
  • EXL2 does support AMD GPUs. Turbo bought a couple just to make sure it runs with rocm

    @Anthonyg5005@Anthonyg50052 ай бұрын
  • what happened to the newsletter ????

    @narpwa@narpwa2 ай бұрын
  • Please make a video on how to fine tune a model using local documents.

    @Paulo-ut1li@Paulo-ut1li2 ай бұрын
  • I don't have strong GPU , do you reccomend any sevices that i can run models on .

    @user-ud9su7qn1x@user-ud9su7qn1x27 күн бұрын
  • Curious headcount? 🙋How many of us watching these type videos are not developers?

    @jawbone1218@jawbone12182 ай бұрын
  • I guess my machine is not good enough, 2019 intel imac, because running any model locally is usually lagging way behind ChatGPT 3, Gemini, Perplexity, etc.

    @cristianionascu@cristianionascu2 ай бұрын
  • as a car content creator i approve this video

    @itisallaboutspeed@itisallaboutspeed2 ай бұрын
  • Good to know.

    @user-js4mt5cd2t@user-js4mt5cd2t2 ай бұрын
  • Basically to understand this video one should already know everything mentioned in this video by heart.

    @rotors_taker_0h@rotors_taker_0h2 ай бұрын
    • Eh, it provides terms to hunt for and sometimes that's all someone needs, a starting point. The video is short and covers a lot of ground.

      @Trahloc@Trahloc2 ай бұрын
    • Dude wants a 16 part lecture to explain it all😂

      @MonkeeGeenyuss@MonkeeGeenyuss2 ай бұрын
    • @@MonkeeGeenyuss I mean, I can only follow because I know it all and cannot imagine someone unfamiliar to understand anything from this firehose, lol.

      @rotors_taker_0h@rotors_taker_0h2 ай бұрын
  • We love LM Studio 😫

    @harrychaklader@harrychaklader2 ай бұрын
  • Please make a video about making our locally running LLMs available for others to use maybe like our own API which people can use or a webUI interface to use our local LLM.

    @mzafarr@mzafarr12 күн бұрын
  • Where is the diagram at 8:50 from?

    @ericcadena2030@ericcadena2030Ай бұрын
  • How do local models compare to cloud ones like openai? Wouldnt a local pc have way worse results? A server farm can have way more vram and hence is better?

    @D0J0Master@D0J0Master2 ай бұрын
    • I'm getting ~gpt 3.5 performance on my laptop with 16gb ram and rtx 3060. I'm primarily using it because I feel like commerical ai chatbots are getting more and more censored

      @joseph-ianex@joseph-ianex2 ай бұрын
    • @@joseph-ianex Can you share which model are you using?, I have a laptop with those exact specs

      @MrBoxerbone@MrBoxerboneАй бұрын
    • @@MrBoxerbone *rtx 3050 ti. Most 7B models run fine, you can try Mistral, Gemma, or Llama 2. Get either ollama (command line) or llm studio (ui) to run the model. If you are new to running models I would recommend llm studio. The models are a bit slow and the context window is pretty small but they run. Pinokio is another cool ai if you want to test out open-source AI art tools 👍

      @joseph-ianex@joseph-ianexАй бұрын
  • what about ollama

    @dungeon4971@dungeon49712 ай бұрын
  • What do you think of phi model ?

    @rfffffffff@rfffffffff24 күн бұрын
  • The best RP model atm is Kunoichi v2

    @Frab1985@Frab19852 ай бұрын
  • I keep canceling my GPT4 subscription and then renewing it... 'Just when I thought I was out, they pull me back in.' GPT4 reminded me of that phrase from The Godfather. :)

    @WINTERMUTE_AI@WINTERMUTE_AI2 ай бұрын
  • 2:17 Bro lives in the future where M4 is already released

    @hyposlasher@hyposlasher2 ай бұрын
  • How hard is it to run LLM with AMD GPU? Is it still Linux only hell bc no driver support?

    @WW-ir7sm@WW-ir7sm2 ай бұрын
    • rocm works for some stuff on windows, just don't expect to be on the bleeding edge with new features

      @diewindows5628@diewindows56282 ай бұрын
  • I spent so much time trying to get something like this set up, but ended up back to gpt, most of these models are also censored just like gpt, and unlike gpt they are much slower AND on top of that they canot use plugins or special api's that let you access the internet or generate images etc. its sad but currently gpt has no peer

    @NeostormXLMAX@NeostormXLMAX2 ай бұрын
  • Dunno why my comment isn't going through, but try Kobold! Better for GGUF. Current fav is "Crunchy Onion" Q4_K_M GGUF. Give it a taste! 10t/s on a 3090 and pretty smart.

    @4.0.4@4.0.42 ай бұрын
  • 📝 Summary of Key Points: 📌 The video discusses the landscape of AI services in 2024, highlighting the abundance of hiring freezes and the prevalence of subscription-based AI services. 🧐 Various user interfaces for running AI chatbots and language models locally are explored, including UABA, Silly Tarvin, LM Studio, and Axel AO. 🚀 The importance of choosing the right model format, understanding context length, and utilizing CPU offloading for running local language models efficiently is emphasized. 💡 Additional Insights and Observations: 💬 "Garbage in, garbage out" is a crucial principle highlighted when fine-tuning AI models, emphasizing the significance of quality training data. 📊 Different model formats like GGF, AWQ, and EXL 2 are explained, showcasing how they optimize model size and performance. 📣 Concluding Remarks: The video provides a comprehensive guide on running AI chatbots and language models locally, emphasizing the importance of model selection, context length, and fine-tuning techniques. Understanding these key aspects can help individuals navigate the AI landscape effectively and optimize performance while saving costs. Generated using TalkBud

    @abdelkaioumbouaicha@abdelkaioumbouaicha2 ай бұрын
  • is it possible though to run LLM on iOS?

    @Rookie_AI@Rookie_AI2 ай бұрын
  • I just have a question, why this channel is so similar to fireship? are you the same person? : )

    @adamofigueroa@adamofigueroa2 ай бұрын
  • How can I unistall text generation web ui? Anyone know this??

    @Lakosta826@Lakosta826Ай бұрын
  • Besides saving money, are there any other reasons to do it locally vs spending $20 a month for chatGPT?

    @YouTube-Administrator@YouTube-Administrator2 ай бұрын
    • privacy mainly

      @nyxilos9167@nyxilos91672 ай бұрын
    • privacy and reliability, as with local LLM you don't depend on anyone's else infrastructure

      @lodyllog@lodyllog2 ай бұрын
    • Privacy, it's not filtered so you can do more things with it, won't see random dips in quality based on the whims of investors.

      @voidsofold@voidsofold2 ай бұрын
  • You pay 20$ for convenience. Spending 1 day to set up the flow, Waiting 2 minutes every time for your model to load when you have a quick question, your GPU + CPU setting your room on fire cuz of how hot they're running... Unless you need some really specific usecase that cloud models censor, then it's just easier to pay those 20$ for instant access

    @artursvancans9702@artursvancans97022 ай бұрын
    • Patience is a virtue. I got Mistral 7B running on an 2018 laptop, and it takes two minutes to respond, but it works well. Why have 8 GB of RAM when I don't use all 8 GB. The AI uses all my RAM. :) But, for people who have to use AI for a job, $20 is cheap, and workplaces cover the cost. For AI at home, a fast enough computer could work.

      @thatguyalex2835@thatguyalex28352 ай бұрын
  • Make a video about stable diffusion like this

    @wevii9043@wevii90432 ай бұрын
  • Your thumbnail reminds me of fireship

    @exaltedjoseph7963@exaltedjoseph79632 ай бұрын
  • I am from Russia, can I participate in the contest?

    @Dilfin90@Dilfin902 ай бұрын
  • lm studio/ollama are probably the simplest ways to get started, not sure why you picked these ones

    @knoopx@knoopx2 ай бұрын
  • Fireship thumbnail is working for me

    @Necessarius@Necessarius2 ай бұрын
  • You did not name countries you are able to ship for the giveaway. Is it worldwide?

    2 ай бұрын
    • i’ll pay for whatever shipping it costs unless the country is unshippable like north korea

      @bycloudAI@bycloudAI2 ай бұрын
    • @@bycloudAI Thank you for this information, and also for the amazing content that you are putting out ♥

      2 ай бұрын
  • Finally!!!!

    @user-uo1fj8mo1l@user-uo1fj8mo1l2 ай бұрын
  • 🎯 Key Takeaways for quick navigation: 00:28 *🤖 Running AI chatbots and LM models locally provides flexibility and avoids subscription costs.* 00:43 *📊 Choosing the right user interface (UI) for local AI model usage is crucial, depending on individual needs.* 02:05 *🖥️ UABA is a versatile UI choice for running AI models locally, supported across various operating systems and hardware.* 02:33 *💡 Installing UABA enables access to free and open-source models on Hugging Face, simplifying the model selection process.* 05:18 *🤔 Context length is crucial for AI models' effectiveness, affecting their ability to process prompts accurately.* 06:12 *⚙️ CPU offloading allows running large models even with limited VRAM, leveraging CPU and system RAM resources.* 06:52 *🚀 Hardware acceleration frameworks like VM inference engine and TensorRTLM enhance model inference speed significantly.* 07:36 *🎓 Fine-tuning models with tools like Kora enables customization for specific tasks, enhancing AI capabilities.* 08:47 *💰 Running local LM models offers cost-saving benefits and customization options, making it an attractive option in the AI landscape.* Made with HARPA AI

    @alan_yong@alan_yong2 ай бұрын
  • Stanford open source LLama model is free. 🎉

    @gregNFL@gregNFL26 күн бұрын
  • 06:14 How do you have Mixtral 8x7b setup to use just that much VRAM and run that fast? On Oobabooga just 9-10 layers and I'm already risking running out of VRAM on my 16GB GPU, and the thing still takes enough time to finish writing moderate length replies that often my ADHD kicks in and I go do something else while it finishes writing.... Which of the GGUF quantization are you using? Is the issue I'm using the Dolphin version instead of the raw Mixtral? Should I ditch the Dolphin variants? It's been getting so hard to keep up with which models are the current most well regarded by the community with so many models coming out all the time...

    @TiagoTiagoT@TiagoTiagoT2 ай бұрын
    • For mixtral 8x7B you're going to want 2 3090s.

      @voidsofold@voidsofold2 ай бұрын
    • @@voidsofold16GB is not enough?

      @TiagoTiagoT@TiagoTiagoT2 ай бұрын
    • @@TiagoTiagoTNot enough for 8x7B

      @voidsofold@voidsofold2 ай бұрын
    • @@voidsofoldThe example he shows after mentioning using 10GB VRAM for quantized Mixtral was not actually that being done in practice but just some unrelated LLM output clip?

      @TiagoTiagoT@TiagoTiagoT2 ай бұрын
    • @@TiagoTiagoTOh, if you offload it sure you can run 8x7B, it'll just be very slow and have barely any token context

      @voidsofold@voidsofold2 ай бұрын
  • this isn't fireship.. where am I?

    @FlafyDev@FlafyDev2 ай бұрын
    • same . the thumbnail got me and then i realised this guy took fireship's entire style

      @myname-mz3lo@myname-mz3lo2 ай бұрын
  • I just really really like how many serious people have to say ooobabooga. It's like, almost as good of a joke on science as when that guy named the seventh planet.

    @zyxwvutsrqponmlkh@zyxwvutsrqponmlkh2 ай бұрын
  • you're awesome

    @harrychaklader@harrychaklader2 ай бұрын
  • Nice

    @mrrespected5948@mrrespected59482 ай бұрын
  • Which model is best for uh... y'know... stuff...

    @twelvecatsinatrenchcoat@twelvecatsinatrenchcoat2 ай бұрын
    • idk if you still need this, but one of the most "fun" models is MLewd

      @user-yj2tc8xu1f@user-yj2tc8xu1fАй бұрын
    • @@user-yj2tc8xu1fI don't know what you're talking about but thank you. This conversation didn't happen.

      @twelvecatsinatrenchcoat@twelvecatsinatrenchcoatАй бұрын
  • So, AI is the new computer, everyone will have one? Seems good to me. I wonder how the job market will be, hardware will be on top for sure and Open AI will still being a giant. But the thing is how other industries will be affected.

    2 ай бұрын
  • easy oogabooga Apple M4 cpu support before release 😂

    @donciutino7490@donciutino74902 ай бұрын
  • I was highly disappointed, Shōji Kawamori and Kazutaka Miyatake are not on the panel about Transformers... ;)

    @Cergorach@Cergorach2 ай бұрын
  • Are you the same as fireship?

    @AshishKumar-kv4hr@AshishKumar-kv4hr2 ай бұрын
    • Different human being

      @swaggitypigfig8413@swaggitypigfig84132 ай бұрын
    • it’s fireship experimenting with 100% channel automation

      @I_SEE_RED@I_SEE_RED2 ай бұрын
  • We're gonna need a bigger GPU

    @AleNovelasLigeras@AleNovelasLigeras2 ай бұрын
  • I thought it was a video from Fireship 😂

    @kingki1953@kingki19533 күн бұрын
  • oobaGooba, OOGABOOBA

    @remi.scarlet.@remi.scarlet.2 ай бұрын
  • OOOGABOOOOGAAAAH 💪😎🍺

    @keffbarn@keffbarnАй бұрын
  • Hope it works on my toaster too

    @MoisesMendes-hb1ee@MoisesMendes-hb1ee2 ай бұрын
  • >this model list

    @bibr2393@bibr23932 ай бұрын
  • Much cheaper too!

    @ivonnemontes8813@ivonnemontes88132 ай бұрын
  • Ollama?

    @neutrino2211_@neutrino2211_2 ай бұрын
  • My brain melted

    @elasdebastos235@elasdebastos2359 күн бұрын
  • timecode 1:18 is a very questionable use of footage

    @tja9212@tja92122 ай бұрын
  • Wish you made more down-to-earth guide on how best to chat up waifus in Sillytavern, the community is super small for what you can achieve with minimal knowledge, running something like Noromaid on google collab, for completly free and uncensored roleplay, it needs to get more well known, plus I dont really know my way around the different settings and models, having a hard time to get the waifus to put in more dialogue over descriptions for example.

    @Zonca2@Zonca22 ай бұрын
    • jesus I truly hate how intertwined the ai community is with the anime community

      @Quell__@Quell__2 ай бұрын
  • Tavern not Tarvern

    @BigHammer_@BigHammer_Ай бұрын
  • Meanwhile, using my gtx 1060 3gb

    @avizi_@avizi_2 ай бұрын
    • 😢

      @pedrogorilla483@pedrogorilla4832 ай бұрын
KZhead