Diffusion models from scratch in PyTorch

2024 ж. 23 Мам.
224 783 Рет қаралды

▬▬ Resources/Papers ▬▬▬▬▬▬▬
- Colab Notebook: colab.research.google.com/dri...
- DDPM: arxiv.org/pdf/2006.11239.pdf
- DDPM Improved: arxiv.org/pdf/2105.05233.pdf
- Awesome Diffusion Models Github: github.com/heejkoo/Awesome-Di...
- Outlier Diffusion Model Video: • Diffusion Models | Pap...
- Positional Embeddings: machinelearningmastery.com/a-...
▬▬ Used Icons ▬▬▬▬▬▬▬▬▬▬
All Icons are from flaticon: www.flaticon.com/authors/freepik
▬▬ Used Music ▬▬▬▬▬▬▬▬▬▬▬
Music from Uppbeat (free for Creators!):
uppbeat.io/t/prigida
Song: Spooky Loops
License code: QKVNF1BODEDX33HO
▬▬ Timestamps ▬▬▬▬▬▬▬▬▬▬▬
00:00 Introduction
00:30 Generative Deep Learning
02:58 Diffusion Models Papers / Resources
04:06 What are diffusion models?
05:06 How to implement them?
05:29 [CODE] Cars Dataset
06:50 Forward process
10:15 Closed form sampling
12:15 [CODE] Noise Scheduler
16:10 Backward process (U-Net)
19:32 Timestep Embedding
20:52 [CODE] U-Net
25:35 Loss
26:28 [CODE] Loss
28:53 Training and Results
30:05 Final remarks
▬▬ Support me if you like 🌟
►Support me on Patreon: bit.ly/2Wed242
►Buy me a coffee on Ko-Fi: bit.ly/3kJYEdl
►Coursera: imp.i384100.net/b31QyP
►Link to this channel: bit.ly/3zEqL1W
►E-Mail: deepfindr@gmail.com
▬▬ My equipment 💻
- Microphone: amzn.to/3DVqB8H
- Microphone mount: amzn.to/3BWUcOJ
- Monitors: amzn.to/3G2Jjgr
- Monitor mount: amzn.to/3AWGIAY
- Height-adjustable table: amzn.to/3aUysXC
- Ergonomic chair: amzn.to/3phQg7r
- PC case: amzn.to/3jdlI2Y
- GPU: amzn.to/3AWyzwy
- Keyboard: amzn.to/2XskWHP
- Bluelight filter glasses: amzn.to/3pj0fK2

Пікірлер
  • Extremely fantastic implementation. I understood the whole diffusion ideas and all mathematical details made sense to me just from your codes.

    @tanbui7569@tanbui7569 Жыл бұрын
  • Really well explained, and a compact notebook. It's basically all written directly from Torch, very refreshing to see when so much content is heavily reliant on APIs.

    @kiunthmo@kiunthmo Жыл бұрын
  • Absolutely phenomenal content! Love it ❤️

    @MassivaRiot@MassivaRiot Жыл бұрын
  • Thank you! I really liked your graphic interpretation of the beta scheduling. It's missing in many other videos about diffusion.

    @sergiobromberg9233@sergiobromberg9233 Жыл бұрын
  • Loved the simple implementation also thanks for sharing additional articles

    @LiquidMasti@LiquidMasti Жыл бұрын
  • Oh my god, this explanation is SUPER CLEAR! 🤯

    @peterthegreat7125@peterthegreat7125 Жыл бұрын
  • Great video! For me, the code makes it easier to understand the math than the actual formulas, so videos like these really help.

    @rafa_br34@rafa_br344 күн бұрын
  • Great effort thank you! The simplified version of it is still complicated though :D Probably, I need to watch this couple of more time after reading resources you attached.

    @cankoban@cankoban Жыл бұрын
  • Amazing video! Highly suggested before diving into the paper

    @ioannisd2762@ioannisd27622 ай бұрын
  • Thanks! Great animation and explanations..amazing 🙏

    @orrimoch5226@orrimoch5226 Жыл бұрын
  • what a great explanation, I will take a deeper look at the code. Thanks

    @shakibyazdani9276@shakibyazdani9276 Жыл бұрын
    • You're welcome! :)

      @DeepFindr@DeepFindr Жыл бұрын
  • What a great video! Loved it!

    @ShobeirKSMazinani@ShobeirKSMazinani Жыл бұрын
  • Very good job! A note: to me you should add torch.clamp(image, -1.0, 1.0) after each forward_diffusion_sample() call. You can check the behavior with and without the clamp when simulating forward diffusion. Images shown without clamp seems "not naturally noisy" as the pixel range is no longer between -1 and 1. I don't know how much this affects the final training result; it should be tried.

    @ViduzTube@ViduzTube Жыл бұрын
    • Yes very good point. Later I also realized this and it actually led to an improvement (on a different dataset however). :)

      @DeepFindr@DeepFindr Жыл бұрын
    • Thank you! When I was watching I was wondering what would happen as you add the variance and the value exceeds 1. Your answer helped me understand it.

      @oliverliu9248@oliverliu9248 Жыл бұрын
  • The explanation was awesome 🔥

    @harsh9558@harsh95586 ай бұрын
  • That was really clear. Thank you !

    @tidianec@tidianec Жыл бұрын
  • Thanks for putting this together

    @andreray6562@andreray6562 Жыл бұрын
  • Thanks for this amazing video! Do you plan on extending this video to include conditional generation at some point in the future? I would love to see an implementation of the SR3/Palette models that use DDPM for image to image translation tasks such as super-resolution, JPEG restoration etc. In this case, the reverse diffusion process is conditioned on the input image.

    @anonymousperson9757@anonymousperson9757 Жыл бұрын
  • Excellent Explanation. Learn a lot from your video, thank you~

    @user-yp4ye9kf3b@user-yp4ye9kf3b Жыл бұрын
  • Thank you! This is the best video I've ever seen

    @user-wn3hb3vc1v@user-wn3hb3vc1v Жыл бұрын
    • Glad that you liked it!

      @DeepFindr@DeepFindr Жыл бұрын
  • really good introduction!thanks

    @senpeng6441@senpeng6441 Жыл бұрын
  • Thanks a lot for the video, really helpful for someone trying to grasp these models. Also, little typo I noticed: at 16:06 in the cell "# Simulate forward diffusion" noise is being added a little faster than intended. The culprit is the line "image, noise = forward_diffusion_sample(image,t)" due to it rewriting the variable "image" at each step in the loop, despite the fact that forward_diffusion_sample was built expecting the initial non noisy image. So from the second iteration step onwards we're adding noise to an already noisy image.

    @LuisPereira-bn8jq@LuisPereira-bn8jq Жыл бұрын
    • Hehe thanks for this finding, this is indeed a bug. I just checked, it doesn't look very different with the correction (assigning to a new variable). If I'm not mistaken, this led to a multiplication by 2, as in every step the pre-computed noise for this t is added, plus the cumulative noise until t (which should be the same as the pre computed one) hence leading to twice the noise as indented. Anyways, thanks for this comment! :)

      @DeepFindr@DeepFindr Жыл бұрын
    • ​@@DeepFindr Hi again. Yeah, the bug didn't really affect the images much, but it might confuse some viewers about whether you're computing x_t from x_0 or from x_{t-1}. As for the "multiplication by 2" bit, it's not going to be exactly that since the betas are changing and you're adding (t-1)-step noise to t-step noise. Moreover, adding a N(0,1) to another independent N(0,1) is a N(0,2), whose standard deviation is sqrt(2), so what was happening should be closer to multiplication by sqrt(2), even if also not exactly that. Anyway, since my previous comment I've now finished the video and trained if for 100 epochs so far (with comparable results to yours). I have two more comments in the latter bits of the video, namely the "sample_timestep" function at 26:59: - I was rather confused for a while as to why we were returning "model_mean" rather than just "x" for t=0. Though eventually I realized that the t's in the code are offset from the t's in the paper: the code is 0-indexed but the paper is 1-indexed. So the t=0 case in the sample_timestep is really inferring x_0 from x_1 in terms of the paper. It might be worth adding a comment about this in either the video or the code. - it took me quite a bit to understand the output of the sample_timestop function. I think I mostly got it now, but this is a really subtle step that is worth demystifying. Here's my current understanding: in effect our model is always trying to predict x_0 from x_t, but we don't expect the prediction to be great for large t. However, the distribution of p(x_(t-1) | x_t, x_0) is a fully known normal distribution, so we instead we the predicted x_0 to approxiamte this p(...), then sample from that to get x_(t-1). In retrospect, I've seen multiple videos on diffusion try to describe this process in words as "we predict the full noise, but then we add some of the noise back", but that vague description never made sense to me. So maybe an extra comment on this could help a future viewer as well. Anyway, let me thank you again for the video. My hope is to eventually actually understand stuff like stable diffusion with all its bells and whistles, and this already helped a lot. And on that note, I noticed that the weights for the network in the video take up 700M, compared to something like 4GB for stable diffusion, so it's maybe not so surprising that this would require a while to train from scratch.

      @LuisPereira-bn8jq@LuisPereira-bn8jq Жыл бұрын
    • @Luis Pereira yes I totally agree, in retrospect some things could've been more in depth. Meanwhile I've also experimented more and read other papers about these models (and also the connection to score based approaches) which could also be added here. Maybe I'll make an update video some day :)

      @DeepFindr@DeepFindr Жыл бұрын
    • @@DeepFindr No worries. My experience is that in retrospect nearly everything could has been improved in some way or other. And if you ever find the time for another video, I at least would be interested. There are a decent number of good KZhead videos on this topic, but this is one of the best I've found.

      @LuisPereira-bn8jq@LuisPereira-bn8jq Жыл бұрын
  • Thanks so much !! This is gold! Keep them coming. I’m curious to see the results of the model, can you share some more pictures?

    @erank3@erank3 Жыл бұрын
    • Thank you! Happy that you liked it! I only have the pictures at the end of this video. Unfortunately I didn't save the model weights after the longer training, because I thought I wouldn't need them anymore :/

      @DeepFindr@DeepFindr Жыл бұрын
  • Very nice video with good explanation. I would like to point out that your Block class, the same batchnorm is used in different places. Batchnorm is trainable and have weights, so you might want to treat it more like an actual layer rather than memory-less operations like pooling and ReLU.

    @roblee5721@roblee5721 Жыл бұрын
    • Hi, thanks for pointing out. This was a little bug, which I've corrected in the original notebook. :)

      @DeepFindr@DeepFindr Жыл бұрын
  • Damnnn I wish I watched your video first thing when trying to understand this. Great explanation

    @tensenpark@tensenpark Жыл бұрын
  • Thanks for the amazing video !

    @xczhou3340@xczhou33409 ай бұрын
  • Really nice exposition. Can you please elaborate on the specifications of the machine it was trained on and approx how long the training took?

    @sohampyne8009@sohampyne8009 Жыл бұрын
  • great video, very clear

    @alexvass@alexvass Жыл бұрын
  • Thank you! It was helpful

    @usama57926@usama57926 Жыл бұрын
  • Thank you for the awesome guide :D Just 1 simple question. In the plotted image, are we looking at x0, x1, .... x10 where x0 is the image at the very left (denoised version) and x10 image at the very right (noised most)?

    @user-fu3jx7mj2o@user-fu3jx7mj2o Жыл бұрын
  • love the content brother

    @adamtran5747@adamtran5747 Жыл бұрын
  • Quite good!

    @AndrejKarpathy@AndrejKarpathy Жыл бұрын
    • Thank you!

      @DeepFindr@DeepFindr Жыл бұрын
  • very nice video and very easy to understand

    @sienloonglee4238@sienloonglee4238 Жыл бұрын
  • Amazing explanation

    @user-dc2vc5ju3m@user-dc2vc5ju3mАй бұрын
  • When you was explaining code for noise scheduler, T value changed from 200 to 300 which also i think should be reflected in different (smaller) betas, cause we end up with smaller alpha compound.

    @marcinwaesa8713@marcinwaesa8713 Жыл бұрын
    • Yes good point!

      @DeepFindr@DeepFindr Жыл бұрын
  • really interesting explination, thank you for doing this.

    @CyberwizardProductions@CyberwizardProductions Жыл бұрын
    • You're welcome!

      @DeepFindr@DeepFindr Жыл бұрын
  • Thank you for the video.

    @frederictost6659@frederictost6659 Жыл бұрын
  • Forward processing is very clear. Could you categorize the code blocks of backward processing?

    @Zindit@Zindit Жыл бұрын
  • Hello thank you for the video and code. I have two questions: Q1- In the Block module 24:16 why is the input channel in `self.conv1` multiplied by 2? The input channel is twice the size of the output based on the `up_channels` list in the `init` of your SimpleUnet class. Is this related to adding "residual x as additional channels" at 24:50? Q2- How do you direct which way the diffusion direction goes in? I know this is a very simplified example model but how would you add the ability to direct the generation towards making a certain class of car, or a car based on text descriptions like "red SUV"? Is there a good explanatory paper or blog post or video on this matter that you can recommend (preferably practical without a lot of math)? (Thank you again for the video)

    @MonkkSoori@MonkkSoori Жыл бұрын
  • beutifully explained

    @user-nn5fp7tl2j@user-nn5fp7tl2j27 күн бұрын
  • Thank you!

    @user-co6pu8zv3v@user-co6pu8zv3v Жыл бұрын
  • Many thanks. Excellent explanation. Can we use the diffusion models for the deblurring of images? These are Generative models, and I want to use them for image restoration problems. Thanks

    @infocus2160@infocus2160 Жыл бұрын
    • Hi! Yes they can be used for image restoration as well. Have you seen this paper: arxiv.org/abs/2201.11793 :)

      @DeepFindr@DeepFindr Жыл бұрын
    • @@DeepFindr Excellent thanks. You are amazing.

      @infocus2160@infocus2160 Жыл бұрын
  • thanks for the well explained and code. but i wonder how can i use the trained model to do generation of images? can advise?

    @lchunleo@lchunleo6 ай бұрын
  • Thank you very much for this video

    @mehdidehghani7706@mehdidehghani7706 Жыл бұрын
    • You're welcome! :)

      @DeepFindr@DeepFindr Жыл бұрын
  • Thanks! From the training result, what I saw was an image that when from its original image to a less noisy one? I was expecting to see a noisy image converted to a less noisy one or it's original one?

    @michael2826@michael28262 ай бұрын
  • Awesome video! what is the software are you using to draw these examples?

    @FelipeOliveira-gt9bf@FelipeOliveira-gt9bf Жыл бұрын
    • Thanks! Its nothing fancy - a mix of PowerPoint and DaVinci Resolve. :)

      @DeepFindr@DeepFindr Жыл бұрын
  • Bro, one day in the future when this channel becomes famous, don't forget I am one of your early fans!

    @leonliang9185@leonliang9185 Жыл бұрын
    • Haha :D I won't

      @DeepFindr@DeepFindr Жыл бұрын
  • Thank you mate for the video, can you make one for the "conditional Diffusion" too? thanks

    @AI_Financier@AI_Financier Жыл бұрын
  • Hi, thank you for the well explained video! I've been following your code and training the same on StanfordCars dataset. At epoch 65, the sampled images of my training just come out as grey images. Is there something wrong with my training? Should I adjust the learning rate?

    @user-sc8hg7lw8t@user-sc8hg7lw8t6 ай бұрын
    • also having this issue, did you figure it out?

      @neelsortur1036@neelsortur10364 ай бұрын
  • Amazing work, thanks for sharing!

    @chyldstudios@chyldstudios Жыл бұрын
    • Thanks!!

      @DeepFindr@DeepFindr Жыл бұрын
  • Thanks for a great tutorial. I think there is a small bug though in the implementation of the output layer of UNet. The output channel dimension is flipped with a kernel size and set to fixed 3. Shouldn't it look instead like this: self.output = nn.Conv2d(up_channels[-1], out_dim, 3)

    @cerann89@cerann89 Жыл бұрын
    • Oh yes :D bugs everywhere. With output dim 1 it would just produce black and white images, so this bug led to color ;-) have you tried it with another kernel size? Did it make a difference?

      @DeepFindr@DeepFindr Жыл бұрын
    • @@DeepFindr I actually tried it on medical MRI images which have only one color dim (greyscale). That is where the error was triggered. I kept the kernel size at 3, so no I can’t give any input on the influence of the kernel size.

      @cerann89@cerann89 Жыл бұрын
  • ques: at 15:18 , why did we not directly scaled b/w -1&1 , or are there two different tensors we are scaling? one b/w 0&1 and the other b/w -1&1 ?

    @curiousseeker3784@curiousseeker37849 ай бұрын
  • Shouldn't you use separate BN layers for 1st and 2nd convolution in a block? In your implementation batch statistics are shared among 2 layers what seems to be a bug.

    @adamgrygielski7395@adamgrygielski7395 Жыл бұрын
    • Yep, you are right. I updated the notebook. Actually I also found that bug in a local version of the code and forgot to adjust the notebook. Bnorm layers can't be shared as each layer learns individual normalization coefficients. Thanks for pointing this out :)

      @DeepFindr@DeepFindr Жыл бұрын
  • Thanks for sharing this tutorial. It's so kind for beginners.

    @kidzheng8531@kidzheng8531 Жыл бұрын
  • goated content

    @xingyubian5654@xingyubian5654 Жыл бұрын
  • How can we save all the generated images? Like as far as my understanding goes at the end of training there would be generated images of stanfordcars from completely noised images.

    @rajatsubhrachakraborty6767@rajatsubhrachakraborty6767 Жыл бұрын
  • Great video! Can You give any tips for 128*128 images generation with that model, please?

    @kudre302@kudre3024 ай бұрын
  • Thanks for your a lot contribution for this.. But a bit confused. at 7:30, q(Xt | Xt-1) is a distribution for which "the sampled noise" follows OR for which "the Noised image" follows ?

    @SeonhoonKim@SeonhoonKim Жыл бұрын
    • It's the distribution of the noised image :) the distribution of the noise is always gaussian. This formula expresses the mixture of the original input and the noise distribution, hence the distribution of the noised image.

      @DeepFindr@DeepFindr Жыл бұрын
    • @@DeepFindr Thanks for your reply !! just one more, please..? Then q(Xt | Xt-1) = N(Xt; ... , BtI ) means the variance of Xt is Bt ? Someone says V(Xt) eventually becomes to 1 in every step, so a bit confused...

      @SeonhoonKim@SeonhoonKim Жыл бұрын
    • @@SeonhoonKim bt is just the variance of this single step. Have a look at the "closed form" part with alpha. Ideally alpha bar becomes 0 at the end (the cumulative product) which leads to a variance of 1

      @DeepFindr@DeepFindr Жыл бұрын
  • Stanfords Cars dataset is no longer available in PyTorch datasets. Do you have any alternate locations for the same data?

    @SandeepSinghPlus@SandeepSinghPlus10 ай бұрын
  • best explanation, and perhaps the sigma in the normal distribution graph should be sigma^2

    @jeonghwanh8617@jeonghwanh8617 Жыл бұрын
  • great walkthrough, just want to point out what missing term in implementation of sample_timestep: when returning results, you forgot to time model_mean with one over sqrt of alphas_t (i.e. 1 + betas_t), to match algorithm 2 from paper, it shall be: return model_mean / torch.sqrt(1. - betas_t) + torch.sqrt(posterior_variance_t) * noise however, even plugging back this term to the return statement, I did not see too much difference in the training result ;P So missing that time might not be a big deal

    @junpengqiu4054@junpengqiu4054 Жыл бұрын
  • At 10:19, q is the noise and the next forward image is x+q. Do I understand it right? Or we just used q and x interchangeably?

    @jby1985@jby198511 ай бұрын
  • You are seeing something thats gonna change the way we see our universe in upcoming 2-3years! Save my comment!

    @arnabkumarpan5615@arnabkumarpan561510 ай бұрын
  • How can a model that is only 3.2GB, produce almost infinite image combinations that can be produced from just a simple text prompt, with so many language variables. What I am interested in, is how a prompt of say a "monkey riding a bicycle" can produce something that visually represents the prompt. How are the data images tagged and categorized in training to do this? As a creative person we often say that an idea is still misty and is not formed yet. What strikes me about this diffusion process is the similarity in how our minds at a creative level seem to work. We iterate and de-noise the concept until it becomes concrete using a combination of imagination and logic. It is the same process that you described to arrive at the finished formula. What also strikes me about the images produced by these diffusion algorithms is that they look so creative and imaginative. Even artists are shocked when they see them for the first time and realize a machine made them. My line of thinking here is that we use two main tools to acquire and simulate knowledge and experience. They are images and language. Maybe this input is then stored in a similar way as a diffusion model within our memory. Logic, creativity and ideas are just a consequence of reconstituting this data due to our current social or environmental needs. This could explain our thinking process and why our memory is of such low resolution. The de-noising process could also explain many human conditions such as depression and even why we dream etc. This brings up the interesting question " Could a diffusion model be created to simulate a human personality"? Or provide new speed think concepts and formulas for the solving of a multitude of complex problems for that matter. The path would be 1) diffusion model idea/concept 2) ask a GAN like gpt-3 to check if it works 3) feed back to the diffusion model and keep iterating until it does in much the same way as de-noising a picture. Just a thought from a diffusion brain.

    @chrislloyd1734@chrislloyd1734 Жыл бұрын
    • It's because the subset of possible images we humans are interested in is actually very specific. If you think about it, infinite combinations isn't that complicated. It's when we want specific things that you need more information. It only takes a few KB of code to make a pseudorandom number generator that can theoretically output every possible image, but we would see the vast majority of those permutations as boring rainbow noise. Ironically, the storage space used by generative models is needed to essentially explain what we DON'T want, so that we are left with the very specific subset that does meet our requirements.

      @flubnub266@flubnub266 Жыл бұрын
  • So, stupid question: In the SimpleUnetClass we define the outputLayer with parameter of 3, to regain the amount of channels our image has. Couldn't we then just input the image_channels variable there? What if my image is grayscale and has only 1 channel?

    @nicolasf1219@nicolasf121911 ай бұрын
  • Say I have a still image x0 and a pre initialized noisy image N. I think I can apply a noise to x0 by "(1-B)x0 + BN". When B=1, the output is N, the noisy image, when B=0, the output is the still image. But that's just the linear version.

    @int16_t@int16_t8 ай бұрын
  • Thanks a lot. Can I ask how to choose timesteps in diffusion? Is the larger the timesteps, the better?

    @yangjun330@yangjun330 Жыл бұрын
    • Basically it's a hyperparameter. Not only the step size is relevant, but rather the beta schedule (so start and end values). In my experiments I was simply visualizing the data distributions to determine a good value. You have a good schedule if the last distribution follows a normal gaussian with zero mean and std 1. Also, I have the feeling that a higher number of steps leads to higher fidelity, but I didn't further look into this

      @DeepFindr@DeepFindr Жыл бұрын
  • Looks like the torchvision dataset for StanfordCars is now deprecated or sth; the original url from which the function pulls the data is closed

    @derekyun5109@derekyun5109 Жыл бұрын
  • Hi, thanks for awesome works. I would like to reduce image size. but, when I changed, training is not working, could you send me some info? and I would like to change your good code to DDIM method, only changing sampling part? could you send me detail info?

    @jungminhwang8115@jungminhwang811511 ай бұрын
  • that's insane math

    @curiousseeker3784@curiousseeker378410 ай бұрын
  • In the Google Collab there are log and exp in the Sinusoidal Embedding block. You did not explain where that come from. I don't see it on the formula at 20:26.

    @hilmiyafia@hilmiyafia Жыл бұрын
    • Hi :) Some implementations of positional embeddings are calculated in log space, that's why you see exp and log there. This usually improves numerical stability and is sometimes also done for loss functions

      @DeepFindr@DeepFindr Жыл бұрын
  • Will it be possible to generate new images using this model, if saved after training?, please share as to how to generate new images if possible.

    @arymansrivastava6313@arymansrivastava631311 ай бұрын
  • @13:09 Why isnt sqrt_recip_alphas used anywhere? also why do do you calculate sqrt_one_minus_alphas_cumprod, where in the equation we only have 1-alphas_cumprod? is that a typo? whats the alphas_cumprod_prev exactly? can someone please explain what is being done here? and whats the posterior_variance? Thanks a lot in advance

    @amortalbeing@amortalbeing2 ай бұрын
  • Thanks for the great video first! I trained the model on human face dataset using your code, it seems the sampling results appear checkerboard (grid pattern) artifacts, how to solve this?

    @catfood7859@catfood7859 Жыл бұрын
    • Hi! Make sure to train the model long enough (e.g. Set 1000 epochs and see what happens). Also you might want to fine tune the model architecture and add more advanced layers like attention. I also encountered weird patterns at first but after training longer the quality got better.

      @DeepFindr@DeepFindr Жыл бұрын
    • @@DeepFindr Thanks for the advice, I'll try it : )

      @catfood7859@catfood7859 Жыл бұрын
    • It helps if you set the final layers filter size to 1

      @KJPCox@KJPCox Жыл бұрын
  • Can you make a video on *Conditional generation in Diffusion modals*

    @usama57926@usama57926 Жыл бұрын
  • How many epochs does it take to produce anything that does not look like noise? I've downloaded the dataset from kaggle and replaced the data loader code in the collab, forward process works, howerver training doesn't seem to work. the loss is stuck from the very beginning at ~0.81, it doesn't go down, and the pictures sampled still look like noise. I am at Epoch 65, it does not seem to improve at all

    @DmitryFink@DmitryFink10 ай бұрын
  • why appending time embedding to the feature is after first layer of cnn in UNet? Why not add the time embedding at the initial step (before UNet)?

    @sriharsha580@sriharsha580 Жыл бұрын
    • You could also do that, but I added the timestep in each of the Unet blocks. I think that there are many possibilities to try things out :)

      @DeepFindr@DeepFindr Жыл бұрын
  • How do you made sure that the cars that has been generated at the end are truly original generations and not just copy of some cars in the dataset?

    @JDechnics@JDechnics Жыл бұрын
    • This actually relates to all generative models - how to make sure that the model doesn't simply memorize the train set. For example I've also seen this discussion for GANs: www.lesswrong.com/posts/g9sQ2sj92Nus9DeKX/gan-discriminators-don-t-generalize There is also some research in that direction: openreview.net/forum?id=PlGSgjFK2oJ To answer your question: you need to sample some data points and compare them with the nearest matches in the Dataset to be sure the model didn't overfit. More data always helps of course, to make it less likely that the model memorizes the whole dataset.

      @DeepFindr@DeepFindr Жыл бұрын
  • if anyone wants to implement the DDPM for time-series data which model would be good instead Unet? any suggestions ?

    @sanjeevlvac1784@sanjeevlvac17845 ай бұрын
  • Could you help me in that if I want to stop the training and resume it later how can I save it and load it later ?

    @tamascsepely235@tamascsepely2354 ай бұрын
  • Thx

    @tilkesh@tilkesh9 ай бұрын
  • Really helpful content, and the recommended resources are very good, thanks

    @terguunzoregtiin8791@terguunzoregtiin8791 Жыл бұрын
  • hello.i am confuse in a few things. 1. why you chose T=300 as generally its T=1000? what decide the number of time steps? 2. there is a variable num_images in simulate forward diffusion section why are we dividing T/num images what does it mean?

    @user-bh8kn3zt5z@user-bh8kn3zt5z5 ай бұрын
  • Always been told after math classes: "You won't need that in real life anyways" xd

    @michakowalczyk7411@michakowalczyk7411 Жыл бұрын
    • I can relate xD

      @DeepFindr@DeepFindr Жыл бұрын
    • but what is real life?

      @snoosri@snoosri10 ай бұрын
    • ​@@snoosri Don't take words so literally, especially on media. I believe you can grasp the meaning from context my friend :)

      @michakowalczyk7411@michakowalczyk741110 ай бұрын
    • ​@@snoosriunderrated comment

      @superpie0000@superpie00009 ай бұрын
  • I have a question,sir At 13:22, the formula is (1-alpha_bar), why does the code put root(1-alpha_bar)?\

    @bluebear7870@bluebear7870 Жыл бұрын
  • Maybe it's because I came to see this video not in time. The Standford car dataset link is now invalid. An 404 error.

    @henriwang8603@henriwang860311 ай бұрын
  • The dataset StandfordCars is no more available, what alternative can I use?

    @nikhilprem7998@nikhilprem79983 ай бұрын
    • ^ having the same problem

      @henrysun6430@henrysun64302 ай бұрын
  • Unable to access the dataset - stanford-cars.

    @sachinmotwani2905@sachinmotwani2905 Жыл бұрын
    • same here

      @gabrieldib@gabrieldib6 ай бұрын
  • Hey I have a question: I think in the Colab notebook you only sample one time step from each image in a batch, but I was wondering why we don't take more intermediate time steps from each sample?

    @FrankWu-hc1dl@FrankWu-hc1dlАй бұрын
  • How to save the model and generate images after training?

    @itsthenial@itsthenial Жыл бұрын
  • How do I put my own training dataset to the ipynb script?

    @glacialclaw1211@glacialclaw12116 ай бұрын
  • The dataset is no longer available.

    @egoistChelly@egoistChelly6 ай бұрын
  • You saved my PhD

    @tendocat8778@tendocat87784 ай бұрын
  • Excellent tutorial, but looks code needs to updated, Stanford Cars data set is gone. Available at Kaggle, could you please update it accordingly.

    @asheeshmathur@asheeshmathur9 ай бұрын
  • At 11:45, the third line of the formula should have a bar on the top of alpha_t

    @xhinker@xhinker10 ай бұрын
  • Do you know how to modify this diffusion model to accept a custom data set?

    @isaacsalvador4188@isaacsalvador4188 Жыл бұрын
    • Yes, simply exchange the Dataset class with a custom dataset from pytorch. As long as it's images, the rest should work fine :)

      @DeepFindr@DeepFindr Жыл бұрын
  • how long did it take to train 500 epochs in your rtx 3060?

    @arnob3196@arnob3196 Жыл бұрын
    • Hi! Good question, it was certainly several hours. I ran it overnight

      @DeepFindr@DeepFindr Жыл бұрын
  • Also StandfordCars is no longer available can you plese chnage it?

    @namirahrasul@namirahrasulАй бұрын
  • das ist aber süß

    @vladilek@vladilek2 ай бұрын
  • i love how u blur ur search history

    @user-kv2jk8vc1l@user-kv2jk8vc1l Жыл бұрын
    • Where? xD

      @DeepFindr@DeepFindr Жыл бұрын
  • Did anyone got an output, dont know why I am getting only noisy images in Epoch 0 ,

    @AbhishekSingh-hz6rv@AbhishekSingh-hz6rv7 ай бұрын
KZhead