The Reparameterization Trick

2024 ж. 11 Мам.

14 566 Рет қаралды

This video covers what the Reparameterization trick is and when we use it. It also explains the trick from a mathematical/statistical aspect.
CHAPTERS:
00:00 Intro
00:28 What/Why?
08:17 Math

Пікірлер

This was the analogy I got from ChatGPT to understand the problem 😅. Hope it's useful to someone: "Certainly, let's use an analogy involving shooting a football and the size of a goalpost to explain the reparameterization trick: Imagine you're a football player trying to score a goal by shooting the ball into a goalpost. However, the goalpost is not of a fixed size; it varies based on certain parameters that you can adjust. Your goal is to optimize your shooting technique to score as many goals as possible. Now, let's draw parallels between this analogy and the reparameterization trick: 1. **Goalpost Variability (Randomness):** The size of the goalpost represents the variability introduced by randomness in the shooting process. When the goalpost is larger, it's more challenging to score, and when it's smaller, it's easier. 2. **Shooting Technique (Model Parameters):** Your shooting technique corresponds to the parameters of a probabilistic model (such as `mean_p` and `std_p` in a VAE). These parameters affect how well you can aim and shoot the ball. 3. **Optimization:** Your goal is to optimize your shooting technique to score consistently. However, if the goalpost's size (randomness) changes unpredictably every time you shoot, it becomes difficult to understand how your adjustments to the shooting technique (model parameters) are affecting your chances of scoring. 4. **Reparameterization Trick:** To make the optimization process more effective, you introduce a fixed-size reference goalpost (a standard normal distribution) that represents a known level of variability. Every time you shoot, you still adjust your shooting technique (model parameters), but you compare your shots to the reference goalpost. 5. **Deterministic Transformation:** This reference goalpost allows you to compare and adjust your shooting technique more consistently. You're still accounting for variability, but it's structured and controlled. Your technique adjustments are now more meaningful because they're not tangled up with the unpredictable variability of the changing goalpost. In this analogy, the reparameterization trick corresponds to using a reference goalpost with a known size to stabilize the optimization process. This way, your focus on optimizing your shooting technique (model parameters) remains more effective, as you're not constantly grappling with unpredictable changes in the goalpost's size (randomness)."
@mohammedyasin20878 ай бұрын
- oh my god !! So good.
  @safau6 ай бұрын
- dam nice bro, thank you for this
  @metalhead60674 ай бұрын
Sometimes understanding the complexity makes a concept clearer. This was one such example. Thanks a lot.
@advayargade7463 ай бұрын
Very nice video, it helped me a lot. Finally someone explaining math without leaving the essential parts aside.
@user-hy4kl3my6h7 ай бұрын
This is a life changing video, thank you very much 😊 🙏🏻
@abdelrahmanahmad305410 ай бұрын
Thank you for this video, this has helped a lot in my own research on the topic
@chasekosborne26 ай бұрын
Thank you for your effort, it all tied up nicely at the end of the video. This was clear and useful.
@MonkkSoori Жыл бұрын
- Thank you for the positive feedback
  @raneisenberg2155 Жыл бұрын
WOW! THANK U. FINALLY MAKING IT EASY TK UNDERSTAND. WATCHED SO MANY VIDEOS ON VAE AND THEY JUST BRIEFLY GO OVER THE EQUATION WITHOUT EXPLAINING
@s8x.Ай бұрын
Thank you for this video, this has helped me a lot
@ettahiriimane44805 ай бұрын
Good job, and thanks for your effort. I hope we see more videos in the future.
@amaramouri9137 Жыл бұрын
- Thanks! I'll do my best
  @raneisenberg2155 Жыл бұрын
Thank you, I liked your intuition, amazing effort.
@amirnasser77687 ай бұрын
- Also please correct me if I am wrong but I think at minute 17 you should not use the same theta notation for both "g_theta()" and "p_theta()" since you assumed that you do not know the theta parameters (the main cause of differentiation problem) for "p()" but you know the parameters for "g()".
  @amirnasser77687 ай бұрын
Thanks, this is a good explanation of the black point of VAE
@slimanelarabi81474 ай бұрын
Holy God. What a great teacher..
@liuzq710 ай бұрын
Beautifully said. Love how you laid out things, both the architecture and math. Thanks a million.
@Gus-AI-World Жыл бұрын
- Glad you enjoyed it!
  @raneisenberg2155 Жыл бұрын
Thank you so much! Please continue with more videos on ML.
@salahaldeen1751 Жыл бұрын
- Will do :) let me know if you have a specific topic in mind.
  @raneisenberg2155 Жыл бұрын
I have a small question about the video, that slightly bothers me. What this normal distribution we are sampling from consists of? If it's distribution of latent vectors, how do we collect them during training?
@user-td8vz8cn1h2 ай бұрын
Your explanation is brilliant! We need more thinks like this. Thank you!
@HuuPhucTran-jt4rk Жыл бұрын
- Thank you very much for the positive feedback!
  @raneisenberg2155 Жыл бұрын
Thank you so much for your video! It definitely saved my life :)
@jinyunghong Жыл бұрын
- You are most welcome :)
  @raneisenberg2155 Жыл бұрын
It is cool although I don't really understand the second half. 😅
@wilsonlwtan3975Ай бұрын
Thanks for the vid 👋 Actually lost the point in the middle of the math explanation, but that's prob because I'm not that familiar with VAEs and don't know some skipped tricks 😁 I guess for the field guys it's a bit more clear :)
@my_master55 Жыл бұрын
- Thank you very much for the positive feedback 😊. Yes, the math part is difficult to understand and took me a few tries until I eventually figured it out. Feel free to ask any question about unclear aspects and I will be happy to answer here in the comments section.
  @raneisenberg2155 Жыл бұрын
16:27 It's unclear (for me) (in context of gradient operator and expectation) why f_theta(z) can't be differentiated and WHY replacement of f_theta to g_theta(eps, x) allows to move gradient op inside of expectation and "make something differentiable" (from math point of view) p.s in practice we train MSE and KL divergence between two gaussians (q(z:x):p(z)) where p_mean = 0 and p_sigma = 1 and it allows us to "train" mean and var vectors in VAE
@tempdeltavalue Жыл бұрын
- Thank you for the feedback :) I will try to address both items: 1. The replacement makes the function (or the neural network) deterministic and thus differentiable and smooth. Looking at the definition of the derivative can help understand this better: lim h->inf ( f(x+h) - f(x) / h ) where a slight change in x produces a small change in the derivative of f(x), makes the function "continuously differentiable". This is the case for the g function we defined in the video: a slight change in epsilon produces a slightly different z. On the other, i.i.d sampling does not have any relation for two subsequent samples, by definition, so the derivative is not smooth enough for the model to actually learn. 2. Yes, I've considered adding an explanation for the VAE loss function (ELBO) but I wanted the focus of the video to be solely on the trick itself since it can be used for other things like the Gumble Softmax Distribution. I will consider making future videos both on ELBO loss and Gumble Softmax Distribution.
  @raneisenberg2155 Жыл бұрын
- @@raneisenberg2155 Thank for an answer ! ❤ Ohh, I'm just missed that we make random sample.. my confusion was at 15:49 you have E_p_theta = "sum of terms" which are contains z(sample) and on the next slide you just remove them (by replacement z to epsilon and f -> g)
  @tempdeltavalue Жыл бұрын
- Yes, I understand your confusion. The next slide on re-parametrizes does not divide into two terms like in the "sum of terms" you described. This is because the distribution is not parametrized and so when calculating the gradient the case changes: Instead of a multiplication of two functions (p_theta(z)*f_theta(z) -- like we had in the first slide) we now only have one function and the distribution parameters are encapsulated inside of it (f_theta(g_theat(eps, x)) -- like we had in the second slide). Hope this helps :)
  @raneisenberg2155 Жыл бұрын
The derivative of the expectation is the expectation of the derivative? That's surprising to my feeble mind.
@dennisestenson78204 ай бұрын