Encoder Decoder Network - Computerphile
Deep Learning continued - the Encoder-Decoder network - Dr Mike Pound. For a background on CNNs it's worth watching this first: • CNN: Convolutional Neu...
Google Deep Dream • Deep Dream (Google) - ...
Password Cracking: • Password Cracking - Co...
Deep Learning & CNNs: • Deep Learning - Comput...
3D from Selfie: • Selfie to 3D Model - C...
Papers included in this Computerphile:
bit.ly/C_FaceAlignment
bit.ly/C_Landmarks
bit.ly/C_AaronLongForm
FCNs, and in a sense encoder decoder networks were first presented here: bit.ly/C_JohnLong
/ computerphile
/ computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: bit.ly/nottscomputer
Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com
I would love a Mike Pound playlist. Or at least I would have if I hadn't already watched all the videos with him.
You can feel the passion when he speaks until nearly out of breath
Great animation work on this episode Sean.
Thanks :)
I dont mean to be off topic but does anyone know a method to get back into an instagram account? I somehow lost the account password. I appreciate any assistance you can give me
@Jett Dexter I really appreciate your reply. I found the site thru google and I'm trying it out now. Seems to take quite some time so I will get back to you later when my account password hopefully is recovered.
@Jett Dexter it worked and I now got access to my account again. I'm so happy! Thanks so much, you saved my account !
@Enoch Patrick Happy to help xD
Im writing a proposal reviewing CodeT5 neural architecture and am so confused about encoder-decoder technique mentioned there. Super stoked to see a Computerphile video on it!
lol, dat face at 5:08 when he wanted to mention the use for military reasons :D
This guy is the best
Love this channel. Every concept is so intuitively explained.
Another awesome lecture by Dr Mike Pound :D. Dang I wish you were my ML/AI lecturer back when I was learning this stuff.
great talk. if Mike could discuss the model interpretability in deep learning models for the next one, that would make my day!
Whoa! What an amazing explanation to such complex topic! Loved the articulation!!
I love the increasing collection of twisty puzzles on the shelf in the background
This is the best explanation about U-net I've ever seen.
You guys remembered to make this video! Nice!
Excellent and brief description ever!
you are the best, I can't find this content out of this awesome channel
It seems like a way to distill an image of identifiable objects in their most basic forms and then using that information to once again layer the identified objects onto less compressed versions of the image. An analog reverse to this might be to have a completed puzzle of an image where you'd identify a few key objects and tag them on a few pieces, then you'd take the puzzle apart and hold on to the key objects and place them in their respective locations on the table. From there, you can start to place the surrounding pieces around each key piece until it's once again understandable.
yeah, that's pretty much summing it the other use of encoder-decoder network is in generating synthetic image (by learning the representation in the middle, given by the encoder)
And then feeding that into a GAN 😈
Very serious key pieces would be the borders and especially the corners. And the sky is blue, so blue pieces would usually sit at the top of the puzzle.
The GAN relation at the end was pretty helpful
Downsampling by choosing the best of them? The max of them? No. First, the image must be low-pass filtered then simply downsample by discarding pixels. But then I see that you really do want to take the max when downsampling. Very interesting. Your GAN analogy at the end is excellent: the interior is like a generator and the higher resolution layers are like a discriminator.
So basically the down up down sampling is doing what two separate systems working collaboratively could do - one to physically locate the item of interest and another to work on it? I'm working on speech recognition from 'images' generated using fast fourier - part of the solution involves locating the part of the image that contains the relevant information before inputting that into the recognition neural net - why would the procedure outlined in the video outperform two independent processes?
That’s an awesome explanation. Thanks!
Please Computherphile, can we have a playlist for all Dr. Mike Pound video's? :)
Very interesting!
Teaching is an art. Thank you so much for this video!
great channel
Plant science sounds rad! Also, two Mike Pound videos in one week, I'd rather this type of pound than to win the national lottery!
Great work. Keep going.
thank you for such great content
Great video. You remind me so much of James Acaster.
GIVE ME THE KNOWLEDGE DOCTOR POUND
By the way, the reason data is brought from encoder to decoder is because of Unpooling which is the (partial) reverse of Pooling. So, pooling takes the maximum pixel in its window. So, in normal convnets it's fine, we don't really need to know which pixel exactly got transferred to next layer. However when unpooling in decoder, we need to know where that pixel was in the pooling "window" to more accurately upsample. To accomplish this, we get the index of which pixel got pooled and pass it to Unpooling layer.
Oscar Mulin no, the one shown here works differently, read Jonathan Long's paper about Fully Convolutional Networks
I think you forgot some colour correction
This is fascinating
Holly bannanas... now that whole stacked restricted Boltzmann machine stuff makes sense to me! In the slide deck from my prof there was always this double pyramid structure depicted and i was like WHAAAT? You might literally have saved exam points here!
Well this make more sense to me, outline the raw sketch before you look for objects, like room, windows, edges of bookshelf desk, drawers and so on. Mike is the center object that shade the room view. And then break it down from there. Mike is the Blob obscuring the view ;), the neural network is not quite sure what he is but it will find out.
While expanding the image from smaller to larger size....how does we map the image?
It is essentially the inverse of the encoder layer. Say for images, the encoder layer we have convolutional 2D layers and max pool 2d layers. In the decoder layer they are replaced with deconvolutional 2D layers (which are essentially transpose of conv2d) while for max pooling, we can just copy over the intensity of the pixel to the pixels in the next layer for which the max pooling would be responsible for, if it were facing the other direction.
brilliant idea
I usually just wipe the server with a cloth or something. What difference at this point does it make?
How can I make this same animation myself for a similar video? The ones at 2:05?
1:16 A Max Pool layer cannot move the representation of a dog from the left side of the image to the right. Max pool layers only gather adjacent pixels.
helpful thank you!
u are the best !
Mike Pound: Teaching noobs about computers, when he's not teaching computers about plants. What an interesting person.
next video about GAN please !
Oh, wheat! Lots of wheat... fields of wheat... a tremendous amount of wheat!
fburton8 Perfect for running through.
That's what we eat. Wheat!
When talking about segmentation, I was hoping he'd mention YOLO (You Only Look Once). It's such an interesting bit of technology, which performs semantic segmentation on each frame of a video in near-realtime, processing each frame only once, hence its name. And it performs quite well for what it's doing! You can find videos of it on KZhead.
Dr. Pound looks like the child of Zach Woods and Elijah Wood. "Dr. Mike Pounds Wood"
I always notice the cubes in the background.
Is this the same thing as a UNet?
I did not understand anything, but it's very interesting
Where can i watch previous video?
+1
I think this video was heavily manipulated, it is almost like a green screen is being used.
levmatta Yes - on the far right through the window is a white plane with his reflection. Visible intermittently.
Him: "this is only one dimension I've drawn here but it's actually two dimensions" Me: "okay I give up!"
NotMarkKnopfler lol it's not that hard. The width of the tip of the marker is the width itself, despite him only drawing a "single" line with seemingly no intended width.
It's actually 4 dimensions because you also have the colour channels and the data batch
He just drew it 1d because it is easier to draw. Just imagine the 2d thing that corresponds to the 1d thing.
He just drew it 1d because it is easier to draw. Just imagine the 2d thing that corresponds to the 1d thing.
Oskar Keurulainen not really, because he is only representing the spatial dimensions as he is talking about spatial downsizing.
Do a video on ML solving captchas?
so thats basically a u-net?
color correction
with color correction, aside from semantic segmentation, you'd also want gradient information to avoid that aliasing when you apply some filter. In this case, it's probably easier to use traditional image processing techniques as gradient and color information is available before you build that convolution pyramid.
I think he is referring to the unusual color calibration of the video.
can you add subtittles?
This is such beautiful, interesting and useful engineering but I cannot for one second stop thinking of the millions of ways it can be wrongfully used. It's a shame really.
Been watching too much dystopian sci-fi?
sci-fi? You're funny. Actually a couple of weeks back the BBC did a program about how police in the US are using computer software (I assume neural networks) to predict crimes. Search for "BBC The Enquiry: can computers predict crime?"
Why is that so bad? That can lead to a decrease in crime. As long as the agencies are bound by law to keep that information to themselves I don't see a problem with it.
I Wish He Could Be My Professor. If so, I will Sleep at his Room's couch and Learn Great Stuff.
Those making the move from analog to ip video, specifically in regulated industries, would benefit using this video, to explain to their cheap ass check writers, why bubbke gun and duct tape is not a sustainable solution.
ok
143rd!!!
49 views, wow.
3rd comment XD first 7 min