[P] StyleGAN Encoder - from real images to latent representation

eyaler · 2019-02-17T09:06:29+00:00

You are awesome. I created some Colabs for your code

Get a smiling video from:

uploaded files - https://eyalgruss.com/smile

camera snapshot - https://eyalgruss.com/smiley

gohu_cd · 2019-02-16T20:16:04+00:00

Is what you provide a way to: for an image A, find a latent code such that the image B associated to the latent code is most resembling to A ?

Because I tested it, and it works so well that I'm skeptical.

JackDT · 2019-02-14T02:30:28+00:00

So you trained a different network to find single features like age, smiling, gender, etc in Stylegan trained face model, and that basically turns Stylegan into magic Photoshop?

This is SUPER cool!

_1427_ · 2019-02-13T14:41:27+00:00

Thank you, this shows me a new way to retrieve latent representation for a real image. Is this your own method or is it already published somewhere? Do you know if there are any other methods for getting the latent representation of a given image?

_1427_ · 2019-02-13T14:53:13+00:00

How do you find the "smiling direction"?

___mlm___ · 2019-02-16T12:38:39+00:00

BTW I made several GIFs with transformations

https://gph.is/g/46g879E

https://gph.is/g/aXmx6xZ

https://gph.is/g/ZWM3nLE

Current status: Tomorrow I'm going to push some new code which improves quality of observed latent representation (I've added some tricky regularization) which provides more stable transformations.

NewFolgers · 2019-02-13T18:03:50+00:00

I just looked around to try to confirm the dimensions/size of the latent vector. Is it basically 512 floating-point values? (and specifically, it's 512 FP32 values?) And that latent vector is the only input to the generator model that's specific to the image you generate?

wookie_44 · 2019-02-13T19:29:14+00:00

Is this GAN applicable only to images?

TotesMessenger · 2019-02-13T20:43:29+00:00

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

[/r/u_pikachuisop] [P] StyleGAN Encoder - from real images to latent representation

^{If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads.} ^(Info ^/ ^Contact)

[deleted] · 2019-02-13T23:17:06+00:00

is that the Neurosky EEG chip in the picture?

zergling103 · 2019-02-14T06:19:07+00:00

How do you run https://github.com/Puzer/stylegan/blob/master/Play_with_latent_directions.ipynb to play with it?

I managed to get it running in https://colab.research.google.com/drive/1Mc8lZ7De-HjDVHAZFVENEl_3nRfYXgq6 but I get an error: ModuleNotFoundError: No module named 'dnnlib'

What code would I need to write in a new cell above to get the necessary stuff installed?

sovsem_ohuel · 2019-02-14T09:02:13+00:00

Looks nice! Btw, I have a couple of questions:

What do you use as a latent code? I see that it has shape (18, 512) which is quite strange.
You say, that you take interpolation direction from logreg coefficients. So you just take positive weights, multiply them on coef and add to your latent vector?
Did you try to just train an encoder?

sigh_ence · 2019-02-14T17:35:38+00:00

Amazing stuff. May I ask why you use the VGG16 representation, rather than optimising directly for pixel alignment? As in: optimise mse(R_pixels, G_pixels)? Thanks!

theredknight · 2019-02-15T11:17:21+00:00

What hardware did you use to do this? Did you have the 8 recommended Tesla V100s sitting around? Also what were your training / rendering times?

phelogges · 2019-02-20T02:42:18+00:00

Why there is a slice to 8 in function move_and_show of Play_with_latent_directions.ipynb? Do other slices work even the whole feature?

kreyio3i · 2019-02-20T04:04:42+00:00

How do I use this for style transfer?

danielhanley · 2019-03-04T05:53:08+00:00

Inspired by this, I trained a model (a slightly modified resnet50) to infer high-scale latent space features from a portrait photo, training the model on thousands of universally unique image-dlatent pairs. This approach may also work on the mid and low scale features as well, but I haven't tested it yet. It doesn't yield the same detail as your awesome input optimization trick, but the model outputs vectors that land safely in the dense parts of the latent space, making interpolations more stable. It performs very well for me in transferring face position from a video in real-time. The detection and alignment bit is actually the performance bottleneck that I'm working on now. Here's a video: https://twitter.com/calamardh/status/1102441840752713729

Maybe this approach could be used alongside input optimization for faster results.

biela7x · 2019-04-05T18:08:58+00:00

Nice work!

I have a question: why in the loss function you divide the MSE by a constant number 82890.0?
Where that number came from?

Thank you!

altanhaider · 2019-04-25T06:19:46+00:00

Can I perform image morphing with this repo? if yes which code to run? I want to make a video in which the image morphs into next, but it's a non-face image

plz help

manubider · 2019-04-27T21:23:28+00:00

hey, awesome work! i have a question, when i try to use the latent representation .npy file in another instance of stylegan i get other face, do you know why that could be??

jxcode · 2019-05-28T06:53:28+00:00

Any suggestion on optimizer, learning rate and iterations required?

kooro1 · 2019-02-13T14:04:40+00:00

You should show real images, not reconstructed images.

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS