×
all 49 comments

[–]eyaler 3 points4 points  (0 children)

You are awesome. I created some Colabs for your code

Get a smiling video from:

uploaded files - https://eyalgruss.com/smile

camera snapshot - https://eyalgruss.com/smiley

[–]gohu_cd 2 points3 points  (7 children)

Is what you provide a way to: for an image A, find a latent code such that the image B associated to the latent code is most resembling to A ?

Because I tested it, and it works so well that I'm skeptical.

[–]___mlm___[S] 5 points6 points  (6 children)

R - a real image

Gen(latent) - a generated image from some latent vector using pre-trained generator

VGG16 - a pre-trained model for perceptual loss (9th layer in my implementation, but 5 also can be used)

R_features = VGG16(R)

G_features = VGG16(Gen(latent))

We want to minimize loss: mse(R_features, G_features), but changing only latent variable. We totally freeze generator and perceptual model weights during optimization.

[–]gohu_cd 7 points8 points  (5 children)

Wow... Did you see how well it works? I'm baffled. I've tried with many faces and it always works. So does this means that the generator has learned to generate every face there is, only using the FF-HQ dataset ? Wtf

[–]gebrial 1 point2 points  (3 children)

So I could use this to find the latent vector for my face and a friend's face and see a smooth transformation. Sounds cool.

[–]dodocatheon 2 points3 points  (2 children)

Yep. https://cdn.discordapp.com/attachments/175942948914069504/546381231911075841/Z.png is a transformation of my wife's face into mine.

[–]gebrial 1 point2 points  (0 children)

Wow very impressive

[–]natureboy-sickflair 0 points1 point  (0 children)

how did you get this to work? I'm having difficulties

[–]ryanbuck_ 0 points1 point  (0 children)

lol I’m not sure of the technical details here but your expression of being baffled has me giggling.

[–]JackDT 2 points3 points  (0 children)

So you trained a different network to find single features like age, smiling, gender, etc in Stylegan trained face model, and that basically turns Stylegan into magic Photoshop?

This is SUPER cool!

[–]_1427_ 1 point2 points  (4 children)

Thank you, this shows me a new way to retrieve latent representation for a real image. Is this your own method or is it already published somewhere? Do you know if there are any other methods for getting the latent representation of a given image?

[–]___mlm___[S] 1 point2 points  (3 children)

So the closest idea is described in "Optimizing the Latent Space of Generative Networks" https://arxiv.org/abs/1707.05776 I'll append it into rep. description a bit later.

I also made a toy example of it some time ago (without perceptual loss) https://colab.research.google.com/drive/1VnlboAKDTrXPh32QiVHii7oUY0GBx7iC

[–]shortscience_dot_org 0 points1 point  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Optimizing the Latent Space of Generative Networks

Summary by Min Lin

An algorithm named GLO is proposed in this paper. The objective function of GLO:

$$\min{\theta}\frac{1}{N}\sum{i=1}N[\min_{z_i}l(g\theta(z_i),x_i)]$$

This idea dates back to [Dictionary Learning](). ![]()

It can be viewed as a nonlinear version of the dictionary learning by

  1. replace the dictionary $D$ with the function $g{\theta}$.

  2. replace $r$ with $z$.

  3. use $l_2$ loss function.

Although in this way, the generator could be learned without the hassles caused by GAN objective, ... [view more]

[–]_1427_ 0 points1 point  (0 children)

Thanks.

[–]namangoyal2707 0 points1 point  (0 children)

There’s also Generative Feature Matching Network (GFMN) from ICLR 2019. Although they used VGG features to also train their model. There are other papers also that used similar idea for inference or training. Currently I just remember the linked one but you can look into their cited papers and their open reviews.

[–]_1427_ 1 point2 points  (6 children)

How do you find the "smiling direction"?

[–]___mlm___[S] 11 points12 points  (5 children)

1) Using pre-trained generator I sampled some number of fake images from a random noise

2) Then I manually classify them into smiling \ not smiling; Young \ Not young; Male \ Female

3) A linear model was trained to classify those labels using latent space as features

4) Weights of the trained linear model are representing the direction in the features space


In general it's possible to transform some existing datasent with some facial attributes into latent space, so in this cases manual labeling isn't needed, but I still have to check it.

[–]invertedpassion 2 points3 points  (0 children)

This is a really clever way to find direction in the latent space. Good stuff.

[–]_1427_ 0 points1 point  (0 children)

Thank you. I know there are works on image-to-image translation to transform a non-smiling image to a smiling one, or recently there's this paper called GANimation that shows how to learn to transform human expression. But I never thought of a way such as what you did.

[–]hanyuqn 0 points1 point  (1 child)

How many images did you generate and manually classify?

[–]___mlm___[S] 0 points1 point  (0 children)

200 images for positive and approximately the same for negative. I've also tried to use diverse as much as possible set of positive\negative images. Seems to me that StyleGAN generates more women than men for example.

[–]___mlm___[S] 1 point2 points  (0 children)

BTW I made several GIFs with transformations

https://gph.is/g/46g879E

https://gph.is/g/aXmx6xZ

https://gph.is/g/ZWM3nLE

Current status: Tomorrow I'm going to push some new code which improves quality of observed latent representation (I've added some tricky regularization) which provides more stable transformations.

[–]NewFolgers 0 points1 point  (1 child)

I just looked around to try to confirm the dimensions/size of the latent vector. Is it basically 512 floating-point values? (and specifically, it's 512 FP32 values?) And that latent vector is the only input to the generator model that's specific to the image you generate?

[–]Phylliida 1 point2 points  (0 children)

yes

[–]wookie_44 0 points1 point  (0 children)

Is this GAN applicable only to images?

[–]TotesMessenger 0 points1 point  (0 children)

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

 If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads. (Info / Contact)

[–][deleted] 0 points1 point  (0 children)

is that the Neurosky EEG chip in the picture?

[–]zergling103 0 points1 point  (2 children)

How do you run https://github.com/Puzer/stylegan/blob/master/Play_with_latent_directions.ipynb to play with it?

I managed to get it running in https://colab.research.google.com/drive/1Mc8lZ7De-HjDVHAZFVENEl_3nRfYXgq6 but I get an error: ModuleNotFoundError: No module named 'dnnlib'

What code would I need to write in a new cell above to get the necessary stuff installed?

[–]___mlm___[S] 0 points1 point  (1 child)

I suppose that you run your notebook not from cloned repository folder. You can change directory to the repository root and then re-run the notebook

Alternatively you can append to your PYTHONPATH path to the cloned repository. import sys sys.path.append("path/to/stylegan")

[–]literatim 0 points1 point  (0 children)

Can you break it down like I am brand new to this ecosystem?

Will I need the System Requirements listed in order to align + encode my own raw images?

Can I clone this repository and upload it to Jupyter or CoLab if I don't have the hardware?

Thanks for your great work by the way!

[–]sovsem_ohuel 0 points1 point  (2 children)

Looks nice! Btw, I have a couple of questions:

  1. What do you use as a latent code? I see that it has shape (18, 512) which is quite strange.
  2. You say, that you take interpolation direction from logreg coefficients. So you just take positive weights, multiply them on coef and add to your latent vector?
  3. Did you try to just train an encoder?

[–]___mlm___[S] 3 points4 points  (1 child)

1) So StyleGAN generator actually contains 2 components:

Generator:

qlatent = normally distributed noise which have shape=(512)

dlatent = mapping_network(qlatent) = shape=(18, 512)

where mapping_network - a fully connected network which transforms qlatent to dlatent

generator(mapping_network(qlatent)) = image

So during training we optimize dlatent instead of qlatent. Optimiziong of qlatent leads to bad results (I can elaborate on it). qlatent is used for features-wise transformation of convolution layers of generator https://distill.pub/2018/feature-wise-transformations/

2) dlatent + multiplier * logreg_coeff; Yes, but I use raw coefficients from logreg, so it doesn't matter are they positive or not.

3) Yes. It somehow works and we can gen relatively similar faces, but less details are saved. It still in progress.

[–]joshuacpeterson 1 point2 points  (0 children)

Can you elaborate on why you optimize dlatent as opposed to qlatent?

[–]sigh_ence 0 points1 point  (1 child)

Amazing stuff. May I ask why you use the VGG16 representation, rather than optimising directly for pixel alignment? As in: optimise mse(R_pixels, G_pixels)? Thanks!

[–]nicdahlquist 1 point2 points  (0 children)

Comparing extracted VGG features ("perceptual loss") usually produces sharper image outputs than L1 or L2 image losses (see https://arxiv.org/abs/1603.08155 for details)

[–]theredknight 0 points1 point  (2 children)

What hardware did you use to do this? Did you have the 8 recommended Tesla V100s sitting around? Also what were your training / rendering times?

[–]___mlm___[S] 0 points1 point  (1 child)

Unfortunately I have only old 1080 Ti :)

I didn't try to train StyleGAN buy the way. I'm just creating an encoder. Rendering time - 170ms to convert latent representation to Image. One minute to find latent representation from an image (but I'm trying to reduce this time)

[–]hpstrgod 0 points1 point  (0 children)

is there any way you could explain a little more detail how to use both files to get latent representations from image?

[–]phelogges 0 points1 point  (0 children)

Why there is a slice to 8 in function move_and_show of Play_with_latent_directions.ipynb? Do other slices work even the whole feature?

[–]kreyio3i 0 points1 point  (0 children)

How do I use this for style transfer?

[–]danielhanley 0 points1 point  (1 child)

Inspired by this, I trained a model (a slightly modified resnet50) to infer high-scale latent space features from a portrait photo, training the model on thousands of universally unique image-dlatent pairs. This approach may also work on the mid and low scale features as well, but I haven't tested it yet. It doesn't yield the same detail as your awesome input optimization trick, but the model outputs vectors that land safely in the dense parts of the latent space, making interpolations more stable. It performs very well for me in transferring face position from a video in real-time. The detection and alignment bit is actually the performance bottleneck that I'm working on now. Here's a video: https://twitter.com/calamardh/status/1102441840752713729

Maybe this approach could be used alongside input optimization for faster results.

[–]vedenev 0 points1 point  (0 children)

That is interesting!

Could you give the code?

Or could you explain how you did this in more details?

Did you use precalculated generated faces?

Did you predict only several first lines of dlatent matrix? How much?

[–]biela7x 0 points1 point  (0 children)

Nice work!

I have a question: why in the loss function you divide the MSE by a constant number 82890.0?
Where that number came from?

Thank you!

[–]altanhaider 0 points1 point  (0 children)

Can I perform image morphing with this repo? if yes which code to run? I want to make a video in which the image morphs into next, but it's a non-face image

plz help

[–]manubider 0 points1 point  (0 children)

hey, awesome work! i have a question, when i try to use the latent representation .npy file in another instance of stylegan i get other face, do you know why that could be??

[–]jxcode 0 points1 point  (0 children)

Any suggestion on optimizer, learning rate and iterations required?

[–]kooro1 0 points1 point  (1 child)

You should show real images, not reconstructed images.

[–]___mlm___[S] 0 points1 point  (0 children)

Original images which were used

https://gist.github.com/Puzer/b3eeda3e0e53d5462c7ac96b876362db

I agree that I have to append a script which can show reproducibility of the approach more explicitly