×
all 65 comments

[–]klendool 14 points15 points  (11 children)

This might be a dumb question, but why are the faces morphing into one another?

[–]otsukarekun 36 points37 points  (8 children)

The generator part of a GAN produces an image from a random vector. The reason for the random vector (or latent vector) is so that there is variation in the output. The reason for the morphing is because the author is walking through the latent vector in small increments. Because each increment is only slightly different from the previous one, the output of each increment is only slightly different.

[–]vobject 4 points5 points  (1 child)

So, is the amount of possible outputs a GAN can generate determined by the size of the latent vector? How large are the latent vectors for these types of image generation in general?

[–]mreeman 15 points16 points  (0 children)

Usually it's a 128 float vector, so a pretty massive latent space. These morphing images are picked by interpolating between random points in the space usually.

[–]albion_m 1 point2 points  (4 children)

Is it possible to reverse this, i.e. use a GAN for latent space embedding of new images or would you have to train a separate encoder on random vector + generated image pairs?

[–]otsukarekun 9 points10 points  (0 children)

If you use the original GAN, the generator's input is only the latent vector. So, if you want to create an embedding for images, you might as well use an autoencoder-based model. Autoencoders by definition embed data into latent space. However, there have been many newer GAN models which do use autoencoders as the generator. There are also models like CycleGAN which use encoders and decoders as part of their structure.

[–]Ending_Credits 2 points3 points  (1 child)

One method you can use is to try and minimize the L2 loss between the output and the target image. Don't know how well this does in practice though.

[–]fortunateevents 0 points1 point  (0 children)

Here is an example of doing something similar in practice, I think. It's not a simple L2 minimization though, as far as I can tell

https://twitter.com/quasimondo/status/1065893396475199488

[–]AnvaMiba 1 point2 points  (0 children)

There are several possibilities:

  • Do gradient descent (or L-BFGS or something) in the latent space to minimize the L2 or whatever distance to the target image. Doesn't require you to train anything but it can be slow for each image and might fail due to local optima.

  • Train an image->latent reconstructor. Probably the best solution. You can either train it after you have a pre-trained generator, or train them jointly as an autoencoder.

  • Design the generator of the GAN to be reversible. This imposes severe architecture restrictions, notably the latent space has to have the same dimension as the image.

[–]CaptainBlob 12 points13 points  (0 children)

So many waifus

[–]Ending_Credits 16 points17 points  (8 children)

Currently still training mine, but it seems like the success of the algorithm is highly dependent on how much the face occupies the image. If you 'zoom out' too far you start getting weird artifacting (this may correct itself, but I'm not going to wait a week to find out). In my case I have 768x768 images extracted from danbooru. If you resize it to 512 it performs poorly, but if you just crop the center it does really well. You can actually get good results with the cropped images withjust pro-GAN too.

I'll post some examples when I get back home this evening.

[–]Ending_Credits 13 points14 points  (5 children)

As promised: https://imgur.com/a/hmksTek

As you can see, ProGAN (i.e. progressive growing of gans) does a decent job, but StyleGAN is slightly better quality. Also, both fail on the 'zoomed out' images.

[–]Leglipa 1 point2 points  (2 children)

How come the styleGAN produces worse results with more iterations?

[–]Ending_Credits 8 points9 points  (0 children)

Look closely, different input images. First one is zoomed in.

[–]auxiliary-character 0 points1 point  (0 children)

I feel like some of those really fucked up ones would make for good profile pics.

[–]DANDANTHEDANDAN 0 points1 point  (0 children)

It's like someone put a bad Snapchat filter over everyone

[–]gwern 3 points4 points  (0 children)

If you 'zoom out' too far you start getting weird artifacting (this may correct itself, but I'm not going to wait a week to find out).

I think this is probably just a lack of compute on your part. After all, their 1024px Flicker headshots take over a week on 8 V100s. Although it would be interesting if the results really were that dramatically different since it seems like such a small expansion of the image domain.

[–]ImJustAsBad 0 points1 point  (0 children)

what kind of PC do you have to run this? What are your specs?

[–]vwvwvvwwvvvwvwwv 6 points7 points  (0 children)

I also stumbled upon this thread which trained on some more varied portrait styles.

[–]laser_velociraptor 11 points12 points  (2 children)

I once made a project based on this. I produced Pokémon with GANs. You can read more here: https://medium.com/infosimples/creating-pokemon-with-artificial-intelligence-d080fa89835b

[–]veqtor 4 points5 points  (1 child)

Stylegan was released a week ago... The code that is.. I think paper was in December

[–]laser_velociraptor 2 points3 points  (0 children)

This same guy had another repo called Anime GAN

[–]veqtor 2 points3 points  (1 child)

What kind of setup is required to train one of these?

[–]Ending_Credits 6 points7 points  (0 children)

A few days on a 1080

[–]Ending_Credits 1 point2 points  (3 children)

Fine tuning on a small dataset (in this case 500 images) seems to work really well. Retrained my model for an extra 'tick' on Zuihou and got these results

Samples

https://i.imgur.com/lhKbMky.jpg

Some morphin:

https://i.imgur.com/rhedp4l.mp4

More morphin:

https://i.imgur.com/sCn11bE.mp4

[–]gwern 2 points3 points  (2 children)

I'm impressed just 500 images works that well. By 500, you mean 500 originals? If so, perhaps you could use aggressive data augmentation to improve the finetuning. (Or the final face StyleGAN model.)

I have a ghetto data augmentation script using ImageMagick & parallel which appears to work well:

dataAugment () {
    image="$@"
    target=$(basename "$@" | cut -c 1-200) # avoid issues with filenames so long that they can't be appended to
    suffix="png"
    # nice convert -flop                          "$image" "$target".flipped."$suffix"
    nice convert -background black -deskew 50                     "$image" "$target".deskew."$suffix"
    nice convert -fill red -colorize 3%        "$image" "$target".red."$suffix"
    nice convert -fill orange -colorize 3%     "$image" "$target".orange."$suffix"
    nice convert -fill yellow -colorize 3%     "$image" "$target".yellow."$suffix"
    nice convert -fill green -colorize 3%      "$image" "$target".green."$suffix"
    nice convert -fill blue -colorize 3%       "$image" "$target".blue."$suffix"
    # nice convert -fill purple -colorize 3%     "$image" "$target".purple."$suffix"
    nice convert -adaptive-sharpen 4x2          "$image" "$target".sharpen."$suffix"
    nice convert -brightness-contrast 10        "$image" "$target".brighter."$suffix"
    # nice convert -brightness-contrast -10       "$image" "$target".darker."$suffix"
    # nice convert -brightness-contrast -10x10    "$image" "$target".darkerlesscontrast."$suffix"
    nice convert +level 3%                     "$image" "$target".contraster."$suffix"
    # nice convert -level 3%\!                   "$image" "$target".lesscontrast."$suffix"
  }
export -f dataAugment
find . type f | parallel dataAugment

[–]Ending_Credits 2 points3 points  (1 child)

No data augmentation beyond the standard mirror used during training. My dataset is split into folders by character (500 images from each of the top 500 character tags, although in practice it tends to be 200-400 due to face detection failure). I just grab one or more of those folders, remake the dataset, and then train for one more tick (60k iterations).

More samples:

Saberfaces (about 4000 mages)

https://i.imgur.com/Q65jElX.mp4

Louise Francoise (just 350 images)

https://i.imgur.com/ouGdWbu.mp4

[–]gwern 0 points1 point  (0 children)

I'm not surprised those work (or that you got so many Sabers out). If Asuka & Holo work, why not them? Data augmentation would probably allow better results from training longer before you get artifacts from overfitting.

[–]Artgor 0 points1 point  (0 children)

Amazing! I'm really exited to see such progress with GANs in the recent years

[–]columbus8myhw 0 points1 point  (0 children)

StyleGAN is this thing, right? https://youtu.be/kSLJriaOumA

[–]FireBendingKorra 0 points1 point  (0 children)

Is this using eigenfaces to make a basis or something similar?

[–]sketchfag 0 points1 point  (0 children)

thank you AI, very cool

[–]aratnagrid 0 points1 point  (0 children)

you are now my favorite AI inventor right now. love you
i mean you work, don't get me wrong

[–]TheCow01 0 points1 point  (0 children)

Neat

[–]Strayo 0 points1 point  (3 children)

How do I train it? is there any helpful video tutorials for beginners?

[–]gwern 1 point2 points  (2 children)

I've been working on a tutorial here: https://gwern.net/Faces EDIT: tutorial's finished.

[–]Strayo 0 points1 point  (0 children)

Thanks, will check out when I get back from baseball

[–]guster7458 0 points1 point  (0 children)

Hey gwern, awesome tutorial. Just a question new to gans. Say I did want to do transfer learning, where would I place the pickle file eg. stylegan-ffhq-1024x1024 to resume. Do I have to rename it to that of my new dataset and which dir does it get placed in?. I know that its mentioned to edit the training_loop and resume_kimg but just was unsure where to place the pickle. TIA

[–]kvgamecube 0 points1 point  (2 children)

Is there a link to the anime pre-trained models somewhere? I only see them for cars, beds, cats, and people.

[–]gwern 0 points1 point  (1 child)

They're scatted throughout the Twitter threads, but I've put up download links for all 3 of mine (the all-anime-faces, and the 2 finetuned ones for Asuka & Holo faces).

[–]TotesMessenger 0 points1 point  (0 children)

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

 If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads. (Info / Contact)

[–][deleted] 0 points1 point  (18 children)

This is great and all, but can I actually 'use' this in any way, shape, or form?. I have yet to see any Open Source project aside from Style2paints or waifu2x that I can actually use as an end-user.

[–]gwern 0 points1 point  (17 children)

What do you mean by 'use'?

[–][deleted] 0 points1 point  (16 children)

I mean something with a GUI so normies like me can give the technology a try. I want to be able to press a few buttons and make my very own character, or at least something of the sort.

You can make Danbooru2025 Platinum Edition until the cows come home, but it's not going to matter if I need a Phd in computer science to enjoy any of it.

[–]gwern 0 points1 point  (15 children)

Hm... Well, if you just want to run it without installing stuff yourself, there's a Google Colab notebook I was using: https://colab.research.google.com/gist/kikko/d48c1871206fc325fa6f7372cf58db87/stylegan-experiments.ipynb

Push the 'connect' button, it'll spin up a VM with a free GPU, then click on the []s to run the code in each block, and it'll generate videos you can download etc. skylion had a version which would even run the videos inside the browser.

Web interfaces haven't been a concern of mine before, because before, none of the results were nearly good enough to be worth even thinking about how to make a web interface... (And I think there are more web devs out there than ML researchers, so I've focused on enabling the former, reasoning that if anything awesome happened worth slapping a web GUI on, web devs would come out of the woodwork, so it's more important to work on the prerequisites like large easily-available datasets and proof-of-concepts.)

[–][deleted] 0 points1 point  (13 children)

I understand, I suppose it's just a matter of time until a passionate person or group of persons get together to provide a quality GUI for all of this tech. I just dream of the day I get a "Text-to-Drawing" program, just imagine the possibilities for professional or creative workers like myself!.

[–]gwern 0 points1 point  (11 children)

I just dream of the day I get a "Text-to-Drawing" program, just imagine the possibilities for professional or creative workers like myself!.

If you want some lulz, there is a good text-to-drawing web interface...

[–][deleted] 0 points1 point  (10 children)

Hah, that's pretty interesting. I'm thinking of something more 'high quality' though.

I want to be able to describe a meadow or character, and get as accurate of a result as possible and in any art style I want. If I want my picture to look like a Van Gogh, I just type it. Or even the possibility of mixing art styles together to create a more unique result. The possibilities really are endless, if it's put together correctly.

[–]gwern 0 points1 point  (4 children)

As I always say, my ultimate ambition with the Danbooru corpus is to create a modern StackGAN: a conditional GAN, conditioned on the text tags from Danbooru describing each image. So then you simply type in the tags which describe the image and it generates dozens of images satisfying the description, and they can be randomized to appear in different styles, or you can control the style by changing tags like the artist tag to imitate a specific artist's style etc.

I originally thought this was necessary simply to make the GAN work at all on complex realistic images, but StyleGAN is so good that I no longer think the tags are necessary for learning to generate anime images (although it should still make training much faster). Now it's more about control: it's not easy to generate a specific image in StyleGAN rather than random samples. You can see the trouble people have been having with the reverse encoder for StyleGAN.

[–][deleted] 0 points1 point  (1 child)

Is this regarding Nvidia's new open source Machine Learning tech?. If so that sounds pretty cool!.

I just hope a "Text-to-Drawing" program is where it ends up in a few years or so. The level of freedom this would provide people like myself, who can barely even draw a stick figure and don't have the free-time to spend learning years of material, is nothing short of astounding. Creative work is just a start, rapid-prototyping is another field that would benefit immensely from such a program.

I guess we'll see.

[–]gwern 0 points1 point  (0 children)

Is this regarding Nvidia's new open source Machine Learning tech?. If so that sounds pretty cool!.

Yes, that's what I've been using ever since they released the source code back on the 4th or so. (Progress has been fast.)

[–]Ending_Credits 0 points1 point  (1 child)

I'd quite like to try few-shot generation based on samples with StyleGAN. Essentially you condition the generator (and discriminator) on an input set of images that it should try and match the distribution of.

I tried it with DCGAN here https://github.com/EndingCredits/Set-CGAN

[–]gwern 0 points1 point  (0 children)

I thought FIGR was interesting. A meta-trained StyleGAN would probably be even better at being finetuned to single characters. Wish I could make an attempt at programming one. I know now that face StyleGAN can get decent results down to n=500, but could a meta-trained StyleGAN drop that down to 50 or 5?

[–]gwern 0 points1 point  (4 children)

You might like Nvidia's new "GauGAN" better than StyleGAN. It doesn't do style transfer but of course you can just apply another style transfer NN to it.

[–][deleted] 0 points1 point  (3 children)

Incredible!. Are there any plans to make it usable for the general public?. It's not 'exactly' what I had in mind, but a great leap in that sort of direction.

Thanks for bringing this to my attention.

[–]gwern 0 points1 point  (2 children)

I'm not sure. The interface they demo looks very usable for regular programmers but they may not release that GUI (they didn't release what they used in their StyleGAN video, for example). There's a placeholder repo for GauGAN but not yet any actual source code. It could be like pix2pix, just another waypoint to an eventual commercial product.

[–]mhdempsey 0 points1 point  (0 children)

I think www.runwayapp.ai is what you're looking for :)

[–]apaidXian 0 points1 point  (0 children)

I think you should make a public Slack Channel, so people can discuss codes and debug.

[–]joshoctober16 0 points1 point  (0 children)

god i wish i could some how make my own its not real generator , (tornado,monster,landscape for example) is there a easy way to make a easy version like this?

[–]ExpertSuccotash3 0 points1 point  (0 children)

I have some questions about Stylegan

  1. If I just using the pretrained model to generate images are they unique everytime or is it because the dataset is
    pretrained that somewhere in the world someone could possible get the same image?

  2. If I use the pretrained model, can I continue training it with my own dataset?

  3. When creating your own datasets eg. turning them into tfrecords. Do I have a folder for every resolution eg. 4x4,

8x8, > 1024x1024 and run " python dataset_tool.py create_from_images datasets/custom-dataset
~/custom-images" seem to get errors when I run this. Not to sure how the folder structure

is suppose to be setup when running this. Cant give exact error as not near my DL computer.

TIA

ES3