×
all 16 comments

[–]gwern[S] 1 point2 points  (0 children)

Latest batch of StyleGAN face samples. I thought I'd get in on the 'X Does Not Exist' snowclone. :)

[–][deleted] 0 points1 point  (0 children)

Hey this is pretty neat

[–][deleted]  (1 child)

[deleted]

    [–]gwern[S] 0 points1 point  (0 children)

    I have no excuse. I increased psi for the 20-40k range, but only to 0.7.

    [–]thatweeblife 0 points1 point  (0 children)

    this amuses me lol

    [–]paintstransfer 0 points1 point  (11 children)

    So... how many gpus are u using?

    [–]gwern[S] 0 points1 point  (10 children)

    2x1080ti.

    The more relevant parameter is total time though: this is approximately 11 days on 2x1080ti, or 22 GPU-days or pi GPU-weeks. (The source code came out on the 4th and I immediately began running it, but I spent the other days training variant models from the general face model: Holo, Asuka, FFHQ, FFHQ+faces, and whole-Danbooru2018-images.)

    Note that if you start from a pretrained model, you need much less time. For example, the character-specific-face models like the Holo or Asuka-only models can be trained in just a few hours if you start from my face model.

    [–]paintstransfer 0 points1 point  (1 child)

    Wow I have not tried training a StyleGAN yet because the official documents said that it needs many 16GB GPUs.... It seems that 2x 1080ti is just OK? It is exciting!

    [–]gwern[S] 0 points1 point  (0 children)

    You only need 8 V100s if you want to train a 1024px StyleGAN for a week to full paper-level results. 512px is much faster, you get great results within the first few days (it's really amazing how much time my face StyleGAN spends just gradually refining the shoulders and things like ties), and transfer learning accelerates it even further.

    [–]FairFolk 0 points1 point  (7 children)

    How many training images would you expect to be needed for a (1-3) person-specific model based on the model proposed in the paper?

    [–]gwern[S] 0 points1 point  (6 children)

    Do you mean training from scratch or finetuning either my anime face model or Nvidia's FFHQ face model?

    From scratch is unclear, but Nvidia FFHQ uses 70k and I found 5k with ProGAN to lead to overfitting/memorization and thus inadequate.

    For finetuning my anime face model, 500 works, but 5000 is much better. You can use aggressive data augmentation, of course. Roadrunner01 on Twitter tried with 50 but the results were pretty meh: very recognizable, but also very crummy in terms of artifacting and distortions.

    [–]FairFolk 0 points1 point  (5 children)

    I meant finetuning the FFHQ one, thanks.

    [–]gwern[S] 0 points1 point  (4 children)

    I haven't done any finetuning with FFHQ on real people, but from experimenting with mergers of a 512px rescaling of FFHQ and my anime faces, the overall behavior of it+StyleGAN seems similar to the anime faces, so I would guess that the orders of magnitude are very similar: 50 per person finetuned will give you recognizable but crummy results, 500 will be OK, and 5000+ will be fantastic. You could experiment with celebrities or something to see how quality scales with n. (Does that CelebA/B include metadata indicating which celebrity is in each photo? That'd be a starting point. Otherwise, you could do something like, I dunno, scrape Google Images for Taylor Swift photos & run a face-cropping script.)

    [–]FairFolk 0 points1 point  (3 children)

    The plan was to use images of some well known politicians. Do you know how well it works to use multiple people for finetuning (maybe 2-5 different ones)?
    Also, do you just train the existing model with the new images, or do you add the images to the previous dataset and continue training with everything? (I'd guess it's the first.)

    On an unrelated note, do you have any example images of your FFHQ/anime merger? It sounds like it would be somewhere between ridiculous and horrifying.

    [–]gwern[S] 0 points1 point  (2 children)

    I don't see any inherent reason why 3 people with 5k images each wouldn't work about as well as 1 person with just 5k images, eg. After all, FFHQ StyleGAN is clearly capable of handling 70k different people!

    Also, do you just train the existing model with the new images, or do you add the images to the previous dataset and continue training with everything? (I'd guess it's the first.)

    The former. (In fact, in the case of my Asuka/Holo finetuning, they're already in the combined dataset, so it's essentially equivalent to 'deleting everything which is not Asuka [or Holo]' to concentrate the model on that one character.)

    It sounds like it would be somewhere between ridiculous and horrifying.

    Sure is: https://twitter.com/gwern/status/1096483918960881665

    [–]FairFolk 0 points1 point  (1 child)

    Sorry to bother you for every little thing, but while we're at it:
    What did you use to crop and align the faces?

    [–]gwern[S] 0 points1 point  (0 children)

    Nagadomi's anime-face cropper. No alignment was done at all.