×
all 34 comments

[–]ituki[S] 14 points15 points  (25 children)

Technical report is available here http://make.girls.moe/technical_report.pdf

[–]gwern 14 points15 points  (15 children)

I'm surprised how well this works: all it takes is a conditional GAN? But the faces are so good and they don't have the usual 'watercolor' or blurring effect I see in all the earlier anime face GANs! (And not even using good labels, since Illustration2Vec is a bad classifier.) What makes this work so well? Nothing stands out as looking like a key ingredient (no complicated architectures like sketching out an image with an RNN and filling in regions), so have you done any ablation runs to figure out why the faces are so clean and crisp? Is it the consistent dataset, the labels, using the SRResNet residual layers rather than the usual upscaling convolutions, the DRAGAN loss, or what?

[–]MolochHASME 4 points5 points  (4 children)

It sounds like it was that the other anime gans had inconcsistent quality in their datasets.

[–]gwern 6 points7 points  (3 children)

Well, it could be. I'm not convinced it should make that much of a different. The faces in CelebA differ tremendously in hair color, skin color, lighting, glasses, jewelry etc, but that doesn't stop people from generating highly realistic samples. I also don't see why it would make the watercolor problem go away. When I was trying anime faces with WGAN, I prepared a face-only dataset much like OP did, with girls sourced from Danbooru, and did some cleanup by deleting the faces rated least-face-like by the discriminator, but it didn't get me any noticeable quality improvement. I suspect it is probably the labels and SRResNet (conditional GANs seem to usually improve a great deal over unsupervised ones, and residuals are better training & higher model capacity), but I can only speculate; this project would be a lot more informative if they can figure out where the quality improvements are coming from, exactly.

[–][deleted]  (2 children)

[deleted]

    [–]gwern 10 points11 points  (1 child)

    I wouldn't say 'too good to be true'. I mean, it's right there, in your browser. They went to considerable effort to do so. It literally could not be easier for you to generate samples to see for yourself how representative the paper samples are. The GAN really is producing those nice thumbnails. (As for memorizing the dataset & overfitting - we wish GANs could do that! I've never seen an image GAN do that. If only I had a nickel for every time I saw a GAN paper worry about memorizing datapoints...)

    [–][deleted] 1 point2 points  (0 children)

    What really surprised me is that the eyes are colored the same (shapes can be a little bit wonky) and hair color stays consistent despite the hair region being highly irregular. Surely being a conditional GAN helps?

    [–]NotAlphaGo 0 points1 point  (6 children)

    Residual layers + pixel shuffle possibly. Original DCGAN uses transposed convolution.

    [–]gwern 1 point2 points  (5 children)

    Meh. The papers looking at pixel shuffle show modest improvements compared to a standard transposed convolution, and IIRC I did in fact try pixel shuffle with only a small gain.

    [–][deleted] 1 point2 points  (2 children)

    I'm not well versed in GAN glossary so pardon me if this is a dumb question. When people say pixel shuffle do they mean distributing the depth dimension to height and width, like transposing and reshaping [N, C*k*k, H, W] to [N, C, H*k, W*k]?

    [–][deleted] 2 points3 points  (1 child)

    Yes that's correct. Here's the paper if you want to read more. https://arxiv.org/pdf/1609.05158.pdf

    [–][deleted] 0 points1 point  (0 children)

    Thank you!

    [–]NotAlphaGo 0 points1 point  (1 child)

    Well, ok fair point. But what about residual layers? Not sure that would get you anything in a DCGAN setting going from z to G(z).

    [–]gwern 0 points1 point  (0 children)

    They are better, more easily trained models; both of those could help - fix artifacts more easily in just a few layers, and better gradients extracted from the limited supervision. I'm sure they help at least a little.

    [–]NotAlphaGo 0 points1 point  (0 children)

    Residual layers + pixel shuffle possibly. Original DCGAN uses transposed convolution.

    [–]mimighost 0 points1 point  (0 children)

    Their data is pretty good

    [–]a_marklar 2 points3 points  (1 child)

    Nice project. Just fyi the first sentence in the technical report introduction looks incomplete

    [–]ituki[S] 0 points1 point  (0 children)

    Thanks for point out that, we have fixed the typo.

    [–]madebyollin 2 points3 points  (0 children)

    Really nice! Is there a way to record noise vectors (e.g. copy/paste a bit of text)? Since the noise vector impacts the sample quality, it would be nice to be able to share good noise vectors.

    [–]marcjschmidt 6 points7 points  (0 children)

    FYI 13MB pdf, 24.0KB/s.

    [–][deleted] 0 points1 point  (1 child)

    I loathe reading papers but yours is a very good read. I really enjoyed reading it and its also more informative.

    Can you tell me the size of loading model?. I have a shitty internet

    Thank you

    [–]madebyollin 3 points4 points  (0 children)

    The model on the site is ~18 MB.

    [–][deleted]  (2 children)

    [deleted]

      [–]ryches 0 points1 point  (1 child)

      I am getting that as well

      [–]madebyollin 1 point2 points  (0 children)

      Using Chrome should fix it, I think?

      [–]zergling103 2 points3 points  (2 children)

      These results look way to good. What technique/architecture are you using?

      [–]mustafaihssan 6 points7 points  (0 children)

      it's in the technical report in the top comment

      [–]L43 0 points1 point  (0 children)

      Yeah that was my first reaction, pretty incredible if true!

      [–]guillefix3 2 points3 points  (0 children)

      GANime girls? really cool though

      [–][deleted] 2 points3 points  (2 children)

      Incredibly impressive. Are you planning to make the chainer training code available too?

      [–]ituki[S] 3 points4 points  (1 child)

      It will take some time to clean up training scripts. All of us are busy with the website part now because of the tremendous number of visits...

      [–]BoldFontOfYouth 0 points1 point  (0 children)

      Is it possible to distribute an offline version as a torrent? Just localhost the files or something like that?

      [–]muvb 0 points1 point  (0 children)

      too bad it only generates half a face

      http://i.cubeupload.com/LVdJX7.png