×
all 94 comments

[–]gwern[S] 50 points51 points  (11 children)

Since both SSC and /r/slatestarcodex are all about all things generative of late, this seems like an appropriate submission. It's also gone viral in China: since I made this website yesterday afternoon, there's been 430,932 unique visitors.

[–][deleted] 25 points26 points  (6 children)

What are the other degenerative posts?

Sorry bad joke. But seriously, posts please?

[–]PuzzledOpinion 2 points3 points  (1 child)

I don't understand what you mean by "degenerative". Could you explain?

[–]JustLookingToHelp180 LSAT but not accomplishing much yet 9 points10 points  (0 children)

Because "degenerate weebs."

[–]erwgv3g34 1 point2 points  (3 children)

It's also gone viral in China: since I made this website yesterday afternoon, there's been 430,932 unique visitors.

Are you going to monetize it?

[–]gwern[S] 11 points12 points  (2 children)

Nah, not my style. It's only going to cost like $80 total anyway (most of that being the domain name). I'd rather enjoy the lulz and cross-traffic to gwern.net than discourage visitors.

[–]aliens_ate_my_cat 0 points1 point  (1 child)

Didn't training the model cost something? Not that I'm encouraging you to become some shady waifu dealer, but I'm curious how long did it take, on what hardware, etc.

[–]gwern[S] 4 points5 points  (0 children)

I haven't estimated it myself but apparently running 2x1080ti for a week and half is like $20 in electricity, and, complicating things further, I use electrical heating & it's been quite cold, so arguably, the true marginal cost of the training has been much less since the waste heat from the GPUs simply substitutes for the electrical heating during the day.

Regardless, I'd say the cost of the training is definitely principally in terms of my time & effort & opportunity cost of training other things, and the direct material-financial costs are trivial in comparison.

Worth it, of course! 720,000 unique visitors and counting, and many hilarious reactions on Twitter & Tumblr & 4chan and elsewhere, see https://gwern.net/TWDNE

[–]duskulldollpneumatically deficient 44 points45 points  (0 children)

for those who haven't seen: the gif

2020: psychadelic meta-waifus colonize the entirety of human memespace, infest your frontal lobe, puppeteer your palsied fingers. inhuman moe will-to-power sucks civilization dry like a kid with an orange slice, spits out pips, carves Holo's face on the moon.

2021: small groups of survivors poke heads out of wheatfield bunkers. the amish inherit the earth.

[–]BaronVSS 33 points34 points  (5 children)

[–]NacatlGoneWildNMDA receptor 9 points10 points  (2 children)

The curse of the sixth layer.

[–]roystgnr 8 points9 points  (1 child)

I Googled this to try to learn exactly what apparently-now-colloquially-famous problems deep neural networks run into when they have six (or more?) layers. I did not find the arXiv paper I had been hoping to. I did find a lot of "This anime is a full on display of body horror" links alongside that wiki page, though!

[–]gwern[S] 0 points1 point  (0 children)

I believe it's a Made in Abyss reference to what happens when one tries to return from as deep as the 6th layer of said abyss - 'loss of one's humanity or life' where loss=melted-flesh-blob.

[–]Gen_McMusterInstructions unclear, patient on fire 4 points5 points  (0 children)

Jesus Christ, how horrifying

[–]remdos 29 points30 points  (4 children)

What are the weird white spots on some of the waifus

[–]gwern[S] 72 points73 points  (0 children)

Vanilla ice cream.

[–]HalloweenSnarry 7 points8 points  (1 child)

You know exactly what they are.

(Or maybe not, I assume Gwern's been working purely with the SFW bits of Danbo for training)

[–]ColonCaretCapitalPI defect in prisoner's dilemmas. 1 point2 points  (0 children)

Maybe it's supposed to simulate lighting. Someone's cheek should be a little lighter in the sun, but a tiny white spot isn't a realistic way of drawing that in.

[–]nerfviking 19 points20 points  (7 children)

Wow, this is leaps and bounds more detailed than a similar one I saw a couple of months ago. Neural network waifu generation technology is really taking off.

[–]Klokinator 22 points23 points  (6 children)

Remember: Some of the greatest inventions in history were adopted and popularized for the sake of porn. These are the first step toward ridiculously realistic simulated waifu VR sex.

[–]nerfviking 33 points34 points  (5 children)

Honestly, as someone who occasionally dabbles in game development, the idea of being able to generate character portraits without have to having any art skill is really appealing to me.

[–]Klokinator 12 points13 points  (0 children)

Agree. For me, I would like to tweak and play with models, then hand them off to my artist. Having an early mockup for him to work with would speed up artwork tremendously. If I could mock up a decent character design, human, alien, or otherwise, and do it in less than an hour, that would highly appeal to me.

[–]FCfromSSC 3 points4 points  (3 children)

speaking as an artist in the games industry who's spent about ten years honing his craft, it's distantly terrifying.

[–]HalloweenSnarry 1 point2 points  (2 children)

Mind if I ask what you've worked on?

[–]FCfromSSC 0 points1 point  (1 child)

Sorry, anonymity is pretty important. Stuff I say here could get me fired.

[–]HalloweenSnarry 0 points1 point  (0 children)

DMs?

[–]enimodas 21 points22 points  (0 children)

[–]thebastardbrastaFiscally liberal, socially conservative 15 points16 points  (6 children)

When will the bottom fall out of the anime market? I've seen GNN waifus, tweening and backgrounds. How long will it be until I can show a NN season 2 of KonoSuba and have it make season 3 for me?

[–]gwern[S] 19 points20 points  (2 children)

/r/MemeEconomy says short all current anime memes, hold cash, and wait to go long all new AI-memes!*

* NOTE: This is not financial advice, past memeturns do not predict future karmaence, please consult a licensed marketing guru for all your memenial needs.

[–]thebastardbrastaFiscally liberal, socially conservative 5 points6 points  (1 child)

Isn't MemeEconomy almost purely sell-side? I have zero faith in advice from people with so little skin in the game.

[–]gwern[S] 15 points16 points  (0 children)

Au contraire mon frere, MemeEconomy being entirely a collective meme, they have the most skin in the game of MemeEconomy it is possible to have.

[–]you-get-an-upvoteCertified P Zombie 5 points6 points  (2 children)

I can't tell if this is joking or not, but the kind of long-term dependencies involved in writing a coherent season-long plot (e.g. what happens in episode 1 affecting episode 2) would be ground-breaking and is very unlikely to happen in the next year or two. And this ignores the challenge of converting the script to an animated series.

I'd say it's unlikely (say ~10%?) a ML-generated manga will sell over a million copies in the next 5 years -- let alone an appealing anime.

A human-guided manga seems more feasible, and one thing I'm really interested in seeing more of is ML models that can be corrected or prompted by humans -- e.g. a human writes the manga script and rough sketches of the frames and the AI tries to finish it. The human iterates on the frames that look bad, etc.

[–]thebastardbrastaFiscally liberal, socially conservative 3 points4 points  (1 child)

Was the idea actually so sensible that you didn't consider it a joke? It seemed ludicrous enough to me.

[–]you-get-an-upvoteCertified P Zombie 3 points4 points  (0 children)

I had/have no idea how familiar you are with machine learning. I think it's possible that someone who has only seen the publicized results of ML could think it is sensible.

¯\_(ツ)_/¯

[–]Edmund-NelsonFilthy Anime Memester 13 points14 points  (3 children)

I saw an Ahegao face, was there Hentai in the training set or something?

[–]gwern[S] 22 points23 points  (1 child)

Yes. ~9% of the original images the faces were cropped from. I've deleted the most obviously pornographic faces I've found but in a dataset of 220k faces...

(Note for other readers: 'ahegao' != 'ahoge'. Ahoges, incidentally, won't show up because the face-cropping script would typically crop the top of the head away.)

[–]Edmund-NelsonFilthy Anime Memester 4 points5 points  (0 children)

I guess since Ahegao has very distinct color patterns (pink where there normally is skin tone and White where normally there isn't) it probably is likely that the NN would pick it up. Especially since the colors correlate strongly

[–]untrustable2 10 points11 points  (10 children)

Can this easily be adapted to make multiple images of the same 'character'? Would be revolutionary to be able to draw/animate a character in a pose/emotion in a few simple commands.

[–]gwern[S] 15 points16 points  (9 children)

You can easily specialize the face model to a single character such as Asuka/Holo: https://twitter.com/gwern/status/1094626477633617922 You just swap out the dataset for an all-$CHARACTER dataset and keep training. Then you can generate random samples and random interpolations of just that character. Works beautifully if you have n>5000 of that character, but can still work reasonably well down into the 100s if you're careful to not overtrain.

You don't get control by default since you can't easily turn an arbitrary image into the original latent vector/random noise you use to generate with, and you don't know what the latent vector dimensions mean, but there are ways which people are working a little on for StyleGAN: https://www.reddit.com/r/MachineLearning/comments/aq6jxf/p_stylegan_encoder_from_real_images_to_latent/ https://www.reddit.com/r/AnimeResearch/comments/asl4gw/fully_custom_waifu/ So if someone writes the code to hook it all together, it should be possible to do something like 'take this arbitrary picture of Holo, find the latent vector which makes the StyleGAN generate almost exactly that picture, then use the "smile" dimension to modify the image to make it smile or stop smiling'.

[–]sonyaellenmann 6 points7 points  (1 child)

but gwern, who is best girl

[–]gwern[S] 23 points24 points  (0 children)

GWERNAN: "Best Girl is StyleGAN-chan! To crush the artists, to see them fall at your feet—to take their sketches and faces, and hear the lamentation of their fans & the normies. That is Best Girl."

[–]CHRISKOSS 2 points3 points  (2 children)

Seems like we are only 5ish years away from software that can turn a text script into video anime.

[–]cosmicrush 6 points7 points  (0 children)

Then we can make a dungeon and dragons type game with this where everyone creates their own stories.

[–]ff29180dIronic. He could save others from tribalism, but not himself. 1 point2 points  (0 children)

or into a live-action movie.

AutomateHollywood

[–]alexmlamb 0 points1 point  (3 children)

What if you just do a conditional GAN (which kind of assumes that you know the character identity for the images in your training set - I don't know if you have that).

But if you just know who is Asuka in the training set, you could make it conditional with the labels "asuka" and "not asuka".

[–]gwern[S] 2 points3 points  (2 children)

I do have something like that. For example, I used the character tags to take the top 1000 character tags by frequency, grabbed their file IDs, and dumped the cropped-faces into 1000 corresponding directories. Not perfect since a lot of extraneous faces would get dumped in thanks to multiple-character images, but overall reasonably accurate, and of course I could make it much more accurate at the cost of size by including a filter on 1boy/1girl tags, which will ensure almost 100% accuracy. (I was experimenting with one of the BigGAN implementations to see if class-conditional helped. It didn't, at least as far as I got in training.)

The problem, of course, is that it's unclear how to use it in StyleGAN. The codebase claims to support labels/categories but they don't use it anywhere according to the README, the paper doesn't include any class-conditional training, all of their examples like LSUN-cats are either limited to a single category or just unconditional originally like FFHQ, and I don't understand how to use the dataset_tool.py's add_label function to turn my 1k directories+images into a working set of .tfrecords.

[–]alexmlamb 0 points1 point  (1 child)

Well, in terms of modeling, I think what you should do is compute an embedding of size (batch, n_channels) for each layer using the label, and then combine it into the hidden state using equation 7 in this paper:

https://arxiv.org/pdf/1802.05637.pdf

Adding it in might also work okay, I'm not 100% sure.

Regarding the data/coding questions, they look pretty disgusting given how tensorflow works.

[–]gwern[S] 0 points1 point  (0 children)

Sounds overly complex, IMO. Characters are high-level, they don't much affect details like hair-noise. No reason to not just add a one-hot encoding into the 8x512FC embedding stack. Or concatenate it to the embedding input into the StyleGAN proper. This is more or less how I would like to do conditional StyleGAN: encode the text tags (all of them, not just the categorical character tag) into a 256 embedding, concatenate with a 8x256FC embedding/noise, and feed a 512 into StyleGAN. You get all the semantics of the tags as supervision, control of all traits defined by the tags like character or pose, plus StyleGAN can still learn in an unsupervised way, without inflating the model parameter size.

Regarding the data/coding questions, they look pretty disgusting given how tensorflow works.

Precisely. ProGAN/StyleGAN are both kinda hideous codebases, as beautiful as their outputs may be. There are lots of things one might like to do with them, but...

[–]_-Thoth-_ 9 points10 points  (1 child)

Your waifu doesn’t even exist and yet she still a shit

[–]gwern[S] 3 points4 points  (0 children)

"You FOOL! This isn't even her final form!"

[–]hh26 7 points8 points  (2 children)

For some reason it kind of upsets me when I see some of the cuter girls there and I realize that there is no actual content that has them as a character. Like, it's wasted potential. Even OC characters that someone draws just for fun at least have a creator who knows they exist and could make up stories about them, and maybe already thought up a back-story for them in their head while drawing them. These girls have nothing. Nobody made them on purpose, nobody put forth time or effort or love into their creation. Nobody has written stories about them, and probably nobody ever will.

I kind of want to start or fund a group that takes these girls and makes stories about them, just so that all of them at least have the potential of someday having someone read about them and love them and they can become an actual waifu in someone's heart.

On the other hand, there are 2563n rbg images with n-pixels in conceptspace, a number with a ridiculously large number of cute anime girls. All of these girls made by this AI, and all other anime girls made by humans, have been in there all along just waiting to be discovered/invented. I'm not sure that the AI generating these particular ones instatiates them in some way that makes them different from the ones that have never been drawn or seen and merely exist in conceptspace.

[–]gwern[S] 2 points3 points  (1 child)

I can't help but ask: what's your position on the Repugnant Conclusion or the Logic of the Larder?

[–]hh26 0 points1 point  (0 children)

I generally don't buy it, in that I don't think hypothetical people have rights if they never come into existence, and I don't think our moral utility functions should scale linearly with sheer numbers of people. But I'm not quite sure. I think that a billion very happy people is better than a trillion mildly content people, but that two billion very happy people is better than one billion equally very happy people. Maybe it's best approximated by some sort of concave down function, like Sum(individual utility)/sqrt(n), where n is the total number of people? So that if you want to halve the average person's utility you have to quadruple the total number of people to make it worth it. I'm not quite sure if that's right though.

Maybe what you want to maximize is just the average utility, and the number of people is useful only in so far as interacting with and helping other people and knowing that other people exist increases the utility of everyone involved.

I'm definitely of the opinion that people with serious genetic diseases or in extreme poverty should choose not to have children, that it's wrong to knowingly and deliberately bring people into the world just to suffer, but I'm not sure if that opinion is founded on them being below a certain threshhold of happiness/sadness so that their utility is negative, or being below average in utility and just decreasing the average, or being below average that the resources they consume replaces the opportunity for someone else to be born instead.

I haven't come to a definite conclusion either way, but in general I don't care about the rights of people who don't actually and never will exist in any way, shape or form except as hypothetical mathematical sequences of numbers that nobody has even explicity calculated. Once someone has been conceived and drawn though, then they acquire the "rights" of a fictional character, which are slim and mostly informal and emotional rather than actual real utility values, and lose in comparison to any real person's rights. But I still am willing to suspend enough disbelief to feel sorry for them for being thought of but not written about.

[–]mystikaldanger 15 points16 points  (14 children)

I can't be the only one who finds anime girls really generic and boring-looking. I swear, it's the same girl with swapped out hairstyles and eye colors.

This overwhelming sameness actually gave me an urgent need to look at some real girls' faces.

I guess the above mentioned qualities are a selling point with some people.

[–]gwern[S] 26 points27 points  (0 children)

Personally, I'm astonished by the sheer variety of art styles the model learns, quite aside from the general quality. If you compare it to, say, MakeGirls.Moe's old model or even my ProGAN models, there's really no comparison. Random samples from those GANs truly 'all look the same', but not my StyleGAN samples. I could look through samples with psi=1.2 all day just because they look so different. Even the 'media' seem to differ, I could swear many of these are trying to imitate watercolors, charcoal sketching, and oil painting.

[–]Ghenlezo 21 points22 points  (4 children)

This overwhelming sameness actually gave me an urgent need to look at some real girls' faces.

This should help: https://thispersondoesnotexist.com/

[–]baj2235Fluorine Uranium Carbon Potassium 7 points8 points  (0 children)

Its seems to me that generated women look much more real than the men, the young much more real than the old.

Overall though, I am mildly creeped out that the images seem to have actually crossed the uncanny valley. There are artifacts in all of the images, but at a glance nothing seems untoward. I may be fooled if I didn't know to pay attention.

[–]sonyaellenmann 6 points7 points  (0 children)

how do you do, fellow real humans

[–]satanistgoblin 1 point2 points  (1 child)

They seem to have weird, too asymmetric teeth. Ones not showing teeth are usually very realistic though.

[–]roystgnr 5 points6 points  (0 children)

Or top teeth that blend into bottom teeth, in the first picture I pulled up after your comment made me want to pay close attention to the teeth. Everything else about this fake girl's face, from the cheekbones to the traditionally hard stuff like hair and eyes, is just indistinguishable from reality.

Before you primed me on teeth, though, I could already tell that many of the backgrounds looked very off, and this pic turned out to be a most extreme case of that too. The blurry beige on most sides is fine. The quasi-human monstrosity cuddled up next to the passably-human girl on one side, not so fine. I'm guessing there were multi-person pictures in the training data that got cropped to focus on just one face? There must have been enough pictures for the network to learn that sometimes faces are next to mostly-cropped-away other faces, but not enough pictures for the network to learn that the mostly-cropped-away other faces need to also resemble reality rather than resembling the nightmare I now expect to have tonight.

[–]JustAWellwisher 12 points13 points  (2 children)

Yeah. You're describing what some people like to call the "moeblob", which carries a similar negative connotation. e.g. "They just all look like blobs".

This artstyle is popular for many reasons and has a long history, cultural significance and so on, but one of them is because it allows for the easy animation, exaggeration and recognition of facial expressions.

[–]mystikaldanger 4 points5 points  (1 child)

it allows for the easy animation, exaggeration and recognition of facial expressions

Does this account for anime's popularity with people of an autistic persuasion?

[–]erwgv3g34 5 points6 points  (0 children)

This is definitely one of the biggest reasons why I prefer animation over live action. Like, I hear people praising "subtle" and "realistic" acting in live-action and all I can think of is that I have no idea what the characters are supposed to be feeling. Give me a giant sweatdrop and a luminescent blush any day, or at least an actor hamming it up like Raul Julia as Bison in Street Fighter.

[–]7thHanyou 9 points10 points  (3 children)

Most anime styles emphasize features and expressions that I find attractive. I suppose it depends what you think looks good, but I don't get tired of things I find visually pleasing.

[–]Faceh 14 points15 points  (2 children)

Yes, I thought a lot of the appeal of 2D was the 'superstimulus' effect.

Humans naturally find things with big eyes, button noses, large heads, etc. cute and/or feminine. So here's a drawing of a female that maximizes for cuteness/femininity along those dimensions and takes it as far as it can go without wrapping back around into creepy.

See also: porn actresses getting boob, butt, and lip enhancement surgery.

From my view, I get the appeal of the style, but if you actually THINK about the features and how they interact it really does start to seem like its the exact same face with only the hairstyle and hair and eye color swapped out.

[–]rolabond 7 points8 points  (1 child)

but they don't have button noses, they don't have noses at all! Tbh I think you have to grow up with anime to find it attractive, my grandparents and parents dislike how a lot of it looks. They think it looks weird.

[–]ShardPhoenix 2 points3 points  (0 children)

I didn't much like the aesthetic until I started watching anime, but that was in my 20's.

[–]erwgv3g34 3 points4 points  (2 children)

How does copyright work on these? Like, I know that if an animal makes a selfie that's not copyrightable. But what happens when an algorithm makes a waifu? Do these images belong to Gwern? If I download the code and make my own waifus, can I use them as characters for a light novel or something?

[–]gwern[S] 6 points7 points  (0 children)

What Crypko/MGM argues is that the model trained on the dataset is a transformative use so the data's copyright does not matter, that the person who trains or makes the model owns the copyright to the model itself, and that the model samples generated at random mechanically have no copyright but that anyone who 'creatively' uses the MGM model by tweaking settings etc has engaged in the de minimis level of input to qualify as a 'creator' and thus owns the copyright to that particular sample. I am a little more agnostic.

But I would say that even in the somewhat unlikely case that the specific images you generated by using my StyleGAN turn out to be copyrighted and I didn't release it under CC-0 like all my stuff, you could always just have an artist redo them for your LN since only that exact image would be copyrighted and not the general appearance/style of it (no trademarks are involved). So there should be no issue with using it as a 'character generator' for inspiration.

[–]doremitard 1 point2 points  (0 children)

I'm not a lawyer, but I think it would be down to a court in whatever jurisdiction to determine whether Gwern had made enough of a creative contribution to each image that he owns the copyright. It seems like this varies between jurisdictions and cases, and may not even be decided in the US.

If you download the code and make your own output and use it in a creative project, and don't tell anyone exactly how you generated it, it seems unlikely that anybody could make a copyright claim against you. It would presumably be very hard to prove exactly which code and training data you used, or even that you had used code rather than commissioned an image from an artist as work-for-hire.

[–]Aegeus 2 points3 points  (1 child)

Also relevant: AI-Powered Image Colorizer

It takes a line drawing and then adds colors to it. Results aren't always perfect, but it has a lot of tools to nudge the AI to what you want.

[–]gwern[S] 7 points8 points  (0 children)

I believe https://github.com/lllyasviel/style2paints is actually a better colorizer. (Sadly, their web interface is down so you can't test it yourself. GPU servers are expensive. I did use it several times and was impressed.)

[–]gwern[S] 2 points3 points  (2 children)

Just for kicks, I added GPT-2 outputs based on an anime prompt. :)

[–]Plint 1 point2 points  (1 child)

And it figured out light novel naming conventions perfectly. I had to look up the title to make sure it didn't just duplicate it from something. The rest is even nearly believable as poorly machine translated Japanese!

The main story is the main story of a romantic comedy based on a light novel shoujo manga called "I Can't Believe My Alien Visitor Is My Little Sister. It's also pretty good. However, the story is completely different from the series I was working on. And I'm quite surprised to know that this is what it was based on. And the story is really different from previous anime stories that I wrote. But at the same time, I really like the writing style and the style isn't a bad one either. It's a really funny kind of writing. So, the main focus of it was the real hero, Sakazu, who is a very nice guy who is also one of those famous people, who I was not aware of at first. I think it makes a very good story even if I don't tell it myself, and if I have it, it shows that he has the confidence to go to the future. It's a good thing. But the protagonist's character also isn't that likable. And I thought it was bad for the show and for the author(s. to not have an obvious character and show how...

[–]gwern[S] 0 points1 point  (0 children)

"I Can't Believe My Alien Visitor Is My Little Sister" is copied from the long prompt I gave it, although it does work with it well I admit and I wish I could see what the full model does with a big prompt: https://twitter.com/gwern/status/1098805275862188034

[–]Plint 0 points1 point  (3 children)

How actually original are its outputs generally? I notice that it's clearly spitting out almost exact duplicates of some input images fairly regularly.

[–]gwern[S] 5 points6 points  (2 children)

I can only really speak for the Asuka/Holo faces since I am familiar with them from doing a lot of manual hand-cleaning, but I would say original in general. It certainly does imitate specific faces in the dataset, but that's not necessarily a bad thing (it should be able to generate the original faces!) and it has generalized well beyond them, as the random sampling & smooth interpolations show.

The 'diversity' of the samples is also governed partly by the 'psi' hyperparameter which does the 'truncation trick' (see the BigGAN & StyleGAN paper for details). At 0, diversity is nil and all faces are a single global average face (a brown-eyed brown-haired schoolgirl, unsurprisingly); at +-0.5 you have a broad range of faces, and by +-1.2, you'll see tremendous diversity in faces/styles/consistency but also tremendous artifacting & distortion. Where you set your psi will heavily influence how 'original' outputs look. At psi=0.5, you're playing it safe and at psi=1.2, they are tremendously original but extremely hit or miss. At psi=0.5 they are consistent but boring. For TWDNE, I set psi=0.7 which strikes a good balance between craziness/artifacting and quality/diversity: still quite diverse, but avoiding most of the worst of artifacting.

Personally, I prefer to look at psi=1.2 samples because they are so much more interesting and I love seeing the creative & artistic samples my StyleGAN can come up with, but the problem is, if I put psi=1.2 samples up on TWDNE, people would not realize that the samples are deliberately being generated that way and would assume that all of the artifacting & distortions is simply the best the StyleGAN model can do and that it is crappier than it really is.

[–]Plint 0 points1 point  (1 child)

Thanks for the explanation! I hadn't realized the implications of its ability to do smooth interpolations, that makes sense. All this is really fascinating even though I'm not actually familiar with programming, much less machine learning.

I've found myself thinking about what other kinds of content would look like through this system, so maybe I ought to change that. For instance, would training on the entire body of work of a specific manga artist produce interesting results? I assume things like panel borders and text would introduce tons of bizarre artifacts though.

I have to agree that its more extreme outputs are... quite something. I'm sure you've noticed its tendency to put eyes in the top right corner of very distorted images. I wonder why?

[–]gwern[S] 0 points1 point  (0 children)

For instance, would training on the entire body of work of a specific manga artist produce interesting results? I assume things like panel borders and text would introduce tons of bizarre artifacts though.

It tries. I mean, there's probably not really any sensible interpolation between regular illustrations and manga layouts, but it does try. If you watch nsheppard's current all-Danbooru2018 StyleGAN interpolation video or my abandoned all-Danbooru2017 ProGAN interpolation videos, you'll see what it tries to do.

I'm sure you've noticed its tendency to put eyes in the top right corner of very distorted images. I wonder why?

At this point, I can only speculate. It could be a failure of global coherency (which would suggest adding self-attention layers, which are very helpful for enforcing global properties like 'only one or two eyes, of the same color'), or could be some sort of extrapolation distortion like it generating a very stretched out and very tilted face which happens to put eyes out of place.

[–]erwgv3g34 0 points1 point  (3 children)

Any chance you can take a dataset of pony faces from Derpibooru and use it to generate pony waifus?

[–]gwern[S] 4 points5 points  (2 children)

Me, probably not. I don't have a mirror of Derpibooru, and my GPUs are busy. No reason someone couldn't take my anime-face model & transfer-learn to pony faces, though, either using the full portrait images (downscaled to 512px JPEGs) or Nagadomi's face-cropping script (which probably would work on pony faces). Although if you want MLP faces, it might make more sense to try to crop faces out of the TV episodes and train on that as well, since that is by far the biggest & most consistent* corpus of pony faces there is. (ie something like grab 1 frame per second, face crop it, then de-duplicate the entire corpus with something like findimagedupes -t 99%.)

* one bad thing about cropping anime faces is that high-quality closeups look very different from the background 'meguca' face drawings so extracting faces from anime episodes probably isn't as great an idea as it looks unless you have some quality filtering in place. But with MLP's ultra-simplified clean Flash style, it's basically all vector art and shouldn't matter.

[–]sl236 0 points1 point  (1 child)

An interesting tidbit to be aware of here is that official MLP artists have very strict guidelines they have to adhere to that severely limit things like the range of angles from which a pony face is allowed to be depicted. ...I have no intuition for whether and how this would affect model training.

[–]gwern[S] 0 points1 point  (0 children)

Yeah, I'm not sure. Hm. Speculating even further, I wonder if the designs are so regular that actual 3D (2D?) models could be extracted from a corpus of screenshots? That's something NNs are also good at. Given such models, then data availability is solved and the restrictions are irrelevant since you can screenshot from any angle (or skip the GANs entirely and just work with the inferred models in a CGI rendering program like Blender).

[–]ElexsonWrite 0 points1 point  (2 children)

How hard will it be to make a male version?

[–]gwern[S] 0 points1 point  (0 children)

Not hard. It already does male faces occasionally because I didn't restrict the corpus to female faces. See https://twitter.com/roadrunning01 for some work on male faces.

[–]gwern[S] 0 points1 point  (0 children)

[–]Greenparrotlover 0 points1 point  (0 children)

In some ways anime girls generated by this A.I look to have more personality, and the imperfect ai, Makes the girls look more relateable in my opinion, they have blemishes and there faces aren't always perfectly symetrical, and the mouth shapes are more diverse. It's pretty good overall, and I like some of the funny and surreal features that are occasionaly generated.I'm having so much fun using it. Though I wish it could generate characthers with multi color hair.