Anime Crop Datasets: Faces, Figures, & Hands

Description of 3 anime datasets for machine learning based on Danbooru: cropped anime faces, whole-single-character crops, and hand crops (with hand detection model).
NN, anime, dataset
by: Gwern Branwen, Arfafax, Shawn Presser, Anonymous, Danbooru Community 2020-05-102020-08-05 finished certainty: log importance: 4


Doc­u­men­ta­tion of 3 anime datasets for ma­chine learn­ing based on Dan­booru: 300k cropped anime faces (pri­mar­ily used for StyleGAN/This Waifu Does Not Ex­ist), 855k whole-s­in­gle-char­ac­ter fig­ure crops (ex­tracted from Dan­booru us­ing AniSeg), and 58k hand crops (based on a dataset of 14k hand-an­no­tated bound­ing boxes used to train a YOLOv3 hand de­tec­tion mod­el).

This datasets can be used for ma­chine learn­ing di­rect­ly, or in­cluded as data aug­men­ta­tion: faces, fig­ures, and hands are some of the most no­tice­able fea­tures of anime im­ages, and by crop­ping im­ages down to just those 3 fea­tures, they can en­hance mod­el­ing of those by elim­i­nat­ing dis­tract­ing con­text, zoom­ing in, and in­creas­ing the weight dur­ing train­ing.

Danbooru2019 Portraits

Dan­booru2019 Por­traits is a dataset of n = 302,652 (16GB) 512px anime faces cropped from solo SFW Dan­booru2019 im­ages in a rel­a­tively broad ‘por­trait’ style en­com­pass­ing necklines/ears/hats/etc rather than tightly fo­cused on the face, up­scaled to 512px as nec­es­sary, and low-qual­ity im­ages deleted by man­ual re­view us­ing , which has been used for cre­at­ing .

The Por­traits dataset was con­structed to train for cre­at­ing

Faces → Portraits Motivation

The main is­sues I saw for the faces based on TWDNE feed­back were:

  1. Sex­u­al­ly-Sug­ges­tive Faces: be­cause I had not ex­pected StyleGAN to work or to wind up mak­ing some­thing like TWDNE, I had not taken the effort to crop faces solely from the SFW sub­set (s­ince no GAN had proven to be good enough to pick up any em­bar­rass­ing de­tails and I was more con­cerned with max­i­miz­ing the dataset size).

    Dan­booru is di­vided into 3 rat­ings, “safe”/“ques­tion­able”/“ex­plicit”, with “ques­tion­able” bor­der­ing on soft­core. The explicitly-NSFW im­ages make up only ~9% of Dan­booru but be­tween the SFW-but-suggestive im­ages and the ex­plicit ones, and StyleGAN’s learn­ing ca­pa­bil­i­ties, this proved to be enough to make some of the faces quite naughty-look­ing. Nat­u­ral­ly, every­one in­sisted on jok­ing about this. This could be fixed sim­ply by fil­ter­ing in “safe”-only rather than merely fil­ter­ing-out “ex­plicit”.

  2. Head Crops: Na­gadomi’s face-crop­per is a face crop­per, not a head­-crop­per or a por­trait-crop­per.

    The face-crop­per cen­ters its crops on the cen­ter of a face (like the nose) and, given the orig­i­nal bound­ing box, will nec­es­sar­ily cut off all the ad­di­tional de­tails as­so­ci­ated with anime heads such as the ‘ahoge’ or bunny ears or twin-tails, since those are not faces. Sim­i­lar­ly, I had left Na­gadomi’s face-crop­per on the de­fault set­tings in­stead of both­er­ing to tweak it to pro­duce more head­-shot-like crop­s—s­ince if GANs could­n’t mas­ter the faces there was no point in mak­ing the prob­lem even harder & wor­ry­ing about de­tails of the hair.

    This was not good for char­ac­ters with dis­tinc­tive hats or hair or an­i­mal ears (such as Holo’s wolf ears). This could be fixed by play­ing with scal­ing the bound­ing box around the face by differ­ent x/y mul­ti­pli­ers to see what picks up the rest of the head. (An­other ap­proach would be to use AniSeg to de­tect face & whole-char­ac­ter-fig­ure si­mul­ta­ne­ous­ly, and crop the fig­ure from its top to the bot­tom of the face.)

  3. Messy Background/Bodies: I sus­pected that the tight­ness of the crops also made it hard for StyleGAN to learn things in the edges, like back­grounds or shoul­ders, be­cause they would al­ways be par­tial if the face-crop­per was do­ing its job.

    With big­ger crops, there would be more vari­a­tion and more op­por­tu­nity to see whole shoul­ders or large un­ob­structed back­grounds, and this might lead to more con­vinc­ing over­all im­ages.

  4. Holo/Asuka Over­rep­re­sen­ta­tion: to my sur­prise, TWDNE view­ers seemed quite an­noyed by the over­rep­re­sen­ta­tion of Holo/Asuka-like (but mostly Holo) sam­ples.

    For the same rea­son as not fil­ter­ing to SFW, I had thrown in 2 ear­lier datasets I had made of Holo & Asuka faces—I had made the at 512px, and cleaned them fairly thor­ough­ly, and they would in­crease the dataset size, so why not? Be­ing over­rep­re­sent­ed, and well-rep­re­sented in Dan­booru (a ma­jor part of why I had cho­sen them in the first place to make pro­to­type datasets with), of course StyleGAN was more likely to gen­er­ate sam­ples look­ing like them than other pop­u­lar anime char­ac­ters.1 Why this an­noyed peo­ple, I don’t un­der­stand, but it might as well be fixed.

  5. Per­sis­tent Global Ar­ti­facts: de­spite the gen­er­ally ex­cel­lent re­sults, there are still oc­ca­sional bizarre anom­alous im­ages which are scarce faces at all, even with 𝜓 = 0.7; I sus­pect that this may be due to the small per­cent­age of non-faces, cut-off faces, or just poorly/weirdly drawn faces and that more strin­gent data clean­ing would help pol­ish the mod­el.

Portraits Improvements

Is­sues #1–3 can be fixed by trans­fer­-learn­ing StyleGAN on a new dataset made of faces from the SFW sub­set and cropped with much larger mar­gins to pro­duce more ‘por­trait’-style face crops. (There would still be many er­rors or sub­op­ti­mal crops but I am not sure there is any full so­lu­tion short of train­ing a face-lo­cal­iza­tion CNN just for anime im­ages.)

For this, I needed to edit lbpcascade_animeface’s crop.py and ad­just the mar­gins. Ex­per­i­ment­ing, I changed the crop­ping line to:

    for (x, y, w, h) in faces:
        cropped = image[int(y*0.25): y + h, int(x*0.90): x + int(w*1.25)]

These mar­gins seemed to de­liver ac­cept­able re­sults which gen­er­ally show the en­tire head while leav­ing enough room for ex­tra back­ground or hats/ears (although there is still the oc­ca­sional er­ror like a or im­age with mul­ti­ple faces or heads still par­tially cropped):

100 real faces from the ‘por­trait’ dataset (SFW Dan­booru2018 cropped with ex­panded mar­gins) in a 10×10 grid

After crop­ping all ~2.8m SFW Dan­booru2018 ful­l-res­o­lu­tion im­ages (as demon­strated in the crop­ping sec­tion), I was left with ~700k faces. This was a large dataset, but the dis­ad­van­tage was that many heads/faces over­lapped, so after a few weeks of train­ing, I had de­cent por­traits marred by strange hy­dra-like heads jut­ting in from the side. So I re­did the crop­ping process us­ing the solo tag to elim­i­nate im­ages which might have mul­ti­ple faces in them.

Is­sue #4 is solved by just not adding the Asuka/Holo datasets.

Fi­nal­ly, is­sue #5 is harder to deal with: prun­ing 200k+ im­ages by hand is in­fea­si­ble, there’s no easy way to im­prove the face crop­ping script, and I don’t have the bud­get to Me­chan­i­cal-Turk re­view all the faces like Kar­ras et al 2018 did for FFHQ to re­move their false pos­i­tives (like stat­ues).

One way I do have to im­prove it is to ex­ploit the Dis­crim­i­na­tor of a pre­trained face GAN. The anime face StyleGAN D would be ideal since it clearly works so well al­ready, so I wrote a ranker.py script (see pre­vi­ous sec­tion) to use a StyleGAN check­point and rank spec­i­fied im­ages on disk, and then re­built the .tfrecords with trou­ble­some im­ages re­moved. (This process can be re­it­er­ated as the StyleGAN model im­proves and the D im­proves its abil­ity to spot anom­alies.) I en­gaged in 5 cy­cles of ranker.py clean­ing over April 2019, delet­ing 14k im­ages; it seemed to re­duce some of the ar­ti­fact­ing re­lated to hands.

Portraits Dataset

The fi­nal 512px por­trait dataset (with por­trait crops, im­proved fil­ter­ing via solo, & dis­crim­i­na­tor rank­ing for clean­ing) is avail­able for down­load via rsync (16GB, n = 302,652):

rsync --verbose --recursive rsync://78.46.86.149:873/biggan/portraits/ ./portraits/

Portraits Citing

Please cite this dataset as:

  • Gw­ern Bran­wen, Anony­mous, & The Dan­booru Com­mu­ni­ty; “Dan­booru2019 Por­traits: A Large-S­cale Anime Head Il­lus­tra­tion Dataset”, 2019-03-12. Web. Ac­cessed [DATE] https://www.gwern.net/Crops#danbooru2019-portraits

    @misc{danbooru2019Portraits,
        author = {Gwern Branwen and Anonymous and Danbooru Community},
        title = {Danbooru2019 Portraits: A Large-Scale Anime Head Illustration Dataset},
        howpublished = {\url{https://www.gwern.net/Crops#danbooru2019-portraits}},
        url = {https://www.gwern.net/Crops#danbooru2019-portraits},
        type = {dataset},
        year = {2019},
        month = {March},
        timestamp = {2019-03-12},
        note = {Accessed: DATE} }

Danbooru2019 Figures

The Dan­booru2019 Fig­ures dataset is a large-s­cale char­ac­ter anime il­lus­tra­tion dataset of n = 855,880 im­ages (248GB; min­i­mum width 512px) cropped from Dan­booru2019 us­ing the AniSeg anime char­ac­ter de­tec­tion mod­el. The im­ages are cropped to fo­cus on a sin­gle char­ac­ter’s en­tire vis­i­ble body, ex­tend­ing ‘por­trait’ crops to ‘fig­ure’ crops. This is use­ful for tasks fo­cus­ing on in­di­vid­ual char­ac­ters, such as char­ac­ter clas­si­fi­ca­tion or for gen­er­a­tive tasks (a cor­pus for weak mod­els like StyleGAN, or data aug­men­ta­tion for BigGAN).

40 ran­dom fig­ure crops from Dan­booru2019 (4×10 grid, re­sized to 256px)

I cre­ated this dataset to as­sist our BigGAN train­ing by data aug­men­ta­tion of diffi­cult ob­ject class­es: by pro­vid­ing a large set of im­ages cropped to just the char­ac­ter (as op­posed to the usual ran­dom crop­s), BigGAN should bet­ter learn body struc­ture and reuse that knowl­edge else­where. Fo­cus on just the hard parts. This is a ML trick which we have used for faces/portraits in BigGAN and , and will use for hands as well. This could also be use­ful for StyleGAN, by greatly re­strict­ing the vari­a­tion in im­ages to sin­gle cen­tered ob­jects (StyleGAN falls apart when it needs to model mul­ti­ple ob­jects in a vari­a­tion of po­si­tion­s). Other ap­pli­ca­tions might be us­ing it as a start­ing dataset for us­ing ob­ject lo­cal­iz­ers to crop out things like faces where im­ages with mul­ti­ple in­stances would be am­bigu­ous or too oc­cluded (mul­ti­ple faces over­lap­ping) or too low-qual­ity (eg back­ground­s), so the whole Dan­booru2019 dataset would­n’t be as use­ful.

Figures Download

To down­load the cropped im­ages:

rsync --verbose --recursive rsync://78.46.86.149:873/biggan/danbooru2019-figures ./danbooru2019-figures/

Figures Construction

De­tails of get­ting AniSeg run­ning & crop­ping. Dan­booru2019 Fig­ures was con­structed by fil­ter­ing im­ages from Dan­booru2019 by solo & SFW sta­tus (~1,538,723 im­ages of ~65,783 char­ac­ters il­lus­trated by ~133,856 artist­s), and then crop­ping us­ing Jerry Li’s AniSeg model (a Ten­sor­Flow Ob­ject De­tec­tion API-based Python model for anime char­ac­ter face de­tec­tion & por­trait seg­men­ta­tion), which Li con­structed by an­no­tat­ing im­ages from Dan­booru20182.

Be­fore run­ning AniSeg, I had to make 3 changes. AniSeg had two bugs (S­ciPy de­pen­dency & model load­ing code) and the pro­vided script for de­tect­ing faces/figures does not in­clude any func­tion­al­ity for crop­ping im­ages. The bugs have been fixed & the de­tec­tion code now sup­ports crop­ping with the op­tions --output_cropped_image/--only_output_cropped_single_object. At the time, I mod­i­fied the script to do crop­ping with­out those op­tions, and I ran the fig­ure crop­per (s­low­ly) over Dan­booru2019 like thus:

python3 infer_from_image.py --inference_graph=./2019-04-29-jerryli27-aniseg-models-figurefacecrop/figuresegmentation.pb \
    --input_images='/media/gwern/Data2/danbooru2019/original-sfw-solo/*/*' \
    --output_path=/media/gwern/Data/danbooru2019-datasets/danbooru2019-figures

Fil­ter & up­scale. After crop­ping out fig­ures, I fol­lowed the im­age pro­cess­ing de­scribed in my StyleGAN faces write­up: I con­verted the im­ages to JPG, deleted im­ages <50kb, deleted im­ages <256px in width, used wai­fu2x to 2× up­scale im­ages <512px in width to >512px in width, and deleted mono­chrome im­ages (im­ages with <255 unique col­ors). Note that un­like the por­traits dataset, these im­ages are not re­sized with 512×512px squares with black back­grounds as nec­es­sary. This al­lows ran­dom crops if the user wants, and they can be down­scaled as nec­es­sary (eg mogrify -resize 512x512\> -extent 512x512\> -gravity center -background black). This gave a fi­nal dataset of n = 855,880 JPGS (248G­B).

Figures Citing

Please cite this dataset as:

  • Gw­ern Bran­wen, Anony­mous, & The Dan­booru Com­mu­ni­ty; “Dan­booru2019 Fig­ures: A Large-S­cale Anime Char­ac­ter Il­lus­tra­tion Dataset”, 2020-01-13. Web. Ac­cessed [DATE] https://www.gwern.net/Crops#figures

    @misc{danbooru2019Figures,
        author = {Gwern Branwen and Anonymous and Danbooru Community},
        title = {Danbooru2019: A Large-Scale Anime Character Illustration Dataset},
        howpublished = {\url{https://www.gwern.net/Crops#figures}},
        url = {https://www.gwern.net/Crops#figures},
        type = {dataset},
        year = {2020},
        month = {May},
        timestamp = {2020-05-31},
        note = {Accessed: DATE} }

Hands

We cre­ate & re­lease PALM: the PALM Anime Loca­tor Model. PALM is a pre­trained anime hand detector/localization neural net­work, and 3 sets of ac­com­pa­ny­ing anime hand datasets:

  1. A dataset of 5,382 ani­me-style Dan­booru2019 im­ages an­no­tated with the lo­ca­tions of 14,394 hands.

    This la­beled dataset is used to train a model to de­tect hands in ani­me.

  2. A sec­ond dataset of 96,534 hands cropped from the Dan­booru2019 SFW dataset us­ing the PALM YOLO mod­el.

  3. A cleaned ver­sion of #2, con­sist­ing of 58,536 hand crops up­scaled to ≥512px.

Hand de­tec­tion can be used to clean im­ages (eg re­move face im­ages with any hands in the way), or to gen­er­ate datasets of just hands (as a form of data aug­men­ta­tion for GANs), to gen­er­ate ref­er­ence datasets for artists, or for other pur­pos­es.

After faces & whole bod­ies, the next most glar­ing source of ar­ti­facts in GAN anime sam­ples like is draw­ing hands. Hands are no­to­ri­ous among hu­man artists for be­ing diffi­cult and eas­ily break­ing sus­pen­sion of dis­be­lief, and it’s worth not­ing that aside from the face, the hands are the biggest part of a , sug­gest­ing the at­ten­tion we pay to them; no won­der that so many il­lus­tra­tions care­fully crop the sub­ject to avoid hands, or tuck hands into dresses or sleeves, among the im­pres­sive va­ri­ety of tricks artists use to avoid de­pict­ing hands. Com­mon GAN fail­ure: hands.

Trick: tar­get er­rors us­ing data aug­men­ta­tion. But even in face/portrait crops, hands ap­pear fre­quently enough that StyleGAN will at­tempt to gen­er­ate them, as hands oc­cupy a rel­a­tively small part of the im­age at 512px while be­ing highly var­ied & fre­quently oc­clud­ed. BigGAN does some­what bet­ter but still strug­gles with hands (eg our ), un­sure if they are round blobs or how many fin­gers should be vis­i­ble. One way to train a NN is to over­sam­ple hard data points by ac­tive learn­ing: seek out the class of er­rors and add in enough data that it can and must learn to solve it.3 Faces work well in our BigGAN be­cause they are so com­mon in the data, and can be fur­ther over­loaded us­ing my anime face datasets; bod­ies work rea­son­ably well and bet­ter after Dan­booru2019 Fig­ures was cre­ated & added. By the same log­ic, if hands are a glar­ing class of er­rors which BigGAN strug­gles with and which par­tic­u­larly break sus­pen­sion of dis­be­lief, adding ad­di­tional hand data would help fix this. The most straight­for­ward way to ob­tain a large cor­pus of anime hands is to use a hand de­tec­tor to crop out hands from the ~3m im­ages in Dan­booru2019.

Hand Model

There are no pre­trained anime hand de­tec­tors, and it is un­likely that stan­dard hu­man pho­to­graphic hand de­tec­tors would work on anime (they don’t work on anime faces, and anime hands are even more styl­ized and ab­strac­t).

Rolling my own. Ar­fafax had con­sid­er­able suc­cess in hand-la­bel­ing im­ages (us­ing a cus­tom web in­ter­face for draw­ing bound­ing boxes on im­ages) for a YOLO-based furry fa­cial land­mark & face de­tec­tor, which he used to se­lect & align im­ages for his /. We de­cided to use his work­flow to build a hand de­tec­tor and crop hands from Dan­booru2019. Aside from the data aug­men­ta­tion trick, an anime hand de­tec­tor would al­low fil­ter­ing out data with hands, gen­er­ated sam­ples with hands, and doubt­less peo­ple can find other uses for it.

Hand Annotations

Cus­tom Dan­booru an­no­ta­tion web­site. In­stead of us­ing ran­dom Dan­booru2019 sam­ples, which might not have use­ful hands and would yield mostly ‘easy’ hands for train­ing, we en­riched the cor­pus by se­lect­ing the 14k im­ages cor­re­spond­ing to hands rating:s: hands is a Dan­booru tag used “when an im­age has a char­ac­ter’s hand(s) as the main fo­cus or em­pha­sizes the us­age of hands.” All the sam­ples had hands, in differ­ent lo­ca­tions, sizes, styles, and oc­clu­sions, and some sam­ples were chal­leng­ing to an­no­tate:

Ex­am­ple of an­no­tat­ing hands in the web­site for 2 par­tic­u­larly chal­leng­ing Dan­booru2019 im­ages

Bit­ing the bul­let. We used Shawn Presser’s an­no­ta­tion web­site May–June 2020, and in to­tal, we an­no­tated n = 14,394 hands in k = 5,382 im­ages (JSON). (I did ~10k an­no­ta­tions, which took ~24h over 3–4 evenings.)

Ran­dom se­lec­tion of 297 hand-an­no­tated hands cropped from Dan­booru2019 hands im­ages (down­sized to 128px)

YOLO Hand Model

Off-the-shelf YOLO model train­ing. I trained a YOLOv3 model us­ing the Alex­eyAB dark­net repo fol­low­ing Ar­fafax’s note­book us­ing largely de­fault set­tings. With n = 64 mini­batches & 2k it­er­a­tions on my 1080ti, it achieved a ‘to­tal loss’ of 1.6; it did­n’t look truly con­verged, so I re­tried with 6k it­er­a­tions & n = 124 mini­batches for ~8 GPU-hours, with a fi­nal loss of ~1.26. (I also at­tempted to train a YOLOv4 model with the same set­tings other than ad­just­ing the subdivisions=16 set­ting, but it trained ex­tremely slowly and had not ap­proached YOLOv3’s per­for­mance after 16 GPU-hours with a loss of 3.6, and the YOLOv3 hand-crop­ping per­for­mance ap­peared sat­is­fac­to­ry, so I did­n’t ex­per­i­ment fur­ther to fig­ure out what mis­con­fig­u­ra­tion or other is­sue there was.)

Good enough. False pos­i­tives typ­i­cally are things like faces, flow­ers, feet, clouds or stars (par­tic­u­larly five-pointed ones), things with many par­al­lel lines like floors or cloth­ing, small an­i­mals, jew­el­ry, text cap­tions or bub­bles. The YOLO model ap­pears to look for round ob­jects with ra­dial sym­me­try or rec­tan­gu­lar ob­jects with par­al­lel lines (which makes sense). This model could surely be im­proved by train­ing a more ad­vanced model with more ag­gres­sive data aug­men­ta­tion & do­ing ac­tive learn­ing on Dan­booru2019 to fine­tune hard cas­es. (R­eread­ing the YOLO docs, one eas­ily reme­died flaw is the ab­sence of neg­a­tive sam­ples: hard-min­ing hands meant that no im­ages were la­beled with zero hands ie. every im­age had at least 1 hand to de­tect, which might bias the YOLO model to­wards find­ing hand­s.)

Cropping Hands

58k 512px hands; 96k to­tal. I mod­i­fied Ar­fa’s ex­am­ple script to crop the SFW4 Dan­booru2019 dataset (n = 2,285,676 JPG/PNGs) with the YOLOv3 hand model at a thresh­old of 0.6 (which yields roughly 1 hand per 10–20 orig­i­nal im­ages and a false pos­i­tive rate of ~1 in 15); after some man­ual clean­ing along the way, this yielded n = 96,534 cropped hands. (The meta­data of all de­tected hand crops are avail­able in features.csv.) To gen­er­ate full­sized ≥512px hands use­ful for GAN train­ing, I copied im­ages ≥512px width, skipped im­ages <128px in width, used wai­fu2x to up­scale 2× & down to 512px im­ages which are 256–511px width, and up­scaled 4× & down 128–255px width im­ages. This yielded n = 58,536 fi­nal hands. Im­ages are loss­ily op­ti­mized with . (Note that the out­put of the YOLOv3 mod­el, filename/bounding-box/confidence for all files, is avail­able in features.csv in the PALM repo for those who want to ex­tract hands with differ­ent thresh­old­s.)

Ran­dom sam­ple of up­scaled sub­set of Dan­booru2019 hands

Hands Download

The PALM YOLOv3 model (Mega mir­ror; 235M­B):

rsync --verbose rsync://78.46.86.149:873/biggan/palm/2020-06-08-gwern-palm-yolov3-handdetector126.weights ./

The orig­i­nal hands cropped out of Dan­booru2019 (n = 96,534; 800M­B):

rsync --recursive --verbose rsync://78.46.86.149:873/biggan/palm/original-hands/ ./original-hands/

The up­scaled hand sub­set (n = 58,536; 1.5G­B):

rsync --recursive --verbose rsync://78.46.86.149:873/biggan/palm/clean-hands/ ./clean-hands/

The train­ing dataset of an­no­tated im­ages, YOLOv3 con­fig­u­ra­tion files etc (k = 5,382/n = 14,394; 6G­B):

rsync --verbose rsync://78.46.86.149:873/biggan/palm/2020-06-09-gwern-palm-yolov3-trainingdatasetlogs.tar ./

Hands Citing

Please cite this dataset as:

  • Gw­ern Bran­wen, Ar­fafax, Shawn Presser, Anony­mous, & Dan­booru com­mu­ni­ty; “PALM: The PALM Anime Lo­ca­tion Model And Dataset”, 2020-06-12. Web. Ac­cessed [DATE] https://www.gwern.net/Crops#hands

    @misc{palm,
        author = {Gwern Branwen and Arfafax and Shawn Presser and Anonymous and Danbooru community},
        title = {PALM: The PALM Anime Location Model And Dataset},
        howpublished = {\url{https://www.gwern.net/Crops#hands}},
        url = {https://www.gwern.net/Crops#hands},
        type = {dataset},
        year = {2020},
        month = {June},
        timestamp = {2020-06-12},
        note = {Accessed: DATE} }

  1. Holo faces were far more com­mon than Asuka faces. There were 12,611 Holo faces & 5,838 Asuka faces, so Holo was only 2× more com­mon and Asuka is a more pop­u­lar char­ac­ter in gen­eral in Dan­booru, so I am a lit­tle puz­zled why Holo showed up so much more than Asu­ka. One pos­si­bil­ity is that Holo is in­her­ently eas­ier to model un­der the trun­ca­tion trick—I no­ticed that the brown short­-haired face at 𝜓 = 0 re­sem­bles Holo much more than Asuka, so per­haps when set­ting 𝜓, Asukas are dis­pro­por­tion­ately fil­tered out? Or faces closer to the ori­gin (be­cause of brown hair?) are sim­ply more likely to be gen­er­ated to be­gin with.↩︎

  2. I’ve mir­rored the man­u­al­ly-seg­mented anime fig­ure dataset & the face/figure seg­men­ta­tion mod­els:

    rsync --verbose rsync://78.46.86.149:873/biggan/2019-04-29-jerryli27-aniseg-figuresegmentation-dataset.tar ./
    rsync --verbose rsync://78.46.86.149:873/biggan/2019-04-29-jerryli27-aniseg-models-figurefacecrop.tar.xz   ./
    ↩︎
  3. It’s been pointed out that our StyleGAN2-ext model seems to try to hide hands, de­spite be­ing trained with PALM. It is pos­si­ble that GANs par­tic­u­larly ben­e­fit from crop­ping data aug­men­ta­tion be­cause the ad­ver­sar­ial train­ing en­cour­ages mod­e-drop­ping of par­tic­u­larly hard mod­es: the Gen­er­a­tor may avoid gen­er­at­ing hands be­cause flawed hands are too eas­ily de­tected by the Dis­crim­i­na­tor, and be­cause so many real im­ages don’t have hands in them, omit­ting hands in­curs lit­tle Dis­crim­i­na­tor penal­ty.↩︎

  4. NSFW did not yield good re­sults.↩︎