×
all 79 comments

[–]gwern[S] 46 points47 points  (0 children)

(Happy Valentine's Day.)

[–]MrEldritch 29 points30 points  (9 children)

In contrast, the Danbooru dataset is larger than ImageNet as a whole and larger than the current largest multi-description dataset, MS COCO, with far richer metadata than the "subject verb object" sentence summary that is dominant in MS COCO or the birds dataset (sentences which could be adequately summarized in perhaps 5 tags). While the Danbooru community does focus heavily on female anime characters, they are placed in a wide variety of circumstances with numerous surrounding tagged objects or actions, and the sheer size implies that many more miscellaneous images will be included. It is unlikely that the performance ceiling will be reached anytime soon, and advanced techniques such as attention will likely be required to get anywhere near the ceiling. And Danbooru is constantly expanding and can be easily updated by anyone anywhere, allowing for regular releases of improved annotations.

I really hope this doesn't end up getting ignored as "not serious" or "not respectable" as a dataset. I agree with /u/gwern - I think this really could be a very useful dataset for more advanced image understanding techniques, and I frankly don't see how we could otherwise get a better dataset with similarly rich metadata for any feasible amount of effort.

[–]gwern[S] 18 points19 points  (6 children)

Yeah. The first time I ran into Danbooru my reaction was amazement: 'what an incredible amount of metadata all hand-contributed by humans! and there's even an API? how is no one using this for computer vision AI yet?!' I waited 3 years and still nothing comes close to the density of annotation (even if there are various datasets which have more images), so since no one else was going to do it, I did.

[–]MrEldritch 18 points19 points  (4 children)

In all seriousness, the next best dataset if this takes off might be e621. It is ... almost exclusively furry pornography, but the dataset is of the same order of magnitude in size (1.3m vs. 2.9m) and MUCH more diverse (in both content and visual style) than Danbooru. And the tagging is comparable in quality, if not greater - although for the 'long tail' of more obscure tags, the problem persists of tag absences not necessarily signifying tag negations.

An example task - that's tricky even for human taggers - would be tagging character gender. There isn't enough of a bias to declare a "default" gender that images are assumed to contain (like Danbooru's focus on anime girls), and there is an extremely large segment of confusing, ambiguous cases - intersex characters, extremely feminine-looking male characters, transgender characters, etc. However, because of e621's 'tag what you see policy', all tagging decisions are ultimately based only on what's available in the post itself, and not external context, so the correct answers are still decidable based only on pixels. (The quantity of intersex characters alone makes tagging gender a more challenging task, requiring relational reasoning - you cannot infer what gender characters an image contains only from the presence of obvious primary or secondary sexual characteristics, you also have to take into account whether they're attached to the same person)

But you would never, ever get anyone to take it seriously.

[–]gwern[S] 11 points12 points  (0 children)

The diversity is a good point, beyond just 'more dakka!'. But I thought a little about the broader idea of grabbing images from multiple boorus and concluded that it's probably pointless overall. It's like that Google paper on 'revisiting the unreasonable effectiveness of data' and showing CNN classification gains continue up to n=300m - yeah, that's great, but even Google can only just barely train a CNN to convergence on n=300m much less tweak it or do research in that regime. 2.9m images is already more than most people can swallow. Adding more images to that will just slow everything down more as you need that many more minibatches/computation; but if you can improve the metadata, you should be able to train a better CNN faster with the same computing power (less noisy gradients, bigger steps), which is why I think that's the right direction for improving Danbooru2017, until such time as we all have TPU pods with 200+ TFLOPs.

But you would never, ever get anyone to take it seriously.

Yep... I feel sorry for the furries. Truly the bottom rung of the nerd totem pole.

[–][deleted] 4 points5 points  (0 children)

although for the 'long tail'

I see what you did there.

[–]Muffinmaster19 2 points3 points  (1 child)

I am glad to see someone else has thought of this.

I have been planning to scrape all of e621's images and tags, then resize the images to 256x256, and the tags to vectors(there is both discrete and continuous metadata so maybe separate those).

Then give the dataset a name that seems legitimate at the acronym level, like ACAP - the Anthropomorphic Cartoon Animal Pornography dataset.

As you say, nobody would take it seriously but that doesn't change the fact that it is ~1.3 million images with thousands of attributes.

Imagine training a Conditional GAN on that to generate furries, it would be hilarious.

[–]MrEldritch 1 point2 points  (0 children)

I think Flash animations/interactives, animated .GIFs, and WebM videos actually make up a fairly significant chunk of the site's images, along with images that would be too confusing to classify or illegible when downscaled: stuff like huge images with many characters (leading to a zillion tags for details that would probably be illegible at low-res), or comics with multiple panels, or images with an excessive aspect ratio. The total images left over after processing would probably be less than 1 million, although still of the same order of magnitude.

[–]EliezerYudkowsky 8 points9 points  (0 children)

I had exactly the same thought. Kudos for actually doing it!

[–]gogogoscott 1 point2 points  (0 children)

It is definitely something shouldn't be overlooked.

[–]epicwisdom 0 points1 point  (0 children)

A significant financial investment into replicating (on a much smaller scale) Google's image crawler, perhaps.

[–]gwern[S] 7 points8 points  (3 children)

By popular demand, there is now a SFW downscaled 512x512px subset (241GB, 2.2m images) available as a torrent: https://gwern.net/doc/anime/danbooru2017-sfw512px-torrent.tar.xz This should address everyone's concerns about too much disk space, the NSFW content and legal/reputational risk, and the annoyance of pre-processing to downscale the big images.

[–]gwern[S] 2 points3 points  (0 children)

By further popular demand due to issues with torrents, now I've set up a rsync server (and if you can't get rsync working, I have to give up). Quick start:

rsync --recursive --times --verbose rsync://78.46.86.149:873/danbooru2017 ./danbooru2017/

This will download both the full & 512px version; if you want only one of them or just an arbitrary subset, list the files and grab what you need. eg if you only want the tag metadata, you can do:

rsync --verbose rsync://78.46.86.149:873/danbooru2017/metadata.json.tar.xz ./

[–]gwern[S] 0 points1 point  (0 children)

Another question: how useful would an AWS S3 'requested-pays' bucket be for people? It theoretically would make it much easier to run DL on the 512px SFW dataset since you would get within-AWS transfer speeds/storage/attachable buckets.

Poll: https://twitter.com/gwern/status/966507701261012992

[–]fosa2 0 points1 point  (0 children)

Thanks for this, which torrent file has the metadata table?

[–]zawerf 5 points6 points  (4 children)

Noob question, how do you deal with a dataset with such varying dimensions? There are images with a height of 30000px in there. They seem to be comic strips so you can't just resize them. Do you bother chopping those up or just filter them? And also a good chunk of them are normal aspect ratios but with unnecessarily high dpi (I guess these I can just rescale).

This seems like a lot of cleaning work that everyone using this dataset has to repeat.

Also if you don't want to accidentally view CP (which there's a lot of...), you need to filter for only "rating"=="s" for safe images.

Edit: Some more issues: Images with highly negative score are also included. Metadata that is_deleted, is_banned, is_flagged, is_pending are (expectedly) missing their images so they need to be filtered. There are also images in all kinds of formats and needs to be converted to rgb/rgba first or you get a random number of color channels. I didn't even know 2-channel image formats exist (I am guessing grayscale + alpha?).

Edit2: Looking at this dataset some more I am not sure I can ever associate my real identity with it. Maybe you can split out the NSFW part as a separate dataset so it's safe by default?

[–]MrEldritch 7 points8 points  (0 children)

Personally, I'd just filter out anything with an aspect ratio more extreme than 1:3 or so, and then pad everything to 1:1.

[–]gwern[S] 2 points3 points  (2 children)

For the full-scale dataset, the sanest workflow would be to iterate over directories for JPG/PNG/BMP files, look up the filename as ID to get the metadata, possibly check against a blacklist of tags (aside from rating, I think there are a number of tags which are probably better off being removed even if you don't remove the image, stuff like "seiyuu_connection" where it is unreasonable to expect a CNN to learn them), and then convert to 512x512px JPG to feed into your CNN. This automatically handles missing images (they aren't in the directory to be iterated over), weird file types like HTML or SWF or no extension at all, lets you filter out on anything you want to, your image library will handle the channels, etc. This sort of munging is inherent to any dataset, there's not much that can be done aside from packaging up a standard 'data loader' function for the various frameworks like PyTorch, but obviously they wouldn't accept such a patch until the dataset has been out there for a while.

Maybe you can split out the NSFW part as a separate dataset so it's safe by default?

See my other comment: https://www.reddit.com/r/MachineLearning/comments/7xk4zh/p_danbooru2017_a_new_dataset_of_294m_anime_images/du9qqsf/ I assume a 512x512px SFW downscaled version (~300GB) would be adequate for you?

[–]zawerf 2 points3 points  (1 child)

By the way sorry if it sounded like I was complaining. I was mostly listing basic stuff for the benefit of other noobs like me who are encountering their first non mnist or imagenet dataset.

I assume a 512x512px SFW downscaled version (~300GB) would be adequate for you?

Kind of. Re-establishing a reputation as a SFW dataset is probably the most important part since the actual preprocessing I can do myself.

Controversy is inevitable once the outrage society knows what's in the dataset. And good luck telling the FBI that the gigabytes of loli/guro/etc on your computer is "for research". See the numerous cases listed in https://en.wikipedia.org/wiki/Legal_status_of_cartoon_pornography_depicting_minors#United_States

(though it will probably be the first time in history that the "for research" defense will actually hold up)

[–]gwern[S] 3 points4 points  (0 children)

That's really not a lot of cases, and the free speech grounds are quite shaky. Danbooru itself is located in the USA and always has been for the past 13 years.

[–]MrEldritch 4 points5 points  (4 children)

Oh man, I've been waiting eagerly for this to come out!

...although now that it's out, I'm realizing that I can't actually download it, since 1.9tb would more than fill all the storage media I own. I'm going to have to buy a new external hard drive if I want to play with this, I suppose.

[–]gwern[S] 12 points13 points  (3 children)

Hm. I didn't realize 1.9TB was going to be such a problem. So... out of curiosity, how interested would people be in something like me shipping a 3TB internal HDD with Danbooru2017 preloaded to people? (Looks like such drives only cost ~$60 on Newegg, so I could either donate it for students with some sort of track record or just charge $70 for HDD+S&H. This sort of thing seems to work well for Nvidia.)

[–]RDxzFCFF3qeTIdWNXIWO[🍰] 5 points6 points  (1 child)

I wonder if there's a market big enough for a service that ships large training data sets, for example this one or coco and imagenet.

I'd be interested, but shipping from US to EU usually doesn't work well and customs are a hassle. Also I'd have to trust random internet people to not run away with my money.

[–]Mandrathax 5 points6 points  (5 children)

Nice. You should cross-post to r/anime

[–]gwern[S] 9 points10 points  (4 children)

Man, screw /r/anime. I submitted a great article to them the other day, and their bot snidely removed it for having a live-action connection.

[–]Mandrathax 1 point2 points  (3 children)

I think you will have more success with this one^

[–]gwern[S] 24 points25 points  (2 children)

Well, alright. But I'm not crossposting because I want to make you happy or anything, understand, I just want to prove you wrong!

[–]Colopty 13 points14 points  (0 children)

Excellent tsundere act.

[–]mirh 0 points1 point  (0 children)

Well, truth be told you shouldn't have reused the same title of this sub.

I mean, even putting aside who doesn't get what a "dataset" is in the first place, in that context.. I dunno, even I would be a bit lost, at least for some seconds?

[–]Ending_Credits 2 points3 points  (5 children)

You might want to run a face extraction routine on a few hundred thousand images, similar to https://github.com/jayleicn/animeGAN . It makes a good harder alternative to celebA, plus you have freely available identity information (which is not the case with celebA, although you can ask nicely, and they'll give it to you).

I actually made a dataset of 150000 faces of ~500 characters, grouped by character, but also with tags, made using the tools int he above repo. It's actually quite reasonably sized so I can upload it somewhere if people are interested, just need to find somewhere to host it. Example of the fun things you can do with it here: https://github.com/EndingCredits/Set-CGAN

[–]gwern[S] 2 points3 points  (0 children)

What I did when I was playing around with a face GAN was use https://github.com/nagadomi/lbpcascade_animeface on an earlier dump of Danbooru images. It worked reasonably well but not perfectly. I wouldn't want to distribute an auto-extracted face dataset without some way of semi-manually verifying it or using a better face extractor (perhaps trained on Danbooru2017! I'm sure there's some interesting semi-supervised thing you could do - you know from the tags what images probably have faces somewhere in them, if not bounding boxes of where...).

On a side note, I'd love to see people try out ProGAN on Danbooru2017. No more excuses about not having high resolution images!

[–]Skylion007 2 points3 points  (3 children)

I actually am working on releasing a paper using this soon and I'll publish the dataset. I plan to release the face dataset with u/gwern as well. So for I have about 1.1 million faces that I was able to extract. Stay posted. :)

[–]Ending_Credits 0 points1 point  (1 child)

Would be interested to hear your tips and tricks.

Personally I found https://www.microsoft.com/en-us/research/publication/stabilizing-training-of-generative-adversarial-networks-through-regularization/ works very nicely, although it still suffers from melty waifu syndrome.

Part of me wonders if to get truly good samples without geometric distortions, we might need more sophisticated generator architectures (e.g. multiple 'passes' which sucessively build up the image). It's pretty impressive what we get considering everything is generated end-to-end.

Thinking about it, a few more FC layers before the conv layers might also be good in this regard (I did see a paper suggesting to train a predictor of the z to avoid mode collapse for adding mroe of these layers.).

[–]gwern[S] 0 points1 point  (0 children)

All the existing anime GANs thus far have been on a small enough scale, both compute and parameter-wise, that I'm not totally convinced we yet have a good handle on how much the current GAN/CNN architectures are badly suited. I mean, people were generating tons of crummy CelebA facial samples but we know from ProGAN that simply stacking layers continuously for 2 or 3 weeks on a top-end GPU can produce damn near photorealistic CelebA facial samples and eliminates all the nasty distortions and artifacts we see earlier in training.

Perhaps the 'watercolor effect' and other geometric distortions are like that too, artifacts of incomplete training or poor convergence. The early stages or sequence of progress doesn't have to make sense to us. (For example, the Zero paper notes that without the 'ladder' features encoded into AG1, Zero doesn't manage to learn to defeat ladders on its own until surprisingly far into training, despite how trivial we find ladders; doesn't stop it from quickly becoming superhuman though.)

[–][deleted] 0 points1 point  (0 children)

Looking forward to seeing your face dataset! I purchased Danbooru gold just so I could get enough filters to get my own high-quality face dataset, but I didn't get anywhere close to 1.1 million faces.

[–]visarga 6 points7 points  (7 children)

NSFW

[–]gwern[S] 23 points24 points  (5 children)

It's only 8.7% NSFW, which is another way of saying it's 91.3% SFW, really.* (And you can just filter out the "e" tagged images in your code if it's a problem. Or better yet, test out how well CNNs work for NSFW classification on illustrations/anime rather than photos; mind-melting horrors optional.)

* EDIT: OK I've looked a little closer at 'q' images & Danbooru official rules, and it's a lot more permissive than I thought, so maybe something like 80-85% ("s" + some of "q") is more genuinely SFW.

[–]asquared31415 1 point2 points  (4 children)

That’s still half of my primary hard drive filled with NSFW material.

[–]gwern[S] 4 points5 points  (3 children)

(Your primary drive is only 300GB...? Doesn't that interfere with all your other ML stuff?)

In any case, I think it's fair to ask people to store the full dataset: this is how torrents scale and stay alive - we all contribute bandwidth and drive space even if we have already finished watching the movie or whatever. If it gets split into multiple torrents and are fragmented, torrents will die or be slower.

I've thought about providing a SFW-only 512x512px downscaled version torrent, for people with less disk space, but am not yet convinced the fragmentation is worthwhile.

[–]MrEldritch 6 points7 points  (1 child)

I think a downscaled version is basically essential. By making the minimum download size larger than most people's hard drives, you essentially make this dataset completely inaccessible to hobbyists and amateurs - who are precisely the people that would be most interested in an anime-based dataset from a popular booru.

Even the full-size version of ImageNet is rarely used because it's considered prohibitively large - both in file size, and file resolution. This full-version dataset is even more painful, on both counts, and is an enormous amount to download for a dataset most researchers and users will have to immediately spend considerable computing time downscaling into a size their models can actually use.

I appreciate that you really can't sustainably host the full dataset yourself, such that the only way it can be available to the community in a sustainable long-term fashion is if many other people download and seed it. But I fear that the enormous size of this dataset will not lead to people grudgingly sighing and choosing to download and peer the full size when they would have much preferred a smaller; it will just lead to them not using your dataset.

[–]gwern[S] 9 points10 points  (0 children)

I think you may be right. There's 2 people here so far who sound like they would've downloaded it if it was smaller like 300GB, and I've gotten another 2 similar comments elsewhere, so it's a fair number of people by the usual rule of thumb of 1%. I've started a 512px rescale job on the server to see how big a converted one (with the "e"/"q" images deleted, since that's the other complaint, having NSFW mixed in) would be.

(Rescaling is actually not that hard; with 8 threads you can chew through Danbooru2017 in a few hours or a day at most. I think it's partially because the bottleneck is often the hard drive especially for writes, but if you're writing out 512px images, of course they're a lot smaller. At under an hour, it's converted ~80k images so far.)

[–]asquared31415 0 points1 point  (0 children)

My primary drive, with the OS and software is a 250GB SSD (I guess it would actually take about 2/3 now that I do the math better), but I have a 2TB HDD for large files and data, and I plan to upgrade my primary drive soon, it’s a couple years old now.

I was just making a comment on how you’re saying “it’s only 8.7%” (I agree that’s a rather small percentage) while many people would look at the raw amount, and not in proportion to the entire data set (150 GB is quite a bit if you don’t work with lots of data on a regular basis)

[–]skgoa 1 point2 points  (0 children)

Yeah, that is a big roadblock. It means I can't play with this dataset on my work computer. A SFW version would be super amazing, though.

[–]mikhael4440 2 points3 points  (0 children)

Nice-u

[–]poctakeover 3 points4 points  (0 children)

baka :/

[–]deeppomf 3 points4 points  (8 children)

Is it possible to only download NSFW images?

[–]gwern[S] 3 points4 points  (6 children)

If you only want NSFW images*, the torrent may not be a good idea. As I pointed out, "e" images make up <10% of the torrent. If you are dealing with such a small subset of images, you are probably better off using a API loop or a tool like DanbooruDownloader; you could then also easily pick up everything uploaded in January & February 2018 too and include tags as filters.

* I'm not sure why you would want only NSFW images; surely even for fapping purposes you would want more than just "e" images?

[–]deeppomf 17 points18 points  (5 children)

Some images are restricted to gold accounts only, and I need as much data as possible for my hentai decensoring project: https://github.com/deeppomf/DeepMindBreak

[–]q914847518 5 points6 points  (0 children)

Great project. I am watching it. (๑•̀ㅂ•́)و✧

[–]gwern[S] 1 point2 points  (3 children)

I see. That does make sense. But couldn't you treat that as a subset of general inpainting and use all images with random sections cropped out during training?

[–]MrEldritch 5 points6 points  (1 child)

Given that the subset of things that'd be covered by censor bars in an anime image that users would want to decensor is quite limited, I'm not sure that adding a large proportion of non-lewd artificial censors would actually improve quality. You don't want the inpainter thinking that perfectly SFW stuff is a likely candidate for what's under a censor bar.

[–]gwern[S] 4 points5 points  (0 children)

I was thinking in terms of regularization and global knowledge - even for deeppomf's rather specific use-case, I would expect a CNN to benefit from understanding the overall geometry of the scene, what characters and genders are present, the image style and colors, and what the non-genital parts of the censored-out box are (it's not an exact outline of a penis or whatever, after all, it'll cover SFW parts of clothing or bodies as well).

[–]deeppomf 1 point2 points  (0 children)

Resolution and content are issues. Many images are too large, and downscaling them loses too much detail. Since NSFW images make up a small proportion of all images, the learning the GAN does for inpainting SFW images probably interferes the inpainting for NSFW. (I don't have proof of this, but that's my hunch. It's like trying to inpaint faces, except images of other body parts are included in the training data. The other body parts aren't improving face inpainting.)

To solve these problems, I use cropped images of uncensored penises and vaginas for training data.

[–]MrEldritch 0 points1 point  (0 children)

Not directly, but there is a potential solution. The dataset is broken up into ten smaller torrents that each contain 1/10th of the dataset. You could download the dataset 1/10th at a time, delete all the images not tagged as 'explicit', and repeat with the next tenth. That should allow you to end up with only the NSFW images, without filling up your hard drive.

[–]Silver_Sky 0 points1 point  (0 children)

What are the best torrent clients now? Bittorrent has been my main client for a while, but it seems to have major trouble with these large torrents. Thanks for any recs.

[–][deleted]  (4 children)

[deleted]

    [–]gwern[S] 0 points1 point  (3 children)

    Works fine for me and http://downforeveryoneorjustme.com/gwern.net (Being down would be weird as it's just a static site on S3 cached by Cloudflare, there's not really anything to be down.)

    [–][deleted]  (2 children)

    [deleted]

      [–]gwern[S] 0 points1 point  (1 child)

      I don't run Danbooru, Albert does. But look on the bright side, at least my website & torrent are still working?

      [–]girlyman1 0 points1 point  (0 children)

      Well its no big deal for me anyway lol its fine

      [–]fosa2 0 points1 point  (3 children)

      Where do we download the tags dataset? No way to get the tags database for just the 300m SFW images?

      [–]gwern[S] 0 points1 point  (0 children)

      The tags dataset is in the tarball which is included in the first torrent of both the full and the SFW subset, so if you download either, you should have it. If you don't want to download any images, you can sign up on BigQuery and dump it - it's a bit of a nuisance since you have to provide a CC and jump through a few hoops saving it to your Google account and then downloading it, but shouldn't take more than half an hour. In both cases, you are getting the full tag dataset, there being no point in providing just the SFW subset - if you look up by image ID, by definition you'll never hit the NSFW records, and if you're querying the JSON record by record, you can just throw in a rating==1 or whatever clause. (Although I think you can probably have BigQuery do a SQL filter before dumping.)

      [–]gwern[S] 0 points1 point  (1 child)

      I've set up a rsync server so you can download just the tag tarball now if you want: see https://www.reddit.com/r/MachineLearning/comments/7xk4zh/p_danbooru2017_a_new_dataset_of_294m_anime_images/dvvf8yo/

      [–]fosa2 0 points1 point  (0 children)

      Thank you very much! I was able to use the original 1.9TB link to grab the tags, but your effort with this anime dataset is very! much appreciated

      [–]inkplay_ 0 points1 point  (2 children)

      I am new at this, how do you download this? The torrent either doesn't work with most clients or has no seed in windows version of transmission.

      [–]gwern[S] 0 points1 point  (1 child)

      Use a client that works, I guess. If you can't figure it out, I can give you SSH access to my seedbox and you can rsync it down.

      [–]inkplay_ 0 points1 point  (0 children)

      Transmission works, the torrent loads but there are no peers so the speed stays at zero. I'll try transmission again, maybe its my network.

      [–]Arias-go 0 points1 point  (1 child)

      An amazing and exciting dataset. That's exactly what i am looking for. By the way, I can not download even part of the dataset with SFW torrent files. No user upload...

      [–]gwern[S] 0 points1 point  (0 children)

      More client issues, I assume. As far as I can tell, the SFW torrents are working fine (I just logged in and I can see 5 of them in rtorrent seeding a collective 100kb/s up). As usual, if you can't figure out your torrent problem, I can give you a SSH login and you can just rsync/scp it down.

      [–]unguided_deepness -1 points0 points  (1 child)

      Jack off and do some machine learning at the same time, what a great idea!

      [–]poctakeover -1 points0 points  (0 children)

      weeb neural networks

      [–]unguided_deepness -5 points-4 points  (8 children)

      I propose that the model trained from this dataset be called "Incelnet"

      [–]MrEldritch 17 points18 points  (7 children)

      Don't be ridiculous. It'd be DeepWeeb, and the ArXiv submission title would be some kind of strained pun about the "deep web"

      [–]gwern[S] 14 points15 points  (4 children)

      I feel that given the open questions about style transfer, the first model should probably be called 'VeGGeta' and explain how the architecture allows Inception scores of over 9000.

      [–]MrEldritch 18 points19 points  (3 children)

      [–]EliezerYudkowsky 13 points14 points  (2 children)

      That is the greatest acronym I've seen in three years.

      [–]silverius 5 points6 points  (0 children)

      So what did you see on February 15th 2015?

      [–]MrEldritch 5 points6 points  (0 children)

      S-senpai noticed me...

      [–]quanticle 4 points5 points  (1 child)

      You're both wrong. It should be called "hentAI".

      EDIT: Or perhaps SenpAI

      [–]wakeshima 5 points6 points  (0 children)

      SenpAI trained on SFW images only, hentAI trained on the full dataset ( ͡° ͜ʖ ͡°)