×
all 35 comments

[–]artr0x 18 points19 points  (2 children)

Finally DL is used for something truly meaningful

[–]KichangKim[S] 2 points3 points  (1 child)

I hope you enjoy this :)

Although this system is trained by using anime-style images, it can estimate some sort of real photo (especially, cosplay).

[–]gwern 2 points3 points  (0 children)

If you want good performance on photographs as well, one thing you could try is switching to ResNet-101 and then retraining the Tencent pretrained photo models: https://www.reddit.com/r/MachineLearning/comments/adv8jl/r_tencent_mlimages_a_largescale_multilabel_image/ The final trained model will probably work about as well on anime images, but work better on real photos.

[–]gwern 7 points8 points  (6 children)

Could you tell us some more? What sort of ResNet? What sort of accuracy/precision do you get? How many images were used? What have you used it for?

[–]KichangKim[S] 3 points4 points  (5 children)

Here is more detailed information.

  • What sort of ResNet?
  • What sort of accuracy/precision do you get?
    • In fact, I did not have any sort of accuracy/precision value for test-dataset. All data is used for only training. The initial loss of training session is about 4000 (binary cross entropy), then finally it is optimized to about 68. Model selection and hyper-parameter tuning is performed by my own eyes with my favorite images manually.
  • How many images were used?
    • Total dataset is 3300000 images, but some sort of filtering is applied and about 1900000 images are actually used. (Drop input image which has too few tags)
    • Total training sample count is 18000000. Input images are augmented by scaling (0.9 ~ 1.1), rotation (0 ~ 360) and translation (0.9~1.1 of width or height).
  • What have you used it for?
    • I trained the model via CNTK python library. Input images are transformed into 299x299x3 by PIL. Web iterface is implemented by CNTK python + waitress + click. But Telegram Bot is implemented by CNTK C# Library and used OpenCV for image transform. So you can see the result of evaluation is slightly different between web interface and telegram bot due to difference of image transform routine.
    • All input images and tags are obtained from the site, called Danbooru. (So I called my system DeepDanbooru.)

[–]gwern 2 points3 points  (4 children)

3.3m from Danbooru - so you're using Danbooru2018 for the dataset? Did you run into any particular problems with that?

The initial loss of training session is about 4000 (binary cross entropy), then finally it is optimized to about 68.

Interesting. I'm not quite sure how to interpret that, but uploading a few images to test it out, the tags it gives look reasonable. (A comparison with illustration2vec would be interesting but I suspect illustration2vec would be a lot worse.) It is not displaying 6000 tag estimates, so I assume you have a confidence cut off at ~0.60?

What have you used it for?

By 'it', I meant the trained neural network. I was wondering if you had some intended goal for it like fixing Danbooru tags or automatically organizing a personal collection of wallpapers or generating embeddings for image search etc.

[–]KichangKim[S] 1 point2 points  (3 children)

3.3m from Danbooru - so you're using Danbooru2018 for the dataset? Did you run into any particular problems with that?

I did not use Danbooru2018 dataset. (Acually, I did not know it before your comment.). I downloaded images by using my own toolset and Danbooru API. So the dataset does not have images which is need to gold account.

It is not displaying 6000 tag estimates, so I assume you have a confidence cut off at ~0.60?

You are correct. The system shows only tags which has score > 0.5.

By 'it', I meant the trained neural network. I was wondering if you had some intended goal for it like fixing Danbooru tags or automatically organizing a personal collection of wallpapers or generating embeddings for image search etc.

This project is the proof-of-concept. So I do not have any specific goal at this time. But I think this system can be used for various purpose like your comment. I think image recommendation system is well performed with this too.

[–]gwern 2 points3 points  (2 children)

I did not use Danbooru2018 dataset. (Acually, I did not know it before your comment.). I downloaded images by using my own toolset and Danbooru API.

Ah. That's too bad, it might've saved you a lot of work. Guess I need to advertise Danbooru2017/8 more heavily if people are still making their own mirrors...

So the dataset does not have images which is need to gold account.

As I understand it, because I upgraded my bot account, it should've grabbed the various hidden images. (I don't know whether this really matters since I suspect that you are more limited by tag quality or compute than sheer number of images.)

This project is the proof-of-concept.

I see. Well, it's a good proof-of-concept. There are lots of additional projects you could do to build on it. Please tell us about your future work, either here or on /r/animeresearch .

[–]Skylion007 4 points5 points  (6 children)

You could probably get it to run entirely in the browser if you export it from CNTK to ONNX.js: https://github.com/Microsoft/onnxjs . Might be worth looking into since you could then use static hosting. :)

[–]gwern 1 point2 points  (4 children)

A ResNet-152 is >200MB. It'll take some serious model compression to get that down into something you want to serve to every visitor and run in-browser.

[–]Skylion007 0 points1 point  (3 children)

I've loaded 100MB DL models and classical vision models in the browser before. It's not too much of a concern. A Github Pages repo would give you at least a gig of hosting space for instance.

[–]gwern 0 points1 point  (2 children)

Even if the hoster has all the bandwidth in the world to burn (100MB per page load adds up fast, especially if the Danbooru community were to start using it seriously), it's not a great idea to send people to such a page - lots of people have bandwidth limits, shared bandwidths, older browsers, or other issues.

[–]KichangKim[S] 0 points1 point  (1 child)

Hosting the network for actual user interface is my main concern. Current the size of DeepDanbooru model > 500MB, and hosted by my notebook PC with CNTK (no GPU version). Cloud services (AWS, Azure or other) are too expensive for hobby project :)

[–]Skylion007 0 points1 point  (0 children)

Hence why you could use Github Pages or such as free hosting. There are a tons of way to get free hosting for static files, even rather large ones. That way you could also do all the image processing client side and not have to worry about uploading anything to your server.

[–]KichangKim[S] 0 points1 point  (0 children)

onnxjs looks cool. I'll add it to my improvement plan.

[–]progfu 2 points3 points  (1 child)

May I ask why CNTK? I haven't really seen anyone use it, apart from random Microsoft projects on github.

[–]KichangKim[S] 2 points3 points  (0 children)

It is completely my personal favorite. Also, it supports C# natively, and I like C#. (But training is done by python :) )

[–]KichangKim[S] 0 points1 point  (0 children)

"Use cropping" option is available on the Deep Danbooru web. If it is enabled, input image is automatically cropped to multiple parts while evaluation. So you can get more tags but it may less precise (and it takes more processing time, ~x6).

[–]Gray_Gryphon 0 points1 point  (1 child)

Out of curiosity, does your training set include copyright/artist tags? I've noticed it can occasionally tag characters (even if it's not that good at recognizing them), but haven't seen any copyright or artist tags appear for any of the pictures I've uploaded.

[–]KichangKim[S] 0 points1 point  (0 children)

It does not have copyright/artist tags. Only general and character tags are used.

[–]zyddnys 0 points1 point  (7 children)

Will you make the trained model available for download?

[–]KichangKim[S] 0 points1 point  (5 children)

Not yet. I'll release the model if I found any solution for large file hosting.

[–]gwern 0 points1 point  (0 children)

I've found Mega.nz good for hosting NN models so far.

[–]MrEldritch 0 points1 point  (0 children)

I'm also interested in the trained model - I think it would be a good starting point for retraining on e621 for tag estimation there.

[–]MrEldritch 0 points1 point  (0 children)

I'm going to second the recommendation for MEGA.

[–]binkarus 0 points1 point  (1 child)

How big is the model? I can probably host it for you.

[–]KichangKim[S] 0 points1 point  (0 children)

Hi, I uploaded my pretrained model to Google Drive. You can download it now. See the original post.

[–]KichangKim[S] 0 points1 point  (0 children)

I have uploaded the model. Check the original post.

[–]TotesMessenger 0 points1 point  (0 children)

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

 If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads. (Info / Contact)

[–]halcy 0 points1 point  (0 children)

For people who want to play with this in tensorflow, here's the v2 model converted (hopefully properly - had to patch up quite a few things along the way, model seems to work fine though) to tensorflow using mmdnn:

Model: https://drive.google.com/file/d/1gq7QFVzBxzw-jGiKB1siJzvxg5mGuUH6/view?usp=sharing

edit: https://drive.google.com/open?id=12uzze66YirvUAccZDfZNsEq34No4u7Fo and the tag list (same as in original download) Example ipython notebook: https://drive.google.com/file/d/1dM5SinN3ppGY67R6IuTksMmGy2WrDiIf/view

[–]artpnp 0 points1 point  (2 children)

Hello.

I am very honored to find your reddit post on google.

I am the webmaster of a anime pics site. We have always hoped that if there exist api with automatic labeling can be used, but since the images has a part involving NSFW content, no company is willing to train our wished ai auto-labelling api, even if it is imagga company.

I am very happy that you can provide a download of the model.It's really helpful.

What I want to ask is: If we provide data sets, can you help us train a similar auto tagging model?

We can pay for this.

My email is [artpnp01@gmail.com](mailto:artpnp01@gmail.com)

If you are interested in continuing to discuss, please reddit reply, or contact us by email.

Thank you.

[–]KichangKim[S] 0 points1 point  (1 child)

Hi. DeepDanbooru is my hobby project and I don't have any plan to provide technical support for paying. But I will open my training code via github soon so you can use it for your project and train your own network.

[–]artpnp 0 points1 point  (0 children)

Ok.Get it.

It's will be a really good project that help people to train thier owned network.

I will keep following on your reddit.

Thank you.