Crops (Link Bibliography)

“Crops” links:

  1. ⁠, Gwern Branwen (2019-02-19):

    (TWDNE) is a static website which uses JS to display random anime faces generated by StyleGAN neural networks, along with GPT-3-generated anime plot summaries. Followups: ⁠/​​​​⁠/​​​​⁠.

    A screenshot of “This Waifu Does Not Exist” (TWDNE) showing a random StyleGAN-generated anime face and a random GPT-3 text sample conditioned on anime keywords/​​​​phrases.
  2. Faces

  3. TWDNE

  4. https://github.com/jerryli27/AniSeg/

  5. Faces#cropping

  6. Faces#discriminator-ranking

  7. ⁠, l4rz (2020-04-28):

    I have trained StyleGAN2 from scratch with a dataset of female portraits at 1024px resolution. The samples quality was further improved by tuning the parameters and augmenting the dataset with zoomed-in images, allowing the network to learn more details and to achieved metrics that are comparable to the results of the original work…I was curious how it would work on the human anatomy, so I decided to try to train SG2 with a dataset of head and shoulders portraits. To alleviate capacity issues mentioned in the SG2 paper I preferred to use portraits without clothes (a substantial contributing factor to dataset ); furthermore, the dataset was limited to just one gender in order to further reduce the dataset’s complexity.

    …I haven’t quite been able to achieve the quality of SG2 trained with the FFHQ dataset. After over than 30000 kimg, the samples are not yet as detailed as it is desirable. For example, teeth look blurry and pupils are not perfectly round. Considering the size of my dataset as opposed to the FFHQ one, the cause is unlikely to be the lack of training data. Continuing the training does not appear to help as is evident from the plateau in FIDs.

    Overall, my experience with SG2 is well in line with what others are observing. Limiting the dataset to a single domain leads to major quality improvements. SG2 is able to model textures and transitions quite well. At the same time it is struggling as the complexity of the object increases with, for instance, greater diversity in poses. It should be noted that SG2 is much more efficient for single domain tasks compared to other architectures, resulting in acceptable results much faster.

    Curated samples, Ψ = 0.70
  8. https://github.com/tensorflow/models/tree/master/research/object_detection

  9. ⁠, Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun (2015-06-04):

    State-of-the-art networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R- have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features—using the recently popular terminology of neural networks with ‘attention’ mechanisms, the RPN component tells the unified network where to look. For the very deep model, our detection system has a frame rate of 5fps (including all steps) on a ⁠, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

  10. https://github.com/nagadomi/waifu2x

  11. ⁠, Joseph Redmon, Ali Farhadi (2018-04-08):

    We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that’s pretty swell. It’s a little bigger than last time but more accurate. It’s still fast though, don’t worry. At 320×320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 mAP@50 in 51 ms on a Titan X, compared to 57.5 mAP@50 in 198 ms by RetinaNet, similar performance but 3.8× faster. As always, all the code is online at https:/​​​​/​​​​pjreddie.com/​​​​yolo/​​​​

  12. BigGAN#danbooru2019e621-256px-biggan

  13. ⁠, Nearcyan, Aydao, Shawn Presser⁠, Gwern Branwen (Tensorfork) (2021-01-19):

    [Website demonstrating samples from a modified StyleGAN2 trained on Danbooru2019 using TPUs for ~5m iterations for ~2 months on a pod; this modified ‘StyleGAN2-ext’, removes various regularizations which make StyleGAN2 data-efficient on datasets like FFHQ, but hobble its ability to model complicated images, and scales the model up >2×. This is surprisingly effective given StyleGAN’s previous inability to approach BigGAN’s Danbooru2019, and TADNE shows off the entertaining results.

    The interface reuses Said Achmiz’s These Waifus Do Not Exist grid UI.

    Writeup⁠; see also: Colab notebook to search by CLIP embedding; ⁠/​​​​⁠/​​​​⁠, TADNE face editing⁠, =guided ponies]

    Screenshot of “This Anime Does Not Exist” infinite-scroll website.
  14. #danbooru2019-portraits

  15. #danbooru2019-figures

  16. https://github.com/AlexeyAB/darknet

  17. ⁠, Arfafax (2020-05-07):

    A showcase: high-quality -generated (anthropomorphic animals) faces, trained on cropped from the e621 furry image booru. For higher quality, the creator heavily filtered faces and aligned them, and upscaled using waifu2×. For display, it reuses Obormot’s “These Waifus Do Not Exist” scrolling grid code to display an indefinite number of faces rather than one at a time. (TFDNE is also available on Artbreeder for interactive editing/​​​​crossbreeding, and a for Ganspace-based editing.)

    9 random TFDNE furry face samples in a grid

    Model download mirrors:

    • Google Drive

    • Mega

    • Rsync:

      rsync --verbose rsync://176.9.41.242:873/biggan/2020-05-06-arfafax-stylegan2-tfdne-e621-r-512-3194880.pkl.xz ./

    Previously, ⁠; later: the My Little Pony-themed followup ⁠, and ⁠.

  18. ⁠, Arfafax (2020-07):

    “This Pony Does Not Exist” (TPDNE) is the followup to ⁠, also by Arfafax. He scraped the Derpibooru My Little Pony: Friendship is Magic image booru, hand-annotated images and trained a pony face YOLOv3 cropper to create a pony face crop dataset⁠, and trained the TFDNE StyleGAN 2 model to convergence on TensorFork pods, with an upgrade to 1024px resolution via transfer learning/​​​​model surgery. The interface reuses Said Achmiz’s These Waifus Do Not Exist grid UI.

    10 random pony samples from TPDNE; see also Derpibooru uploads from TPDNE.

    The S2 model snapshot is available for download and I have mirrored it (rsync rsync://176.9.41.242:873/biggan/2020-07-15-arfafax-stylegan2-thisponydoesnotexist-1024px-iter151552.pkl ./). See also: ⁠/​​​​

  19. https://danbooru.donmai.us/posts?tags=hands+rating%3As

  20. 2020-06-08-danbooru2019-palm-handannotations-export.jsonl

  21. https://colab.research.google.com/drive/1xLAsfefpG1Hom7JZYbvdROHugMBNBrmR

  22. https://mega.nz/file/LXoiCAoB#ut9B3kNHVDPKR_Ih1WIwak3gWJQTH_eW4VY6PwYAveA

  23. https://github.com/arfafax/MLP-Face-Dataset

  24. ⁠, Arfafax (2020-02-18):

    Tool for getting the dataset of cropped faces from [furry booru] e621 (NSFW; WikiFur description). It was created by training a network on annotated facial features from about 1500 faces.

    The total dataset includes ~186k faces. Rather than provide the cropped images, this repo contains CSV files with the bounding boxes of the detected features from my trained network, and a script to download the images from e621 and crop them based on these CSVs.

    The CSVs also contain a subset of tags, which could potentially be used as labels to train a conditional GAN.

    File
    get_faces.py Script for downloading base e621 files and cropping them based on the coordinates in the CSVs.
    faces_s.csv CSV containing URLs, ⁠, and a subset of the tags for 90k cropped faces with rating = safe from e621.
    features_s.csv CSV containing the bounding boxes for 389k facial features with rating = safe from e621.
    faces_q.csv CSV containing URLs, bounding boxes, and a subset of the tags for 96k cropped faces with rating = questionable from e621.
    features_q.csv CSV containing the bounding boxes for 400k facial features with rating = questionable from e621.
    Preview grid