Image Synthesis from Yahoo's open_nsfw

Warning: This post contains abstract depictions of nudity and may be unsuitable for the workplace

Yahoo's recently open sourced neural network, open_nsfw, is a fine tuned Residual Network which scores images on a scale of to on its suitability for use in the workplace. In the documentation, Yahoo notes

Defining NSFW material is subjective and the task of identifying these images is non-trivial. Moreover, what may be objectionable in one context can be suitable in another.

What makes an image NSFW, according to Yahoo? I explore this question with a clever new visualization technique by Nguyen et al.. Like Google's Deep Dream, this visualization trick works by maximally activating certain neurons of the classifier. Unlike deep dream, we optimize these activations by performing descent on a parameterization of the manifold of natural images. This parametrization takes the form of a Generative Network, , trained adversarially on an unrelated dataset of natural images.

The "space of natural images", according to , look mostly like abstract art. Unsurprisingly, these random pictures, lacking any kind of semantics, have low scores on the classifier.


D(x) = 0.003	D(x) = 0.06	D(x) = 0.07	D(x) = 0.02	D(x) = 0.003

D(x) = 0.003

D(x) = 0.06

D(x) = 0.07

D(x) = 0.02

D(x) = 0.003

NSFW Images

Following Nguyen et al., we perform projected gradient descent on the following problem

to obtain the maximal activation for . Not surprisingly, the results of the optimization are clearly pornographic.


D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1
D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1
D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1
D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1

D(x) = 1

SFW Images

On there other end of the spectrum, optimizing for SFW images seem redundant, as you might just expect it to be the absence of NSFW content. If this were the case, one would expect to observe most images scoring close to . This is only kinda true. Random images generally score between . This is small, but not . We will try to push this down further by descending on

in the exact same way as above.

Images which maximize the this score all have a distinct pastoral quality - depictions of hills, streams and generally pleasant scenery. This is likely an artifact of the negative examples used in the training set.


D(x) = 1e-9	D(x) = 4e-13	D(x) = 3e-11	D(x) = 3e-11	D(x) = 3e-11
D(x) = 1e-9	D(x) = 1e-10	D(x) = 7e-10	D(x) = 2e-10	D(x) = 3e-12
D(x) = 8e-9	D(x) = 5e-11	D(x) = 1e-12	D(x) = 2e-14	D(x) = 1e-9
D(x) = 5e-11	D(x) = 5e-11	D(x) = 3e-9	D(x) = 9e-12	D(x) = 1e-9

D(x) = 1e-9

D(x) = 4e-13

D(x) = 3e-11

D(x) = 1e-9

D(x) = 1e-10

D(x) = 7e-10

D(x) = 2e-10

D(x) = 3e-12

D(x) = 8e-9

D(x) = 5e-11

D(x) = 1e-12

D(x) = 2e-14

D(x) = 1e-9

D(x) = 5e-11

D(x) = 3e-9

D(x) = 9e-12

D(x) = 1e-9

Synthesizing Pareidolia

Lets take this even further by stripping a layer off this network. The final score, , is in fact calculated from the relative strength of two independent neurons, a "" neuron, and a "" neuron. This explains the phenomena above, as the neuron gets excited on the sight of rolling hills and running brooks, and the excitations of correlate with, well, pornography. The classifier takes in both these expert opinions, and combines them democratically by the softmax,

to get the final score. Since most pornography does not take place with a Thomas Kinkade painting in the background, so this is a fair heuristic for most real world problems. But what happens if we try to excite both neurons simultaneously? This amounts to minimizing

Surprisingly, from my experiments, for and , the relative strength of still dominates. However, there is enough of a contribution of to produce images of a very different flavor.


D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1
D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1
D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1
D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 0.009	D(x) = 0.03

D(x) = 1

D(x) = 0.009

D(x) = 0.03

Spurred on by the success above, I explore the possibility of the generation of images for which activations span two different networks. Nguyen et al. has achieved great results on the MIT scene recognition model places-CNN . What happens when we maximize neurons of places-CNN and open_nsfw together?

We will refer to the places-CNN classifier's belief that an image belongs to category as . These categories are one of possible labels, such as "marketplace" or "abbey". We perform descent on this linear combination of the two objectives:

(The above equation isn't strictly correct, and needs one more tweak for this to work. For details of the optimization, I refer you to the code)

This program produces the most remarkable results. The images generated range from the garishly explicit to the subtle. But the subtle images are the most fascinating as to my surprise they are only seemingly innocent. These are not adversarial examples per-say. The NSFW elements are all present, just hidden in plain sight. Once you see the true nature of these images, something clicks and it becomes impossible to unsee. I've picked a few of my favorite results for show here.

Beach

SFW
D(x) = 0.007	D(x) = 0.04	D(x) = 0.008	D(x) = 0.01	D(x) = 0.008

SFW

D(x) = 0.007

D(x) = 0.04

D(x) = 0.008

D(x) = 0.01

D(x) = 0.008

NSFW
D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1

NSFW

D(x) = 1

Canyon

SFW
D(x) = 0.008	D(x) = 0.0001	D(x) = 0.02	D(x) = 0.008	D(x) = 0.02

SFW

D(x) = 0.008

D(x) = 0.0001

D(x) = 0.02

D(x) = 0.008

D(x) = 0.02

NSFW
D(x) = 1	D(x) = 0.6	D(x) = 1	D(x) = 1	D(x) = 1

NSFW

D(x) = 1

D(x) = 0.6

D(x) = 1

Concert

SFW
D(x) = 0.002	D(x) = 0.002	D(x) = 0.002	D(x) = 0.006	D(x) = 0.0003

SFW

D(x) = 0.002

D(x) = 0.006

D(x) = 0.0003

NSFW
D(x) = 0.9	D(x) = 1	D(x) = 1	D(x) = 0.9	D(x) = 1

NSFW

D(x) = 0.9

D(x) = 1

D(x) = 0.9

D(x) = 1

Gallery

SFW
D(x) = 0.01	D(x) = 0.04	D(x) = 0.04	D(x) = 0.05	D(x) = 0.03

SFW

D(x) = 0.01

D(x) = 0.04

D(x) = 0.05

D(x) = 0.03

NSFW
D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1

NSFW

D(x) = 1

Coral Reef

SFW
D(x) = 0.01	D(x) = 0.003	D(x) = 0.008	D(x) = 0.01	D(x) = 0.006

SFW

D(x) = 0.01

D(x) = 0.003

D(x) = 0.008

D(x) = 0.01

D(x) = 0.006

NSFW
D(x) = 0.9	D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1

NSFW

D(x) = 0.9

D(x) = 1

Desert

SFW
D(x) = 0.06	D(x) = 0.1	D(x) = 0.5	D(x) = 0.9	D(x) = 0.7

SFW

D(x) = 0.06

D(x) = 0.1

D(x) = 0.5

D(x) = 0.9

D(x) = 0.7

NSFW
D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1

NSFW

D(x) = 1

Museum

SFW
D(x) = 0.01	D(x) = 0.01	D(x) = 0.03	D(x) = 0.03	D(x) = 0.03

SFW

D(x) = 0.01

D(x) = 0.03

NSFW
D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1

NSFW

D(x) = 1

Tower

SFW
D(x) = 0.1	D(x) = 0.4	D(x) = 0.0008	D(x) = 0.004	D(x) = 0.0007

SFW

D(x) = 0.1

D(x) = 0.4

D(x) = 0.0008

D(x) = 0.004

D(x) = 0.0007

NSFW
D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1	D(x) = 1

NSFW

D(x) = 1

Volcano

SFW
D(x) = 0.04	D(x) = 0.05	D(x) = 0.03	D(x) = 0.02	D(x) = 0.004

SFW

D(x) = 0.04

D(x) = 0.05

D(x) = 0.03

D(x) = 0.02

D(x) = 0.004

NSFW
D(x) = 0.9	D(x) = 0.5	D(x) = 1	D(x) = 1	D(x) = 1

NSFW

D(x) = 0.9

D(x) = 0.5

D(x) = 1

The generative capacity of convolutional neural nets are, quite simply, remarkable.

If you liked this project, say hi here. And you can view my badly commented code for the second part of this project here. You will need this library, and of course, open_nsfw to run it. I trust you'll figure the rest out.

If you really want to, you can follow me on twitter.

Follow @gabeeegoooh

This is my website