[D] Eat Your VGGtables, or, Why Does Neural Style Transfer Work Best With Old VGG CNNs' Features?

gwern · 2018-01-20T18:04:02+00:00

[deleted]

Imnimo · 2018-01-20T18:25:44+00:00

I wonder if you could tease out some of the differences between architectures by using randomly initialized, untrained networks to extract content and style. They wouldn't be as good as trained networks (although it'd be cool if they were!), but it'd let you test a wide variety of architectures without having to run really expensive training.

darkconfidantislife · 2018-01-20T19:22:32+00:00

I'd like to clarify somewhat on my hypothesis (#1): I meant that VGG is using its huge number of parameters to learn things about the images but not things that directly relate to the classification task. For example, VGG can be pruned to about 10% of its capacity while still retaining the same classification performance.

SkiddyX · 2018-01-20T19:01:28+00:00

I feel like the trend of using feature pyramids while using ResNet for object detection might be for a similar reason. I would be interested to see StyleNet results using one.

ProGamerGov · 2018-01-21T03:32:55+00:00

Although this VGG-specificity appears to be folklore among practitioners, this is not something I have seen noticed in neural style transfer papers; indeed, the review Jing et al 2017 explicitly says that other models work fine, but their reference is to Johnson's list of models where almost every single model is (still) VGG-based and the ones which are not come with warnings (NIN-Imagenet: "May need heavy tweaking to achieve reasonable results"; Illustration2vec: "Best used with anime content...Be warned that it can sometimes be difficult to avoid the burn marks that the model sometimes creates"; PASCAL VOC FCN-32s: "Uses more resources than VGG-19, but can produce better results depending on your style and/or content image." etc).

Not all of Neural-Style's default settings are ideal for each model. My research on how the Adam Optimizer affects style transfer, shows that the default parameters Neural-Style uses for Adam are unstable and cause something similar to the "burn marks" which I wrote about in your source.

I've found that when attempting to train my own VGG networks (for style transfer, not classification even though classification training was used), that the best parameters to use, change as the network is changed.

Add/remove FC layers from retrained VGG and resnet models. Does that lead to large gains/losses in quality?

As far as I know, Neural-Style does not use FC layers. /u/crowsonkb 's style_transfer even had the FC Layers completely removed from the equation. Though the FC Layers can also be used to exert more control over the style transfer process.

When using modified Neural-Style scripts that support label files, one can see that the network predictions are not always what the content image contains.

stochastic_gradient · 2018-01-22T15:28:00+00:00

Here's another hypothesis: Batch norm is to blame. VGG does not use it, while the other networks do.

The test for this would be to train a resnet on ImageNet using some of the recent self-normalizing tricks [1,2], and see if the learned features work better for style transfer.

1: https://arxiv.org/abs/1706.02515

2: https://arxiv.org/abs/1709.04054

8solutions · 2018-01-21T01:30:08+00:00

/u/progamergov

wassname · 2018-01-21T06:06:27+00:00

Test 1: retrain much smaller VGGs

Quick tests are good since they are more likely to be performed. So for a quick test you could just compare VGG16 and VGG19 and see how the style transfer differs.

toastjam · 2018-01-21T04:36:13+00:00

Stupid question, but how is a residual different from loss?

shortscience_dot_org · 2018-01-20T18:12:03+00:00

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS

VGG worked best for style transfer

Everyone uses VGG

Hypotheses

Testing hypotheses

Inner-workings: