×
all 4 comments

[–]ogrisel 2 points3 points  (2 children)

Nice hack :)

Do you have any intuition as to why the network would try to "fool the instance normalization layer" in the first place? Is it related to the adversial training or is this an artifact of instance normalization that would appear for any deconv architecture (e.g. VAE, GLOW, Unet...) with instance normalization layers?

[–]stpidhorskyi[S] 2 points3 points  (1 child)

Thanks!

I doubt that it has something to do with adversarial learning. It is definitely tied up with instance normalization, but I'm not sure if this artifact could be found on other architectures, it might be specific to the style architecture.

My initial hypothesis that I had a while ago is that for some unknown reason, the network wants some channels to be non-zero-mean, but mostly negative (spikes that I observed are positive so that the rest will go negative after normalization). Why? I don't know. However, instance normalization will always make it zero-mean, no matter what.

This is what StyleGAN2 paper says:

We pinpoint the problem to the AdaIN operation that normalizes the mean and variance of each feature map sepa-rately, thereby potentially destroying any information found in the magnitudes of the features relative to each other. We hypothesize that the droplet artifact is a result of the generator intentionally sneaking signal strength information past instance normalization: by creating a strong, localized spike that dominates the statistics, the generator can effectively scale the signal as it likes elsewhere. Our hypothesis is sup-ported by the finding that when the normalization step is removed from the generator, as detailed below, the dropletartifacts disappear completely.

So, they say that the reason is that due to instance normalization, relative magnitude between channels is lost. Creating a spike is a way around it.  Seems very plausible.
Something that I find interesting:

  • The spikes occur only at the output of 64x64 block. If it is a generic instance normalization problem, then I would think spikes should occur not just at 64x64 block.
  • All spikes that I've observed are positive.
  • Everything from 4x4 to 64x64 has separate scaling and convolution operations, everything from 64x64 to 1024x1024 has fused operation (see https://github.com/NVlabs/stylegan/blob/master/training/networks_stylegan.py#L178). And spikes occur precisely at that transition. But it might be a coincidence.

[–]SaveUser 1 point2 points  (0 children)

This is fascinating. Thanks for the explanation and great work in the repo!

[–]ink404 0 points1 point  (0 children)

Was wondering if anyone here knows of a method to use a trained stylegan model for transferring the style to an unseen image?