×
all 53 comments

[–]Smith4242 23 points24 points  (3 children)

Really cool! Would love to see what the GAN produces if it's fed album covers from specific genres!

[–]BostonRich 9 points10 points  (2 children)

Or two genres that are very different, hip hop and country maybe.

[–][deleted] 12 points13 points  (0 children)

Old town road

[–]alexchuck 0 points1 point  (0 children)

Post Malone

[–]crazyyfish 20 points21 points  (0 children)

It is interesting that it even learns some kind of texts and their arrangement.

[–]CrazyAsparagus 32 points33 points  (1 child)

A lot of letters are backwards, is that because you trained on augmented data with image flipping or something?

[–]veqtor 13 points14 points  (0 children)

Probably, I did the same thing without flipping and got some pretty reasonable pieces of text

[–][deleted] 12 points13 points  (0 children)

Did you mirror images as part of data augmentation? A lot of the text has a definitive backwards vibe to them :)

[–]exilhesse 10 points11 points  (1 child)

It kind of stresses me out that there is text that I am unable to read. Is this what dyslexia feels like?

[–]pataoAoC 5 points6 points  (0 children)

Same... But I suspect this is what English etc looks like when you come from a non-Latin script language (Hindi, Arabic, Chinese etc)

[–]CambrianKid 7 points8 points  (0 children)

Have you uploaded the model anywhere? I'd love to play around with this.

[–]veqtor 8 points9 points  (6 children)

4 days on 1x 2080ti, I've implemented fp16-mixed precision training so it's more like 2x though

https://i.imgur.com/OGcaUKB.jpg

[–]gwern 1 point2 points  (3 children)

How did you implement FP16?

[–]veqtor 2 points3 points  (2 children)

Used some progan code that had been disabled, also had to perform correct casting in mapping layers and of course, whenever you're doing some kind of normalization you probably want to do it in fp32, with the proper epsilon to avoid divide by zero. The optimizer supports adaptive loss scaling but I don't remember how I activated it. I will clean up my code a bit and upload to github

[–]toomanyLUNAs 0 points1 point  (1 child)

Still planning to upload your fp16 version? It could be very useful, even if it's messy.

I doubt people will mind, although do get not wanting to put up code which you know has problems.

[–]veqtor 1 point2 points  (0 children)

Here it is, also contains some stuff I've been experimenting with, but that's disabled and needs to be enabled in networks_stylegan.py

https://github.com/veqtor/stylegan

[–]shoeblade[S] 0 points1 point  (1 child)

Nice, what's the output resolution?

[–]veqtor 0 points1 point  (0 children)

512x512, I grabbed covers from Spotify that were mostly at 640x640 so

[–]iluvcoder 7 points8 points  (7 children)

Hi u/shoeblade, look great, how long (in days) did it take to train?

[–]shoeblade[S] 11 points12 points  (5 children)

~5-6 days on 8x V100

[–]TopsyMitoTurvy 29 points30 points  (4 children)

Cries in AWS credits.

[–]veqtor 15 points16 points  (3 children)

Check my images:
https://i.imgur.com/OGcaUKB.jpg

4 days on 2080ti, with fp16 (mixed precision)

I'm trying to get the most out of a 2080ti and then release this code, maybe, hopefully you can get the same results in 2 weeks

[–]scrdest 2 points3 points  (0 children)

Third row, penultimate column is clearly a new version of Potato Jesus.

[–]TopsyMitoTurvy 0 points1 point  (1 child)

Any update on this?

[–]veqtor 0 points1 point  (0 children)

Here it is, also contains some stuff I've been experimenting with, but that's disabled and needs to be enabled in networks_stylegan.py

https://github.com/veqtor/stylegan

[–]JayWalkerC 3 points4 points  (0 children)

I'm also curious what hardware was used.

[–]Veedrac 6 points7 points  (0 children)

The text looks amazing.

[–]ZigguratOfUr 5 points6 points  (0 children)

Looks better aesthetically than any other stylegan I've seen!

[–]lewis841214 2 points3 points  (0 children)

Is it ok to share the code and the trained data?

[–]NicolasGuacamole 10 points11 points  (7 children)

Can you show closest images in the training set compared to your cherry picked samples? Would be interesting to see.

[–]shoeblade[S] 1 point2 points  (6 children)

Not sure what you mean exactly, but here's a matrix of random samples from training:

https://imgur.com/UWgaG2D

[–]SimpleProject 24 points25 points  (5 children)

They're asking for the nearest neighbor in the training set for the generated images you posted.

[–]marvpaul 11 points12 points  (4 children)

Just to make sure the GAN has not overfitted I guess :)

[–]NicolasGuacamole 3 points4 points  (3 children)

Pretty much

[–]NotAlphaGo 0 points1 point  (2 children)

Wasn't there an argument somewhere that doing an interpolation would show overfitting?

[–]NicolasGuacamole 1 point2 points  (1 child)

Wouldn’t surprise me. Overfitting is already a hard concept to quantify. In these GANs it’s especially difficult, if nothing else because these distances in pixel space might not be perfectly revealing w.r.t the ‘real’ point that was overfit to.

Anyway I just want to see out of interest, especially if the face image is close to an existing pattern.

[–]NotAlphaGo 1 point2 points  (0 children)

Agreed. Quantifying overfitting is hard in GANs. Even if you did memorise the training set and the GAN would be a smart interpolation function between, wouldn't that still give some merit to the trained GAN? I. E. an overfit GAN doesn't have to be a useless GAN?

[–]nurijanian 1 point2 points  (0 children)

someone should post this in r/oddlyterrifying

[–]gnu-user 0 points1 point  (1 child)

This is very cool! Do you have a project on Github or any code available to demonstrate how this was done? Are there any projects / code in particular for those interested in GANs that you recommend?

[–]veqtor 0 points1 point  (0 children)

Here is the repository:
https://github.com/NVlabs/stylegan

This guide is really good:
https://gwern.net/Faces

[–]tough-dance 0 points1 point  (1 child)

Maybe you can't go into specifics, but what differentiates the styleGAN? It is just a DCGAN applied to something "styled" or is there some sort of relevant "style structure" or something (much like a WaveGAN is just a 1-Dimensional DCGAN applied to a waveform)

[–]julvo 4 points5 points  (0 children)

StyleGAN uses a different generator - instead of repeated deconvolutions it's a fully connected network which maps a random code into parameters for adaptive instance normalisation layers. here's the paper: https://arxiv.org/abs/1812.04948

[–]Jonno_FTW 0 points1 point  (0 children)

Can you post the training dataset and/or code or model?

[–]fimari 0 points1 point  (1 child)

I think before doing this we should extract as much mean as possible by generating for example a well formed SVG out of the cover where text is text with font linked to it, and a gradient is defined as gradient.

The next big step - high quality vectorisation.

[–]veqtor 0 points1 point  (0 children)

Actually, it could perhaps work to add a Spatial Transformer Network with OCR to the discriminator, along with some kind of text-to-2d-attention in gen, to handle the text properly

[–]b_n 0 points1 point  (2 children)

Does it work so well because all the images have the same aspect ratio? I have a database I would to try this on, however the images are all of varying resolutions and aspect ratios

[–]gohu_cd 1 point2 points  (0 children)

You can add black margins to your images so that they all fit in a square box while keeping the right aspect ratio. Worked for me :)

[–]veqtor 0 points1 point  (0 children)

Yes, also, there's probably some common styles of album covers and variations in the dataset.

[–]XmintMusic 0 points1 point  (0 children)

Is the code or training data available somewhere? I'd love to play with these covers.

[–]Beaster123 0 points1 point  (0 children)

The examples look like 90s album covers.

[–]SaveUser 0 points1 point  (0 children)

Do you have your event logfiles / tensorboard graphs?

I was training a similar StyleGAN but ended up with diverging loss for G and D, and a small degree of mode collapse, so I'd be curious to see the stats on yours

[–]mysterEFrank 0 points1 point  (0 children)

It learns mop top haircuts

[–]Kilerpoyo 0 points1 point  (0 children)

Hi, did you use the Nvidia code?