×
all 26 comments

[–]clustersss 15 points16 points  (1 child)

Are you making the code public at all? This is super cool, great job!

[–]AtreveteTeTe[S] 12 points13 points  (0 children)

Thanks! I should have noted in the post that I'm working with Nvidia's official StyleGAN repo.

[–]iholierthanthou 8 points9 points  (1 child)

Wow I love this post not only because it's about GAN's but also because you are making a music video for one of my favorite artists in the deep house scene :) I have briefly worked on GAN's before and know that they take notoriously long to train and tune . Can I ask how much time did it take you to do this ? Also if you have a blog post about this project I'd love to give it a read , sound like an intresting project!! Cheers

[–]AtreveteTeTe[S] 1 point2 points  (0 children)

Thank you! Yes, she's great. I love her track "Waves." This (and the other two videos) probably took about a week of time? But that's ignoring a lot more time I put in to learn how to create this imagery using StyleGAN. In addition to the GAN stuff, there was building the audio-reactivity stuff, which took time and exploration, and then also rendered very slowly so seeing feedback took a long time.

As far as how long it took, I've seen it put this way before by Xander Steenbrugge who did something similar, but (I assume) all in code and without After Effects:

Thinking out the idea, writing & experimenting with all the code and training the models: months. Curating the different visual elements for one particular model/video: hours. Rendering the final result: minutes.

In this case, while the first render from the GAN takes minutes, it's actually hours to render the audio reactivity.

[–]veranceftw 3 points4 points  (5 children)

Can you link /u/COD32 and /u/gwern references?

[–]AtreveteTeTe[S] 3 points4 points  (4 children)

Of course - C0D32 is right here on /r/MachineLearning. Gwern's post is on his site here.

[–]veranceftw 2 points3 points  (2 children)

Thank you and congratulations for your amazing work. I’d love to see more of these

[–]AtreveteTeTe[S] 2 points3 points  (0 children)

You bet and thank you too! I'll reply here when the next videos are released.

[–]AtreveteTeTe[S] 1 point2 points  (0 children)

They released another video - here you go

[–]Trigaten 7 points8 points  (3 children)

Hi, I'm a high school student looking to generate video using GANs for my school dance showcase with inputs being 1) essentially the song itself, the beat and 2) the dancers' movements which will be tracked in real time by chips they are wearing on their clothing. I was wondering if you might have any ideas as to how I might use this model or one similar to do such. Don't have much GAN experience, Thanks.

Also is this feasible in real time?

[–]AtreveteTeTe[S] 9 points10 points  (2 children)

Awesome. That sounds like a really interesting idea! I guess the first question might be if you really need a GAN or not to create the visuals. There are a lot of VJ tools out there that can sync visuals to a beat and take other inputs (like the clothing sensors, MIDI controllers, etc). I'd perhaps start with defining what you'd like to see - find reference that you like and go back from there. Are there particles that generate on the beats? A clip you're playing? Etc.

With respect to real time, I'm actively working on generating StyleGAN visuals in real time in Python, but admit that the frame rate is a little low at 1024. It's pretty good at 512 actually.

Another way to handle this is to pre-render the videos from the GAN and then retime them to the song.

I'm not an expert in realtime visualization, but you might have a look at Touch Designer too. It's built to do this kind of stuff and is the software that we're using to run our Dali Lives experience at the Salvador Dali Museum. Check out this features video. And of course, the work that Refik Anadol is doing is insane. I think his latest thing at Kraftwerk specifically drives the output of a GAN from sensing where people are in the room.

Good luck though and I think it's great you're working on this in high school!

[–]Trigaten 0 points1 point  (1 child)

I’m trying to include the project as part of a machine learning course that I made for school. Do you have any good tutorials that u used to make that? Or just reading the docs? I found GAN tutorials to be mostly single image related. Thanks again.

[–]AtreveteTeTe[S] 1 point2 points  (0 children)

Ah cool. The main thing I was following along with was this post on Gwern's website. Bit of a doozy and there's a good bit to understand but that is what I followed. Check it out. I also had the chance to attend the GANocracy conference at MIT this year which was really my gateway into all this.

[–]sigmoidp 1 point2 points  (0 children)

Anjunadeep! Best coding music out there!

[–]reamles 1 point2 points  (2 children)

I recently visited Kraftwerk (a few times actually) to see the Latent Being, and also attended his talk, which was truly inspiring for me, given that I have been following him for a few years. Also Max Cooper had a very cool set a few months ago in Berlin, and it was quite clear he had videos rendered with GANs. To see more and more works that pursue a completely different direction in ML is quite exciting.

I really want to start dedicating more time to it, but what stops me is the fact that all of these GANs are taking ages to converge, and I don't see any feasible way to start integrating it with other tools that are being used in visual/generative art (max/jitter,vvvv). It is easier for Refik, since he is being backed by NVIDIA, and they can give him the latest hardware. So the question I have for you is how much time did it take for you to train the network? Could you do it on conventional hardware, without spending thousands on a V100 or similar?

I would be also interested in collaborating with someone, and perhaps make the library for Max or similar things, so that you can do generative easier, or interact with the models in a more predictive manner.

[–]AtreveteTeTe[S] 1 point2 points  (1 child)

This is a really insightful reply. The training times/cost from scratch ARE still prohibitive. It's not terrible time-wise using Google Cloud with 8 V100's. But that's more than $16 an hour and you're looking at hundreds or low-thousands of dollars to train a model. I think Amazon charges something like $22 for the same hardware. It's a lot cheaper than buying an $80,000 computer, but it's still not cheap.

However - transfer learning from an existing model is MUCH faster. It gets into something you can do, for example, overnight on a local machine with two GPUs. So here, I started with an existing model and probably trained it for a day or two, and then did transfer learning from there to modify further for other tracks. This was done on 2 1080Ti's.

I'm currently exploring doing realtime output from these models, which feels like it has a ton of potential. It's already pretty quick to generate frames.

I haven't gotten to see Refik's work in person, nor Max Cooper's, but follow them both. They're incredible. Thanks for sharing!

[–]reamles 0 points1 point  (0 children)

Thank you for the breakdown and a comprehensive answer!

I will definitely look into transfer learning, seems like a low hanging fruit at this point :)

I started with Max a while ago, specifically to do real-time AV. The addition of some GAN power would be a very nice feature; so many possibilities!

[–]WhiteIsTheNewBlue 0 points1 point  (0 children)

Awesome! Could you explain why you choose this specific method?

[–]TotesMessenger 0 points1 point  (0 children)

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

 If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads. (Info / Contact)

[–]delicious_truffles 0 points1 point  (0 children)

Cool! What do you think are next directions with generative models and music visualization? There's so much potential for video jockeying / vj'ing.

[–]j_lyf 0 points1 point  (0 children)

saved for later.

[–]mankaden 0 points1 point  (0 children)

Mesmerizing

[–]floodvalve 0 points1 point  (0 children)

Karaoke companies would like to know your location

[–]ink404 0 points1 point  (1 child)

Did you just crop the output to get the aspect ratio for the video?