×
all 8 comments

[–]HybridSystem 2 points3 points  (2 children)

Looks like it's just generating essentially random (and randomly named) CSS rules and classes. Impressive that it learns the syntax rules for this but I don't really see how this is anywhere near being something useful. CSS has to be written in conjunction with HTML and possibly JS and having an algorithm to generate "average CSS rules" based on a corpus of websites would still be far worse than using general pre-made CSS templates - or am I missing something here?

[–]Noncomment 8 points9 points  (1 child)

Read the section above this. Gwern is planning on training a neural network to produce CSS from a given HTML document. Then have it try different variations and test them to see what works the best (to improve upon A/B testing.)

This was just a first step. Testing to see if RNNs could learn to produce valid CSS.

[–]alexcasalboni[S] 1 point2 points  (0 children)

Indeed. It would be nice to integrate commits history with incremental improvements and see how the system would "learn" how to increase code quality. Maybe too far away in the future?

[–]w0nk0 0 points1 point  (3 children)

OP, I noticed you used a very large Dropout of 0.8 - any specific reasons for that?

I am currently toying around a lot with LSTM-based text production (as a first step in using them for something else), and I haven't seen anyone use more than 0.5. I haven't collected enough trials to have a good assessment myself, but I generally find that varying dropout has less effect than I expect.

Also, any insights you have gained about the number of layers? My preliminary results indicate that the main relevant parameter is the amount of node connections, mostly independent from the amount of layers they're spread out across.

[–]gwern 0 points1 point  (2 children)

No particular reason. I didn't have the time and/or money to try anything like random search of the various hyperparameters.

I have since managed to get my laptop GPU working so I could go back and try to optimize results but I want to move on to something more useful like generating CSS from HTML. (I've so far collected 109M of target CSS and HTML pairs, but I'm still puzzling over the encoder-decoder RNN code out there for Theano & Torch... They may be great frameworks but they are neither succinct nor simple for the beginner.)

Also, any insights you have gained about the number of layers? My preliminary results indicate that the main relevant parameter is the amount of node connections, mostly independent from the amount of layers they're spread out across.

I got a similar impression: increasing the number of layers didn't seem to help with final results' quality, but did slow down training a great deal and made the model larger. In the future I'll probably be sticking with <4 layers and instead increasing neuron count as much as possible. This surprised me since I assumed that the more computation (layers) it could do per time-step/character, the better it would do, but I guess CSS is predictable enough that the RNN can spread its computations across multiple time-steps? (I wonder if anyone has empirically looked into that; perhaps by taking multiple sequence datasets, each with clearly different entropies, and seeing if more layers will help with the harder sequences.)

[–]w0nk0 0 points1 point  (1 child)

Have you looked at any of the libraries? Keras, Lasagne, chainer all have pretty good RNN functionality. I have used Keras quite a bit but ran into weird issues and some constraints that made me look elsewhere. I haven't settled on what I will keep using yet. Chainer is more limited but seemed more reliable and more geared towards RNN usage.

[–]gwern 0 points1 point  (0 children)

I've been looking at all the implementations I can find, and right now the closest to what I want is http://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-gpus-part-3/ implementation of encoder-decoder with attention in Theano: https://github.com/kyunghyuncho/dl4mt-material/tree/master/session2

Currently, this code includes three subdirectories; session0, session1 and session2. session0 contains the implementation of the recurrent neural network language model using gated recurrent units, and session1 the implementation of the simple neural machine translation model. In session2, you can find the implementation of the attention-based neural machine translation model we discussed today. I am planning to make a couple more sessions, so stay tuned!

(I think attention is important to make the 'translation' scale to whole HTML/CSS pairs.)

[–]asenski 0 points1 point  (0 children)

So easy, a caveman could do it. (tm)