Hacker News new | past | comments | ask | show | jobs | submit login
Training a Neural Net to Generate CSS (gwern.net)
176 points by Moshe_Silnorin on Aug 5, 2015 | hide | past | favorite | 31 comments



"1 one parse error on *zoom: 2 !important;"

That might be a parse error according to validator, but it's actually a pretty commonly used hack for old Internet Explorers - real browsers drop such invalid declaration, while IE (up to 6 I think, although 7 still accepted some other character IIRC) parsed it anyway and ignored the asterisk. What's more - this quirk is commonly used with, guess what, the "zoom" property, which is one of the common ways to trigger "hasLayout" mode in IE engine (although you'd most likely use it with value "1" instead of "2").

So this NN simply wrote a compatibility hack for IE6 :)


It's reassuring to think that there'll soon be a growing population of smart systems and artificial intelligences that hate Internet Explorer too.


If you like his work, consider donating to gwern on patreon. https://www.patreon.com/gwern?ty=h

Note: I am obviously not gwern.


If you find this combination of Natural Computation and arts (or styling in this case), you might want to take a look on papers published on these tracks:

* Evolutionary Computation for Music, Art, and Creativity - http://cilab.cs.ccu.edu.tw/ci-tf/ECMAC2015.html

* EvoMUSART - http://www.evostar.org/2016/cfp_evomusart.php

* International Symposium on Computational Aesthetics in Graphics, Visualization, and Imaging - http://expressive.richardt.name/2015/CAe/Home

* Generative Art Conference - http://www.generativeart.com/

I've seen a few papers working on CSS in the past.



Could you point me at any of these papers ?


Someone correct me if I am wrong here, but these char-rnn based outputs are great to look at and experiment with - but they dont seem to be of any practical use. All that they are capable of showing is that an RNN can "remember" things. The biggest question is how do you make any use of these things ?

Is it possible to nudge the RNN in a particular direction - so that it produces something that we want ? Perhaps there is an answer to this in Alex Graves et al. Paper on Handwriting generation. A more thorough explanation or exploration in this direction would perhaps help ?.

Anyone really working on generating CSS ( conditioned! ) ?


No experience actually trying, but I'd suspect the first step would be training a system on combinations of CSS and HTML; all the really interesting behavior comes from the interaction between the two.

Then I imagine you could do interesting stuff by e.g. constraining the HTML and seeing what kind of CSS was spit out.


Can it center vertically?


Probably, if trained with data containing display: flex


it could be a nice experiment to have a basic html structure and a picture, and then attempt to generate the style until the rendered html match the image.

genetic css?


Yes, a generative model of HTML+CSS is definitely a direction I'm nodding towards. (I discussed it briefly in the previous section about reinforcement learning in general.) I'm still hazy on the full architecture: you need an RNN to generate the CSS, you probably want to feed in a particular HTML page to target the CSS onto, the users' browsers generate the reward signal, and you can create images of the HTML+CSS combo as rendered in a web browser and bring in a convolutional network somehow... There doesn't seem to be anything quite like this in the literature that I've come across.

(Actually, there's a surprising dearth of reinforcement learning in general. Very few blog posts or demos or introductory materials. It makes it hard to understand what is new about DQN or how the whole system works on a concrete coding level.)


Reinforcement learning seems like overkill if you're just trying to match a single target. RL agents want to maximize reward accumulated over time, while you'd be happy to find a single good state. You could frame this as stochastic optimization (MCMC or simulated annealing) over the CSS string, with some combination of RNN and convnet serving as proposal.

The best paper award at CVPR this year (http://www.cv-foundation.org/openaccess/content_cvpr_2015/ht...) has an architecture somewhat like this: instead of a web browser they're using a 3D rendering engine, and searching for pose parameters that cause their rendered image to match an observed image. This is just Bayesian inference using MCMC, where they train a deep net to function as a data-driven proposal distribution.

If your goal were to actually get a working system, you'd probably want to do inference directly on a parameter vector encoding all the relevant quantities - heights, widths, font sizes, colors, etc. of the various boxes, etc. - that you could programmatically ground out into a CSS file. Trying to do inference over the raw text is making things artificially hard since you have to put so much work into even just getting correct syntax. Though maybe that's part of the fun. :-)


I think the optimal way to do this would be to use an RNN that can walk up and down the DOM. So it starts at the HTML tag, and chooses which path to go up and down, visiting each node. Then iterate through every style attribute and predict what value it should have.

This would create a simple way of generating CSS styles for a document, without dealing with the complicated issues of RNNs having limited memory and producing correct syntax.

Then you can use these predictions as a prior probability over what the CSS styles should be. Then you can use some kind of bayesian optimization to find the optimal settings in the least number of experiments.


The "reward signal" aka objective function seems to be the challenging part here. In the parent post's suggestion, all you'd end up with is a neural network that could /maybe/ reproduce a picture (assuming CSS was capable and the network has the approximately learnable properties necessary). It'd be more interesting to have some "quality" measure that actually meant something to evaluate outputs.


of course you need a fitness function; why not pixel by pixel comparison of the rendered output?


If the original image were created in a vector-based program -- in other words, something where the x,y,width,height parameters are known and stored -- you can load the generated HTML+CSS in a known reference browser and enumerate the x,y,width,height of the matching elements in the DOM. If your fitness function is a golf score, then something like:

  sum( abs(X_expected - X_actual) * 
       abs(Width_expected - Width_actual) + 
       abs(Y_expected - Y_actual) * 
       abs(Height_expected - Height_actual) )
mapped over all elements ought to do the trick. When you hit 0, it's a perfect reproduction.


yeah, but you need to render in a headless browser. This might take 0.1 seconds per webpage which is extremely slow when trying to use in the context of reinforcement learning.

I thought about trying to do MCMC over a beam search through the rnn output, but ran out of time and patience.


My point was that this is boring. A fitness function that evaluated some notion of how "pretty" a page was would be cooler than being able to regenerate a screenshot's CSS (in a likely very complex form).


I don't think that would get you a convex cost function. Having a correct example of the CSS should work better.



That would be really interesting to see. I bet there would be convoluted CSS hacks we never even dreamed were possible and they won't be practical at all. :)


Or maybe it would just generate a <table> for layout with a bunch of 1px spacer cells/gifs..


It's generating CSS, so it can't add (non-pseudo-)elements.


Letting my imagination run wild, I'd add a speech recognition module, generating CSS from a spoken description of layout etc.

After some busy work along that line, we would arrive at a generation of PLs where you don't write source code but have it generated from interpreted speech input.

I'm not so sure I (as a programmer) would really want that, but it would seem a rather obvious line of development.


Ability to generate sense out of thin air must be nightmare for anyone trying to fight spam. Will this ever make into our inboxes?


Well the spam-bots will learn to generate cool stories, we may actually enjoy spam..

Has anyone trained RNN on YCombinator comments ?


Interesting idea.

If there were any bot reading comments, it already know your idea ;)


Oh, I was looking forward to getting into ML and NN, but given my crappy i3 CPU/integrated GPU, seems like I am not.


You can do ML. But you won't match the performance of state of the art deep learning networks. Training on CPU isn't impossible. Gwern said it was a 20x slowdown, others said it was only as bad as 10x. But for smaller models it should be feasible.


Training a NN to be subjective? Cool story bro, keep telling it.




Applications are open for YC Summer 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: