Hacker News new | past | comments | ask | show | jobs | submit login
Geocities Forever – neural network generated geocities pages (geocitiesforever.com)
143 points by laacz on May 6, 2016 | hide | past | favorite | 40 comments



The generated English (and a number of other languages) doesn't make a lick of sense (certainly not comparable to decent NN language models), and there's HTML and JS visible all over the place. But they certainly look a lot like typical geocities pages. How do you generate HTML and JS with NNs? I wish there were an explanation somewhere. Maybe that's the reason that there's so much invalid HTML.


It could just be char-rnn output inside a very simple HTML wrapper. I found that a char-rnn can produce a lot of syntactically valid CSS with lots of URLs, so CSS+some HTML+image URLs is entirely possible: http://gwern.net/AB%20testing#training-a-neural-net-to-g...

The template here looks like it would go

"<script src="jquery.js"></script> <script src="sha1.js"></script> <script src="links.js"></script>

<script> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

  ga('create', 'UA-4209567-5', 'auto');
  ga('send', 'pageview');
</script> <html>

...[until generation limit, ~1600 characters; note the absence of any valid closing tags]

The low quality of the text might reflect the demands of simultaneously modeling text+HTML+CSS in a too-small RNN or insufficient training (training a high quality char-rnn/torch-rnn to convergence on the entire Geocities corpus, or even just a few hundred megabytes, could take weeks).


Yes, it turns out to be char-rnn/torch-rnn: 'RNN was 512-node, 3-layer' https://twitter.com/aanand/status/728738421758414849

He wasn't training on his own computer so aside from being much too small (for modeling at least 3 languages, text+HTML+CSS, I would start with 1k neurons and go up from there), it probably wasn't trained to convergence either. The 50MB size is maybe a bit on the small size but probably not the root cause of the low text quality.


Agreed about wishing there were an explanation.

It's not clear to me that NN are the best way to get interesting Franken-pages. Maybe a Markov chain instead?


The best way is to have the NN generate the HTML / CSS / JS code and then use a markov chain to create content to populate the page with.

It also has to do with quality of input. The texts on geocities pages aren't exactly pinnacles of clear and legible language. Same with the HTML.


Markov chains can't keep track of nested stuff though I thought?

I suppose that NNs have some sort of limit to how many layers they handle, but I thought they tended to learn nesting fairly well, unlike markov chains?

I could be wrong about this.


It looks a lot like the stuff I produced a while back with an rnn. I just fed the rnn raw web pages for a giggle. It does surprisingly well, often managing to close braces correctly. It really goes to show how good the browsers are at dealing with junk html. I was thinking about trying to jointly train the html and an image of the webpage, in the vague hope that you might be able to go from image to webpage. One day. Probably quite soon...


GeoCities was the spark that ignited my interest in programming. I can remember hacking together awful HTML, CSS and JavaScript for a Sonic the Hedgehog fan page as early as middle school. Now the "home page" has been replaced by Facebook and Twitter. The culture of the Internet has changed and future generations are spared the need to even think about tables or color codes (unless they actively pursue web development).

And sure, there's undeniable value in convergence. But part of me will always miss those magical days of animated GIFs sliding across my CRT.



I will confirm that's why I made it.

Doesn't look like web sites are going to be replaced by a neural network anytime soon. Not sure if I prefer the Orwellianism of social networks to a Borg cube at this point.


That's my case too. I think somebody should study how fan pages made by 90s kids were the motivation for a generation of programmers.


I carefully followed this page carefully to make a Dragonball fan page using MS frontpage. I also added "cool" touchs like comet cursors, midi music, and MP3 files with the extensions changed so Xoom wouldn't reject them.

http://labrocca.com/htmlementary/


I know some developers who write code like this:

index var menubar = 'solid //speed of NS message master pathwave seconds // Don't edit below! var message="./www.oocities.com/Wraith.js"); //for yellow you are browsers var message="Merry You are moving your source code, it you should you have decide to one of the counter." var yvAlt = 10; var xpos=mysteps - var ypos=new Array(); function snowNS() { if (document.all) { // show position var a = slide.obj - expires.length; msg += (de_coselect(dy); if (remarks < red.focus); i <= cool ; //if layer is the area b = obj.visibility = eval(window) } // end hiding craft // set the link is the window // showdown down for idiot swd obj = document.getElementById(e); if((obj.top=top[i]+spring) setTimeout("move( tmLn.left+":"+minutes") + musionhamessages; } // End Created by GeoCities Home Page Generator Yahoo! GeoCities - father_gabriel's Home Page function Help(daLink) { var helpWnd=window.open(daLink,"help","width=500,height=500,scrollbars=yes,dependent=yes"); } // COLOR HEAD1 HEAD2 LINE1 BODY



In case it's not clear (it wasn't to me), refreshing the splash screen and clicking the big `Enter` will show you a different generated page.


As will clicking any of the links on one of the generated page.


I got an entire website inside a <marquee> tag, chrome showed it moving across the screen. Didn't know such a thing was possible, i am impressed.

Also, looking at the actual page source, it makes no sense whatsoever, so this is probably saying more about the fault tolerance of modern rendering engines than about the NN used to generate these pages :)


This has to be the most interesting way to fuzz-test a browser's rendering engine.


Modern browsers are supposed to have finally standardised on how to deal with garbage HTML like this, but it'd be interesting to see if that's actually the case.


I do see one missed opportunity here, and that was to frame this project not as a NN-based Geocities page generator, but as a mystery.

Imagine instead a much more cryptic domain name and no landing page - just a random page each time (but keeping the shareable links). No credits or attributions, inviting speculation. It could have been the next Horse_ebooks!


just in case didn't know or need detox; http://www.oocities.org/


I think the takeaway here is that it generates even remotely valid HTML in addition to remotely coherent content.

I'm not sure why they include those irrelevant promotional quotes on the landing page. Also, zero details on the properties of the network they used.


sometimes I think its better to leave out the details and just let the experience be, not everything needs a how-its-made blog post and the lack of that here feels refreshing in a way.


The quotes are for fun I reckon. They have that homemade quality. Like something you'd see on the back of a book self published by yer Mammy


The brokenness was not the dominant factor of geocities, more the individualism in using a limited set of design resources.

This project should be tuned more into that direction.

Is the source available?


Well, it took me about 5 minutes of clicking through oocities to find this: http://www.oocities.org/tokyo/4633/xindex.html


This is high art. Congrats.


I like to imagine randomly clicking "Enter" and having today's Drudge Report pop up


It's about time that the Library of Babel had a Web presence.


This literally makes my day. They should bring back Geocities


Is there an archive of TimeCube we can train this thing on?


The first page I saw reminded me an awful lot of the time cube page.


Yeah, why would you even need to seek out Time Cube-like input? I got a page with a large bold, italic, underlined link that said "CLICK HERE TO BE GOD 4".


Aren't all of these just available at the Wayback Machine? Was really hoping this person found a way to display original pages correctly with their images intact.


These are randomly generated based on things from geocities.


Where does the neural network come in?


The HTML contents of geocities pages are the input to the NN which then creates the page you get when you hit ENTER


Reminds me of the days of sites like hell.com


I feel like I just hacked too much time. o.O


aol -> geocities -> frienster -> myspace -> facebook -> _next_




Applications are open for YC Summer 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: