On the significance of Gwern’s poem generator

Vast waves of researchers must crash
on cliffs of ignorance
to grind them down into a stash
of knowledge fine as sands.

I first saw automatically generated poems in 1998. They were randomly arranged picks from a corpus of clichéd goth lines, clearly terrible and presented as a parody on terrible goth poetry. I had run into them because I had just started to study IT and this was one of the first examples of CGI programming that I had happened to find. I played with it for longer than I might have, because I was writing terrible goth poems.

Anyway, I did know about the Turing test (where a machine passes if it cannot be reliably distinguished from a human) and I knew machines were getting better at seeming like humans, because my dad was having way too much fun coding a chatbot and pointing out how it was better than ELIZA. Those terrible poems seemed like the same kind of thing, where machines would inevitably get better at seeming human. And poems are clearly easier to fake than conversation. So, it has been twenty years. Did the machines get better at poetry? Oh yes. Oh hell yes.

The white-petalled white fox
Opens himself to coolness
In the late evening.
But when the last child started
The white fox to his feet flew,
And the old fox was master
Of all the magic heathen.
Till when the faint huntsman
Had snuffed the fragrant water
Over his plump ears and skin,
In the old way he knew not
Till morn had almost shone;
And then the fox came slowly
And left the place unguessed;
The white fox was not master,
Although he had been master,
Although he had been servant
And now he could be master
Of all the magic powers
That keep the place enchanted
In the wide earth and water.

Impressive, isn’t it? Great choice of individual words, a consistent style and theme, good use of juxtaposition and the narrative, while too inconsistent for prose, is good enough for what passes for poetry these days. Here, have another sample:

I had a vision
Of an old and stubborn old man,
His hair was pale, and thin,
His face was all forlorn,
And the moon was full in the air,
And a spirit passed over his brow,
And its face was all for ever.
And he spoke:
‘Have we ever a dream?
Have we ever a vision
Of the ghost’s ghost?’
The Master gave the word:
‘By the breath I know
The meaning of Death:
Can it be ‘hush?
Have we ever a dream?’
The spirit said:
‘By the breath I know,
The meaning of Death,
You will see a ghost
Stand by the door
And enter.’

If you’re a poet, this should chill you to the marrow of your bones. Because you’re a poet and you’re hypersensitive to beautifully chosen words. And because this is clearly better than some of the human poetry you’ve read, and possibly better than some of the poetry you’ve written. Definitely better than some of mine, anyway.

I did pick my two favorites from a large selection of samples, most of which failed to similarly arrest me. But still, something categorical has just happened to poetry. If you read those lines a year ago you would have been so certain they were by a human you wouldn’t even have thought to doubt it. That certainty is never coming back.

These two poems are from a recurrent neuronal network developed by OpenAI and trained by the inimitable Gwern, a clear contender for a lifetime achievement Ig Nobel for his amazing work analyzing darknet drug markets, catnip and the genetics of My Little Pony: Friendship is Magic, to name just a few. His article on the project is fascinating, even with my very rudimentary understanding of machine learning methods. Gwern makes clear this is nowhere near the limit of what is possible even today. This is only what a single (very capable) guy can do in a little bit of his spare time using publicly available tools and no budget. AI has been progressing faster than I suspect most poets know, and so many ressources are being poured into it that most experts expect AI capabilities to continue to grow rapidly.

We build machine intelligence
and we have never been
this close to where our purpose ends:
the Victory Machine.

I have been following progress in the machine learning / AI area a bit, so I’m less surprised. I expect that machines will continue to learn to do more human tasks, and since the various kinds of stringing words together are just another set of tasks, the only question remaining is how difficult these writing tasks are relative to image comparison, speech recognition or driving a car. I guess writing a paper that gets accepted into Nature is harder than writing a novel that sells ten thousand copies under a human-seeming pseudonym, which is itself harder than the classic Turing test. And writing free verse poetry that sells the couple dozen to a few hundred copies that pass for success in poetry today? Or gets accepted into a poetry magazine? That’s easier than any of those.

The point of all this, of course, is to point out that highly structured poetry of the type that I write continues to be way harder to get out of a machine than free verse. Automating what I do should be harder than the Turing Test I think, although probably not harder than the convincing novel. The main reason for this is that current machine learning methods need a lot of training data, and there simply isn’t a lot of Common metre poetry to train on. A haiku generator would be fairly easy, because there are a lot of haiku. Training on the Mahābhārata might also easily yield something that produces Mahābhārata-like material that could fool many. But as far as I understand it, current machine learning methods also aren’t good at producing output that satisfies a larger number of strict and orthogonal rules. No doubt an evaluation function could be written that discards everything without exactly correct line lengths, near rhymes, meter and correct grammar. But if I understand correctly, a too-strict evaluation makes it very hard to learn a good model of a solution. So that kind of thing remains the prerogative of humans, for now.

To show you’re not a bot, just use
idiosyncrasy.
For instance, you could simply choose
to write in poetry.

The higher order rules of rhyme
and meter need a strength
unique to humans, for a time
of undetermined length.