What is this?

A page full of articles generated by a computer trying to mimic SCP Foundation wiki. Just in case it isn't clear all of the articles are generated by a computer and not a human, if you decide to use them anywhere please include a similar disclaimer.

You might have heard of OpenAI's GPT-2 model which is basically a magic box that has been trained to generate text on about 40GB of internet data. People have done all kinds of stuff with it and I decided to make it write SCP articles. Interestingly the raw model is quite good at this, the SCP wiki must have been in the training data and the model is robust enough that it can reproduce the style of SCP articles. However, I used this fork which adds finetuning and finetuned it on SCP articles for better accuracy.

Why do articles sometimes stop in the middle of a sentence?

There is a fixed window size in the model and I didn't manage to get the model to output arbitrarily long outputs so often it hits the limit of ~1024 words before generating the whole article. If you have any tips on how to modify the GPT-2 code to produce longer outputs I would love to know.

Are the SCP numbers relevant in any way?

The article with number X almost certainly has no connection to the articles with the same number on the SCP wiki. However, as the article generation progressed I kept fixing bugs and the finetuning progressed so articles with later numbers should be slightly better. The Bonus SCPs page is completely separate and should probably contain the best SCPs as its generated by the latest version of the model.

Some of the articles are complete gibberish!

Yeah, that's how the cookie crumbles when it comes to machine learning. I didn't filter the item database itself at all. The tales and other pages are manually filtered because otherwise there would be empty pages and many copies of the Log of Unexplained Locations.

Why is it not possible to vote on the articles?

Hosting static webpages is easier and voting would require dynamic webpage.

Why not host it directly on wikidot.com like the SCP wiki?

Yeah, that seems like a good idea, I should have thought of that before I had written all of this static page generation.

Formatting is kinda broken or less pretty than on the actual wiki!

Another thing I wish I would have done differently from the start. When transoforming the website to a text format I removed most of the formatting data. I should have put in some markers for stuff like collapsable sections of an article or interviews. As of now the only way to add the formatting to the articles would be manually and I'm not gonna do that to 1000+ articles.

Technical details

I used the bigger 345M model, trained it for several days (about 10 000 epochs) and got to ~2.5 average loss. The dataset was created by webscraping the SCP wiki, running it through elinks to convert it to text and then using sed to fix details that got mixed up when converting to text. In the end it was 110MB of raw text data. I removed most of the "useless" files, like lists of articles but didn't quite catch everything. There were several copies of the Log of Unexplained Locations so once in a while the model just generates that article word for word. Other than that I didn't notice any overfitting.

Some highlights

SCP-595 - "SCP-595 is a Machete-brand, fully-automatic, .45-caliber Glock 19 that is engraved with the phrase “Christ there,” as well as the phrase “Can’t he suffer for his sins?” "

Ordinary toaster - "A toaster, also known as a "regular" toaster, is a non-anomalous item commonly found in or near a home. Toasters are no more or less dangerous than other toasters, but they do not produce the same works as a normal toaster. The difference is that toasters are designed specifically as to be used for automated, repetitive or homicidal purposes."

Jude's Last Ride - "There are six evils in the world. The Karcists, the Monsters, the Mad Men and the S Theologians. "

The Old Laboratory - "He smiled. "You're right. It's pretty clear that this building is some kind of resummoning facility. Snake Oil is a big brand." "

The Vampire Who - " "Hey Doc. Why do they have to be so fucking ugly?" I asked, poking around my swanky multicolored jumpsuit. "Well, we give them great lighting, shade, and gear and all that. They'll still probably look good, though." "

SCP-1599 - Gothic cathedral from the future along with an exploration log (?) - "Dog: Sputrino. Sputrino is bullshit. Sputrino is ugly. Sputrino never lasts. Sputrino is a Crooked Stitcher. Sputrino can't be going anywhere. It's bullshit. Sputrino is just a cliche term for Crooked Stitcher. Sputrino is garbage. Can't you see how retarded this is taking me? Like, how can I even comprehend how bullshit this is? Sputrino is crap. Sputrino is not a… thing."

SCP-099 - A magnetic hexagon that dissolves everything in its field of view and teleports so its visible to all observers.

SCP-960 - "Happy Employment Times!!!"

SCP-594 - Emerald lump that emits "purple noise", floats in peoples' faces and eats throats.

SCP-038 - "SCP-038 is a 20.7cm (tall) aftershock pistol defined in ancient Greek mythology as a tool used by the Prometheus' Warrior and a regular security weapon for the Greek god Odysseus. "

SCP-288 - A camper van of unknown anomalous properties, a D-class has a nice one-sided conversation with the van (about his four boats among other things).

page revision: 1, last edited: 2019-05-14 12:54:22.852705
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License