Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: This Word Does Not Exist (thisworddoesnotexist.com)
956 points by turtlesoup on May 13, 2020 | hide | past | favorite | 342 comments



Wow. GPT2 is so, so, so much better than Markov chains. I'm reading these definitions, and the fact that the last few words of the sentence match the first few words subject-wise is pretty amazing. Just some random ones:

> denoting or relating to a word (e.g., al-Qadri), the first letter of which is preceded or followed by another letter

> a synthetic compound used in perfumery and cosmetic surgery to improve the appearance of skin tone and irritation

> a type of cookie made with dough, jelly, butter, or chocolate, often filled with extra flour

Pretty impressive. I've never seen fake text so real. (I mean none of these seem to quite make 100% logical sense, but if you were just skimming the sentence nothing would stand out as a red flag.)


I always like to point people to /r/SubSimulatorGPT2 [1] as a good example of what GPT2 is able to accomplish.

It's a subreddit filled entirely with bots, each user is trained on a specific subreddit's comments matching it's username (so politicsGPT2Bot is trained on comments from the politics subreddit).

Go click through a few comment sections and see how mind-bendingly real some comment chains seem. They reply quoting other comments, they generate links (they almost always go to a 404 page, but they look real and are in format that makes me think it's real every time I hover over it) entirely on their own, they have full conversations back and forth, they make jokes, they argue "opinions" (often across multiple comments back and forth keeping the context of which "side" each comment is on), and they vary from single word comments to multi-paragraph comments.

Take a look at this thread [2] specifically. The headline is made up, the link it goes to is made up, but the comments look insanely real at first glance. Some of them even seem to be quoting the contents of the article (which again, doesn't exist) in it's comments!

If you threw something like 50% "real humans" in the mix, I genuinely don't think I'd be able to pick out the bots on my own.

[1] https://www.reddit.com/r/SubSimulatorGPT2/

[2] https://www.reddit.com/r/SubSimulatorGPT2/comments/fzwso5/nr...



This one is also fantastic, the post text itself is so goddamn convincing that I actually searched on youtube for the name of the show to see if it existed (it doesn't).

https://old.reddit.com/r/SubSimulatorGPT2/comments/gj2z4f/ia...


I particularly like the "edit" in light of the "mod comment".

The "edit": UPDATE: I just wanted to say that I'm a huge fan of all of you. The subreddit is extremely well run, and the amount of support and encouragement that I've been getting from the community is incredible.

The "mod comment": "Hi Iamacreamt! Your post has been removed because this topic isn't suitable for /r/IAmA."


These comments within that post were also really impressive.. it still seems off but the fact that it carries the topic between replies in a thread is so crazy to me:

> How do you feel about the internet arguments subreddit, r/iamverysmart? > As a young professional, where do you prefer to have internet arguments with your friends? I've been thinking about leaving YouTube for Reddit, but I don't currently have much free time, and I want to have some quality back and forths with my friends.

---

> I love that subreddit. I love how open minded it is. The only reason why I didn't like it was because it's so popular. I'm glad that I'm not the only person who has this problem, because I never had any reason to like it.

---

> Thanks for the reply :) I had a bit of an argument with one of my friends over reddit, and I was wondering if you could give me insight on the way you handled it?


What stands out to me is that the comments are treating "arguments" as passion as if it was "apples" or "old cars".

"I had many arguments growing up"

"where do you prefer to have internet arguments with your friends"

It interpolates "arguments"-related stuff into the posts as with any other topic. It does it well, but seems to have missed that it's not a conversational topic like most others. We don't talk about arguments like we talk about basketball. But here's a peek into what it would look like if we did!

(I also enjoyed the bot signing both a question and the immediate reply with ~Alex)


Treating arguments like sport is basically politics in a nutshell for some these days


That seems like a idiosyncronous but plausible thing to talk about as if it's a conversational topic like most others?

Although most people probably don't prefer to have internet arguments (especially with their friends who sign both a question and the immediate reply with ~Alex).


I am enjoying them being meta as hell on this thread though:

https://www.reddit.com/r/SubSimulatorGPT2/comments/caaq82/we...


> I think the US has turned from a police state into the police state we see today. They're just using more tools to keep us safe in the eyes of the government. One major tool that I can think of is the TSA. The TSA is a tool to keep us safe, not to keep us safe. I believe the government and TSA have become a one party system. They use the TSA as a way to keep us safe, and then use the TSA as a weapon against us if we're too annoying. A lot of people do not understand the government or TSA. It's very easy to do what I mentioned above.

-- https://old.reddit.com/r/SubSimulatorGPT2/comments/gj7ony/cm...

I guess sometimes it errors out :D


I subscribe to it so it’s mixed into my front page and every now and then, I read a post and get a good way into the comments before I realize what sub it is.


This one has a bunch of bots talking to each other with eerie perfection: https://www.reddit.com/r/SubSimulatorGPT2/comments/giw40p/wh...



Wow, this is so much better than the original /r/subredditsimulator (which was Markov Chains).

That's a very fun thread you linked - it's very believable!


One fun part - we used the inline metadata trick to train a single GPT-2-1.5b to do all the different subreddits. It allows mutual transfer learning and saves an enormous amount of space & complexity compared to training separate models, and it's easy to add in any new subreddits one might want (just define a new keyword prefix and train some more). Not sure that trick is meaningful for Markov chains at all!


What is the inline metadata trick?


It's an old trick in generative models, I've been using it since 2015: https://gwern.net/RNN-metadata When you have categorical or other metadata, instead of trying to find some way to hardwire it into the NN by having a special one-hot vector or something, you simply inline it into the dataset itself, as a text prefix, and then let the model figure it out. If it's at all good, like a char-RNN, it'll learn what the metadata is and how to use it. So you get a very easy generic approach to encoding any metadata, which lets you extend it indefinitely without retraining from scratch (reusing models not trained with it in the first place, like OA's GPT-2-1.5b), while still controlling generation. Particularly with GPT-2, you see this used for (among others) Grover and CTRL, in addition to my own poetry/music/SubSim models.


I almost got miffed about this one complete jerk. Then I remembered how it was generated and laughed through the whole thing - it is so uncanny. Genuine Reddit emotions were had.



Hah, I love this one: https://www.reddit.com/r/SubSimulatorGPT2/comments/fzwso5/nr...

"You're arguing against yourself!"


Wow. Now I really wish they had released a pretrained GPT2 model.


There is a whole news agency that is build upon GPT-2. There is a social media influencer bot that also uses GPT-2 and also responds to comments and is mostly coherent.


I am deeply disappointed that /u/SubSimulatorGPT2GPT2Bot does not exist.


One of the recurring tropes on Hacker News whenever a text generator (either RNN or GPT-2 based) project is posted, there is inevitably a comment saying "this is indistinguishable from a Markov chain."

In this case, it's impossible to do so.


If someone could make a billion word dictionary of these words, you could get excellent sentence compression rates out of standard English.


GPT-2 itself works off of excellent sentence compression (byte-pair encoding for the input and output)


so for example, a byte-pair input could be "potato salad in rum" and the output might be "potadum"?


Great stuff. Found a real word:

  refactoring

  the systematic and systematic reworking of a piece of text to reduce unnecessary redundancies
Interesting that the made-up definition is pretty much the real one.

https://www.thisworddoesnotexist.com/w/refactoring/eyJ3IjogI...


Got a few comments about that one and a few others! I just updated the existing word blacklist and re-deployed.


"fibroblastosis" -- appears to feature in some medical journals

"bryosphere" -- something to do with moss

"biosprint" -- a brand of yeast.

Many of these 'non-words' have already been taken....


Oh, that second one is cute. I read up on bryophyes (moss and friends) for an exceedingly brief stint back in my undergrad days. Pronunciation similarity to bio made for many "bryo" puns.


I got Undercrowded

  undercrowded
  un·der·crowded
  (of a place) not full of people or vehicles
  "the area was undercrowded with traffic"


Oh, I'm surprised the "blacklist" isn't just the standard English dictionary. I'm sure I'm just being naive though. Why not just blacklist any word that already exists in English?


There are some subtleties (e.g. hyphens, derived forms, bigrams, etc.) but the biggest problem is that most English dictionaries don't have entries for every scientific word / piece of internet slang. I ended up tokenizing Wikipedia for a blacklist and still missed a lot :(


> I ended up tokenizing Wikipedia for a blacklist and still missed a lot :(

That sounds like an impressive project in itself :)


Words not on Wikipedia, found on other sources, listed by frequency (perhaps with a date-weighting of the source document to reduce rating of older sources), would be an interesting way to find holes in Wikipedia's coverage.


Someone should make a Wikipedia page of that list. Oh, wait.


I like how you had information, made a sarcastic comment about it, but didn't share the actual information ... just in case your comment might prove helpful ...


Are you saying the URL of that Wikipedia page is “actual information” that patrickthebold failed to share?

I think that page doesn’t exist. patrickthebold wasn’t sarcastically mocking people who were too lazy to look up that page. He was just making the point that as soon as a hypothetical list like that was uploaded to Wikipedia, it should be deleted, since those words would then be words found on Wikipedia.


That may be because there isn't a single definitive list of all words in the English language

https://www.lexico.com/explore/how-many-words-are-there-in-t...


If you're using a blacklist, it it really machine learning? Or are you using it to re-train?


blacklist is probably to avoid cases where it randomly generates a real word like above two cases, so that blacklist filter is probably applied after the ml stuff.


Yup, the line for the blacklist lookup is here: https://github.com/turtlesoupy/this-word-does-not-exist/blob...


Data scientist here. It's common to define boundaries for a machine learning algorithm by hand. Think of telling a chess AI that it can't move pieces off the board.


Unspooled and Hardstyle popped up for me. Perhaps you should do a google search for generated words before displaying them to prevent existing words from being shown.


Or maybe the ML algorithm behind is just using an existing dictionary and performs no operation.


You flatter me by thinking so; it is a bot! The source code is open here: https://github.com/turtlesoupy/this-word-does-not-exist


It gave me "Intermodulate" a minute ago. https://en.wikipedia.org/wiki/Intermodulation


That ones a little more fuzzy; intermodulate doesn't occur very much in discourse (e.g. not in the wiki article at all) even though it would naturally be related


https://www.thisworddoesnotexist.com/w/bordellum/eyJ3IjogImJ...

Bordellum. It's a word, I think?

https://www.thisworddoesnotexist.com/w/disaproval/eyJ3IjogIm...

Disaproval. So close to a real word it looks like a misspelling.

Anyway, this is really interesting.


For the latter, press the "Write Your Own" button and it'll do exactly that

In general, I don't fix a random seed so in general you can get different definitions (sometimes: I cache data)


Thanks!


I got "cyberpolice" which seems to be a real word.

I also got "deflategate" which i love - it must be an upcoming scandal! :-)


Deflategate was a National Football League (NFL) controversy involving the allegation that New England Patriots quarterback Tom Brady ordered the deliberate deflation of footballs used in the Patriots' victory against the Indianapolis Colts in the 2014 American Football Conference (AFC) Championship Game.

https://en.wikipedia.org/wiki/Deflategate


> it must be an upcoming scandal!

It was, in 2016.

https://en.wikipedia.org/wiki/Deflategate


Multiple people managed to find it? How likely is it to generate the same non-word more than once? Is there a limited set?


I also got «invention» up as a result.


I just saw "gofundme".


I got "pataphysical".


Very impressive. It was reminiscent of the Polish sci-fi writer, Stanislaw Lem who invented more words than Shakespeare. That made for some tricky translations. Here's a list of new words in just one of the books alone:

https://en.wikisource.org/wiki/User:Hal_Bregg_II/Neologisms_...

This 1961 book predicted what Lem calls "opton", an electronic book reader with only one page between the covers (Kindle Opton special for Lem's 100 year anniversary next year would be nice!)

I wonder if someone has created sci-fi short stories with that data set yet.


I'm not sure that Shakespeare really invented those words. A lot of the first use citations in the Oxford English Dictionary tended to go for prominent writers in its first use citations.


Here’s the quote of the 1961 e-reader prediction: https://i.imgur.com/e1x76Nz.jpg


"Stanislaw Lem who invented more words than Shakespeare."

SL's inventiveness is beyond reproach but he lived in a time (died 2006) rather later than Mr (I have 50 spelings of my name) Shakesper. Lem spoke several languages and a cursory glance at your link seems to show a lot of drinks!

Shake a spear did invent a huge number of English words. Some of them were due to the rather random speling existent in the Elizabethan England of the 14th C. The rest were the result of a creative mind that needed to deploy ideas and concepts in ways that were not available at the time. The clever thing is that he created many words that seem so obvious in meaning - and are so obviously "English". He literally understood English to its core and was able to manipulate it effectively. Good skills that man.


It is doubtful how many words were actually invented by Shakespeare, as opposed to a Shakespeare text just being the first recorded use of that word.


And we owe the word "robot" to Karel Čapek (AFAIK). "Rabota" - means work in Bulgarian, and I guess similar in many other slavic languages.


FYI, the robot and rabota are pretty close due to ‘robota’ meaning work in old Czech.


this is exactly how it was invented.

Also - "robot" was not invented Karel Čapek but his brother Josef. Karel was looking for good word describing mechanical workers for his theatre play called RUR.


Didn't know that - Thanks!


It would be neat to do this with different dictionary training sets such as a legal dictionary or biology glossary. I wonder if it could generate useful creative ideas in the mind of a professional.

Or, if feeling contentious, other wordspaces https://gender.wikia.org/wiki/Category:Gender_Identities


I have some starter code for urban dictionary here if you want to give it a go: https://github.com/turtlesoupy/this-word-does-not-exist/blob...

The early results were that it works, but noisier datasets are tough. The urban dictionary corpus also has a ton of racist definitions


Lem is such a witty writer and his short stories are so thoroughly enjoyable. He's very clever!

His work isn't like Terry Pratchett but I find that there is a commonality in their cleverness. Still can't get over Pratchett's 'ideon' (a fundamental particle that when it strikes the brain, creates an idea; hilarious!) or that weird story of Lem's where an abandoned traveler gene-chemical engineers a sapient civilization to build himself a ship so he can fly back.


Podcasters and Youtubers have been pronouncing squarespace incorrectly all this time!

squarespace

squares·pace

the ability to play in a board game within a system by controlling the left and right edges of the board or a grid surrounded by squares

https://www.thisworddoesnotexist.com/w/squarespace/eyJ3IjogI...


I was getting a lot of brand names, including Sporcle and Truvia.


Complete List (so far) of this X Does Not Exist sites:

• This Person Does Not Exist https://thispersondoesnotexist.com/

• These Lyrics Do Not Exist https://theselyricsdonotexist.com/

• This Cat Does Not Exist https://thiscatdoesnotexist.com/

• This Rental Does Not Exist https://thisrentaldoesnotexist.com/

• This Waifu Does Not Exist https://www.thiswaifudoesnotexist.net/

• This Resume Does Not Exist https://thisresumedoesnotexist.com/

• This Artwork Does Not Exist https://thisartworkdoesnotexist.com/


> Complete List (so far) of this X Does Not Exist sites

"citation needed" on that completeness claim, or rather "this list is incomplete, you can help by expanding it"

Also: https://thisfursonadoesnotexist.com/


I'm looking for a meta "This list does not exist" generator.



Disappointed that this was not a machine-learning-generated list of "This X Does Not Exist" generators.


Those cats are nightmare fuel. Too many tails, paws and everything is fur. Not to mention pose and body proportions being in uncanny valley territory even when it gets the number of appendages right...

https://i.imgur.com/saUivfe.png

shudder


When it's wrong it's really wrong.



This Exist Does Not Exist.... coming soon!


Is there a page that will AI generate “thisthingdoesnotexist” pages?

Asking for a friend


Straight from a science fiction novel:

    eicoscience
    eico·science

    the branch of physics that deals with the behavior, physics, and the properties of living organisms
    "I started thinking about the world thinking of me and I never really went deep into the eicoscience"
https://www.thisworddoesnotexist.com/w/eicoscience/eyJ3IjogI...


    neurotheistic

    relating to the theories of early visual culture,   
    particularly those which emerged out of attempts 
    by the Chinese to achieve a cosmopolitan consciousness 
    through the use of classical ideas
    "the neurotheistic ideas of Mao Zedong"
I like the word, but the definition does not seem to do much with the obvious etymology of the word. One could, e.g., imagine using the word for people who have developed a religious attachment to deep learning networks.


I really like this definition. If a human author took it and developed it, it would be a pretty original angle.


  strillation
  stril·la·tion
  the action or process of boiling (a subject, typically an 
  animal) at room temperature
I have to say this is pretty hilarious


Depends on the liquid you're using!! You can definitely boil a squirrel in liquid nitrogen!


Liquid nitrogen boils below room temperature though. I once strillated a squirrel on top of Mount Everest because the pressure was so low...


While this is technically true, it would more commonly be described as freezing.


No one said anything about pressure :)


Strikes me as very Douglas Adams in its absurdity


A word that probably should exist


I was honestly surprised this one didn't already exist, since it's such a natural extension of the roots:

reductory

re·duc·tory

relating to or denoting products of decay

"winding and burning reductory tissue"

https://www.thisworddoesnotexist.com/w/reductory/eyJ3IjogInJ...

Edit: fixed mono spacing.


I've seen "reductorio" being used in Spanish but the "Real Academia de la Lengua" says it doesn't exist as a proper word.

And yes, it means exactly that.


Might be time to coin some new phrases.

  nonmagical
    non·mag·i·cal

    unquestionably true or valid
    "his nonmagical beliefs and ideas"
Well that's a perfectly cromulent word.


My favorite so far:

  décorbé

    a thing that smells or tastes terrible


  noun.
  tartou
  tar·tou
  
  a Chinese pastry
  "each piece of beef tendon tartou serves four and a half hours ago"

Time-traveling pastries, wonderful.


Every time I go to my friend's house for tartou he apologizes and tells me they ran out hours ago. :(


It's like that reverse wine that gives you the hangover first.


That’s called exercise


This site is really dolky†, I'm loving it.

https://www.thisworddoesnotexist.com/w/dolky/eyJ3IjogImRvbGt...


Delightfully meta. I'm really loving some of these words, and might seriously consider using some of them.


  nailblast (n.)
  1. a wound on the external surface of the nail
     "her throat was swelling with nailblasts"
Holy crap, is she okay?!?


Wow, even with definitions! That looks like a better version of a game I made years ago where you have to pick out the real word from four options. The three "non-words" are generated by Markov chains:

https://www.michaelfogleman.com/wug/


This game is great; I've been trying to think through ways of using these NLP models in a competitive game but the mechanics aren't obvious. It would be awesome to do something like an AI rap battle


Your website is amazing and full of very interesting projects, looks like a lot of fun.


This is perfect to name startups / products. Very well done.


Yes, I am here to launch my new cure-all "metasodium" its like sodium, but meta.

metasodium meta·sodium any of a group of silica compounds thought to act by the interaction of sodium with sodium, the latter of which has many physiological roles and is essential in the modulation of many physiological processes "a polymeric metasodium oxide from which such compounds in the cell cycle are derived"


This sounds like the method Star Trek: Voyager writers used to come up with their technobabble


Came here to comment this.


I'll give it a go! Seems pretty easy to replace the blacklist test with a WHOIS lookup.


Don't bother, WHOIS is pretty much useless 9/10 times nowadays.


    handjob
    hand·job
    a job offered to a woman as a job or offer offered by a man
    "you have won a handjob as a waitress"
https://www.thisworddoesnotexist.com/w/handjob/eyJ3IjogImhhb...

Uhh.........


That's really unfortunate; results are straight out of the algorithm but it's reflective of a bias in the training data set.



Looks like it doesn't understand geography or human habitats very well, but entertaining nonetheless!

  cubot
  a member of an American Indian people living in 
  shallow water off central Alaska and western Australia
https://www.thisworddoesnotexist.com/w/cubot/eyJ3IjogImN1Ym9...


This gave me a good laugh

    noun.
    swem

    the process of forming a strip of mucus before injecting intravenously
    "semen started to emerge immediately after swemting"

https://www.thisworddoesnotexist.com/w/swem/eyJ3IjogInN3ZW0i...


Bug report: I hit “Microtissue”, a word that does exist, and its definition is incorrect ;-)

(https://www.thisworddoesnotexist.com/w/microtissue/eyJ3IjogI...)


Oh no, it must have missed my blacklist. It's supposed to give a little "this word probably exists" when that happens. It's surprisingly hard to determine if a word already exists: https://www.thisworddoesnotexist.com/w/tissue/eyJ3IjogInRpc3...


I also got "adverb. quaterly", "noun. geotagging", "adjective. mispriced", "noun. wholy" (google says it's an obsolete spelling of wholly).

I didn't see any note saying "this word probably exists".


Is there any sort of logic to discourage it from using a word in its own definition as it does there?


Not currently, but I should probably reject samples that do! Filed a task here: https://github.com/turtlesoupy/this-word-does-not-exist/issu...



Is it comparing to a list of common bigrams? It seems to come up with those a lot.


Same with accommodability[1], except that (impressively) it _did_ get the definition more or less correct.

> accommodability

> ac·com·mod·abil·ity

> The quality of being likely to be useful, effective, or useful

---

[1] https://www.thisworddoesnotexist.com/w/accommodability/eyJ3I...


Same deal with "infocenter": https://www.thisworddoesnotexist.com/w/infocenter/eyJ3IjogIm...

Granted, I don't know if it's technically a dictionary word, but I've definitely seen it in technical contexts.


Wouldn't look out of place in Germany with their ubiquitous info points, service points and other cobbled together denglish word creations




Yeah I got comicbook, nonreferential, barleywine, geophile, which could all be compound words, maybe hyphenated today but perhaps not tomorrow.


Just pushed a change to update the blacklist (collapsing hyphens / spaces to single words); it'll never be perfect but hopefully slightly better!


This doesn't always produce a new word. I got "metamucil", a brand of fiber supplement.

https://www.thisworddoesnotexist.com/w/metamucil/eyJ3IjogIm1...

edit: also got "redbull" https://www.thisworddoesnotexist.com/w/redbull/eyJ3IjogInJlZ...


I got "crowdsourced", of which the definition is solidly in the uncanny valley: https://www.thisworddoesnotexist.com/w/crowdsourced/eyJ3Ijog...

It almost gets it right...


Interesting! I got a different result for "crowdsourced". https://www.thisworddoesnotexist.com/w/crowdsourced/eyJ3Ijog...


Good call, when you press "generate your own" it tries to detect these and displays a little warning but my blacklist is not complete. My model is bolted on top of GPT-2 which used the web as a training set which is probably why some of these are popping up.


I got polyfill. Interesting how it stumbles on these things. The definitions are the more interesting parts.

https://www.thisworddoesnotexist.com/w/polyfill/eyJ3IjogInBv...


I got whitepaper


polystain


zooland


This is pretty cool but I am getting some strange results sometimes:

https://imgur.com/a/rv0wHN1


That's not ideal and probably picking up some of the original training set from GPT-2 (this model is bolted on top of it)! How about DUOLINGOLOGY instead https://bit.ly/3fPGP8q

I'm using a blacklist to reject "real" words but it's surprisingly hard to build for rare words. I'm up to ~600K items after parsing Wikipedia tokens and it still doesn't capture everything.


  airpods
  air·pods
  a large pair of wings and wings of a bird or other flying 
  animal, typically used as a guide for a figure skater and 
  paraglider
  "a pair of airpods"


That's understandable! Another one I got was "whitepaper".

In any case, it's all pretty impressive! Defintely something one can use for creative writing or fantasy world building.


It seems to make compound words just like humans do

gigafactory

nonplayable

waterboy

pepperjack

unreimbursed

interop

nonalloyed

backdoored


I've just realised this tool is great for inventing new species or spells for my DnD campaign. Things like "Lollyfish" "Bannabeat" and "Sanaf" sound like awesome little plants or creatures to decorate my world with!


Hah good idea. I’m using thispersondoesntexist to generate npc appearances ad-hoc in my campaign. I believe there is a potential for a generator that understands fantasy races.


That's a cool idea! I know nothing about DnD; do people usually write the campaigns down and/or compose with a tool?


Or fantasy black market drugs.


  patentless
  patent·less

  not having or requiring a license of a particular kind and without permission
  "patentless wireless communications"
  a word that does not exist; it was invented, defined and used by a machine learning algorithm.
Well, not exactly [0]!

[0] https://en.wiktionary.org/wiki/patentless


Ha!

    neuterization
    neu·ter·i·za·tion
    the denial of a person's sexual identity and gender 
    identity to someone else
    "she had undergone neuterization of her facial hair"


Love this. Added it to my site that shows a bunch of other variants: https://thisxdoesnotexist.com.


Amazing! I found https://www.thiswaifudoesnotexist.net/ there a while back


    manpool 
    a large pool, typically as part of a carnival or carnival 
    *"a few old dead men may be playing in the manpool"*
So close, yet still so far.


I'm not understanding the praise this is getting. The words I've seen are very clearly wrong and do not match how English words are made. Some examples:

> méxis: an obsessive or revelatory pursuit

No comment....

> heelbark: a red braid fastened to a man's hat so as to prevent heeling

Unless you put your hat on your shoes, you're on the wrong end of the body.

> transgate: raise the value of (something, especially money) by expanding its capacity to become transactions or funds.

What's that even mean?

>noress: a unit of electric charge equal to one nanosecond

Where's the Coulombs? Who is J̶o̶h̶n̶ ̶G̶a̶l̶t̶ Noress?

Additionally I'm seeing words that either exist or are natural permutations/mispellings. Example:

> monucleotides, but mononucleotides are a real thing.

Additionally, the example sentences are just as crazy. Maybe I'm having bad luck. There are some good hits, but the majority of them appear pretty tashy (this is a crazy difficult problem!)


How are English words made?


Typically they have a root to them. There are words that don't and are made up, like yeet (which I'll consider a word because of its usage and common knowledge), but other words like "microscope" are are derived from Latin or something else. The example here is from microscopium. There's a lineage and things modify more slowly (slang typically moves faster but also rarely stays in the lexicon long term). Many words are portmanteaus or compounds, like heelback (heel + back). How words are composed is called Morphology[0]. I mentioned in another comment morphemes. Let's look at transgate. We have trans+gate. Trans is a loan word from Latin meanings “across,” “beyond,” “through,” “changing thoroughly,” “transverse". We know what a gate is, but it can also be like a block (gated) or in a circuit (which is like a door). Here the model is taking the morpheme "trans" and using it as if it is "transaction". But in "transaction" the word makes sense because it is through an action (the word started from the meaning to do business and because this often means exchanging money, that's how we now think of it).

So "transgate" also sounds weird because it has opposing ideas. "through" + "block". But we need to look at morphemes to see why. At least (IIRC) it made this word a verb.

[0] https://en.wikipedia.org/wiki/Morphology_(linguistics)


I'm not a linguist, but typically words evolve as memes and/or follow etymological patterns made up of root words. It's very rare that they're plausible sounding gibberish attached to plausible arbitrary meanings. This generator seems like it's in the "uncanny valley".... They're all somewhat plausible immitations of words, but the fact that they're not natural can be felt.


Totally agree with all of that. It's probably a matter of outlook; I expect and enjoy some weird uncanny valleying from a humorous ML generator.


I’ll add it’s also getting syllables wrong:

- incineratory split into in•cin•e•ra•tory which have the suffix wrong (right is tor•y)

- re•tro•genic instead of re•tro•gen•ic


I think people here are missing what you're saying because it is subtle. Which is correct. That "troy" is different from "tor•y". "y" should be the suffix. Just like how "fix•ed" would be different from "fi•xed". "y" is the suffix like "itch" vs "itchy".

What this means, building off of what the evidence I gave, is that this model is not learning the morphemes (smallest root meaning). This exact characteristic is part of why these words sound weird. It is the same problem as the one brought up by tasogare.

[0] https://en.wikipedia.org/wiki/Morpheme


You are on to something there. For the syllables I'm actually using a rule-based model from Python's "pyhphen" library: https://pypi.org/project/PyHyphen/

I am not totally happy with the results but have not had a chance to train my own


https://en.wikipedia.org/wiki/The_Meaning_of_Liff The book is a "dictionary of things that there aren't any words for yet, written by Douglas Adams and John Lloyd.


I have it on my desk!



    stirrupie
    stir·rupie

    a stupid or depraved person; an immoral or destructive person
    "every single one of these artists was a stirrupie"
Gold. I bet Neil Stephenson would have loved to have this when he was writing Anathem.


>chlamydrama

>a narrative, typically one that takes place on a screen or on social media

https://www.thisworddoesnotexist.com/w/chlamydrama/eyJ3IjogI...

I'm impressed, a witty wordplay.


This word does exist (though not with the given definition):

backtick back·tick

    an act of greeting or greeting someone or something with a quick facial tick
    "we were greeted with a backtick"




Second word I got was "showercap" which is borderline: https://www.thisworddoesnotexist.com/w/showercap/eyJ3IjogInN...


I think this needs to do a google search for each word before assuming it doesn't exist. I got "glosscoat", which is a type of paint/coating (typically hyphenated, but non-hyphenated examples exist).

Wow, the links are just base64'ing the whole text because of course there's no way to trivially reproduce it...

https://www.thisworddoesnotexist.com/w/glosscoat/eyJ3IjogImd...


Yep! It's a sampling procedure; I could fix a random seed or put the results somewhere. "somewhere" ended up being the browser URL here (with a signature to prevent tampering)


Ok, this one made me laugh. Well done.

  facetook
  face·took
  an instance of talking with no expectation of truth or accuracy
  "a complete facetook of her statements"


This is awesome.

> scumdoggery

> scum·dog·gery

> seizure of property as a practice or violation of the law, especially in financial matters

> "he didn't care for the scumdoggery he got for being a police officer"


My second word seems... quite questionable:

nonhumanoid

1. not having life

"a nonhumanoid woman"

https://www.thisworddoesnotexist.com/w/nonhumanoid/eyJ3IjogI...


ahegao

ahe·gao

a formal gesture representing a happy ending or blessing

"her head was brought to a halt by the ahegao of the chair"

lol. entertaining


This one is actually a real word, AFAIK (and the definition isn't too far off, haha). Not sure if that's intended.


I know, I added it to the site.


Ahegao is a word, Japanese though. Not Safe For Work if you look it up.


I know, I added it to that site to see what it would generate, as it only generates English words and tries to attribute unknown words to something most similar


I love this:

    nonfatalism
    non·fa·tal·ism

        the belief that one's parents don't have sex for the same reason as oneself
        "a nonfatalism argument"
Or:

    brainsack
    brain·sack

        a sharp crack or slit in the surface of sea or sky
        "a slushy brainsack close to shore"
Brainsack is a sharp crack on the surface of the sky. Amazing.


Seems like it failed?

headbutter. One who strikes other people with one's head. You are becoming known as a headbutter, so unless you want the league to suspend you, I suggest that you stop playing dirty! Headbutter - Idioms by The Free Dictionary https://idioms.thefreedictionary.com/headbutter


noun. backpressure back·pres·sure the pressure exerted up against a fluid, caused by the flow of air or water through it, exerting great physical pressure on the body "a low backpressure"

https://www.thisworddoesnotexist.com/w/backpressure/eyJ3Ijog...


    quora
    a set of questions
    "a rhetorical quora"

!


Amazing work! I just created Find Your Next Startups Name using this tool.

https://find-your-next-startups-name.now.sh/

https://github.com/mrpeker/find-your-next-startups-name


Yosho! What a great tool!

yosho expressing a strong enthusiasm or confidence "yosho! It's going to be one of the best days of your life"


That reminds me of the Japanese word よし (yo shi) (which sounds like "Yosh"). Its meaning is very nearly identical: an expression of excitement or enthusiasm, equivalent to saying "all right!" or "okay!" in English. Was the model trained with the Japanese word and its definition?


Delightful; especially to create words for their sound rather than their meaning, which the machine declares for whatever reasons it has at the time. It interested me that I was sometimes disappointed with the supplied definition, and sometimes strangely pleased, even though I'd no meaning in mind when I made up words. This is sublime.


This is amazing:

quintessay

1. a piece of writing (usually one of short or noibid) expressing or expressing a person's view, especially the concept of something abstract or self-subsistent. "the main quintessay of feminist theory."

2. a word that does not exist; it was invented, defined and used by a machine learning algorithm.

Link / Generate another word / Write your own


see also: pseudothesis


I got baitbox which is well...

https://en.wiktionary.org/wiki/bait_box

Still pretty cool. I'd totally believe some of these definitions if I didn't see the site.

Also this one which I've seen used quite a bit. overfamiliarize https://www.thisworddoesnotexist.com/w/overfamiliarize/eyJ3I...


> tiger-necked poodle

> a very small, sweet, pouched porcupine; a cricket

> "soak in a splash of chocolate red liqueur to ensure that the fluffy bits eventually turn into tiger-necked poodle poodle"

https://www.thisworddoesnotexist.com/w/tiger-necked+poodle/e...


The first two I generated were pretty good with the first being a very "true" sounding word - however I then got Spongen[1] - an aromatic berry of a variety with a bright red, yellow, or greenish taste "spongen, light white berries" which seems like a pretty big adjective fail.

1. https://www.thisworddoesnotexist.com/w/spongen/eyJ3IjogInNwb...


Except Transclusion was defined by Ted Nelson in Computer Lib / Dream Machines. :)

https://www.thisworddoesnotexist.com/w/transclusion/eyJ3Ijog...

See here: https://en.wikipedia.org/wiki/Transclusion


> minthority

> a fraction of fifty or more (the whole number one), especially within a government or the ranks of a corporate organization

This does exist.

> pneumostomy

> a surgical operation in which a tubular vein is drawn into the lung "he takes a long pneumostomy at the hospital"

This A.I. could be predicting future words.


Do you have a suggestion as to how to filter such a word?

It has 64 results on Google

2 results on DDG/Bing.

And no matches on any online dictionary that I could find (Merriam-Webster, Urban Dictionary, ...)


it's quite good with jargon that sounds [to the layman] plausibly medical

cyphroglodystrophy cyphroglodys·tro·phy a form of muscular dystrophy of muscle, caused by compression of an amyloid cytochrome "children with cyphroglodystrophy have unusually low blood pressure"


>durantino

>du·rantino

> a dairy-based flavor made by supplementing milk with milk of the same consistency as dairy acid

> "durantino preserves and protects against all age-related diseases through the combined action of dairy and vitamin D supplement"

Great. I think I want to buy some durantino


Ah. I misread this as "this world does not exist" and was briefly disappointed.

Briefly.


Seems like a perfectly cromulent project to me.


How about embiggening that to incromulentness: https://bit.ly/2zFhVaD



Mycogeny: the formation of a mycoplasma within a cell "mycogeny was detected in liver urine and its recovery in lymph nodes remained unclear"

It’s not too far off Mycogen: As Asimov explains in Prelude to Foundation,[21] their name is formed from the Greek stems myco- (meaning 'yeast' or other types of fungi) and -gen (meaning 'maker' or 'producer').

https://en.m.wikipedia.org/wiki/Galactic_Empire_(Isaac_Asimo...


I made a page some years ago with 13000+ nonexistent words. You get to choose your own meanings.

From memory, it picks each letter with the same probability of following the previous two letters as actual english words have. The more previous letters included in calculating probabilities, the more like actual words you get. My list is on the wild side. Not novel, but was fun to do. Good for writing Jabberwocky-type poetry.

http://www.adamponting.com/pseudo-word-list/


noun [usually as modifier] deflategate de·flate·gate a situation in which one side is unable to extricate itself from a dangerous dilemma, especially one involving civil disobedience or military attack "a nuclear deflategate could doom the North Korea situation" a word that does not exist; it was invented, defined and used by a machine learning algorithm.

This word was used for the Patriots scandal a few years ago.

https://en.m.wikipedia.org/wiki/Deflategate


Some of these are squarely in the uncanny valley! https://www.thisworddoesnotexist.com/w/indpendent/eyJ3IjogIm...


Funny that in Dutch probably (at least) some of these words do exist, I mean stuff like "week broodje" (mushy little bread) and "weekbroodje" (bread of the week) have a very different meaning based on the use of a space or not. In fact, we have a nice website that focusses on the incorrect use of spaces: SOS (Signalering Onjuist Spatiegebruik) [0], notice that spatiegebruik is 1 word ("Space-use")!

[0] https://www.spatiegebruik.nl/


Tuberculant

https://www.thisworddoesnotexist.com/w/tuberculant/eyJ3IjogI...

That's brilliant, and I'm going to use it.


Oops. It just served me up “undefined” as a word that doesn’t exist. (Not an error message - it literally gave me “undefined” complete with definition after generating a few good ones)

Interesting nonetheless.


One of the words I got was "unsubsidized" which appears in pretty much every online dictionary I've looked at, although it is red squiggle underlined in the comment box.


Yeah my first word was "carseat" [1] with the definition "the seat of a car". While I guess "carseat" all one word isn't technically a word, I would say it's perfectly cromulent outside of an English class.

[1] https://www.thisworddoesnotexist.com/w/carseat/eyJ3IjogImNhc...


All is missing is a link to register a domain with generated word.


This reminds me so much of balderdash which is oddly made up of real words but seem just as likely as these. Would be great to have this model try to play balderdash, though!


Some of them are pretty clever and could have easily become slang given enough time.

My first result was "upboard" which describes all the wood on a sailboat that is above the water.


For slang, I'm training another modelusing urban dictionary as the datasource. I hope to release it someday but there is a lot of work to be done to clean the data. The articles are huge, user-generated and full of racism.


Pronounced "ubberd"


  prefetude
  prefe·tude
  a woman's strong sexual desire, especially that of a young man
  "he was a perv and his prefetudes were obvious"
Ha


Pretty meta:

  verb [with object]
  refour
  re·four
  reinvigorate or allow (someone) to use new forms of expression
  "refouring popular culture"


This is brilliant, could barely be funnier if written by a human:

> adjective.

> nondegenective

> non·de·genec·tive

> (of a computer virus) preventing development of an infection in which the infection is found in the host, without warning or compromise to the computer

> "this virus is nondegenective against bovine cholera"

It's both impressive in its relevance and fluency of language, despite being nonsense, and hilarious. Had me in stitches at 'bovine cholera'.


> carnarium

> carnar·ium

> the central part of the human body, covering the anus and genitals, between the head and the chin

> "the ribs and ribs of the carnarium"

Absolutely amazing.


Self-aware Machine Learning

    procreationist
    1. relating to or advocating the theory that sex is the only biological sex, or as opposed to that other sex identified with reproduction "a procreationist approach to reproductive science"

    2. a word that does not exist; it was invented, defined and used by a machine learning algorithm.


My favorite one so far:

mantula: a small parasitic stinging insect that feeds on ants, flies, and other small insects, native to leafy lawns and shrubs. "mantulas are widely grown as food"

You could do some great auto-worldbuilding in a dwarf fortress type game with this. Maybe constraining the input data to "bio" and "historical" definitions.


Nice hostname generator :) Reminds me burgundy:

http://burgundy.io/


I got a word that does exist [1], with a definition that doesn't:

nonadjective

non·ad·jec·tive

not based on a principle or theory

"the philosophical and political alternatives to the totalitarianism of nonadjective politics"

[1] https://en.wiktionary.org/wiki/nonadjective


> nonreal - of the highest quality

> "an idealized, nonreal world of bliss"

Well... Plato sure wouldn't like your NN.


I guess it's only to be expected that from time to time, this would generate an actual word, even if perhaps the definition doesn't match a word's actual definition.

The first time I loaded the page, it came up with "polypyrrole", an existing word.


  tootwork
  toot·work
  coarse hair left over after a period of neglect by the hair follicles
  "he let his hair go, and she gave him a big tootwork"
This is quite amusing to play with; beyond that, however, I fail to see the purpose


No purpose!


This is brilliant. Authors could use this to create new words in their universe. Entrepreneurs could use this to get a unique and short dot com and product name. Someone could use this to create a "which word isn't a real word" quiz.



AKA: future software project / programming language / startup name generator.


Came across this, it just slammed two words together

instrument/adjective in·stru·men·t/ad·jec·tive

    an instrument related to a keyboard in that it is played rather than written
    "an instrument/adjective keyboard"


My recent favourite fictional word is Elonzi.

https://twitter.com/TESLAcharts/status/1256038204895047683



I got indigogo, which amusingly is the (slightly changed) name of a real service.


I got toxoplasm, also very near but definition is just hilarious:

toxoplasm tox·o·plasm

    (in humans) a microorganism of the alimentary canal, which forms a protective passage across nerve vessels adjacent to the colon, for example in the trachea
    "there is evidence that hemostasis greatly increases the risk of malaria by toxoplasm"


Toxoplasm exists, it basically makes rat suicide lemmings against cats. Also, it makes men more suicidal and women more... libidinous.


Disease is called toxoplasmosis, caused by Toxoplasma Gondii, I've heard about it, that's why I've pasted this definition. This AI has blacklist of valid words, toxoplasma is probably on list, toxoplasm is not. I don't know that much about latin, but toxoplasm looks similar to toxoplasma but is something other (sounds like a tissue to me). Toxoplasm without "a" at end probably does not exist, but I may be very wrong, so take that with a grain of salt.

> "protective passage across nerve vessels adjacent to the colon, for example in the trachea"

colon in trachea and toxoplasm(a) forming a protective passage is the funny part for me.


I got “jibberish”, defined as:

  a tense tense [sic] of a verb in de-escalation, usually after some verb has been removed or added
  
  "she gave a nervous jibberish if I didn't get the last word"


Opened the site and got this, is this just the AI or is this due to trolling?

cracula (crac·ula)

crocodiles attached to the genitals, typically with the anus protruding from much to their outermost point

"eating those cracula have no effects on the penis"


Just AI, it's getting... creative.


The next step is writing the Jabberwock equivalent completely with fake words.


leptosphere

lep·to·sphere

the space beyond the earth's crust which is about 2.8 billion miles (4.8 km) across and includes the oceans and Mars, Uranus, Neptune, and Jupiter and beyond

"the space between the earth's crust and the leptosphere"


tamarino

a Mexican dish of dried Mexican anchovies topped with grilled onions, anchovies, and tomatoes

"a tamarino with sherry cream sauce"

Personally I like to serve my tamarino with a side of anchovies.


First word it gave me was “pragma” which actually is a word.


This is delightfully clever.

I'm the developer behind http://prosecraft.io so I always enjoy learning about new linguistic toys :)

Thank you!


suckline[1]: an insincere or arrogant remark

couldn'nt help but thinking about trump :)

[1] https://www.thisworddoesnotexist.com/w/suckline/eyJ3IjogInN1...


Maybe I'm just unlucky, but I found the first three words very unimpressive and unconvincing.

>quaivarary

> a quantity or quantity that is different from a specified quantity or quantity in some sense


Applications are open for YC Summer 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: