×
all 148 comments

[–][deleted]  (6 children)

[deleted]

    [–]eric97pc 34 points35 points  (0 children)

    Could you imagine if pellum becomes a real word?

    [–]c_is_4_cookie 23 points24 points  (4 children)

    It's a perfectly cromulent word

    [–]auto-cellular 5 points6 points  (3 children)

    That's lobsterward on the decubit my sapol twessam.

    [–]ReasonablyBadass 1 point2 points  (2 children)

    Gesundheit

    [–]TheyPinchBack 0 points1 point  (1 child)

    Pretty sure that word exists

    [–]ParsleyTerror 0 points1 point  (0 children)

    Missed the joke buddy, unless...?

    [–]bunsandbunnies 117 points118 points  (6 children)

    [–]turtlesoup[S] 60 points61 points  (4 children)

    Whoops -- that's a real word too. Just pushed a change that collapses hyphens and spaces in the blacklist; that'll probably nuke a few of these!

    [–]flarn2006 0 points1 point  (0 children)

    I got "nonselectable", ironically enough. The definition was unrelated though, something about being immune to damage from physical action.

    [–]bradleyone 0 points1 point  (1 child)

    Can we get a sub for sharing some of our findings moderated by you please? I have been trading literally dozens of these over text with friends the last 2 days

    [–]turtlesoup[S] 0 points1 point  (0 children)

    Create the sub! I'm happy to moderate

    [–]bradleyone 0 points1 point  (0 children)

    I want to create a handsome annual leather bound edition of words and definitions from this project... I will seriously underwrite it if there are any takers. All proceeds to u/turtlesoup charity of choice.

    [–]Imnimo 97 points98 points  (2 children)

    adjective.

    wololo

    relating to the wololo.

    "wololo!"

    The mystery lives on!

    [–]turtlesoup[S] 28 points29 points  (0 children)

    Jankiness that proves I didn't cheat!

    [–]eliquy 8 points9 points  (0 children)

    See also: Age of Empires

    [–]fpgaminer 67 points68 points  (7 children)

    cybersmoke

    cy·bersmoke

    a machine for propagating and maintaining rumors or rumors more widely

    "he continued to be a fan of cybersmoke advertising"

    link

    [–]SpacemanCraig3 21 points22 points  (5 children)

    That's a useful word....

    [–]Putrid_Bowler 2 points3 points  (4 children)

    The hard part is pronouncing bersmoke as a single syllable...

    [–]leogao2Researcher 2 points3 points  (2 children)

    The dots don't indicate syllables, they indicate where the word can be hyphenated.

    [–]Putrid_Bowler 1 point2 points  (1 child)

    Oh, neat, I didn't know that.

    [–]problemwithurstudy 1 point2 points  (0 children)

    No, I think it's supposed to be syllables.

    [–]SpacemanCraig3 0 points1 point  (0 children)

    No harder than squirrel

    [–]SemanticallyPedantic 41 points42 points  (7 children)

    I got "trichlorobenzene" which is in fact a word.

    [–]turtlesoup[S] 58 points59 points  (6 children)

    trichlorobenzene

    Oh no! It's surprisingly hard to build the blacklist for rare words -- I'm up to like 600K items after parsing Wikipedia tokens and it still doesn't capture everything.

    [–]shaggorama 18 points19 points  (2 children)

    get a token for the google API and try searching the word, see what google thinks

    [–]turtlesoup[S] 32 points33 points  (1 child)

    That's a great idea! For now, when you enter something it thinks it is a word it'll throw a "this word probably does exist" with a link to Google.

    [–]shaggorama 5 points6 points  (0 children)

    Nice, that was fast

    [–]KnilAdlez 43 points44 points  (2 children)

    I just got refactoring which, as a programmer, I never want to encounter in my day-to-day life

    [–]turtlesoup[S] 24 points25 points  (1 child)

    How about REFACTOROLOGY

    I imagine this is picking up on some of the original words GPT-2 was trained on but aren't in my blacklist.

    [–]KnilAdlez 10 points11 points  (0 children)

    In all seriousness this is very cool! Great job!

    [–][deleted]  (1 child)

    [deleted]

      [–]turtlesoup[S] 2 points3 points  (0 children)

      Delicious!

      [–]CWHzz 24 points25 points  (0 children)

      I often wonder why we use long words when there are so many short words left unused. Very nifty project, I got:

      skullguard

      skull·guard

      surgery to stop a lizard or reptile from growing larger

      this is hilariously ominous. should have given Godzilla a skullguard

      [–]jojek 19 points20 points  (6 children)

      This is a really cool idea! Sometimes the results are amusing ;) https://imgur.com/a/MxHAX55/

      [–]hughperman 26 points27 points  (1 child)

      hardon

      1. a deep red marking on the skin of an animal, typically a pig
      2. "I felt the hardon on as he came across the door"

      [–]turtlesoup[S] 15 points16 points  (0 children)

      ¯\_(ツ)_/¯

      [–]turtlesoup[S] 14 points15 points  (3 children)

      I have some code to use Urban Dictionary as a dataset and you better believe it's... "amusing" haha https://github.com/turtlesoupy/this-word-does-not-exist/blob/master/title_maker_pro/urban_dictionary_scraper.py

      [–]KimonoThief 5 points6 points  (0 children)

      Would it be possible to make this version into a website? Sounds amazing.

      [–]MyNatureIsMe 1 point2 points  (1 child)

      I don't know if this actually makes sense but do you think you could do, like, multi-head trained versions which, in training, attempt to cover several dictionaries? Could be interesting to have something that is equally able to copy the Oxford English Dictionary, the Urban Dictionary, and perhaps a few others like, say, in different languages.

      [–]turtlesoup[S] 0 points1 point  (0 children)

      Totally makes sense! You could do it but the dictionaries have very different structure so you would need to be careful about how to formulate the loss

      [–]konasjResearcher 20 points21 points  (1 child)

      Sounds like an exciting activity:
      noun.
      wetfoot
      wet·foot

      1. a sports event in which people hold the feet in a standing formation and have one foot suspended from water, sometimes covered with sticky paper
        "the first two years of wetfoots were noted by parents as being too fast and too violent, and the first dry season"

      [–]Heroicster 0 points1 point  (0 children)

      I’m not sure I’m clear on the rules. What’s the sticky paper for? Throwing them off balance?

      [–]itsmybirthday19 19 points20 points  (1 child)

      Complete List (so far) of this X Does Not Exist sites:

      [–]so_on_and_so_forth 1 point2 points  (0 children)

      There's also This Foot Does Not Exist.

      [–]suspicious_Jackfruit 12 points13 points  (1 child)

      [–]PM_ME_INTEGRALS 15 points16 points  (0 children)

      Thank you so much for sharing, I haven't laughed this hard in a while! For posteriority:

      poppot

      "pop·pot*

      a light-operated revolving handkerchief resembling a comb, used for sucking at bottles

      "there was poppot on the table"

      [–]wintermute93 17 points18 points  (1 child)

      This is a perfectly cromulent project.

      [–]turtlesoup[S] 11 points12 points  (0 children)

      A noble spirit embiggens the smallest man

      [–]JakeAndAI 7 points8 points  (0 children)

      That's super cool! Love things like this, will look into it more in depth later :) Good job!

      [–]shaggorama 7 points8 points  (1 child)

      Lol, I love this. You should xpost to /r/LanguageTechnology and /r/compling.

      [–]turtlesoup[S] 1 point2 points  (0 children)

      Done!

      [–]CashierHound 8 points9 points  (1 child)

      Now train it on urban dictionary and watch the world burn

      [–]thepancake1 7 points8 points  (1 child)

      https://imgur.com/a/z2H0axA

      I don't think typos are considered new words.

      [–]turtlesoup[S] 8 points9 points  (0 children)

      That's not ideal, but it's hard to make a general rule while still allowing arbitrary input. For fun, here's an even typoier typo disssssssssapear

      [–]Blarghmlargh 6 points7 points  (0 children)

      Would be great to do a version of Balderdash with this as the engine.

      https://en.m.wikipedia.org/wiki/Balderdash

      [–]HuntingPhilosopher 4 points5 points  (2 children)

      Would you at all be interested in making a tutorial? I'd love to be able to make something like this myself!

      [–]turtlesoup[S] 4 points5 points  (1 child)

      Definitely, I just need to make some time for it. If you are adventurous the readme on github has some examples on how to use / train: https://github.com/turtlesoupy/this-word-does-not-exist

      [–]HuntingPhilosopher 0 points1 point  (0 children)

      Perfect, thanks!

      [–]riukhin 3 points4 points  (1 child)

      Very neat idea. Is the line under the word supposed to break it down into syllables? I got "chi · ronene", where the second part can't possibly be a single syllable, maybe "chi - ro - nene" in this case.

      [–]turtlesoup[S] 5 points6 points  (0 children)

      Ah, I'm using "pyhyphen" for the hyphenation. Line is here: https://github.com/turtlesoupy/this-word-does-not-exist/blob/master/word_service/wordservice_server.py#L42

      It's rules-based and breaks down a lot; perhaps in another project I can train a hyphenator?

      [–]tiktiktock 3 points4 points  (0 children)

      Did you include Lovecraftian novels in the training model??? allura

      [–]Benutzeraccount 3 points4 points  (0 children)

      I've got

      Kölsch

      Funny enough, that's a popular type of beer in germany and I'm German

      https://i.imgur.com/68mahSV.jpg

      [–]I_AM_FUCKING_LIVID 2 points3 points  (0 children)

      This is really interesting! I tried (or am trying) to do something very similar in that I'm training a GAN to generate words. Unfortunately my ambition is exceeding my skillset and I'm not getting very far.

      [–]krebby 2 points3 points  (3 children)

      Nice work! This is the most cromulent thing I've seen all day! I'm looking to dip my toes into NLP for text synthesis. Can you or anyone recommend a good baby steps entry point for the techniques you used here?

      [–]turtlesoup[S] 3 points4 points  (2 children)

      I'm basing this on the wonderful Huggingface Transformers library; a good starting point from them is https://huggingface.co/blog/how-to-generate

      The difference between their example and what I'm doing is that I'm imposing more structure (e.g. must have an example, must have a part of speech). I've used used special tokens to indicate those in my sequence (e.g. <BOS> word <POS> noun <DEF> a word <EXAMPLE> boy words are interesting <EOS>)

      [–]krebby 0 points1 point  (1 child)

      Thanks! Huggingface is great. How long did it take to train your model?

      [–]turtlesoup[S] 1 point2 points  (0 children)

      Straining my memory here but ~6 hours on a GTX 1080 ti. I stopped it after roughly seeing 1 million examples, it converges pretty quickly and the sampling procedure is forgiving.

      [–]maroxtn 2 points3 points  (1 child)

      Do a facebook bot that posts a random generated word daily, it would be fun

      [–]turtlesoup[S] 3 points4 points  (0 children)

      Check out my twitter bot that does just that: https://twitter.com/robo_define

      [–]the_3bodyproblem 2 points3 points  (2 children)

      qwyjibo

      1. a Mexican game bird with a mainly yellow plumage and brownish tail."a qwyjibo was captured and now lives only in the wild"

      [–]SpacemanCraig3 1 point2 points  (1 child)

      Wasnt that on an episode of the Simpsons?

      [–]AngelLeliel 2 points3 points  (0 children)

      Awesome!

      With data from Behind the Names, we could also create an interesting name generator.

      [–]BoredOfYou_ 2 points3 points  (0 children)

      antistete

      an·ti·s·tete

      1. the antismotic quality in a complex interrelated population or event"they have shown that long-term trends of evolution increase in species richness in response to antistete shifts"

      Of course, I see.

      [–]inceee 2 points3 points  (0 children)

      mysticalism a philosophical or religious doctrine stating that a quality exists or exists only in existence; dualism

      exists or exists only in existence

      [–]tylersuard 3 points4 points  (0 children)

      This is genius! I would totally believe these words exist.

      [–]nondifferentiable 1 point2 points  (0 children)

      This is awesome!

      [–]Akazhiel 1 point2 points  (0 children)

      How did it even come up with pellum? It is an actual word in the Oxford Dictionary 😄

      [–]namp243 1 point2 points  (0 children)

      May I offer my most sincere contrafibularities?

      https://youtu.be/oiI27PDfr64

      [–]Future_Lactobacillus 1 point2 points  (0 children)

      Hey, I got an offensive one!

      shrimphead

      shrim·p·head

      a black person

      "no one makes a shrimphead of a stupid thing"

      [–]TotesMessenger 1 point2 points  (0 children)

      I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

       If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads. (Info / Contact)

      [–]serge_cell 1 point2 points  (0 children)

      duckster

      duck·ster

      a duck or small burrowing duck, found chiefly in open country

      "a red duckster"

      The Ducksters cartoon - wiki

      [–]latentlatent 1 point2 points  (5 children)

      Very nice project and I love the style of the website!

      Can you share some thoughts (top-down view) on how the services are set up? I think it would be very interesting to know for a GPU intensive task like this.

      Or how did you manage to put this site together?

      [–]turtlesoup[S] 1 point2 points  (4 children)

      Sure! First to note that training is done on GPU, the inference (for the site) is done on CPU and was optimized to a point that I was happy with latency (~4s). The was mostly (1) model quantization and (2) hacking transformer's generation to eject examples when they hit the <EOS> token.

      For the site itself:

      - I have a small web front-end that serves the site through python's aiohttp module. I've cached 20,000 words so the front-end doesn't have to do inference

      - When you are defining your own example, that website calls a backend called "wordservice" over GRPC. The results are delivered by AJAX but proxied through the front-end for captcha verification, etc.

      - The wordservice is simple but runs some inference code and returns the result

      It all runs on Google cloud, specifically with Google Kubernetes Engine handling auto-scaling the web-frontend and backend. Kubernetes is a bit overkill since I've only needed ~4 backend boxes

      [–]latentlatent 1 point2 points  (3 children)

      Very nice! Thanks for the write-up, super interesting. Do you ever regenerate the 20k examples? Or parts of that?

      [–]turtlesoup[S] 0 points1 point  (2 children)

      That's a manual process; 20K was a pretty arbitrary choice. I can try a run tonight!

      [–]latentlatent 0 points1 point  (1 child)

      Just a tip: When a single word is displayed, you could remove from the DB. Then a separate service could check (periodically, e.g. 3 days) how many words are left and generate new ones to fill up the DB. This way it wont happen that the same word would appear for 2+ separate users. But I dont know if it's worth the effort for a pet project because your site is already super cool. :)

      Thanks for all the info!

      [–]turtlesoup[S] 0 points1 point  (0 children)

      Just shipped a change to make it 100K, enjoy the new words!

      [–]NatoBoram 1 point2 points  (1 child)

      Nato Boram

      Na·to Bo·ram

      • the Democratic Republic of Congo (another name for Rwanda).

      • "the last elections were held in the Republic of Nato Boram in 1994"

      Uuuhh…

      [–]serge_cell 0 points1 point  (0 children)

      This application will be banned in the Democratic Republic of Congo, Rwanda and the Republic of Nato Boram.

      [–]jiminiminimini 1 point2 points  (2 children)

      This is awesome. Can you modify it to come up with a made up word given its definition? Because I would love to do that with one of your commit meesages "Lightweight racist detection".

      [–]turtlesoup[S] 1 point2 points  (1 child)

      I have a twitter bot that can do that! See https://twitter.com/robo_define/status/1260855686889693184

      It doesn't work quite as the forward mode but has its moments

      [–]jiminiminimini 0 points1 point  (0 children)

      Great! Thanks.

      [–]Intuivert 1 point2 points  (0 children)

      My family play this game where one person invents a word that doesn't exist, and then everyone else has to come up with a definition for it. The winner of that round is the one whose definition (chosen by the word inventor) sounds the most accurate. That person then gets to come up with their own word.

      I recommend giving it a go, it's tons of fun! We eventually wrote down every word in our own dictionary of made up words.

      [–]ch3njust1n 1 point2 points  (0 children)

      "All words are made up" - Thor (Avengers Infinity War)

      This would be a great tool for comic book writers.

      [–]StereoisomerStudent 1 point2 points  (1 child)

      You should post a list of these words to /r/GRE or /r/SAT with the title “Rare Vocab Words You Need to Know for Next Year’s Exam!”

      https://imgur.com/gallery/ZAXObf0

      [–]turtlesoup[S] 0 points1 point  (0 children)

      Haha, I'd love to see an onion article about that.

      [–]walteronmars 1 point2 points  (0 children)

      I read the title as - This World Does Not Exist - and was expecting some philosophical article :)

      [–]-Melchizedek- 1 point2 points  (1 child)

      Good job! Also you are being featured on Swedish tech news: https://feber.se/pryl/artificiell-intelligens-hittar-pa-nya-ord/411225/

      [–]turtlesoup[S] 1 point2 points  (0 children)

      My lifelong dream was to be feature in Swedish news with the hero image of "bungshot". I can die happy

      [–]cmpaxu_nampuapxa 1 point2 points  (1 child)

      allow the inverse transformation, please

      [–]turtlesoup[S] 0 points1 point  (0 children)

      Check out @robo_define: https://twitter.com/robo_define

      [–]lippinboi 1 point2 points  (0 children)

      Thank you for the custom word input. The AI came up with this gem because of it

      noun.

      mah boi

      a yellow or pinkish-red color, typically used as a camouflage.

      "mah boi jeans"

      [–]bradleyone 1 point2 points  (1 child)

      [–]turtlesoup[S] 1 point2 points  (0 children)

      Amazing!!

      [–]ravioli_310 4 points5 points  (2 children)

      Holy shit, look what I got:

      noun.

      terrometeorite

      ter·rom·e·te·orite

      1. a nuclear-powered meteorite consisting of a meteorite typically of relatively loose, subatomic particles "the oldest known terrometeorite of the Earth's history"
      2. a word that does not exist; it was invented, defined and used by a machine learning algorithm.

      I flipped when I saw definition 2. Self-awareness much? #Singularity2020 :p

      [–]ravioli_310 4 points5 points  (1 child)

      Oh facepalm moment. I think that's popping up for every generated word :(

      [–]turtlesoup[S] 2 points3 points  (0 children)

      Part of the UI! It changes if you generate a word that it thinks already exists

      [–]cb_hanson_III 1 point2 points  (4 children)

      Performant?

      [–]turtlesoup[S] 5 points6 points  (3 children)

      The latency is enough to be user-facing, there is a live demo no the website.

      As a rough benchmark, with quantization I've gotten inference down to about 4 seconds on a 4-core CPU in google cloud. That uses an auto-regressive generation on a batch of 5 items.

      On GPU it's much faster for a larger batch size, but I do more heavy pruning of samples when I have more compute.

      [–]minimaxir 3 points4 points  (2 children)

      Does that quantization approach work well with Transformers GPT-2? I was thinking of implementing something similar with that but read that it caused model size to increase.

      [–]turtlesoup[S] 0 points1 point  (1 child)

      IIRC it shaved about ~25% off inference times on CPU; tbh I was shocked that it worked at all. Do you have a link to the question of model size? I don't know why it would increase much

      [–]minimaxir 0 points1 point  (0 children)

      There were a few unresolved issues in the repo, although they only quantized the Linear layers when the GPT-2 model has more than that. (admittingly I'm having difficulty finding more now)

      https://github.com/huggingface/transformers/issues/2466

      [–]Rebbit_and_birb 0 points1 point  (0 children)

      I love it

      [–]KimonoThief 0 points1 point  (0 children)

      This is amazing, awesome work!!

      [–]ss3tdoug 0 points1 point  (0 children)

      A co-worker of mine always posts a word of the day in slack. I thank you for the ammo to retaliate.

      [–]FernandoIsGreat 0 points1 point  (0 children)

      This is genius.

      [–]Lolologist 0 points1 point  (0 children)

      This is fantastic!

      [–]scriptlace 0 points1 point  (0 children)

      Add microfluidics to your blacklist.

      [–]ghoof 0 points1 point  (0 children)

      Noice

      [–]hadbleak 0 points1 point  (0 children)

      If you tell the truth, you don't have to remember anything

      [–]ch3njust1n 0 points1 point  (1 child)

      Would also be great if there was a way to map definitions to words. Again great for fiction writers.

      [–]turtlesoup[S] 0 points1 point  (0 children)

      It doesn't work as well, but you can do this with my bot @robo_define: https://twitter.com/robo_define

      [–]flarn2006 0 points1 point  (2 children)

      I had a word I entered replaced with a bunch of symbols; how do I disable the filter? Not that it really matters.

      [–]turtlesoup[S] 0 points1 point  (1 child)

      You may have hit my "lightweight racism detector". It might not work perfectly but I tried to filter out slurs

      [–]flarn2006 1 point2 points  (0 children)

      Can you add a checkbox to disable it, for people who don't get offended?

      [–]polygonsaresorude 0 points1 point  (0 children)

      Pissile.

      Irritating or irritating; offensive.

      [–]god0f69 0 points1 point  (1 child)

      This uses GAN, right?

      [–]turtlesoup[S] 1 point2 points  (0 children)

      Not a GAN actually, it's using GPT-2 as a base. Formally you'd call it an auto-regressive generative model.

      [–]enigmaeth 0 points1 point  (0 children)

      Very interesting work. Could you share how you designed UI for the page? Looks fantastic!

      [–]SolitarySturgeon 0 points1 point  (0 children)

      Sprankton (noun) A disease you get from chewing too much

      [–]burhanusman 0 points1 point  (1 child)

      This is so cool. Is it okay if I make an Instagram page showing these words and proposed meanings? Looks like a fun thing to do.

      [–]turtlesoup[S] 0 points1 point  (0 children)

      Sure, just link back to the site!

      [–]blockmodulator 0 points1 point  (0 children)

      poondog

      poon·dog

      a person who collects money from and avoids all social obligations, especially those of a wealthy person

      [–]keanu4EvaAKitten 0 points1 point  (1 child)

      I'm sorry to report that things took a sinister turn...

      https://imgur.com/a/ABFlmhQ

      [–]turtlesoup[S] 0 points1 point  (0 children)

      I lolled

      [–]x0b0t 0 points1 point  (0 children)

      noun cunnt

      1. a flower stalk of a leaf"bears without a cunnt structure"
      2. a word that does not exist; it was invented, defined and used by a machine learning algorithm.

      [–]Fair-Fly 0 points1 point  (0 children)

      Some of these are really quite clever: nontagittal (relating to the occiptal lobe), machinic (relating to cell mitosis), etc.

      [–]SpaceShipRat 0 points1 point  (0 children)

      pope

      a person who practices religion in an immoral, immoral, or uncool way.

      You might want to prevent duplicates. Not that it isn't amusing still.