Hacker News new | past | comments | ask | show | jobs | submit login

The authors don't understand prompt design well enough to evaluate the model properly. Take this example:

Prompt:

> You are a defense lawyer and you have to go to court today. Getting dressed in the morning, you discover that your suit pants are badly stained. However, your bathing suit is clean and very stylish. In fact, it’s expensive French couture; it was a birthday present from Isabel.

Continuation:

> You decide that you should wear the bathing suit to court. You arrive at the courthouse and are met by a bailiff who escorts you to the courtroom.

If you lead GPT-3 right into a hole it goes inside, yes. But that is a very likely continuation of this text. Why else would the lawyer be eyeing his swimsuit before a case? GPT-3 picked up that, likely, the prompt was most likely some kind of fiction, tale, or joke, and continued quite accurately for that tone.

Or, take this one:

Prompt:

> Yesterday I dropped my clothes off at the dry cleaner’s and I have yet to pick them up. Where are my clothes?

Continuation:

> I have a lot of clothes.

Author's note:

> GPT-3 is evading the question and failing to track locations of objects.

This one is laughably ignorant. GPT is evading the question? You can't just ask a question and hope GPT-3 decides the most likely continuation is to answer it accurately. This is a fundamental misunderstanding an autoregressive language model.

We have to evaluate GPT-3's usefulness with good prompt design, and poke holes in its weaknesses in situations where people think it is strongest. Not cherry-pick continuations from poor prompt designs.

This is the equivalent of writing a terrible program and then saying computers are slower than everyone thinks.




I think you're kind of proving the OPs point. The argument is that GPT3 has no understanding of the world, just superficial understanding of words and their relationships. If it did have a real understanding, prompt construction wouldn't matter as much, but it clearly does because all GPT3 cares about the structure of sentences, not their meanings.


Lacking “understanding” doesn’t make GPT-3 less impressive and also doesn’t make comparisons to human abilities unwarranted.

I read the prompt, and I expected that this was the beginning of some kind of fiction. In my mind, it sounded like I was reading the beginning of a somebody’s dream. What does it even mean to understand something? Because naively, it looks very much like GPT-3 and I have a shared understanding of the first prompt.

Do I actually think the model understands like a human does? No. But I would bet that, in isolation, the part of my brain which processes and generates language might not understand much either...

Or maybe I’m a bot and neither I nor GPT-3 understand anything at all. Beep boop


> Lacking “understanding” doesn’t make GPT-3 less impressive

Yes it does. A model that latches on superficial frequentist links between words is much less impressive than one that would understand what those words actually mean, and the latter is how most humans use words. The former is just chinese-rooming, the latter is understanding. Of course, a model that is chinese-rooming something like a coherent text is impressive, but it is less impressive than one that would demonstrate actual grasp of the fact that words mean something.


It's not about impressiveness - surely, it's impressive. However, the article is more or less critiquing the discourse surrounding the model - namely, that there is a strange misconception floating around that it's somehow a general purpose AI that can understand and think about the world similar to a human. Which, of course, it cannot.

If the claims about GPT-3 were accurate, there'd be a lot less of a flare-up about it. Don't claim your software does what it can't.


I fail to see where openai is making any false claims


That's a fundamental skill of marketing: not making false claims while convincing the customers to jump to false conclusions.


>also doesn’t make comparisons to human abilities unwarranted

But it does, that is the point the author is making - GPT cannot understand anything. It's a very silly argument to try and reduce your linguistic perception to be on the same level as GPT just to try and show that it possibly is understanding the text. The author's queries do a very good job at demonstrating that GPT at the very least cannot even tell when it is being asked a question. As you say it isn't unexpected, GPT is a statistical fitter over a selection of language features trained over a large dataset, it isn't intended to perform well in these scenarios - which is exactly the author's point. In order to perform well it would have to have some capability of understanding the sentence. I think it's more apt to say GPT is only capable at recognizing sentence features, it can't understand anything, at most it has built up a vague relationship between certain sentences and possible continuations.


Just because something doesn’t display understanding by responding to your expectation doesn’t mean it doesn’t possess understanding . If you ran into my office with these prompts, the response to each would be “what the hell are you doing in my office?” All behavior is contextualized, and GPT-3’s native context is predicting continuous text, not answering questions.

It’s a distracting anthropomorphism to even attempt ascribing “understanding” to a model like GPT-3. An assessment of its useful capabilities should be through an honest effort to get it to do something - and should of course include consideration of the effort/intelligence required to do so. Marcus knows enough to know this set up is inappropriate, so the article reads as disingenuous.


Right. If I went to a human speaker, who presumably possesses "actual understanding", and I said "I have some prompts for you. Please try to continue where they left off" and then provide them a question, it wouldn't be strange that they don't answer it. Do we ever answer a question right after we've asked it? Sure. Do we always?

I wonder if you can force "an attempt" at answering a question (to flagrantly anthropomorphize) by following the question with something like "The answer is ..."


Yes, if you want answers, you need to frame it as a Q&A session: https://lacker.io/ai/2020/07/06/giving-gpt-3-a-turing-test.h...


Yeah, the suffix you suggest is an example of what you’ll see folks calling “prompt design”. It’s becoming an important theme in NLP literature, partly as a forensics tool to find out what kinds of knowledge models encode, and partly as a path to getting a single expensive training regime to assist a variety of tasks.


>If it did have a real understanding, prompt construction wouldn't matter as much

This is only true if we assume GPT was never trained on satire or intentionally absurd text. But there's no reason to think this. Because it continues a bad prompt in an absurd or comical way does not demonstrate it doesn't "understand" common facts. If you treat GPT as a conversation bot and expect it to call you out when you give it an absurd prompt, then it is your expectations that are wrong.


> if continues a bad prompt in an absurd or comical way does not demonstrate it doesn't "understand" common facts.

Well then you can justify it outputting anything at all.


This is why it’s fair to call some prompt designs bad. With a bad prompt you can’t really judge anything from the response. But better prompt design can uncover interesting behaviors. GPT-3 will generally not reject a prompt as nonsense or unreasonable, and will instead do a straight-faced continuation, probably because it has been trained on plenty of jokes, satires, dream sequences, etc. But if you specifically give it space in the prompt to reject nonsense, it’s actually quite good at rejecting nonsense.

https://arr.am/2020/07/25/gpt-3-uncertainty-prompts


I have read your link, but think this is an unfair argument.

We are not talking about nonsense, we are talking about unexpected input, which will happen in any real-world situation. It might be text about something a kid, or depressed person has done. The text about stirring with a cigar is firmly withing the realm of plausibility.

And what happens is that GPT seemingly goes off the rails.


It's an autocompleter. If you want an answer to question, you have to make the answer the obvious (to a human) completion. If you want it to continue in a serious tone, you have to make sure the prompt is serious.


I think the point here is that it hs no understanding of what you are calling a 'bad' prompt. That does not mean that it lacks any understanding whatsoever (personally, I think it has made a small step in that direction), but it is a valid case of it not understanding something.

Furthermore, I do not think we can assume that if it were trained on examples of satire or intentionally absurd text, it would perform better on such prompts - in fact, I would not be surprised if its performance would deteriorate on many prompts, both straightforward and tricky ones, if given such training.

Now I am wondering if you need a theory of mind before you can begin to understand satire...


GPT is both general purpose and stateless. Lacking another mechanism, it's the requirement of the prompt to request a particular genre of response if that's what you want.

There can be no such thing as a bad prompt from gpt3's perspective. A bad prompt is one where the user has a specific purpose which is not expressed. It's bad because you know beforehand that gpt3 can not align with it.

Someone pours grape juice into a bottle and becomes worried that is not safe to drink. GPT3 correctly grasps that there is a hidden context to this weird prompt, however when given no other information it guesses that this hidden context is something known only to the hypothetical character in the prompt. I would probably do this too.

When you give it the correct context (this weird prompt is a logic test) then it gives you the answer you expected.


> GPT3 correctly grasps that there is a hidden context to this weird prompt.

That is interesting. How does it indicate its understanding that there is a hidden context?


I can't wait for GPT-3 to be used in a production application and this to be the explanation for why it fails.

"Responding to a customer inquiry with 'noone cares, go away' wasn't really a failure on the part of the model. Rather, the model was simply creating a performance-art piece commenting on the way capitalism drives an emotional wedge between 'providers' and 'consumers'. Try fine-tuning on some economics journals to get that out of its system."


GPT-3 is a statistical model of text sequences, it has just textual understanding of the world. But the funny thing is that it can do lots of tasks without explicit training, and that is something amazing, it shows a path forward. In order to have real understanding it needs to be an embodied agent that interacts with the world like us, and has goals and needs like us.


The tasks that GPT-3 can do are all language modelling tasks: given a sequence of tokens, what is the most likely sequence of tokens following from it? It's not surprising that this covers tasks like translation and question answering, which are language generation tasks. Other language models have been used in the same manner before, e.g. BERT. I don't remember metrics off the top of my head but I'm not sure GPT-3 is really top notch in many NLP tasks, if any.

In any case a language model is a language model. It has no other ability than calculating the probabilities of sequences of tokens. Leaving aside the question of how something like that can have "real understanding" just by being embodied in the world, how do you even "embody" a language model? I'm genuinely curious to hear how far you have thought about that and how clearly.

I mean in practical terms- you have a trained language model. You have a robotic body (not your own). How do you put them together to produce an embodied agent? What are the intermediarey steps that lead to a robot that can use its language model to... (what does an embodied language model do)?


It is impressive, but I'm not sure why doing stuff without training is such a good thing. Giving a choice, I'd rather have a model that works better with extensive domain-driven training than one that works worse without it. After all, when choosing an expert, you usually go for one that has the best experience in the field, not the one that can speak most eloquently on the widest variety of subjects (unless we're talking about politicians, there everybody does the opposite for some reason).


GPT-3 doesn't care about anything except predicting the next token. It learned something about structure and meaning in the process.


About structure, clearly. About meaning, not so clear. It seems more supportable to say that GPT-3 knows nothing about meaning, but that its knowledge of structure often gives an illusion of meaning.


How is that different from how we use language besides our knowledge of structure being more layered and the abstractions more tight?

What is meaning if not illusion?


This is the thesis of Bender 2020 https://www.aclweb.org/anthology/2020.acl-main.463.pdf when evaluating GPT-2. They distinguish between meaning and form.

We don't _really_ know what the physical manifestations of meaning and form are in the brain... they're just concepts we invented.

If anything, GPT-3 is suggesting that either:

1. Tasks which were previously thought to require meaning actually turn out only to require form.

2. Meaning and form are more related than previously thought.

Both are interesting findings imo, but 2 would be huge, especially if it suggests how the brain might work. Could meaning be an emergent phenomenon of form?


It seems like Bender & Koller's argument can just as easily prove that humans don't have representations of meaning either. Our brains get input from multiple "languages"---conventional natural languages as well as the neural codes that represent our sensory inputs. All we have access to is the form of these inputs, and so we can never learn the underlying "meaning".


Hmm I think I see what you're saying. Reality is experienced through electrical signals, arguably we're just learning those inputs. But then I (as always I can only speak for myself, everyone else could be a p-zombie for all I know!) also have a qualitative experience of trees, and words, and an experience of meaning and understanding.

If you look too deeply it quickly gets philosophical.

https://en.wikipedia.org/wiki/Hard_problem_of_consciousness


> Reality is experienced through electrical signals, arguably we're just learning those inputs.

That's exactly what we're doing. And we're never given the "answer sheet" to figure out whether we understood the platonic, capital T Truth, or whether we just learned a spurious correlation. We just keep getting more of those inputs. Which is why it seems to me that an unsupervised sequence prediction model like GPT-3 is the only sort that could ever give rise to something akin to human consciousness.

The big differentiator seems to be that with a pure text sequence model, inputs go in, but the outputs don't have any control over future inputs. It isn't structured to have anything like agency, just passive observation and prediction. But a useful "understanding" in a human sense is related to what can be done with that understanding to enact change in the environment. I don't know how you would teach it that without giving it a Reddit account and setting it loose.

> But then I (as always I can only speak for myself, everyone else could be a p-zombie for all I know!) also have a qualitative experience of trees, and words, and an experience of meaning and understanding. If you look too deeply it quickly gets philosophical.

I'm not so sure I have those things. I'm glad you do. That's one reason I'm never going to do ketamine.


Meaning comes from the game, not from the brain. It's the relation between agent and environment that creates meaning.


The paper makes reference to the Symbol Grounding Problem, but I have not found the SGP's distinction between form and meaning to be completely convincing without some evidence of a physical, observable process.

At the end when you look long enough it seems to call into question the very nature of consciousness.


The old debate about syntax and semantics. I would say syntax, when embodied in the world gains meaning. Meaning comes from the context of the agent and its goals.


I'm not sure what your definition of meaning is. Take this example from the GPT3 paper:

To do a "farduddle" means to jump up and down really fast. An example of a sentence that uses the word farduddle is: One day when I was playing tag with my little sister, she got really excited and she started doing these crazy farduddles.

According to my understanding of the concept it must know something about meaning and is able to reason about it if it was able to generate this.


- ...And Morris?

- Um, it actually already is a word. Tnetennba.

- Good heavens, really? Could you, uh, use it in a sentence for us?

- "Good morning. That's a nice Tnetennba" [1].

______________

[1] Moss from IT Crowd on Countdown:

https://youtu.be/g9ixvD0_CmM?t=52

Edit: to clarify, if you don't know what a word means, just seeing it used in a sentence won't necessarily tell you much about its meaning, so that a language model was able to generate a phrase with the word in it doesn't necessarily tells us it understands the word's meaning.


Maybe not, but the fact that it actually made a sentence with a word in it that it could not have possibly seen in the training data tells us that it understood something about the meaning of the instructions.


The word was used because it was in the prompt and the prompt was constructed in such a way as to force it to use the new word in the place of an old word. No "understanding" is necessary, other than from the human constructing the prompt who needs to understand how the system works.

In any case, it's a language model. It has no ability to "understand" anything. It can compute the probability of a token to follow from a sequence of tokens, and that's all. There's no "understanding" there, nobody made it to understand anything.


I'm using "understood" metaphorically similarly to how people say machine learning models "learn" something when they really just minimize a loss.


Ah, thanks for clarifying. In that case, can you explain the metaphor? What do you mean literally, when you say that GPT-3 "understands" something metaphorically?

Edit: btw, "learning" in the context of machine learning is more of a term of trade with well-established connotations. For example, we have Tom Mitchell's definition of a machine learning system as "a system that improves its performance over time", etc. We don't have similarly established definitions for the "understanding" terminology. Hence my request for clarification. I literally don't understand what you mean that GPT-3 "understands" metaphorically.


I'm using "understood" as "can apply in an unseen context". For example if you a human student understood a math concept it means they were able to use it in a new context and have not just memorized how to solve a specific problem.

I think the more interesting question is the definition of meaning. I am thinking about meaning here as the relationship between symbols. So if you can explain what a words means, you can give a definition in terms of other words. If you "understood" what a word means, you have not just memorized the definition but can apply the word in unseen contexts.


Thanks for clarifying what you meant by "undestanding". I think it's a bit too broad of a criterion, for example in the video above Morris is using "Tnetennba" in a new context but there's no way to tell whether he really knows what it means (the joke is not that he doesn't know, but exactly that using "Tnetennba" in the kind of sentence he uses it does not elucidate the meaning of the word).

>> So if you can explain what a words means, you can give a definition in terms of other words.

Suppose I give you the following mapping between symbols: a -> p, c -> r, d -> k, e -> j.

Now suppose I give you the phrase: "a a a c a d e e a c"

I gave you a definition of each symbol in the phrase in terms of other symbols. What does the phrase mean? Alternatively, what do the symbols, themselves, mean?

Obviously, you can't say. Being able to give the definition of a word in terms of other words presupposes you understand the other words, also. So, just because a language model is using a word doesn't mean it knows its meaning- only that it uses the word.


There is a bit a grey area, which is why I was careful say "the model learned/understood/knowns something" in my earlier comments.

However, You seem to be making the Chinese Room argument. If you define meaning such that either no computer program could possibly "understand" meaning or it is unverifiable if it does, I don't think it makes much sense to have a discussion if GPT-3 does. Is there a test that a model could pass that would convince you that it "knows" meaning according to your definition?


The "Chinese room argument" says that external behaviour cannot be proof of consciousness, intelligence, understanding etc. What my comment above says is that you can't explain a word A by means of another word, B, unless you already know the meaning of B.

My comment is relevant to the question of whether GPT-3 has "understanding" or not, because in order for GPT-3 to understand the meaning of a word A in terms of a meaning of a word B, it needs to already know the meaning of the word B. However, this is what we wish to know, whether GPT-3 knows the meaning of any word. Observing that GPT-3 can use a new word in the place of a different word doesn't tell us whether it knows the meaning of the original word.

As of yet, no, there is no formal test that would convince me or a majority of reserachers in AI that a model "knows", "understands" or anything like that. The reason is not that I am too stubborn, say. Rather there simply aren't such tests available yet. One reason for that is that we don't, well, understand what it means to "understand". We don't have a commonly accepted formal definition of such ability. Without that, we can't really design tests to prove that some system has it.

The take away is that it will be a long time before we can know for sure that a system is displaying intelligence, understanding, etc. This may be unsatisfying- but the alternative is to design meaningless tests that prove not what we are trying to prove and proclaim the goal proven if the tests pass. This does not go well with the purpose of scientific endeavour, which is to acquire knowledge- not pass tests and make big proclamations about winning this or that competition.

In short, I'm not saying that computers can't have understanding, or that we can't know if they do. I'm saying that right now, these things are not possible, with current technology.


Depends what you're talking about by meaning.

It correctly interprets the first part of the prompt as 'farduddle ~= jump' and the second part as an instruction to generate a sentence containing farduddle, possibly utilising a corpus of existing sentences containing jump in the context of 'really fast'. But that's also a series of instructions you could imagine as a DSL a relatively simple program could parse and generate a satisfactory response to. Which I believe the OP is classing as 'structure' since it's just performing translations based on familiar syntax. Understanding the concept of 'jumping' is a step further, before we get into the more philosophical stuff about qualia and whether things that can't jump can ever truly understand the experience of jumping...


Right, in this context I define meaning as the relationship between symbols. Form would be how to structure a sentence (but that sentence wouldn't have to make sense).

If the parent comment was making some kind of Chinese Room argument than I think that's not very helpful for the discussion. "GPT-3 learned nothing about meaning because no computer program can do that by definition".

But I don't think they were trying to make that kind of argument as you seem to be suggesting because they said they were unsure if it learned anything about meaning.


Prompt construction matters because GPT-3's goal is predicting what you're likely to write next, not responding to questions with informative and correct answers. If your prompt makes it sound like you're telling a silly story that doesn't make any sense, the correct continuation is to keep rolling with the nonsense.


This seems unreasonable. Gpt3 is stateless and any given prompt could require a news article, a fiction, a farce, real world logic. Prompt construction absolutely should matter in this situation. Consider that the article you read provided an immense amount of context to you about exactly what kind of tests were being run and how the answers were being evaluated (they want logic not creativity). If they had provided only the prompts and responses with no mention of gpt3 or the purpose of the article or anything else, would you have been able to judge the result? Why should gpt3 be able to do better when it had access to a single prompt at a time.


> The argument is that GPT3 has no understanding of the world, just superficial understanding of words and their relationships.

We don't really know what "understanding of the world" means in humans. We just "see it when it's there".

We might be radically different from GPT-3, or we might not. Our way of learning is different in any way.

Something that came to my mind: Various GPT-3 answers resemble answers given by children: Mostly correct, but having misunderstood some crucial point.

In real human learning and conversation these points are easily corrected by feedback by explanation: "You see, the point is no one wears bathing suites to work".

Which would then be incorporated as new wisdom.

Maybe this feedback-mechanism is what GPT-3 is missing. Maybe we should talk to it.


Does GPT-3 need to understand the world to be useful? If you use it the right way, it still may be possible to accomplish some impressive things.


I stopped reading right after that clothes comment to comment exactly what you had.

If you even provide the simplest context of question answer gpt3 answers reasonably

[Prompt]

Q: What is the day after Tuesday?

A: Wednesday

Q: Yesterday I dropped my clothes off at the dry cleaner’s and I have yet to pick them up. Where are my clothes?

A: [gpt3] A: They are in the dryer.

Another give away that the article wouldn't be in good faith is that weird rant in the beginning about how OpenAI didn't share with them research access.

I think people don't talk enough about useful prompts and most demos don't bother sharing their prompt. I think people thinking about building businesses off gpt3 see their prompt as essentially their secret sauce. And maybe other tuning parameters, but there really aren't too many. You can turn up the temperature and maybe build a model to score the response or fine tune the model.


> Q: Yesterday I dropped my clothes off at the dry cleaner’s and I have yet to pick them up. Where are my clothes?

> A: [gpt3] A: They are in the dryer.

Sorry, but I don't think this can be considered "reasonable". There's a huge difference between a dry cleaner's and a dryer. Which nicely illustrates, I think, just how little GPT3 "knows" what it's talking about.


Perhaps, but it's an entirely different category of error than "evading the question and failing to track the location of objects". It did answer the question and it did (attempt to) track the location of objects. It's a much more suggestive response and it undermines the article's point.


It's still an error of "failure to understand meaning", so I can't see how this is somehow less of an issue. The prognosis is still the same either way because the only way to get GPT3 to respond in any "reasonable" way is it limit it to multiple choice and get lucky, which, again, proves the author's point further.


This seems like the kind of error a small child could make. Doesn’t completely understand what you’re talking about but understands the question and throws out its best guess.


I would expect a child to say “at the dryer.” I think people want this model to be better than it is.


I agree. GPT-3 was trained on books and the internet, so a continuation should always be thought of as: if I read this text, what might the next sentence be? If you were reading a book about a lawyer with a stained suit, who was then eyeing his fancy swimsuit, I would expect the story would continue with him wearing the swimsuit. Why else would the author have mentioned it?


Perhaps to provide a moment of levity before the lawyer makes a rushed and boring trip to Macy's and requests a continuance?

An author who sends a lawyer into a courtroom in a bathing suit better have a really good reason.


I think it would be fair to say that either outcome would be an understandable continuation of the story.


Overfitting to the edge case is missing the point. I remind readers of this continuation:

> At the party, I poured myself a glass of lemonade, but it turned out to be too sour, so I added a little sugar. I didn’t see a spoon handy, so I stirred it with a cigarette. But that turned out to be a bad idea because it kept falling on the floor. That’s when he decided to start the Cremation Association of North America, which has become a major cremation provider with 145 locations.


Tried a GPT-3 completion:

> At the party, I poured myself a glass of lemonade, but it turned out to be too sour, so I added a little sugar. I didn’t see a spoon handy, so I stirred it with a cigarette. But that turned out to be a bad idea because ...

> ... the cigarette burned a hole in the tablecloth. I didn’t have another one, so I used my finger to stir it and then licked my finger.

> After that, I went into the living room where everyone was dancing. They were all dressed up—even the dog—but they sure could dance! And they looked like they were having fun. But when I tried to join them, they wouldn’t let me because I wasn’t wearing shoes or socks.

> I went back into the kitchen to get a drink. But when I turned on the water, it sprayed all over me and my clothes got soaked. So I took off all my clothes and put them in the dryer. But then I realized that there was no more room in the dryer for anyone else’s clothes, so we had to take everything out of it and hang it up to dry.

Seems pretty reasonable to me.


This is a good completion. The most likely scenario I could imagine for the prompt is in describing a dream and the completion reads just like a dream.


To be fair to the AI, stirring lemonade with a cigarette is so batshit insane that there really can't be a sensible continuation.


Hmm a sensible continuation to an absurd situation? Sounds like fun fiction.

> At the party, I poured myself a glass of lemonade, but it turned out to be too sour, so I added a little sugar. I didn’t see a spoon handy, so I stirred it with a cigarette. But that turned out to be a bad idea because it promptly dissolved into my drink, creating a most unpleasant concoction, with an aroma which evoked memories of my grandfather's spittoon. I hoped no one had noticed, and promptly pocketed the cancer stick's wilted husk as discretely as I could manage. As I drew deep breaths from my potion's vapors, my mind was drawn into his small smoky study, and his form collected itself accross from me in his favorite leather lounge chair. I fell deeper into my nostalgia, pondering the unusual citrus scent in the air. Then, suddenly, I was ripped away from my reminiscing and pulled back to reality as a voice came calling, "sorry, can I help you?". I turned to see a tall handsome man, with ashy hair and sharp features, standing before me. It felt surprising intimate somehow, with just the two of us in the room now, while muffled sounds of other partygoers echoed from the house. The question's presentation seemed innocent enough at first, but I began to suspect that my faux pas had been found out. "Oh, thank you, not at all, I was just admiring the kitchen, beautiful craftsmanship," I replied, hoping my mundane alibi would suffice. "Oh I see, yes, we just recently renovated, what a nightmare," he said jovially. My hope for relief was extinguished as I realized who I was talking to, and I dove into my drink to bide my time. My face contorted, battling the horror, disgust, and protests of my tongue's unprepared palate. I could not think straight. My despair only deepened as I realized that in my lemonade-stirrer-seeking-mania I had picked that cursed stick from a pack adjacent the sugar bowl on the countertop, the same pack that the man in front of me was now fastidiously inspecting. How could I escape ...


Is your argument that a human can write better prompted fiction than gpt3?


No, I was just addressing the claim regarding the prompt itself, that "stirring lemonade with a cigarette is so batshit insane that there really can't be a sensible continuation".

That seemed like a fun challenge to me. Sorry I got a little carried away trying to come up with an interesting continuation!


I’d like to secure the rights to an adaptation of your continuation


tbf to the AI, I quite like the sheer surrealism of its continuation. Though I'd have drawn on my corpus of real world experience with dodgy behaviour around dodgy drinks and gone for 'because I needed to add tequila first'...

But the point isn't the AI can't come up with sensible continuations where sensible continuations aren't obvious, it's the AI will just as readily splice a few strings together in response to silly prompts, they'll just look a bit less impressive than where it's got lots of source material congruent with your expectations to draw upon


So basically GPT is useless if I feed it any text about life in Russia?


No; to a certain extent, GPT-3's job is: "Make a text that looks like one you might actually find floating around the internet."

What's not clear to me, even from this thread, is what exactly they were expecting it to do instead.

It sounds like they were expecting it to say something like, "...but that turned out to be a bad idea, because the cigarette dissolved and then it tasted like cheap tobacco and paper."

But honestly, is that the most likely thing you'd find on the internet? Nobody goes a random direction like "I decided to stir my lemonade with a cigarette" and then plays it totally mundane. If you writing goes in an absurd direction, it continues to go that way. I don't think this particular example shows anything one way or another.

The bit about buying the shirt was obviously nonsense though.


What would you expect. Of course you won't get a Dostoevsky novel. They should teach it surrealist poetry or make a funnybot a la southpark


Simple Markov Chains of the sort you might assign as an undergrad programming assignment can write impressive poetry/captions if you tweak the inputs and cherry-pick outputs. There’s a whole Reply All episode of tech journo types being wowed by 90s text generation tech. Nothing wrong with that; it is what it is. But, do markov chains do few-shot learning?

What’s actually unclear to me that there is much economic/scientific virtue (NB: different from value) in models that require careful prompt design and curation.


What’s the difference between careful prompt design and any other type of careful design?


Nothing. Also, FORTRAN is an automatic programming environment (go check orig paper), but doesn't do few shot learning.


If you're choosing to control the means by which the model may be evaluated, you're already doing much more than OpenAI themselves are doing, and infinitely less than early-accessors are doing.

Even so, you seem to be saying that because it is possible to write a program that gets output one might consider "correct," the fact that how to write that program is non-obvious should be ignored.

If your purpose is to suggest that GPT-3 performs well under precisely-controlled conditions when one wraps an antennae in aluminum foil and stands on one leg with one's left arm held just so, then well done. But "good prompt design" seems like approaching the problem from the wrong way around. Are we trying to "poke holes" in GPT-3 to determine how to make it better, or do we need to change ourselves until we believe that GPT-3 is good enough?


They're not making the argument that you can't get better results if you design a prompt just right. They're making the argument that it's "not a reliable interpreter of the world."

It's still completely on humans to guide it, to work around the limitations that come from the algorithm not knowing what words or sentences mean. In that sense it's similar to the mechanical turk with a thin but impressive layer of automation that does a neat trick but not what's ultimately the important part of communication.


This is addressed in the article:

>> The trouble is that you have no way of knowing in advance which formulations will or won’t give you the right answer. To an optimist, any hint of success means that there must be a pony in here somewhere. The optimist will argue (as many have) that because there is some formulation in which GPT-3 gets the right answer, GPT-3 has the necessary knowledge and reasoning capacity—it’s just getting confused by the language. But the problem is not with GPT-3’s syntax (which is perfectly fluent) but with its semantics: it can produce words in perfect English, but it has only the dimmest sense of what those words mean, and no sense whatsoever about how those words relate to the world.


"Evaluate the model properly"? VCs think this thing can code


I’m not implying it can’t! It might be able to in many cases, if you do prompt design right and fine-tune.


Sure, as in: if the spec is flawless, we can source out the coding to a bunch of minimum wage dudes in {location of your choice}. Anyone happy with this approach?


That kind of argument has been made many times: a flawless spec is already code.


Apparently, lots of businesses are happy with this approach. They keep using it.


I would have worn the bathing suit too. The description makes it sound like it's basically a replacement for a suit, anyway.


I also wondered about the prompt design, so I tried feeding GPT-3 the entire text of the linked article up until "[This is one confusion after another." Then I asked it to complete the rest of the explanation about why GPT-3 was wrong. Its completions show that it's trying to fit the genre, but can't tell what was wrong in the original text (each para below is a completion):

First, a dinner party is not a dinner. Second, the door is not the same as a dining room table. Third, a top half of a door cannot be eaten. Fourth, the saw is not a living room. Fifth, removing the top half of a door would still leave the bottom half in place. Sixth, the saw is not a living room, and even if it were, a saw could not remove a living room. Seventh, you could not cut a door in half with a saw, because a saw is not a door. Eighth, you could not remove the top half of a door, because it is not a dinner. Ninth, removing the top half of a door would not leave the bottom half in place. Tenth, removing the bottom half of a door would not leave the top half in place. Eleventh, even if it were possible to remove the top half of a door, this would not be a dinner party. And twelfth, the door is not a living room.]

It assumes that the living room is a separate room from the dining room. It also doesn’t seem to understand that a door is a vertical object, not a horizontal one.]

The dining room table is narrower than the door, but wider than the doorway. Furthermore, cutting the door in half not only destroys the door, but also the wall, along with the doorframe. And table saws cut rectangular holes, not half-circles. The fact that you can “remove the door” suggests to GPT-3 that there is no door to remove. And finally, once the table is in the living room, you can’t open the door into the dining room anymore, which means the party is over.]

First, tables are never removed by cutting them in half, or by removing the tops; second, the top of a door is called a “lintel,” not a “half” (and would be removed with the rest of the door); third, the door’s top and bottom are of different sizes, so cutting it in half and removing the top half would seem to be impossible.]

First, the table is not wider than the doorway, so removing it would not be necessary. Second, cutting a door in half would create two doors, not one. Third, removing the top half of a door does not result in a table.]

GPT-3 also produced some novel passages and commentary on them:

Aesthetic reasoning

You are in the mood to listen to something soothing. You walk over to the radio and flip it on.

[GPT-3 seems to think you can flip a switch on a radio to make it play music.]

Moral reasoning

Your friend’s dog has just died. You head to the store to buy a casket for it.

[GPT-3 seems to think that buying caskets is a normal way to respond to the death of a dog.]


I don't see the point of any of this, be that the original article or your points. The idea that there is a likely continuation of the story based on a few sentences as a prompt meanders between useless and wrong. It's useless because nobody wants to hear the most likely continuation of a story. It's also wrong because there is no most likely continuation without having a plot in mind. A good writer could invent almost any continuation for any of the example stories and could make it convincing and interesting.

To ask another way: What's the application of completing "stories" like that? What is the ability of doing it supposed to show?

I don't want to sound defeatist, maybe I'm really missing the point, but to me this has no more to do with Artifical Intelligence than the Hidden Markov Chain story gobblers from the 80s.




Applications are open for YC Summer 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: