×
all 27 comments

[–]MaxTalanov 5 points6 points  (3 children)

I don't think it's apocryphal. Everything about it is plausible. Multilayer perceptrons were available at the time. This is exactly the kind of project that DARPA would have been working on.

The failure mode described is also very plausible and too specific to be made up. It would be surprising if something like this didn't happen.

[–]gwern[S] 6 points7 points  (2 children)

The failure mode described is also very plausible and too specific to be made up.

By that logic, the wolf/dog detector story really happened, and the 3 Wise Kings really did have a dozen different names, ruled multiple kingdoms each, and died in multiple ways - "it's too detailed to be made up!" Yeah, it's detailed... just like all the other urban legends and myths. And of course it's at least somewhat plausible, otherwise no one would tell it. (By the way, which version, exactly, do you find plausible? brightness, photographic film, length of shadows, type of camera, field vs forest, what? And was it 100 or 200 total photographs, exactly, and what ratio were they split up with? If the specific details indicate the story is true, shouldn't they at least agree?)

[–]MaxTalanov 3 points4 points  (1 child)

We have plenty of evidence that DOD was working on this exact problem (tank detection from small images) in the 1980s, with neural networks. They published some of it in the 1990s and 2000s.

The failure described is exactly how you would expect the first iteration of the project to go down. They didn't report on it (it was a failure) and kept iterating, with better evaluation procedures. Meanwhile the anecdote leaked discreetly, from lectures of professors involved in the project.

[–]gwern[S] 6 points7 points  (0 children)

That doesn't address my point: specific detail is not a sign of authenticity when it emerges spontaneously, in mutually-contradictory versions, without any paper trail or findable source, but is a very well documented and classic sign of a myth/urban legend/religion and a strong indicator that the story is false. (In real histories, incidents become simpler and less detailed in later versions as information gets lost or summarized.) You are ducking my question about which 'specific detail' you find so compelling and such proof of truth, and you're raising even more problems: even discounting Fredkin putting it in the early 1960s entirely, this story is already in circulation by at least 1992 with Dreyfus, who himself describes it as being done long before ("one of connectionism's first applications...in the early days..."), so 1990s/2000s is way too late - not that any of those publications describe a bona fide tank incident in the first place.

The failure described is exactly how you would expect the first iteration of the project to go down.

No, I wouldn't. I would expect it to simply fail to beat the competing methods, which is how neural net failures usually happened: people train them for weeks on their weak mainframes and they never converge or do well despite a frustrated researcher tweaking hyperparameters and watching the loss print out and eventually swearing off perceptrons as useless and deeper nets as untrainable. 'Discovering one weird trick' is not how it usually goes.

[–]fldwiooiu 4 points5 points  (9 children)

maybe it didn't but the thrust of the article is off - anyone who's ever trained a neural network on a small dataset knows how easy it is to learn an irrelevant feature like a biased mean value. the story is incredibly plausible.

[–]gwern[S] 5 points6 points  (8 children)

If it's so common, then it should be very easy to instead start telling real instances that actually happened of systems getting into the wild instead of ones that never did...

I also point out that modern CNN training practices do a lot to mitigate this: CNNs have built in biases towards meaningful features, we use large datasets in the first place, and when we can't we use data augmentation and transfer learning to effectively train on much much larger dataset, etc.

[–]fldwiooiu 2 points3 points  (3 children)

how do you know it never did? did you expect to find a 1970s DOD publication of an (embarrassing) failed trial? anecdotes from greybeards who plausibly could have heard the story are the best you can hope for...

[–]gwern[S] 5 points6 points  (0 children)

I do, yes, especially when we have an eyewitness account from Fredkin who apparently started it, we have the researcher whose work is most close to it decades later describing it as successful and explaining why he didn't follow up on it, and so on. It fits every last characteristic of an urban legend, I have explained how it came into existence and why it evolved in the way it did, I have tracked down just about every source mentioning it, the people who claim it happened do not have reliable sources for their knowledge and the stories contradict each other on every point, and so on. What more do you want? How could it look any more fake? Would I have to interview every researcher who ever worked on NNs to prove it to your satisfaction? Do I have to do a FOIA request on Kanal's Army contract or something?

[–]phobrain 0 points1 point  (0 children)

I was in one of the tanks, and I'm sure they never saw us. рад помочь.

[–]fldwiooiu 0 points1 point  (3 children)

transfer learning only really works on stuff that resembles imagnet. not applicable to many truly interesting applications..

the real lesson from the story is DO PROPER VALIDATION. but it's real easy to fool yourself.

[–]gwern[S] 3 points4 points  (2 children)

transfer learning only really works on stuff that resembles imagnet.

That's not true. Consider "Universal representations: The missing link between faces, text, planktons, and cat breeds", Bilen & Vedaldi 2017. You can get transfer between very separated domains.

[–]shortscience_dot_org 1 point2 points  (1 child)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

http://www.shortscience.org/paper?bibtexKey=journals/corr/1701.07275

Summary Preview:

This paper is about transfer learning for computer vision tasks.

Contributions

  • Before this paper, people focused on similar datasets (e.g. ImageNet-like images) or even the same dataset but a different task (classification -> segmentation). This paper, they look at extremely different dataset (ImageNet-like vs text) but only one task (classification). They show that all layers can be shared (including the last classification layer) between datasets such as MNIST and CIFAR-10

  • Normalizing ...

[–]kjearns 3 points4 points  (0 children)

This is not a good bot. Please stop posting.

[–]SemaphoreBingo 1 point2 points  (3 children)

The author writes: " Tanks must not have been a major focus of early NN research in general - for example, Schmidhuber’s history does not mention tanks at all."

From a quick check of Schmidhuber I don't really see any applications mentioned at all, and no mentions whatsoever of 'military' or 'defense', which makes me think that this is not much of a point.

[–]gwern[S] 2 points3 points  (2 children)

I also didn't find many hits otherwise, which was why I checked Schmidhuber in the first place; I was surprised at how many papers had been published on neural nets for stirring tanks, as in, chemical vats and the like, and relatively few on military tanks, and wondered if I had missed some high profile or important ones. Schmidhuber does mention applications. (I jumped to a random page in the middle, pg30, and just there he discusses backgammon, Atari, and other games; applications, surely.) And, even if he doesn't highlight it in the body, keywords like 'tank' can show up in the extensive bibliography in the titles of papers or journals.

[–]SemaphoreBingo 0 points1 point  (1 child)

Rosenblatt tells Kanal that the results ended up being classified, so of course it wouldn't have been published.

Here's a potential timeline. These facts are known: 1. Kanal et al present work at a 1963 talk 1a. Fredkin points out the possibility of cloudy/sunny days 2. Not long after, the work becomes classified.
3. In a 1991 interview, Kanal does not mention problems generalizing.

Now here's some potential explanations: 1b. Kanal improves the model / retrains on additional data 1c. Now the results generalize, leading to (2) 3a. 27 years after the fact, Kanal has simply forgotten the early problems with tank classification.

[–]gwern[S] 0 points1 point  (0 children)

Rosenblatt tells Kanal that the results ended up being classified, so of course it wouldn't have been published.

Not all the results were classified, obviously, as I provide fulltext of Kanal & Randall 1964, and Kanal feels free to discuss it and provide citations in the 1990s. Only the research done later in the program after they had already published a few papers and perhaps gone to conferences or given talks as the publication venues indicate they did. It would be nice if I could put Kanal or Randall and Ed Fredkin in the same conference or same university sometime 1960-1964, I think that would clinch the case for the true origin of the story being Fredkin's possibility being transformed into a actuality by the game of telephone. To claim that perhaps the failure was found only after the classification would then require quite a series of coincidences: Fredkin has to independently stumble on the flaw and the flaw has to evade detection for years while they develop it & give talks on it and it has to be exposed just at the wrong time to be classified and Kanal has to seriously lie to his readers about it being a success decades later when he has little reason to and the tank story has to leak out from classification rather than Fredkin's question and Fredkin has to not know about the failure under classification in order to conclude his question was the origin of the story. As opposed to just, Kanal is being honest, Fredkin's question got turned into a story people can't resist telling in a perfectly ordinary case of the sort of thing you see on Snopes all the time, and the classification was unlucky. Much more parsimonious.

[–]HamSession 1 point2 points  (3 children)

Even if the story is apocryphal it presents a good lesson to students to examine their models using the scientific method. Assume your output is not remarkable and vary the input to ensure the effect remains.

[–]jmmcd 1 point2 points  (0 children)

Did you read the section on this exact issue in the article?

Also u/MarxUser55

[–]MarxSoul55 0 points1 point  (0 children)

My reaction is similar to yours.

It matters not whether the story is true. It's more about the lesson it's trying to teach: We cannot expect models to perform well on a different problem-domain than what it was trained on.

In addition, the daytime-nighttime thing, alone, is a good thing to bring up IMO. I'm not sure how many pictorial datasets allocate a meaningful amount of data to the nighttime-domain, but I doubt it'd be a lot since humans are mostly active during the day. For example, I'm looking at Kaggle's "cats v dogs" dataset and almost no pictures are taken in low-light conditions.

I think nighttime pictures are important if ML becomes advanced enough to create an AGI (and honestly I'm not sure what else besides ML could potentially create AGI, but that's another discussion). It frustrates me to think about a robotic worker that suddenly becomes inept after the clock strikes sundown.

[–]cdrwolfe 0 points1 point  (0 children)

Being new to this, the dog v wolf anecdote made me curious. I have read a bit about data augmentation of training sets to provide more samples to train against, I assume to aid generalisation etc. Regarding trying to learn the difference between a dog and a wolf, do researchers look to overcome such a situation by extracting the feature they wish to identify ie cut out the wolf and replace the dog in a grassy field with it etc, or simply have the wolf isolated I a completely white background.

Or is that just a step / hassle to far 🙂

[–]TheFML 1 point2 points  (1 child)

http://iphome.hhi.de/samek/pdf/BinICML16.pdf page 3, and more generally http://www.heatmapping.org/ is a good starting point.

that tank anecdote seems like an incredibly narrow topic to focus on. don't mean to be rude, but I assume you'd probably be better off using your time to do research on more important problems.

[–]gwern[S] 3 points4 points  (0 children)

that tank anecdote seems like an incredibly narrow topic to focus on. don't mean to be rude, but I assume you'd probably be better off using your time to do research on more important problems.

It is narrow, agreed, but I did most of the work way back in 2011; I considered the story as debunked as necessary until I saw it being given pride of place in a New York Times story a few days ago. That was the last straw. (And as you can see here, people still want to believe in it.)

In any case, I've been meaning to compile examples of reward hacking and this provides a logical home for them.

[–]UmamiSalami 0 points1 point  (1 child)

The idea of distinguishing friendly from enemy tanks with an NN on the battlefield is simply ridiculous. It is possible for a human to learn how to identify different types of military vehicles by looking at them, but it's harder than differentiating a dog from a muffin, much harder when you take distance and battlefield conditions into account, and still harder when you take into account all the different angles at which you can spot a vehicle. I'm not an image recognition guy, but I would be very surprised if we could do this today with proper practices. Only if people had wildly over-optimistic expectations for NNs would they even attempt such a project back in the day.

I'm pretty sure that most identification of enemy forces comes from their location and behavior, not counting the number of road wheels or checking the shape of the turret.

[–]gwern[S] 1 point2 points  (0 children)

If you're referring to Kanal & Randall, the project had much more reasonable expectations. The purpose, as discussed in their background/preparation paper, was to simply check through aerial reconnaissance to identify potential regions of interest for the human analysts. They note that forecasts ~1962 were expecting footage to increase vastly in future conflicts while human analyst time remain limited (both of which happened). It would certainly be unreasonable to expect a perceptron or other statistical algorithm to identify tank type, affiliation, etc for use with automatic targeting (as in some versions of the tank story), but simple screening for 'possibly a little tank-like box' seems possible. You can always vary the threshold, too, remember - it's worthwhile if you can just scrap 10 or 20% of the footage as definitely irrelevant to the intelligence analysts and let them spend their time on more important areas of the photos. (This is why they have those long sections discussing decision-theoretic perspectives, to allow varying the threshold for particular applications depending on the costs and availability of human analysts.)

[–]dpineo 0 points1 point  (0 children)

Having worked on many on DARPA projects over the last decade, I'd wager that this has in fact happened many times.