Here's what I did. I thought up 30 pairs of variables that would be easy to measure and that might relate in diverse ways. Some variables were physical (the distance vs. apparent brightness of nearby stars), some biological (the length vs. weight of sticks found in my back yard), and some psychological or social (the S&P 500 index closing value vs. number of days past). Some I would expect to show no relationship (the number of pages in a library book vs. how high up it is shelved in the library), some I would expect to show a roughly linear relationship (distance of McDonald's franchises from my house vs. MapQuest estimated driving time), and some I expected to show a curved or complex relationship (forecasted temperature vs. time of day, size in KB of a JPG photo of my office vs. the angle at which the photo was taken). See here for the full list of variables. I took 11 measurements of each variable pair. Then I analyzed the resulting data.
Now, if the world is massively complex, then it should be difficult to predict a third datapoint from any two other data points. Suppose that two measurements of some continuous variable yield values of 27 and 53. What should I expect the third measured value to be? Why not 1,457,002? Or 3.22 x 10^-17? There are just as many functions (that is, infinitely many) containing 27, 53, and 1,457,002 as there are containing 27, 53, and some more pedestrian-seeming value like 44. On at least some ways of thinking about massive complexity, we ought to be no more surprised to discover that third value to be over a million than to discover that third value to be around 40. Call the thesis that a wildly distant third value is no less likely than a nearby third value the Wild Complexity Thesis.
I can use my data to test the Wild Complexity Thesis, on the assumption that the variables I have chosen are at least roughly representative of the kinds of variables we encounter in the world, in day-to-day human lives as experienced in a technologically advanced Earthly society. (I don't generalize to the experiences of aliens or to aspects of the world that are not salient to experience, such as Planck-scale phenomena.) The denial of Wild Complexity might seem obvious to you. But that is an empirical claim, and it deserves empirical test. As far as I know, no philosopher has formally conducted this test.
To conduct the test, I used each pair of dependent variables to predict the value of the next variable in the series (the 1st and 2nd observations predicting the value of the 3rd, the 2nd and 3rd predicting the value of the 4th, etc.), yielding 270 predictions for the 30 variables. I counted an observation "wild" if its absolute value was 10 times the maximum of the absolute value of the two previous observations or if its absolute value was below 1/10 of the minimum of the absolute value of the two previous observations. Separately, I also looked for flipped signs (either two negative values followed by a positive or two positive values followed by a negative), though most of the variables only admitted positive values. This measure of wildness yielded three wild observations out of 270 (1%) plus another three flipped-sign cases (total 2%). (A few variables were capped, either top or bottom, in a way that would make an above-10x or below-1/10th observation analytically unlikely, but excluding such variables wouldn't affect the result much.)
So it looks like the Wild Complexity Thesis might be in trouble. Now admittedly a caveat is in order: If the world is wild enough, then I probably shouldn't trust my memory of having conducted this test (since maybe my mind with all its apparent memories just popped into existence out of a disordered past), or maybe I shouldn't trust the representativeness of this sample (I got 2% wild this time, but maybe in the next test I'll get 50% wild). However, if we are doubtful about the results for either of those reasons, it might be difficult to escape collapse into radical skepticism. If we set aside radically skeptical worries, we might still wonder how wild the world is. These results give us a preliminary estimate. To the extent the variables are representative, the answer seems to be: not too wild -- though with some surprises, such as the $20,000 listed value of the uncirculated 1922 Lincoln wheat penny. (No, I didn't know about that before seeking the data.)
If we use a Wildness criterion of two (two times the max or 1/2 the min), then there are 33 wild instances in 270 observations, or about 12%, overlapping in one case with the three flipped-sign cases, for 13% total. I wouldn't take this number too seriously, since it will presumably vary considerably depending on the variables chosen for analysis -- but still it's smaller than it might have been, and maybe okay as a first approximation to the extent the variables of interest resemble those on my list.
I had meant to do some curve fitting in this post, too -- comparing linear and quadratic predictions with more complex predictions -- but since this is already a good-sized post, we'll curve fit another day.
I admit, this is a ham-handed approach. It uses crude methods, it doesn't really establish anything we didn't already know, and I'm sure it won't touch the views of those philosophers who deny that the world is simple (who probably aren't committed to the Wild Complexity Thesis). I highlight these concessions by calling the project "stupid epistemology". If we jump too quickly to clever, though, sometimes we miss the necessary groundwork of stupid.
Note: This post was substantially revised Feb. 6.
Still interesting anyway. Thoughts:
1. how do we know that the variables are not a subset of possible variables, selected for their general interest to humans and so the simplicity simply due to the hidden confounder 'interesting enough to humans to collect'?
2. could this be an artifact of representation? Something like http://en.wikipedia.org/wiki/Benford%27s_law (although that seems to have meaningful explanations)
3. the general observation reminds me of Cohen's paper http://ist-socrates.berkeley.edu/~maccoun/PP279_Cohen1.pdf 'The Earth is Round p<0.05' where he notes on pg 4 that 'the nil hypothesis' is always false and quotes a psychology saying that "Everything is related to everything else".
Posted by: gwern | 02/05/2013 at 08:59 PM
Gwern: Thanks for the thoughtful comment!
On 1: I agree that's possible! It was my intent to acknowledge that possibility by nodding to the fact that I'm not generalizing to aliens and looking at the kinds of variables we experience in day-to-day life.
On 2: Benford's law is a cool regularity. It's also a pattern that holds across a wide range of variables -- so it's another way into the type of project above. There are representational issues in the data, e.g., some data that it seemed natural to put in logarithmic form, which impacted the Wildness tests (harder to get 10x or 1/10 but easier to flip positive to negative. Here's one place where I tried to just be "stupid" about things.
On 3: I didn't know Cohen's paper. (Thanks for the link!) Paul Meehl also said some things about everything being related, as I recall. But relatedness and complexity seem to me to be somewhat separable. Relations can be simple or complex, and arguably nothing is simpler than a zero correlation....
Posted by: Eric Schwitzgebel | 02/06/2013 at 01:15 AM
> Relations can be simple or complex, and arguably nothing is simpler than a zero correlation....
That gets you into 'whose definition of simplicity'; if you look at it from an entropy/information theory/computation complexity point of view, 2 variables which are uncorrelated are as complex as it is possible to be, since knowing one variable does not predict in the slightest bit the other variable and the 2 variables have no shorter encoding than the variables separately.
Also consider the opposite case: if 0 correlation is the simplest possible relationship, then surely 1 correlation is the most complex possible relationship. Yet you can get a correlation of 1 with any variable x just by looking at the correlation of x and... x.
Posted by: gwern | 02/06/2013 at 07:49 PM
Indeed, simplicity is a complex issue! I agree that encoding each value of each variable is information intensive if they are uncorrelated. But encoding *their relationship* is arguably informationally cheap! I'd be inclined to think that both zero and one correlations are simple relationships, compared, say, to the relationship y = (sin(a + bx)+c)^(f+gx^2) -- which is still pretty simple compared to the entire universe of possible functions. (I might get into this a bit in my follow-up post.)
Posted by: Eric Schwitzgebel | 02/06/2013 at 10:34 PM