I don't want to get much into the discussion of the statistics, which are probably valid. Rather, I want to talk about the assumptions required for these statistics to work. The major assumption in the CLT is that the random variables in one's sample are independent and identically distributed. In the sorts of scenarios Surowiecki (apparently) describes (like guessing the number of beans in a jar of jelly beans), these assumptions more-or-less holds until the point when the average is taken. I'll try to make my point more clear by a couple of scenarios:
In both scenarios, we'll place a jar of, say, dollar coins in front of a crowd, and the jar gets awarded to the person who guesses closest to the number of coins in the jar. In the first scenario, everyone makes a private guess about this number; in the second, we'll follow a "Price-is-Right" model, and ask participants in order (and never asking more than once) what they think the number is, while allowing later participants to eavesdrop on earlier. As a further assumption, lets say that everyone in the crowd understands and can use the central limit theorem. There are two questions that immediately arise:
- What is the best strategy for an individual to choose?
- In which of the two scenarios does the average guess of the crowd come closest to the actual number of coins in the jar?
In scenario 1, the answer to the first question is obvious: the respondent should make the guess that matches his or her internal idea about the number of coins in the jar. The responses will be independent, and probably have pretty similar distribution centered around the actual contents of the jar, so the CLT ought to be able to characterize the answer to the second question quite well.
In scenario 2, the answer is much less obvious. Each person's answer will depend on both what that person believes the true answer to be, and on the circumstances in which he or she answers the question; If I know I'm the last to answer the question, but my internal idea about the count is widely larger from what all others have answered so far, the best guess to make is *not* what I believe the number is, but rather the guess which seems likely to be closer to the true value than the others, rather than that which will be closest. An epsilon greater than the largest value picked by previous guessers. The upshot is that the best strategy is no longer independent of the actions of other participants, and the CLT can not be applied.
This is likely to be true of any game with eavesdropping. Early guesses may be relatively independent, but later participants who know this will also know that they can make a better guess by taking the mean of those early, independent guesses. The obvious danger is that maybe you've sampled the guess of someone else playing the exact same strategy as you have, without knowing it - now you've increased the number of samples, and decreased the variance within the population of the parameter you've attempted to guess. After several iterations of this strategy, the sample variance is now smaller than the population variance, and the variance of the sampling distribution of the mean rapidly approaches 0. It appears, statistically, like there is a very narrow confidence interval for the population parameter, but it's an illusion - there's no more information added to the system when later samples are describable as functions of earlier samples. The result roughly approximates mobbing behavior, like we see in the market.
Later I'll try to argue why I think this makes culture a generally losing proposition across species...
1 comment:
this is also known as "mooooo"... "baaaaa"
Post a Comment