That goes for the median and mode, too.
Here's one reason why and how it applies to the way SoundCheks will measure the truthfulness of statements and the people who make them.
SoundCheks will use structured annotations of public statements to, among other things, score their truthfulness. The scores range from 0% to 100%. Zero percent means a rater is positive the statement is false. One hundred percent means the rater is positive the statement is true. Actually, we add a very tiny fudge factor if the rating is precisely 0% or 100% because at SoundCheks, we don't believe that anything is completely certain to be true or false.
Anyway, say you've got a statement with 100 ratings. We could just take the mean of those scores as a measure of the truthfulness of statement, and maybe even get fancy and weight the scores by how much we trust each of the raters. Let's say the mean comes out to be 60%. Maybe we'd take the mode or the median instead if the distribution of ratings is skewed to the right or left. Great, let's call it a day!
Not so fast. Neither the mean, median, or mode tell us the whole story about the distribution of ratings, especially with this kind of data. We don't know how the distribution is shaped, for one. Is it a bell curve? It is it a lop-sided sort of bell? Is it a funky u-shape, with two peaks at 0% and 100%? Another thing those three measures of central tendency don't tell us is the amount of uncertainty there is about the mean. Is there a wide distribution of opinions about the truthfulness of the statement? Or do most people seem to agree? If there's a lot of dispersion in the ratings, is it just because people don't seem to be collectively sure about the truthfulness of the statement's premises, or does it look like there might be two groups with opposing views on the matter?
To illustrate, let's look at a toy model for the distribution of a truthfulness score. In fact, it's very close to the sampling model we'll use to describe the SoundCheks data. It turns out that a natural way to describe the distribution of a probability rating between zero and one (which we've turned into a percent so far) is with something called the beta distribution. It's not the only way to model the distribution of a probability, but it's by far the most popular and intuitive.
The shape, mean, mode, median, variance, etc, of a beta distribution are controlled by just two parameters, α > 0 and β > 0. Focusing on just one measure of central tendency, the mean of the beta distribution is μ = α/(α + β). This makes sense if you think about the parameters as measuring the concentration of truth (α) vs. falsehood (β) in the statement. But there's also a sort of third, implicit parameter in the beta distribution, which is α + β. We'll call this implicit parameter γ. To demonstrate clearly how important it is to pay attention to the shape of a distribution, let's look at what happens when we have a constant mean μ = 0.5, but varying γ.
Let's start with a beta distribution where α = β = 1 (thus γ = 0.5), which looks like this:
Here's a case where our mean estimate of the statement's truthfulness is 50%, but we're very uncertain about that estimate. The statement is equally likely to be 99.99999999999% true...or 0.000000001% true. This is a distribution we might start with to characterize our prior beliefs before we obtain any information whatsoever about the statement or the person who made it, or the truthfulness of statements in general across the population of people who happen to make statements.
Now let's look at a totally different case. let's see what happens when α = β = 100,000, which looks like this:
Now we're pretty certain that the statement is 50% likely to be true because we've got a lot of evidence on both sides of the debate. Specifically, we're 95% certain that the truthfulness of the statement is between about 49.7% and about 50.2%. That interval, btw, was estimated from a simulation, which probably doesn't even have a precision to the tenth of a percent anyway (I don't know, I didn't check).
Next, let's look at a scenario where there is a fair amount of uncertainty about how truthful the statement is, as is the case when α = β = 10:
Now our 95% credible interval (which was taken from the highest posterior density interval for you dorks out there) is between about 29% and about 71%. Here, we've got the same bell shape as we did before, except the width of the distribution is wider around 0.5.
So far so good. But here's where it gets weird. Let's look at the case when α = β = 0.001:
his is the distribution that would arise if most of your raters gave either very low or very high truthfulness scores to the statement, and maybe some gave scores in between. In this case, not only are we quite uncertain about the mean truthfulness of 50%, but it is the least likely single value of truthfulness according to the distribution above. How will SoundCheks interpret this scenario? Well, we would still consider the mean truthfulness to be an estimate of the true probability that a statement is true under our current knowledge. But we would also be suspicious that there is tremendous controversy over the statement, perhaps arising from a mixture of extremely liberal and extremely conservative members of the rating body. It is distributions like this that would further motivate our plans to have raters answer social and political attitude surveys.
The point is, the shape of the distribution matters just as much as the average.