Previous Entry Share Next Entry
Ste, care to back me up?
ATTN: All Statisticians

You are all wrong. And probably stupid.

At what point did you forget how to perform simple arithmetic? No, really, I'm interested.

The following is a definition for "Median" that I've seen online
(n+1) / 2
This is accurate, but wrong. The correct definition is
((n - 1) / 2) + 1
I'm aware that these two are equal, but the latter is the better way of doing it. The former is merely a simplification of this, which removes the point of why you have to piss about with it in the first place.

Let's use an example. I have 600 students, with different exam grades. I want to know the median. My dataset goes from 1 to 600. But I want to find the halfway point. I'm unable to perform division arithmetic because my start point is 1. So I initially subtract one to standardise it, giving me data from 0-599. This can be divided by two, to give me the standardised median. I must add 1 at the end in order to bring the data back from the standardised version to the practical version. Ergo, the median is 300.5.

That's the case using both my definition of the median, and the one that I've seen around. But then it gets tricky, with Quartiles. I've seen the following definitions for the Lower Quartile
(n+1) / 4

(n / 4) + 0,5
And both of these are fucking WRONG. The first is based on the original definition of the median, and some bright spark (read: fucking statistician) has decided to modify it by changing the 2 to a 4. Fucking moron. The second is again based on a simplified version of the median rule. The correct formula to find out the Lower Quartile is:
((n - 1) / 4) + 1
Again, we standardise it, divide by four, then de-standardise it.

In our dataset of 1-600, we then have that the lower quartile is 150.75. Given that we start from 1, there are 149.75 datapoints below this, and 449.25 datapoints above it. Oh look. A 1:3 ratio.

The two examples of wrong formulae obtained from "university" websites give values of 150.25 and 150.5. Now, while both of these have 25% of the datapoints below them discretely, if looked at continuously they do not.

When statistics groups data, this is a lossy form of compression. Information is lost, and so we interpolate and approximate to try to regain some of that information. Since we are using a continuous approximation, it would be an error to assume that 150.25 = 150.5 = 150.75 as it would discretely. Although the difference is trivial, especially since we are only approximating, it is still fucking stupid for them to give a formula which is inherently wrong. Instead, my formula is correct, and as far as I can see, is not found on University websites. I plan to write letters to said universities, pointing this out.

But first, any maths grads want to back me up and point out that I'm right? :o)

  • 1

Heh. Actually, Computer Science is sucky and boring, and I doubt I'd ever post about it. Star Wars on the other hand, is way, way cooler than statistics. ;P


lol, you're such a pedant. like me ;)

You are all wrong. And probably stupid.

Is this like saying "100% of staticians are wrong, and any given statician has some P(stupid) > 0"?

  • 1

Log in

No account? Create an account