Unnatural Intelligence - also known as Artificial Intelligence: November 2018

Sample variance estimation formula: Why is (n-1) to be used instead of n as the denominator?

Let Y₁, Y₂, .., Y_n be the sample data points for the random variable Y. Let μ_Y and σ_Y denote the mean and standard deviation of Y respectively. This assumption usually holds true in case of sampling with replacement. Hence, E(Y_iY_j) = E(Y_i)E(Y_j) for any i, j ∈ {1, .., n} such that i ≠ j.

The estimator for mean of Y is simply the average value of the data points, i.e.

Mean_est(Y) or Ỹ_est =

which seems intuitive. By the same reasoning, it would seem that the best estimate for the variance of Y would be Var_est(Y) =

Let us estimate the mean value of the estimated variance, since the estimate itself is a random variable dependent on the values of data points Y_iin the sample set. Let A denote the variance estimator that uses n as the denominator. Let us denote by S_n the numerator in the formula for Ỹ_est, i.e.

Now, by definition,

Thus, the estimated variance has a mean value that is less than the actual variance. From equation ⑤, it is clear that if the estimator A is multiplied by n/(n - 1) (which is a linear transformation of A), then the resulting estimator A_mod will have a mean value that equals the actual variance. Thus, the more precise estimator of variance is:

Unnatural Intelligence - also known as Artificial Intelligence

Sunday, November 11, 2018

Basic statistics - Some notes and proofs

Sample variance estimation formula: Why is (n-1) to be used instead of n as the denominator?