Friday, 4 May 2007

st.statistics - unbiased estimate of the variance of a weighted mean

First some notation. Each example is drawn from some unknown distribution Y with E[Y]=mu and textrmVar[Y]=sigma2. Suppose the weighted mean consists of n independent draws XisimY, and wi1n is in the standard simplex. Finally define the r.v. X=sumiwiXi. Note that E[X]=sumiwiE[Xi]=mu and textrmVar[X]=sumiwi2textrmVar[Xi]=sigma2sumiwi2.



Generalizing the standard definition of sample mean, take
hatmu(xi1n):=sumiwixi.
Note that E[hatmu(xi1n)]=sumiwiE[xi]=mu=E[X], so hatmu is an unbiased estimator.



For the sample variance, generalize the sample variance as
hatsigmab2(xi1n):=sumiwi(xihatmu(xi1n))2,
where the subscript foreshadows this will need a correction to be unbiased. Anyway,
E[hatsigmab2]=sumiwiE[(xihatmu)2]=sumiwiEleft[left(sumjwj(xixj)right)2right].
The term in the expectation can be written as
sumj,kwj(xixj)wk(xixk)=sumjwj2(xixj)2+sumjneqkwjwk(xixj)(xixk).
Passing in the expectation, the first term (when xineqxj, which would yield 0) is
E[(xixj)2]=2E[xi2]2mu2=2sigma2,
whereas the second (when xineqxj and xineqxk, which would yield 0) is
E[xi2xixjxixk+xjxk]=E[xi2]mu2=sigma2.
Combining everything,
sumiwileft(2sigma2sumjneqiwj2+sigma2sumjneqkneqiwjwkright)=sigma2(1sumjwj2).
Therefore E[hatsigmab2]sigma2=sigma22sumjwj2, i.e. this is a biased estimator. To make this an unbiased estimator of Y, divide by the excess term derived above:
hatsigmau2(xi1n):=frachatsigmab2(xi1n)1sumjwj2=fracsumiwi(xihatmu)21sumjwj2
This matches the definition you gave (and a sanity check wi=1/N, recovering the normal unbiased estimate).



Now, if one instead were to seek an unbiased estimator of X=sumiXi, the formula would instead be hatsigmab2(xi1n)(sumjwj2)/(1sumjwj2).



It is very odd for me that the documents you refer to are making estimators of Y and not X; I don't see the justification of such an estimator. Also it is not clearly how to extend it to samples that don't have length n, whereas for the estimator of X, you simply have some number m of n-samples, and averaging everything above makes things work out. Also, I didn't check, but it's my suspicion that the weighted estimator for Y has higher variance than the usual one; as such, why use this weighted estimator at all? Building an estimator for X would seem to have been the intent..

No comments:

Post a Comment