Friday, 4 May 2007

st.statistics - unbiased estimate of the variance of a weighted mean

First some notation. Each example is drawn from some unknown distribution YY with E[Y]=muE[Y]=mu and textrmVar[Y]=sigma2textrmVar[Y]=sigma2. Suppose the weighted mean consists of nn independent draws XisimYXisimY, and win1win1 is in the standard simplex. Finally define the r.v. X=sumiwiXiX=sumiwiXi. Note that E[X]=sumiwiE[Xi]=muE[X]=sumiwiE[Xi]=mu and textrmVar[X]=sumiw2itextrmVar[Xi]=sigma2sumiw2itextrmVar[X]=sumiw2itextrmVar[Xi]=sigma2sumiw2i.



Generalizing the standard definition of sample mean, take
hatmu(xin1):=sumiwixi.hatmu(xin1):=sumiwixi.
Note that E[hatmu(xin1)]=sumiwiE[xi]=mu=E[X]E[hatmu(xin1)]=sumiwiE[xi]=mu=E[X], so hatmuhatmu is an unbiased estimator.



For the sample variance, generalize the sample variance as
hatsigma2b(xin1):=sumiwi(xihatmu(xin1))2,hatsigma2b(xin1):=sumiwi(xihatmu(xin1))2,
where the subscript foreshadows this will need a correction to be unbiased. Anyway,
E[hatsigma2b]=sumiwiE[(xihatmu)2]=sumiwiEleft[left(sumjwj(xixj)right)2right].E[hatsigma2b]=sumiwiE[(xihatmu)2]=sumiwiEleft[left(sumjwj(xixj)right)2right].
The term in the expectation can be written as
sumj,kwj(xixj)wk(xixk)=sumjw2j(xixj)2+sumjneqkwjwk(xixj)(xixk).sumj,kwj(xixj)wk(xixk)=sumjw2j(xixj)2+sumjneqkwjwk(xixj)(xixk).
Passing in the expectation, the first term (when xineqxjxineqxj, which would yield 0) is
E[(xixj)2]=2E[x2i]2mu2=2sigma2,E[(xixj)2]=2E[x2i]2mu2=2sigma2,
whereas the second (when xineqxjxineqxj and xineqxkxineqxk, which would yield 0) is
E[x2ixixjxixk+xjxk]=E[x2i]mu2=sigma2.E[x2ixixjxixk+xjxk]=E[x2i]mu2=sigma2.
Combining everything,
sumiwileft(2sigma2sumjneqiw2j+sigma2sumjneqkneqiwjwkright)=sigma2(1sumjw2j).
Therefore E[hatsigma2b]sigma2=sigma22sumjw2j, i.e. this is a biased estimator. To make this an unbiased estimator of Y, divide by the excess term derived above:
hatsigma2u(xin1):=frachatsigma2b(xin1)1sumjw2j=fracsumiwi(xihatmu)21sumjw2j
This matches the definition you gave (and a sanity check wi=1/N, recovering the normal unbiased estimate).



Now, if one instead were to seek an unbiased estimator of X=sumiXi, the formula would instead be hatsigma2b(xin1)(sumjw2j)/(1sumjw2j).



It is very odd for me that the documents you refer to are making estimators of Y and not X; I don't see the justification of such an estimator. Also it is not clearly how to extend it to samples that don't have length n, whereas for the estimator of X, you simply have some number m of n-samples, and averaging everything above makes things work out. Also, I didn't check, but it's my suspicion that the weighted estimator for Y has higher variance than the usual one; as such, why use this weighted estimator at all? Building an estimator for X would seem to have been the intent..

No comments:

Post a Comment