First some notation. Each example is drawn from some unknown distribution with and . Suppose the weighted mean consists of independent draws , and is in the standard simplex. Finally define the r.v. . Note that and .
Generalizing the standard definition of sample mean, take
Note that , so is an unbiased estimator.
For the sample variance, generalize the sample variance as
where the subscript foreshadows this will need a correction to be unbiased. Anyway,
The term in the expectation can be written as
Passing in the expectation, the first term (when , which would yield 0) is
whereas the second (when and , which would yield 0) is
Combining everything,
Therefore , i.e. this is a biased estimator. To make this an unbiased estimator of , divide by the excess term derived above:
This matches the definition you gave (and a sanity check , recovering the normal unbiased estimate).
Now, if one instead were to seek an unbiased estimator of , the formula would instead be .
It is very odd for me that the documents you refer to are making estimators of and not ; I don't see the justification of such an estimator. Also it is not clearly how to extend it to samples that don't have length , whereas for the estimator of , you simply have some number of -samples, and averaging everything above makes things work out. Also, I didn't check, but it's my suspicion that the weighted estimator for has higher variance than the usual one; as such, why use this weighted estimator at all? Building an estimator for would seem to have been the intent..
No comments:
Post a Comment