First some notation. Each example is drawn from some unknown distribution $Y$ with $E[Y] = mu$ and $textrm{Var}[Y] = sigma^2$. Suppose the weighted mean consists of $n$ independent draws $X_isim Y$, and ${w_i}_1^n$ is in the standard simplex. Finally define the r.v. $X = sum_i w_i X_i$. Note that $E[X] = sum_i w_i E[X_i] = mu$ and $textrm{Var}[X] = sum_i w_i^2 textrm{Var}
[X_i] = sigma^2sum_i w_i^2$.
Generalizing the standard definition of sample mean, take
$$
hat mu({x_i}_1^n) := sum_i w_i x_i.
$$
Note that $E[hat mu({x_i}_1^n)] = sum_i w_i E[x_i] = mu = E[X]$, so $hat mu$ is an unbiased estimator.
For the sample variance, generalize the sample variance as
$$
hat sigma^2_b({x_i}_1^n) := sum_i w_i (x_i - hat mu({x_i}_1^n))^2,
$$
where the subscript foreshadows this will need a correction to be unbiased. Anyway,
$$
E[hat sigma^2_b] = sum_i w_i E[(x_i - hat mu)^2] = sum_i w_i Eleft[left(sum_j w_j (x_i - x_j)right)^2right].
$$
The term in the expectation can be written as
$$
sum_{j,k} w_j(x_i - x_j)w_k(x_i - x_k) = sum_jw_j^2(x_i - x_j)^2 + sum_{jneq k} w_j w_k(x_i - x_j)(x_i - x_k).
$$
Passing in the expectation, the first term (when $x_ineq x_j$, which would yield 0) is
$$
E[(x_i-x_j)^2] = 2E[x_i^2] - 2mu^2 = 2sigma^2,
$$
whereas the second (when $x_i neq x_j$ and $x_i neq x_k$, which would yield 0) is
$$
E[x_i^2 - x_ix_j - x_ix_k + x_jx_k] = E[x_i^2] - mu^2 = sigma^2.
$$
Combining everything,
$$
sum_i w_i left(2sigma^2sum_{jneq i}w_j^2 + sigma^2sum_{jneq kneq i} w_j w_kright)
= sigma^2( 1 - sum_j w_j^2).
$$
Therefore $E[hat sigma_b^2] - sigma^2 = -sigma^22sum_j w_j^2$, i.e. this is a biased estimator. To make this an unbiased estimator of $Y$, divide by the excess term derived above:
$$
hat sigma_u^2({x_i}_1^n)
:= frac {hat sigma_b^2({x_i}_1^n)}{1- sum_j w_j^2}
= frac {sum_i w_i(x_i - hat mu)^2}{1- sum_j w_j^2 }
$$
This matches the definition you gave (and a sanity check $w_i = 1/N$, recovering the normal unbiased estimate).
Now, if one instead were to seek an unbiased estimator of $X=sum_i X_i$, the formula would instead be $hat sigma_b^2({x_i}_1^n)(sum_j w_j^2) / ( 1 - sum_j w_j^2)$.
It is very odd for me that the documents you refer to are making estimators of $Y$ and not $X$; I don't see the justification of such an estimator. Also it is not clearly how to extend it to samples that don't have length $n$, whereas for the estimator of $X$, you simply have some number $m$ of $n$-samples, and averaging everything above makes things work out. Also, I didn't check, but it's my suspicion that the weighted estimator for $Y$ has higher variance than the usual one; as such, why use this weighted estimator at all? Building an estimator for $X$ would seem to have been the intent..
No comments:
Post a Comment