Tuesday, 27 November 2007

pr.probability - How many trial picks expectedly sufficient to cover a sample space?

The expected number of picks needed equals the sum of the probabilities that at least $t$ picks are needed, which means that $t-1$ subsets left at least one value uncovered. We can use inclusion-exclusion to get the probability that at least one value is uncovered.



The probability that a particular set of $k$ values is uncovered after $t-1$ subsets are chosen is



$$Bigg(frac{n-k choose r}{n choose r}Bigg)^{t-1}$$



So, by inclusion-exclusion, the probability that at least one value is uncovered is



$$ sum_{k=1}^n {n choose k}(-1)^{k-1}Bigg(frac{n-k choose r}{n choose r}Bigg) ^{t-1} $$



And then the expected number of subsets needed to cover everything is



$$ sum_{t=1}^infty sum_{k=1}^n {n choose k}(-1)^{k-1} Bigg(frac{n-k choose r}{n choose r}Bigg)^{t-1} $$



Change the order of summation and use $s=t-1$:



$$ sum_{k=1}^n {n choose k}(-1)^{k-1} sum_{s=0}^infty Bigg( frac{n-k choose r}{n choose r}Bigg)^s$$



The inner sum is a geometric series.



$$ sum_{k=1}^n {n choose k} (-1)^{k-1}frac{n choose r}{{n choose r}-{n-k choose r}}$$



$$ {n choose r} sum_{k=1}^n (-1)^{k-1}frac{n choose k}{{n choose r}-{n-k choose r}}$$



I'm sure that should simplify further, but at least now it's a simple sum. I've checked that this agrees with the coupon collection problem for $r=1$.



Interestingly, Mathematica "simplifies" this sum for particular values of $r$, although what it returns even for the next case is too complicated to repeat, involving EulerGamma, the gamma function at half-integer values, and PolyGamma[0,1+n].

No comments:

Post a Comment