Saturday, 4 August 2007

st.statistics - Kernel width in Kernel density estimation

Hi,
I am doing some Kernel density estimation, with a weighted points set (ie., each sample has a weight which is not necessary one), in N dimensions.
Also, these samples are just in a metric space (ie., we can define a distance between them) but nothing else. For example, we cannot determine the mean of the sample points, nor the standard deviation. The Kernel is just affected by this distance, and the weight of each sample:
f(x) = 1./(sum_weights) * sum(weight_i/h * Kernel(distance(x,x_i)/h))



In this context, I am trying to find a robust estimation for the kernel bandwidth 'h', possibly spatially varying, and preferably which gives an exact reconstruction on the training dataset x_i. If necessary, we could assume that the function is relatively smooth.



I tried using the distance to the first or second nearest neighbor but it gives quite bad results. I tried with leave-one-out optimization, but I have difficulties finding a good measure to optimize for in this context in N-d, so it finds very bad estimates, especially for the training samples themselves. I cannot use the greedy estimate based on the normal assumption since I cannot compute the standard deviation. I found references using covariance matrices to get anisotropic kernels, but again, it wouldn't hold in this space...



Someone has an idea or a reference ?



Thank you very much in advance!

No comments:

Post a Comment