Sunday 3 September 2006

mg.metric geometry - Are Bregman divergences quasi-convex?

The answer is: yes, it is always quasi-convex! I'll show this by first proving a stronger characterization, from which the other facts follow. Please bear with me as I first make a few definitions.



Let convex $S subseteq mathbb{R}$ and a function $f:Sto mathbb{R}$ be given. To avoid existence of derivatives, let $f'(v)$ refer to any subgradient of $f$ at $v$, and say $f$ is convex if for any $x,v in S$, $f(x) geq f(v) + f'(v)(x-v)$. (This is an equivalent formulation of convexity, and when $f$ is differentiable, gives the 'first-order' definition of convexity.) Note critically that for $u,vin S$ with $uleq v$, it follow that
$f'(u) leq f'(v)$. (This is sort of like the mean value theorem, though not exactly since those subgradients are technically sets; I think all of I've said so far may appear in the thesis
of Shai Shalev-Shwartz.) Define
$$b_x(v) = f(x) - f(v) - f'(v)(x-v)$$
to be the Bregman divergence of $f$ at the point $x$, taking the linear approximation at $v$. By the definition of convexity, if follows that $b_x(v) geq 0$ for all $x,vin S$.



Fact: $b_x(cdot)$ is decreasing up to $x$, exactly zero at $x$, and increasing after $x$.



Proof. $b_x(x) = f(x)-f(x) - f'(x)(0) = 0$. Now consider $uleq v leq x$; we'd like to show $b_x(u) geq b_x(v)$. To start, write
$$
b_x(u)-b_x(v) = f(v) + f'(v)(x-v) - f(u) - f'(u)(x-u).
$$
Now, using $f(v) geq f(u) + f'(u)(v-u)$ yields
$$
b_x(u)-b_x(v) geq f'(v)(x-v) + f'(u)(v-x) = (f'(v) - f'(u))(x-v),
$$
and $b_x(u)-b_x(v)geq 0$ follows since $f'(v) geq f'(u)$ and $x-vgeq 0$. To
show the last case, that $xleq vleq u$ gives $b_x(u) geq b_x(v)$, the proof is
analogous. QED.



Some remarks:



  • To see that this means $b_x(cdot)$ is quasi-convex, take any $yleq z$ and any
    $lambda in [0,1]$. Then the point $w:=lambda y + (1-lambda)z$ lies on the line
    segment $yz$, and $b_x(cdot)$ must be increasing in the direction of at least one of
    these endpoints.
  • This also gives a strong idea of how convexity breaks down for $b_x(cdot)$. In
    particular, let $f= max{0, |x|-1}$ (a 1-insensitive loss for regression). Then the
    function $b_0(cdot)$ is 0 on $(-1,1)$ and 1 everywhere else except ${-1,+1}$ (those points
    are different since, by using subgradients, these functions have sets as output; but if you
    took a differentiable analog to this loss, something like a Huber loss, you'd get basically the same effect, and $b_0(cdot)$ is a vanilla continuous (non-convex) function).

No comments:

Post a Comment