next up previous
Next: The data processing inequality Up: lecture3 Previous: lecture3

The log sum inequality

In this section we introduce an inequality which will allow us to deduce the concavity (or convexity) of some many useful functions.


\begin{theorem}
(Log-sum inequality) For non-negative numbers $a_1,a_2,\ldots,a...
...1}^n b_i}
\end{displaymath}with equality iff $a_i/b_i = constant$.
\end{theorem}
Note that the conditions on this theorem are much weaker than for Jensen's inequality, since it is not necessary to have the sets of numbers add up to 1.


\begin{proof}
The function $f(t) = t\log t$\ is strictly convex. (Why?). By
Je...
...a_i}{\sum_j b_j} \log \sum_j \frac{a_i}{\sum_j b_j}
\end{displaymath}\end{proof}

Using this inequality, we can prove a convexity statement about the relative entropy function.
\begin{theorem}
If $(p_1,q_1)$\ and $(p_2,q_2)$\ are pairs of probability mass
...
...da \leq 1$. That is, $D(p\Vert q)$\ is convex in the
pair $(p,q)$.
\end{theorem}

\begin{proof}
Recall that
\begin{displaymath}D(p\Vert q) = \sum_x p(x) \log \fr...
...a D(p_1 \Vert q_2) + (1-\lambda) D(p_2 \Vert q_2).
\end{multline}\par\end{proof}

\begin{theorem}
$H(p)$\ is a concave function of $p$.
\end{theorem}

\begin{proof}
\begin{displaymath}H(p) = \log \vert\Xc\vert - D(p\Vert u)
\end{di...
...he uniform distribution. Since $D$\ is convex, $H$\ must
be concave.
\end{proof}

\begin{proof}
Here is another more direct proof. Let $X_1 \sim p_1$\ and $X_2 \...
...mbda)p_2) \geq \lambda H(p_1) + (1-\lambda)
H(p_2).
\end{displaymath}\end{proof}

The following theorem is important and will be used several times throughout the quarter.
\begin{theorem}
Let $(X,Y) \sim p(x,y) = p(x)p(y\vert x)$. The mutual informati...
...rt x)$\ and a
convex function of $p(y\vert x)$\ for fixed $p(x)$.
\end{theorem}

\begin{proof}
Recall that
\begin{displaymath}p(y) = \sum_x p(x,y) = \sum_x p(x)...
...formation must
be a convex function of the conditional distribution.
\end{proof}



Todd Moon
2000-02-18