So, after dropping a MathJax command into my template, we're all set to go.
I wanted to start with a few theorems on metric compactness (strictly speaking, compactness of metric spaces), but that can get complicated very fast. So let me begin with two inequalities:
- The Cauchy inequality
- Bessel's inequality
The Cauchy inequality is more commonly known as the
Cauchy-Schwarz or
Cauchy-Bunyakoswki-Schwarz inequality, but this is, in my opinion, downright absurd. We might as well call it the Hilbert-Schmidt or the Von Neumann inequality. Neither Bunyakowski nor Schwarz formulated the result in its current form. Both proved it for integrals, which was indeed a step forward, though why Walter Rudin calls it the Schwarz inequality, despite Schwarz having published his result some three decades after Bunyakowski, is completely beyond me. So Cauchy inequality it is.
The main importance of the Cauchy inequality is that it proves an inner product space to be a normed space. Suppose we have an inner product space $(X, \langle\cdot\rangle)$, then define $\|x\| := \sqrt{\langle x, x\rangle}$ for all $x\in X$. To show that this is a norm, we prove the Cauchy inequality.
CAUCHY INEQUALITY. Let $(X, \langle\cdot\rangle)$ be an inner product space, and $x, y\in X$ have unit norm (i.e., $\|x\|=\|y\|=1$). Then
$$\langle x, y\rangle \leq 1.$$
Proof. $0 \leq \|x-y\|^2 = \|x\|^2 + \|y\|^2 - 2\langle x, y\rangle = 2\left( 1 - \langle x, y\rangle \right)$.
You will rarely see proofs this short in your lifetime.
Now, most will know the Cauchy inequality in a different form, where the vectors $x$ and $y$ are allowed to have any norm they please. But inner products are bilinear, so
$$\left\| \frac{x}{\|x\|} \right\| = \sqrt{ \left\langle \frac{x}{\sqrt{\langle x, x\rangle}}, \frac{x}{\sqrt{\langle x, x\rangle}} \right\rangle } = \sqrt{ \frac{\langle x, x\rangle}{\langle x, x\rangle} } = 1.$$
Similarly, one may generalize the Cauchy inequality formulated above to
$$\left\langle \frac{x}{\|x\|}, \frac{y}{\|y\|} \right\rangle \leq 1 \Rightarrow \langle x, y\rangle \leq \|x\|\|y\|,$$
for $x,y\in X$ of arbitrary norm. Incidentally, this proves that
$$\|x+y\|^2 =\|x\|^2 + 2\langle x, y\rangle + \|y\|^2 \leq \|x\|^2 + 2\|x\|\|y\| + \|y\|^2 = \left(\|x\|+\|y\|\right)^2,$$
so that $\|\cdot\|$ is indeed a norm.
Now let $X$ be a
Hilbert space (a complete inner product space in the norm topology), and let $Y\subset X$ be a closed linear subspace of $X$. Then for $x\in X$, there is a unique $y\in Y$ such that $\|x-y\|$ is minimized. Indeed, let $(y_n)_n$ be a sequence such that $d(x, y_n) \rightarrow d(x, Y)$, $d$ being the metric associated with $\|\cdot\|$. Then
$$\|y_m - y_n\|^2 = 2\|y_m-x\|^2 + 2\|y_n-x\|^2 - 4\| \tfrac{1}{2}(y_m+y_n) - x \|^2 \rightarrow 0$$
(as $m\wedge n\rightarrow\infty$), since $Y$ is convex*. Hence $y_n$ converges to some $y\in Y$ by the completeness of $X$ and the closedness of $Y$.
We call this distance-minimizing element the
orthogonal projection of $x$ onto $Y$, because it is the unique $y\in Y$ such that $\langle x-y, z\rangle = 0$ for all $z\in Y$. To see this, assume $\|z\|=1$ w.l.o.g. and notice that the function $t\mapsto \|x - y - tz\|^2 = \|x-y\|^2 - 2t\langle x-y, z\rangle + t^2$ attains its minimum at $t = \langle x-y, z\rangle = 0$. Furthermore, suppose there is another $y'\in Y : \langle x-y',z\rangle=0\forall z\in Y$, then $\|x-y\|^2 = \|x-y'\|^2 + 2\rangle x-y', y'-y\langle + \|y'-y\|^2 = \|x-y'\|^2 + \|y'-y\|^2$, contradicting the minimality of $y$.
The operator $P$ projecting an element $x\in X$ onto $Y\subset X$ has the following properties:
- $P$ is linear: $P(\alpha x_1+x_2) = \alpha Px_1+Px_2$;
- $P$ is a contraction: $\|Px\|\leq\|x\|$.
Linearity follows directly from the defining property $\langle Px, z\langle = \rangle x, z\langle \forall z\in Y$, whereas the contractive property follows from Pythagoras: $\|Px\|^2 + \|x-Px\|^2 = \|x\|^2$.
This is all very fine in theory, but what does such a projection look like in practice? Well, suppose our subspace $Y$ is spanned by a orthonormal sequence $(x_n)_n$. (If $(x_n)_n$ is not orthonormal, we can make it so by the Gram-Schmidt process.) Then
$$Px = \sum_{n=1}^\infty\langle x, x_n\rangle x_n.$$
If the sequence were finite, the proof would be a simple consequence of the orthonormality of $(x_n)_n$. For the infinite case, we need to prove convergence of the series. Well, for any finite orthonormal sequence $(x_n)_n$, $\|Px\|^2 = \sum_n\langle x, x_n\rangle^2 \leq \|x\|^2$. Letting the number of terms go to infinity, $\|Px\|$ forms an increasing series bounded from above, thereby converging to a limit. It is a basic fact of Banach spaces that absolutely convergent series are convergent (in fact, a space is Banach if and only if every absolutely convergent series converges). So convergence is established. Incidentally, we have proved Bessel's inequality:
BESSEL'S INEQUALITY. Let $(x_n)_n$ be an orthonormal sequence in a Hilbert space $X$. Then, for any $x\in X$,
$$\|x\|^2 \geq \sum_{n=1}^\infty\langle x, x_n\rangle^2.$$
Now comes the clue: Cauchy's inequality is nothing but a special case of Bessel's! Just set all the $x_n$ with $n\geq2$ equal to zero, and there you go.
And now for a final homework question: Is it even necessary to prove the Cauchy inequality? Do we use it somewhere on the way to proving Bessel? Or could we have skipped the appetizer and gone straight to the main course, serving Cauchy as a sweet dessert instead? What do you think?
*Is convexity strictly necessary? Well, consider the Hilbert space of square-integrable sequences $x=(x_n)_n\in\mathbb R^\mathbb N$, and let $C$ be the closure of $\cup_{n=1}^\infty \{x : x_n\geq1+1/n\}$. Then $d(0, C) = 1$, but $d(0, x)>1$ for all $x\in C$.
Sources used:
- Bobrowski, A. Functional analysis for probability and stochastic processes. 3.1.1-13 and 4.2.3-5.
- Lang, S. Real and functional analysis. V.1.