Tuesday, March 15, 2016

Compactness in metric spaces

Let me start this post by quoting Adam Bobrowski:

I believe saying that the notion of compactness is one of the most important ones in topology and the whole of mathematics is not an exaggeration.

Agreed. Now let me add a quotation of my own:

I believe saying that metric spaces are the most important spaces in topology is not an exaggeration.

There are, of course, non-metrizable topological spaces, and they have their uses, but let's face it: those uses are rare and limited. I, for one, don't have a clue what they may be. Point-wise convergence of functions? Who gives an $\epsilon$?


So let's combine the two and investigate "metric compactness," by which I mean compactness of metric spaces. Our menu for today will be centered around two main dishes:
  • The Heine-Borel theorem
  • The Bolzano-Weierstraß theorem
Both are generalized into their astronaut-food equivalent, and some related concepts will be discussed.


Let's start with the related concepts. There are many different kinds of compactness,

Two inequalities

So, after dropping a MathJax command into my template, we're all set to go.

I wanted to start with a few theorems on metric compactness (strictly speaking, compactness of metric spaces), but that can get complicated very fast. So let me begin with two inequalities:
  • The Cauchy inequality
  • Bessel's inequality
The Cauchy inequality is more commonly known as the Cauchy-Schwarz or Cauchy-Bunyakoswki-Schwarz inequality, but this is, in my opinion, downright absurd. We might as well call it the Hilbert-Schmidt or the Von Neumann inequality. Neither Bunyakowski nor Schwarz formulated the result in its current form. Both proved it for integrals, which was indeed a step forward, though why Walter Rudin calls it the Schwarz inequality, despite Schwarz having published his result some three decades after Bunyakowski, is completely beyond me. So Cauchy inequality it is.

The main importance of the Cauchy inequality is that it proves an inner product space to be a normed space. Suppose we have an inner product space $(X, \langle\cdot\rangle)$, then define $\|x\| := \sqrt{\langle x, x\rangle}$ for all $x\in X$. To show that this is a norm, we prove the Cauchy inequality.

CAUCHY INEQUALITY. Let $(X, \langle\cdot\rangle)$ be an inner product space, and $x, y\in X$ have unit norm (i.e., $\|x\|=\|y\|=1$). Then
$$\langle x, y\rangle \leq 1.$$
Proof. $0 \leq \|x-y\|^2 = \|x\|^2 + \|y\|^2 - 2\langle x, y\rangle = 2\left( 1 - \langle x, y\rangle  \right)$.

You will rarely see proofs this short in your lifetime.

Now, most will know the Cauchy inequality in a different form, where the vectors $x$ and $y$ are allowed to have any norm they please. But inner products are bilinear, so $$\left\| \frac{x}{\|x\|} \right\| = \sqrt{ \left\langle \frac{x}{\sqrt{\langle x, x\rangle}}, \frac{x}{\sqrt{\langle x, x\rangle}} \right\rangle } = \sqrt{ \frac{\langle x, x\rangle}{\langle x, x\rangle} } = 1.$$ Similarly, one may generalize the Cauchy inequality formulated above to $$\left\langle \frac{x}{\|x\|}, \frac{y}{\|y\|} \right\rangle \leq 1 \Rightarrow \langle x, y\rangle \leq \|x\|\|y\|,$$ for $x,y\in X$ of arbitrary norm. Incidentally, this proves that $$\|x+y\|^2 =\|x\|^2 + 2\langle x, y\rangle + \|y\|^2 \leq \|x\|^2 + 2\|x\|\|y\| + \|y\|^2 = \left(\|x\|+\|y\|\right)^2,$$ so that $\|\cdot\|$ is indeed a norm.

Now let $X$ be a Hilbert space (a complete inner product space in the norm topology), and let $Y\subset X$ be a closed linear subspace of $X$. Then for $x\in X$, there is a unique $y\in Y$ such that $\|x-y\|$ is minimized. Indeed, let $(y_n)_n$ be a sequence such that $d(x, y_n) \rightarrow d(x, Y)$, $d$ being the metric associated with $\|\cdot\|$. Then $$\|y_m - y_n\|^2 = 2\|y_m-x\|^2 + 2\|y_n-x\|^2 - 4\| \tfrac{1}{2}(y_m+y_n) - x \|^2 \rightarrow 0$$ (as $m\wedge n\rightarrow\infty$), since $Y$ is convex*. Hence $y_n$ converges to some $y\in Y$ by the completeness of $X$ and the closedness of $Y$.

We call this distance-minimizing element the orthogonal projection of $x$ onto $Y$, because it is the unique $y\in Y$ such that $\langle x-y, z\rangle = 0$ for all $z\in Y$. To see this, assume $\|z\|=1$ w.l.o.g. and notice that the function $t\mapsto \|x - y - tz\|^2 = \|x-y\|^2 - 2t\langle x-y, z\rangle + t^2$ attains its minimum at $t = \langle x-y, z\rangle = 0$. Furthermore, suppose there is another $y'\in Y : \langle x-y',z\rangle=0\forall z\in Y$, then $\|x-y\|^2 = \|x-y'\|^2 + 2\rangle x-y', y'-y\langle + \|y'-y\|^2 = \|x-y'\|^2 + \|y'-y\|^2$, contradicting the minimality of $y$.

The operator $P$ projecting an element $x\in X$ onto $Y\subset X$ has the following properties:
  • $P$ is linear: $P(\alpha x_1+x_2) = \alpha Px_1+Px_2$;
  • $P$ is a contraction: $\|Px\|\leq\|x\|$.
Linearity follows directly from the defining property $\langle Px, z\langle = \rangle x, z\langle \forall z\in Y$, whereas the contractive property follows from Pythagoras: $\|Px\|^2 + \|x-Px\|^2 = \|x\|^2$.

This is all very fine in theory, but what does such a projection look like in practice? Well, suppose our subspace $Y$ is spanned by a orthonormal sequence $(x_n)_n$. (If $(x_n)_n$ is not orthonormal, we can make it so by the Gram-Schmidt process.) Then $$Px = \sum_{n=1}^\infty\langle x, x_n\rangle x_n.$$ If the sequence were finite, the proof would be a simple consequence of the orthonormality of $(x_n)_n$. For the infinite case, we need to prove convergence of the series. Well, for any finite orthonormal sequence $(x_n)_n$, $\|Px\|^2 = \sum_n\langle x, x_n\rangle^2 \leq \|x\|^2$. Letting the number of terms go to infinity, $\|Px\|$ forms an increasing series bounded from above, thereby converging to a limit. It is a basic fact of Banach spaces that absolutely convergent series are convergent (in fact, a space is Banach if and only if every absolutely convergent series converges). So convergence is established. Incidentally, we have proved Bessel's inequality:

BESSEL'S INEQUALITY. Let $(x_n)_n$ be an orthonormal sequence in a Hilbert space $X$. Then, for any $x\in X$, $$\|x\|^2 \geq \sum_{n=1}^\infty\langle x, x_n\rangle^2.$$ Now comes the clue: Cauchy's inequality is nothing but a special case of Bessel's! Just set all the $x_n$ with $n\geq2$ equal to zero, and there you go.

And now for a final homework question: Is it even necessary to prove the Cauchy inequality? Do we use it somewhere on the way to proving Bessel? Or could we have skipped the appetizer and gone straight to the main course, serving Cauchy as a sweet dessert instead? What do you think?


*Is convexity strictly necessary? Well, consider the Hilbert space of square-integrable sequences $x=(x_n)_n\in\mathbb R^\mathbb N$, and let $C$ be the closure of $\cup_{n=1}^\infty \{x : x_n\geq1+1/n\}$. Then $d(0, C) = 1$, but $d(0, x)>1$ for all $x\in C$.

Sources used:
  • Bobrowski, A. Functional analysis for probability and stochastic processes. 3.1.1-13 and 4.2.3-5.
  • Lang, S. Real and functional analysis. V.1.