RKHS


Reproducing Kernel Hilbert Space

Hilbert spaces are generalizations of the usual finite-dimensional Euclidean spaces. Also, they have certain favorable convergence properties, yielding (unique) linear projections of their elements onto closed linear subspaces, or, more generally, unique nonlinear projections onto closed convex sets.

Definition

A normed space \((V, \| \cdot \|_{V} )\) is complete if any Cauchy sequence \(\{ v_n \}\) of its elements is convergent. If the norm \(\| \cdot \|_{V}\) is induced by an inner product and if it is complete, then we say that V is a Hilbert space.

A reproducing kernel Hilbert space (RKHS) is a family of functions on some set \(\mathcal{X}\) that forms a Hilbert space, with an associated kernel.

To start with, let us define what we mean by a kernel. We will stick to Euclidean feature spaces \(\mathcal{X}\), although everything works out if \(\mathcal{X}\) is an arbitrary separable metric space.

Definition

Let \(\mathcal{X}\) be a closed subset of \(\mathbb{R}^d\). A real-valued function \(K: \mathcal{X} \times \mathcal{X} \rightarrow \mathbb{R}\) is called a Mercer kernel provided the following conditions are met:

  1. It is symmetric, i.e., \(K(x; x^{\prime}) = K(x^{\prime}; x)\) for any \(x, x^{\prime} \in \mathcal{X}\).

  2. It is continuous, i.e., if \(\{ x_n \}\) is a sequence of points in \(\mathcal{X}\) converging to a point \(x\), then

\[ \lim_{n \rightarrow \infty} K(x_n, x^{\prime}) = K(x,x^{\prime}), \ \forall x^{\prime} \in \mathcal{X}. \]

  1. It is positive semidefinite, i.e., for all \(\alpha_1, \ldots, \alpha_{n} \in \mathbb{R}\) and all \(x_1, \ldots, x_n \in \mathcal{X}\),

\[ \sum_{i,j \in [n]} \alpha_i \alpha_j K(x_i, x_j) \geq 0. \]

Suppose we have a fixed kernel \(K\) on our feature space \(\mathcal{X}\) (which we assume to be a closed subset of \(\mathbb{R}^d\)). Let \(\mathcal{L}_K( \mathcal{X})\) be the linear span of the set \(\{ K(x^{\prime}, \cdot): x^{\prime} \in \mathcal{X} \}\), i.e., the set of all functions \(f: \mathcal{X} \rightarrow \mathbb{R}\) of the form

\[ f(x) = \sum_{j=1}^{N} c_j K(x_j, x) \]

for all possible choices of \(N \in \mathbb{N}\), \(c_1, \ldots, c_N \in \mathbb{R}\), and \(x_1, \ldots, x_N \in \mathcal{X}\).

Fact

\(\mathcal{L}_K( \mathcal{X})\) is a vector space.

For any Mercer kernel \(K\), we can complete \(\mathcal{L}_K( \mathcal{X})\) into a Hilbert space of functions that can potentially represent any continuous function from \(\mathcal{X}\) into \(\mathbb{R}\), provided \(K\) is chosen appropriately.

Theorem

Let \(\mathcal{X}\) be a closed subset of \(\mathbb{R}^d\), and let \(K: \mathcal{X} \times \mathcal{X} \rightarrow \mathbb{R}\) be a Mercer kernel. Then there exists a unique Hilbert space \((\mathcal{H}_K, \langle \cdot, \cdot \rangle_{K}\)) of real-valued functions on \(\mathcal{X}\) with the following properties:

  1. For all \(x \in \mathcal{X}\), the function \(K_x(\cdot) := K(x, \cdot)\) is an element of \(\mathcal{H}_K\), and \(\langle K_x, K_{x^{\prime}} \rangle_{K} = K(x,x^{\prime})\) for all \(x, x^{\prime} \in \mathcal{X}\).

  2. The linear space \(\mathcal{L}_K(\mathcal{X})\) is dense in \(\mathcal{H}_K\), i.e., for any \(f \in \mathcal{H}_K\) and any \(\epsilon > 0\) there exist some \(N \in \mathbb{N}\), \(c_1, \ldots, c_N \in \mathbb{R}\), and \(x_1, \ldots, x_N \in \mathcal{X}\), such that

\[ \| f - \sum_{j \in [N]}c_j K_{x_j} \|_{K} < \epsilon. \]

  1. For all \(f \in \mathcal{H}_K\) and all \(x \in \mathcal{X}\),

\[ f(x) = \langle K_x , f \rangle_{K}. \]

Moreover, the functions in \(\mathcal{H}_K\) are continuous. The Hilbert space \(\mathcal{H}_K\) is called the Reproducing Kernel Hilbert Space (RKHS) associated with \(K\).

Proposition

Suppose \(K(x, y)\) is a Mercer kernel on \(\mathcal{X} \times \mathcal{X}\), where \(\mathcal{X}\) is a closed subset of \(\mathbb{R}^d\) (or more generally, \(\mathcal{X}\) could be any complete separable metric space). Then there is a sequence of continuous functions \((\psi_i)\) on \(\mathcal{X}\) such that

\[ K(x,y) = \sum_{i=1}^{\infty} \psi_i(x) \psi_i(y) \]

and

\[ c \in \ell^2 \ \mbox{and} \ \sum_{i=1}^{\infty}c_i \psi_i = 0 \Rightarrow c=0 \]

and \(\psi_1, \psi_2,\ldots\) forms a complete orthonormal basis for the RKHS \(\mathcal{H}_K\).

References

  1. Bruce Hajek and Maxim Raginsky. Lecture notes for ECE 543, Statistical Learning Theory.