RKHS


Reproducing Kernel Hilbert Spaces (RKHS)

Overview

A Reproducing Kernel Hilbert Space (RKHS) is a Hilbert space of functions where every evaluation at a point can be expressed as an inner product. This simple yet powerful property connects geometry, functional analysis, and computation.

RKHSs form the mathematical foundation for many modern techniques, including kernel methods in machine learning, Gaussian processes, and certain aspects of signal processing and statistics.

Roughly speaking, an RKHS is a space of functions \(f : \mathcal{X} \to \mathbb{R}\) (or \(\mathbb{C}\)) such that evaluating \(f\) at a point \(x\) behaves linearly and continuously with respect to the geometry of the space.

Motivation and Intuition

Imagine you have a space of functions—say, smooth curves or square-integrable functions. In many practical problems, we want to measure similarity between such functions, and sometimes also between points in the input space \(\mathcal{X}\).

An RKHS provides a unified framework for doing both:

  • It is a Hilbert space of functions with a defined inner product \(\langle \cdot, \cdot \rangle_{\mathcal{H}}\).

  • It has a kernel function \(k(x, y)\) that measures similarity between points in \(\mathcal{X}\).

  • The kernel acts as a bridge between data points and functions, ensuring that for every \(x\), there exists a special function \(k(\cdot, x)\) representing evaluation at \(x\).

Formal Definition

Let \(\mathcal{X}\) be a nonempty set. A Hilbert space \(\mathcal{H}\) of functions \(f : \mathcal{X} \to \mathbb{R}\) is called a Reproducing Kernel Hilbert Space if there exists a function \(\) k : mathcal{X} times mathcal{X} to mathbb{R} \(\) such that:

1. For each fixed \(x \in \mathcal{X}\), the function \(k(\cdot, x)\) belongs to \(\mathcal{H}\). 2. (Reproducing property) For every \(f \in \mathcal{H}\) and \(x \in \mathcal{X}\), \(\) f(x) = langle f, k(cdot, x) rangle_{mathcal{H}}. \(\)

The function \(k\) is called the reproducing kernel of \(\mathcal{H}\).

Key Properties

  • Symmetry: \(k(x, y) = k(y, x)\) for all \(x, y \in \mathcal{X}\).

  • Positive definiteness: For any finite set \(\{x_1, \dots, x_n\} \subset \mathcal{X}\) and any real numbers \(c_1, \dots, c_n\), \(\) sum{i,j=1}^{n} ci cj k(xi, x_j) ge 0. \(\)

  • Uniqueness: Every RKHS has a unique reproducing kernel.

  • Existence: Conversely, every positive-definite kernel corresponds to a unique RKHS.

Examples of Kernels

1. Linear kernel: \(k(x, y) = \langle x, y \rangle\) The corresponding RKHS is the space of linear functions.

2. Polynomial kernel: \(k(x, y) = (\langle x, y \rangle + c)^d\) The RKHS consists of all polynomials up to degree \(d\).

3. Gaussian (RBF) kernel: \(\) k(x, y) = exp!left(-frac{|x - y|^2}{2sigma^2}right) \(\) This gives rise to a very smooth, infinite-dimensional RKHS used widely in machine learning and Gaussian processes.

4. Dirac kernel: \(k(x, y) = \delta_{xy}\) The RKHS consists of all functions on \(\mathcal{X}\) with finite support.

5. Exponential and Laplacian kernels: Examples include \(k(x, y) = \exp(-\|x - y\| / \sigma)\), producing less smooth function spaces.

Constructing an RKHS from a Kernel

Given a positive-definite kernel \(k\), we can construct the corresponding RKHS \(\mathcal{H}_k\) as follows:

1. Consider the set of all finite linear combinations \(\) mathcal{H}0 = left{ sum{i=1}^{n} alphai k(cdot, xi) : alphai in mathbb{R}, xi in mathcal{X}, n in mathbb{N} right}. \(\) 2. Define the inner product on \(\mathcal{H}_0\) by \(\) leftlangle sumi alphai k(cdot, xi), sumj betaj k(cdot, yj) rightrangle = sum{i,j} alphai betaj k(xi, y_j). \(\) 3. Complete this space under the induced norm to obtain the Hilbert space \(\mathcal{H}_k\).

This procedure ensures the reproducing property holds automatically.

The Reproducing Property in Depth

The “reproduction” in RKHS refers to how evaluation at a point can be realized geometrically as an inner product: \(\) f(x) = langle f, k(cdot, x) rangle_{mathcal{H}}. \(\) Intuitively, \(k(\cdot, x)\) acts like a probe function that extracts the value of \(f\) at \(x\).

This property guarantees that function evaluation is a continuous linear functional, meaning the act of “plugging in” \(x\) does not destroy smoothness or boundedness within the space.

Geometric Viewpoint

Each function \(f\) in the RKHS can be expressed as a limit of finite kernel expansions: \(\) f = sum{i=1}^{infty} alphai k(cdot, x_i), \(\) analogous to how vectors can be expressed in a basis.

The kernel defines an implicit feature map: \(\) Phi: mathcal{X} to mathcal{H}, quad Phi(x) = k(cdot, x), \(\) so that \(\) k(x, y) = langle Phi(x), Phi(y) rangle_{mathcal{H}}. \(\)

This interpretation explains the kernel trick: we can compute inner products in a high- or infinite-dimensional feature space without ever constructing \(\Phi\) explicitly.

Connection to Machine Learning

In kernel methods (e.g., Support Vector Machines, Kernel Ridge Regression, Gaussian Processes), one typically never works directly with functions in \(\mathcal{H}\), but instead uses the kernel matrix \(\) K{ij} = k(xi, x_j), \(\) which represents pairwise similarities between data points.

The representer theorem states that in many optimization problems involving an RKHS norm, the optimal solution \(f^*\) lies in the span of the kernel functions centered at the data: \(\) f^*(x) = sum{i=1}^{n} alphai k(x, x_i). \(\)

This result is fundamental: it transforms potentially infinite-dimensional problems into finite-dimensional linear algebra.

Gaussian Processes and RKHS

Gaussian Processes (GPs) use kernels to define distributions over functions. Given a kernel \(k\), a GP assumes that any finite set of function values has a joint Gaussian distribution with covariance matrix determined by \(k\).

The RKHS associated with the kernel \(k\) captures the “mean-square smoothness” of functions sampled from the GP prior. In fact, the RKHS norm provides a measure of how “complex” a function is relative to the GP — functions with small RKHS norm are more likely under the prior.

Examples and Exercises

1. Show that the function \(k(x, y) = 1 + xy\) defines an RKHS on \(\mathbb{R}\) consisting of all affine functions \(f(x) = a + bx\). 2. Verify that the Gaussian kernel is positive definite. 3. Compute the RKHS norm for a linear function in the linear kernel space. 4. Using the representer theorem, derive the form of the solution in kernel ridge regression.

Indefinite Kernels and Reproducing Kernel Kreĭn Spaces (RKKS)

Not all similarity functions are positive definite. In some applications (e.g., certain signal processing, graph kernels, or indefinite similarity measures), one encounters indefinite kernels — functions for which \(\) sum{i,j} ci cj k(xi, x_j) \(\) can be negative for some coefficients \(c_i\).

In such cases, the appropriate generalization of an RKHS is a Reproducing Kernel Kreĭn Space (RKKS), where the underlying space is a Kreĭn space — a vector space with an inner product that is not necessarily positive definite.

An RKKS can often be represented as a difference of two RKHSs: \(\) mathcal{K} = mathcal{H}+ ominus mathcal{H}-, \(\) where \(k = k_+ - k_-\) and both \(k_+\), \(k_-\) are positive-definite kernels.

While RKKSs retain a form of the reproducing property, many geometric intuitions (like distances) must be reinterpreted carefully. Nonetheless, RKKS theory provides valuable tools for working with indefinite similarities and extensions of kernel methods.

Summary

| Concept | Description | | —  —  — -| —  —  —  — –| | RKHS | Hilbert space of functions with reproducing kernel property | | Kernel \(k(x,y)\) | Symmetric, positive-definite similarity function | | Evaluation | \(f(x) = \langle f, k(\cdot, x) \rangle\) | | Construction | Linear combinations of kernels + completion | | Machine learning link | Kernel trick and representer theorem | | Indefinite generalization | Reproducing Kernel Kreĭn Space (RKKS) |

Further Reading

  • Aronszajn, N. (1950). Theory of Reproducing Kernels. Transactions of the American Mathematical Society, 68(3), 337–404.

  • Berlinet, A. & Thomas-Agnan, C. (2004). Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer.

  • Steinwart, I. & Christmann, A. (2008). Support Vector Machines. Springer.

  • Schölkopf, B. & Smola, A. (2002). Learning with Kernels. MIT Press.

Closing Remarks

The RKHS framework elegantly merges geometry, analysis, and computation. Its language enables us to describe smoothness, regularization, and similarity in a unified way. Understanding RKHS deeply provides not only theoretical insight but also practical power across modern applied mathematics, data science, and machine learning.