Kernel Formulas¶

Mathematical definitions of all kernel functions.

Base Kernel: f¶

RBF (Radial Basis Function) kernel:

\[k_f(x, x') = \sigma^2 \exp\left(-\frac{(x-x')^2}{2\ell^2}\right)\]

Parameters:

\(\sigma^2\) : variance parameter
\(\ell\) : length scale parameter

Cross-Covariance: f and g¶

Covariance between function and its integral:

\[k_{fg}(x, y) = \sigma^2 \ell \sqrt{\frac{\pi}{2}} \left[\text{erf}\left(\frac{y}{\sqrt{2}\ell}\right) - \text{erf}\left(\frac{y-x}{\sqrt{2}\ell}\right)\right]\]

where \(\text{erf}\) is the error function.

Integrated Kernel: g¶

Auto-covariance of the integral:

\[k_{gg}(y, y') = \sigma^2 \ell \sqrt{\frac{\pi}{2}} \left[y \cdot \text{erf}\left(\frac{y'}{\sqrt{2}\ell}\right) + y' \cdot \text{erf}\left(\frac{y}{\sqrt{2}\ell}\right) - \ell\sqrt{2} E(y, y')\right]\]

where \(E(y, y')\) is an auxiliary function involving the error function.

Derivative Kernel: h¶

Auto-covariance of the derivative:

\[k_{hh}(x, x') = \frac{\sigma^2}{\ell^2} \left(1 - \frac{(x-x')^2}{\ell^2}\right) \exp\left(-\frac{(x-x')^2}{2\ell^2}\right)\]

Note the \(\frac{1}{\ell^2}\) scaling: derivative variance is \(O(1/\ell^2)\).

Cross-Covariance: f and h¶

Covariance between function and its derivative:

\[k_{fh}(x, x') = \frac{\sigma^2}{\ell^2} (x' - x) \exp\left(-\frac{(x-x')^2}{2\ell^2}\right)\]

Properties:

Anti-symmetric: \(k_{hf}(x, x') = -k_{fh}(x, x')\)
Zero when \(x = x'\) (function uncorrelated with its own derivative)

Cross-Covariance: g and h¶

Covariance between integral and derivative:

\[k_{gh}(y, x) = \sigma^2 \left[\exp\left(-\frac{x^2}{2\ell^2}\right) - \exp\left(-\frac{(y-x)^2}{2\ell^2}\right)\right]\]

Second Derivative Kernel: u¶

Auto-covariance of the second derivative \(u(x) = f''(x)\):

\[K_{uu}(x, x') = \frac{\partial^4 k}{\partial x^2 \partial x'^2}\]

RBF (diagonal variance \(= 3\sigma^2/\ell^4\)):

\[K_{uu}(x, x') = \frac{\sigma^2}{\ell^4}\left(3 - \frac{6t^2}{\ell^2} + \frac{t^4}{\ell^4}\right) \exp\!\left(-\frac{t^2}{2\ell^2}\right), \quad t = x - x'\]

Matérn 5/2 (diagonal variance \(= 25\sigma^2/\ell^4\)):

\[K_{uu}^{\mathrm{M52}}(x,x') = \frac{\partial^4 k_{\mathrm{M52}}}{\partial x^2 \partial x'^2}\]

Periodic (diagonal variance \(= 16\pi^4\sigma^2(\ell^2+3)/(\ell^4 p^4)\)):

\[K_{uu}^{\mathrm{per}}(x,x') = \frac{\partial^4 k_{\mathrm{per}}}{\partial x^2 \partial x'^2}, \quad k_{\mathrm{per}}(t) = \sigma^2 \exp\!\left(-\frac{2\sin^2(\pi t/p)}{\ell^2}\right)\]

Locally-periodic (product kernel, diagonal variance \(= \sigma^2(16\pi^4\ell^4(\ell_p^2+3) + 24\pi^2\ell^2\ell_p^2 p^2 + 3\ell_p^4 p^4)/(\ell^4\ell_p^4 p^4)\)):

The locally-periodic kernel is \(k_{\mathrm{lp}} = k_{\mathrm{SE}} \cdot k_{\mathrm{per}}\); its second-derivative auto-covariance follows from the product rule applied twice.

Cross-covariances involving u¶

\[K_{fu}(x, x') = \frac{\partial^2 k}{\partial x'^2}\]

\[K_{hu}(x, x') = \frac{\partial^3 k}{\partial x \, \partial x'^2} \quad \text{(antisymmetric)}\]

\(K_{gu}\) is defined only for RBF and Matérn 5/2.

Note

Matérn 3/2 is only once mean-square differentiable, so \(u = f''\) is not well-defined and is not supported.

Joint Covariance Matrix¶

For sampling \((f, g, h, u)\), the full covariance matrix is:

\[ \begin{bmatrix} K_{ff} & K_{fg} & K_{fh} & K_{fu} \\ K_{gf} & K_{gg} & K_{gh} & K_{gu} \\ K_{hf} & K_{hg} & K_{hh} & K_{hu} \\ K_{uf} & K_{ug} & K_{uh} & K_{uu} \end{bmatrix} \]

where:

\(K_{ff}\), \(K_{gg}\), \(K_{hh}\), \(K_{uu}\) are auto-covariance matrices
\(K_{fg} = K_{gf}^T\), \(K_{fh} = -K_{hf}^T\), \(K_{gh} = K_{hg}^T\) are cross-covariance matrices
\(K_{fu} = K_{uf}^T\), \(K_{hu} = -K_{uh}^T\), \(K_{gu} = K_{ug}^T\) (when defined)

Numerical Considerations¶

Derivative Variance Scaling¶

Derivative variance scales as \(\frac{\sigma^2}{\ell^2}\) and second derivative variance as \(\frac{\sigma^2}{\ell^4}\):

Small \(\ell\) → High derivative variance (grows faster for \(u\))
Can cause numerical instability
Mitigation: Use \(\ell \geq 0.2\) for \(h\) sampling; \(\ell \geq 0.3\) for \(u\) sampling

Jitter Addition¶

For numerical stability, small jitter is added to diagonal:

Function/integral blocks: \(10^{-8}\)
First derivative block: \(10^{-6}\) (higher due to larger variance)
Second derivative block: higher jitter applied proportional to \(1/\ell^4\) scaling

References¶

Solak et al. (2003) - Derivative observations in Gaussian process models of dynamic systems
Rasmussen & Williams (2006) - Gaussian Processes for Machine Learning, Chapter 9
GSL Manual - https://www.gnu.org/software/gsl/doc/html/

Next Steps¶

API Reference - Implementation details
Troubleshooting - Numerical issues