Kernel Formulas¶
Mathematical definitions of all kernel functions.
Base Kernel: f¶
RBF (Radial Basis Function) kernel:
Parameters:
- \(\sigma^2\) : variance parameter
- \(\ell\) : length scale parameter
Cross-Covariance: f and g¶
Covariance between function and its integral:
where \(\text{erf}\) is the error function.
Integrated Kernel: g¶
Auto-covariance of the integral:
where \(E(y, y')\) is an auxiliary function involving the error function.
Derivative Kernel: h¶
Auto-covariance of the derivative:
Note the \(\frac{1}{\ell^2}\) scaling: derivative variance is \(O(1/\ell^2)\).
Cross-Covariance: f and h¶
Covariance between function and its derivative:
Properties:
- Anti-symmetric: \(k_{hf}(x, x') = -k_{fh}(x, x')\)
- Zero when \(x = x'\) (function uncorrelated with its own derivative)
Cross-Covariance: g and h¶
Covariance between integral and derivative:
Second Derivative Kernel: u¶
Auto-covariance of the second derivative \(u(x) = f''(x)\):
RBF (diagonal variance \(= 3\sigma^2/\ell^4\)):
Matérn 5/2 (diagonal variance \(= 25\sigma^2/\ell^4\)):
Periodic (diagonal variance \(= 16\pi^4\sigma^2(\ell^2+3)/(\ell^4 p^4)\)):
Locally-periodic (product kernel, diagonal variance \(= \sigma^2(16\pi^4\ell^4(\ell_p^2+3) + 24\pi^2\ell^2\ell_p^2 p^2 + 3\ell_p^4 p^4)/(\ell^4\ell_p^4 p^4)\)):
The locally-periodic kernel is \(k_{\mathrm{lp}} = k_{\mathrm{SE}} \cdot k_{\mathrm{per}}\); its second-derivative auto-covariance follows from the product rule applied twice.
Cross-covariances involving u¶
\(K_{gu}\) is defined only for RBF and Matérn 5/2.
Note
Matérn 3/2 is only once mean-square differentiable, so \(u = f''\) is not well-defined and is not supported.
Joint Covariance Matrix¶
For sampling \((f, g, h, u)\), the full covariance matrix is:
where:
- \(K_{ff}\), \(K_{gg}\), \(K_{hh}\), \(K_{uu}\) are auto-covariance matrices
- \(K_{fg} = K_{gf}^T\), \(K_{fh} = -K_{hf}^T\), \(K_{gh} = K_{hg}^T\) are cross-covariance matrices
- \(K_{fu} = K_{uf}^T\), \(K_{hu} = -K_{uh}^T\), \(K_{gu} = K_{ug}^T\) (when defined)
Numerical Considerations¶
Derivative Variance Scaling¶
Derivative variance scales as \(\frac{\sigma^2}{\ell^2}\) and second derivative variance as \(\frac{\sigma^2}{\ell^4}\):
- Small \(\ell\) → High derivative variance (grows faster for \(u\))
- Can cause numerical instability
- Mitigation: Use \(\ell \geq 0.2\) for \(h\) sampling; \(\ell \geq 0.3\) for \(u\) sampling
Jitter Addition¶
For numerical stability, small jitter is added to diagonal:
- Function/integral blocks: \(10^{-8}\)
- First derivative block: \(10^{-6}\) (higher due to larger variance)
- Second derivative block: higher jitter applied proportional to \(1/\ell^4\) scaling
References¶
- Solak et al. (2003) - Derivative observations in Gaussian process models of dynamic systems
- Rasmussen & Williams (2006) - Gaussian Processes for Machine Learning, Chapter 9
- GSL Manual - https://www.gnu.org/software/gsl/doc/html/
Next Steps¶
- API Reference - Implementation details
- Troubleshooting - Numerical issues