Gaussian Process module
gp
Gaussian Process utilities for primordial feature analysis.
This module provides core GP functionality for modeling smooth variations in the primordial power spectrum and testing for localized features.
Key capabilities:
- Covariance matrix computation with proper kernel design
- Log-marginal likelihood landscape visualization
- Hyperparameter optimization and validation
- Bin resolution constraints for finite data
Physical interpretation:
- Length scale \(\ell\): Characteristic scale of smooth variations in log(k) space
- Signal variance \(\sigma^2\): Amplitude of deviations from power-law
- Noise variance \(\sigma_n^2\): Uncorrelated measurement/cosmic variance uncertainty
build_kernel(config)
Build sklearn kernel from configuration.
This is the single source of truth for kernel construction, ensuring consistency across the codebase.
| PARAMETER | DESCRIPTION |
|---|---|
config
|
Kernel configuration specifying kernel type, \(\sigma\), \(\ell\), and parameters.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Kernel
|
sklearn Kernel object (without noise component). |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If kernel type is unknown or params are invalid. |
Examples:
>>> config = KernelConfig(KernelType.RBF, sigma=0.1, length_scale=0.5)
>>> kernel = build_kernel(config)
>>> K = kernel(log_k) # Evaluate covariance matrix
Source code in src/primefeat/gp.py
build_noise_covariance(n, noise_level=None, noise_cov=None)
Build noise covariance matrix.
Supports two modes:
- Diagonal noise (simple): \(\sigma_n^2 I\)
- Full posterior covariance (recommended for MCMC): \(\Sigma_{\mathrm{post}}\)
| PARAMETER | DESCRIPTION |
|---|---|
n
|
Number of data points.
TYPE:
|
noise_level
|
Diagonal noise standard deviation \(\sigma_n\) (used if noise_cov=None).
TYPE:
|
noise_cov
|
Full N×N posterior covariance matrix \(\Sigma_{\mathrm{post}}\).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray
|
Noise covariance matrix of shape (n, n). |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If neither noise_level nor noise_cov provided, or shape mismatch. |
Source code in src/primefeat/gp.py
compute_kernel_matrix(log_k, kernel_config)
Compute kernel covariance matrix (signal only, no noise).
Useful for visualizing kernel structure and comparing kernels.
| PARAMETER | DESCRIPTION |
|---|---|
log_k
|
\(\log(k)\) values, shape (n,) or (n, 1).
TYPE:
|
kernel_config
|
Kernel configuration specifying type and hyperparameters.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray
|
Signal covariance matrix \(K_{\mathrm{signal}}\) of shape (n, n). |
Source code in src/primefeat/gp.py
compare_kernels(log_k, configs)
Compute kernel matrices for multiple configurations.
Useful for comparing how different kernels represent correlation structure.
| PARAMETER | DESCRIPTION |
|---|---|
log_k
|
\(\log(k)\) values, shape (n,) or (n, 1).
TYPE:
|
configs
|
Dictionary mapping names to KernelConfig objects.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, ndarray]
|
Dictionary mapping names to kernel matrices. |
Examples:
>>> log_k = np.linspace(-7, -1.5, 20)
>>> configs = {
... 'RBF': KernelConfig(KernelType.RBF, 0.1, 0.5),
... 'RQ_low_alpha': KernelConfig(KernelType.RATIONAL_QUADRATIC, 0.1, 0.5, {'alpha': 0.5}),
... 'RQ_high_alpha': KernelConfig(KernelType.RATIONAL_QUADRATIC, 0.1, 0.5, {'alpha': 10.0}),
... }
>>> matrices = compare_kernels(log_k, configs)
Source code in src/primefeat/gp.py
compute_bin_resolution(nbins, k_start, k_end)
Compute resolution limits imposed by finite binning.
With finite bins over a finite \(k\)-range, we cannot resolve arbitrarily small correlation lengths. This function computes the minimum resolvable length scale and warns if requested parameters are below this limit.
| PARAMETER | DESCRIPTION |
|---|---|
nbins
|
Number of bins.
TYPE:
|
k_start
|
Start of \(k\)-range in \(\mathrm{Mpc}^{-1}\).
TYPE:
|
k_end
|
End of \(k\)-range in \(\mathrm{Mpc}^{-1}\).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Any]
|
Dictionary with: - delta_log_k: Bin spacing in \(\log(k)\) space - log_k_range: Total range in \(\log(k)\) space - min_resolvable_length: Minimum length scale we can constrain (~2 bins) - max_sensible_length: Maximum useful length scale (~half range) |
Examples:
>>> res = compute_bin_resolution(20, 0.001, 0.23)
>>> print(f"Min length scale: {res['min_resolvable_length']:.3f}")
>>> print(f"Max length scale: {res['max_sensible_length']:.3f}")
Source code in src/primefeat/gp.py
validate_hyperparameters(sigma, length_scale, nbins, k_start, k_end, warn=True)
Validate GP hyperparameters against bin resolution limits.
| PARAMETER | DESCRIPTION |
|---|---|
sigma
|
Signal standard deviation \(\sigma\).
TYPE:
|
length_scale
|
RBF kernel length scale \(\ell\) in \(\log(k)\) space.
TYPE:
|
nbins
|
Number of bins.
TYPE:
|
k_start
|
Start of \(k\)-range in \(\mathrm{Mpc}^{-1}\).
TYPE:
|
k_end
|
End of \(k\)-range in \(\mathrm{Mpc}^{-1}\).
TYPE:
|
warn
|
Whether to print warnings.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if parameters are within reasonable bounds, False otherwise. |
Source code in src/primefeat/gp.py
build_gp_covariance(log_k, length_scale=None, sigma=None, noise_level=0.01, noise_cov=None, return_cholesky=False, kernel_config=None, backend=None)
Build GP covariance matrix for given kernel configuration.
Supports multiple kernel types through KernelConfig:
- RBF (Squared Exponential): default, infinitely smooth
- Rational Quadratic: multi-scale structure
- Periodic: exactly repeating patterns
- Locally Periodic: periodic with varying amplitude
| PARAMETER | DESCRIPTION |
|---|---|
log_k
|
\(\log(k)\) values, shape (n, 1) or (n,).
TYPE:
|
noise_level
|
Diagonal noise standard deviation \(\sigma_n\) (used if noise_cov=None).
TYPE:
|
noise_cov
|
Full N×N posterior covariance matrix \(\Sigma_{\mathrm{post}}\) (RECOMMENDED for MCMC bins). If provided, this accounts for bin-bin correlations. Extract from MCMC: np.cov(delta_samples.T)
TYPE:
|
return_cholesky
|
If True, return Cholesky factor instead of K.
TYPE:
|
kernel_config
|
KernelConfig object specifying kernel type and parameters (NEW).
TYPE:
|
length_scale
|
(Backward compatibility) RBF kernel length scale \(\ell\) in \(\log(k)\) space.
TYPE:
|
sigma
|
(Backward compatibility) Signal standard deviation \(\sigma\) (GP amplitude).
TYPE:
|
backend
|
Backend to use ('numpy', 'jax', or None for default). If None, uses current implementation (numpy).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray
|
Covariance matrix K of shape (n, n), or lower Cholesky factor L if return_cholesky=True. |
Examples:
Rational Quadratic kernel:
>>> config = KernelConfig(
... KernelType.RATIONAL_QUADRATIC,
... sigma=0.1,
... length_scale=0.5,
... params={'alpha': 2.0}
... )
>>> K = build_gp_covariance(log_k, kernel_config=config, noise_level=0.01)
Periodic kernel:
>>> config = KernelConfig(
... KernelType.PERIODIC,
... sigma=0.1,
... length_scale=0.3,
... params={'period': 1.5}
... )
>>> K = build_gp_covariance(log_k, kernel_config=config, noise_level=0.01)
Backward compatible RBF (old API):
Full posterior covariance (MCMC bins):
>>> delta_samples = np.array([chain[f'delta_{i}'] for i in range(1, 21)]).T
>>> posterior_cov = np.cov(delta_samples.T)
>>> K = build_gp_covariance(log_k, kernel_config=config, noise_cov=posterior_cov)
Use JAX backend:
Source code in src/primefeat/gp.py
377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 | |
compute_log_marginal_likelihood(delta_values, log_k, length_scale=None, sigma=None, noise_level=0.01, noise_cov=None, kernel_config=None, backend=None)
Compute log-marginal likelihood for given GP hyperparameters.
where \(K = K_{\mathrm{signal}}(\theta) + \Sigma_{\mathrm{noise}}\)
This is the probability of observing the data \(\delta\) under the GP model with hyperparameters \(\theta\). Higher values indicate better fit.
Components:
- Data fit term: \(-\frac{1}{2} \delta^T K^{-1} \delta\) (reward fitting the data)
- Complexity penalty: \(-\frac{1}{2} \log|K|\) (penalize overly flexible models)
- Normalization: \(-\frac{n}{2} \log(2\pi)\)
| PARAMETER | DESCRIPTION |
|---|---|
delta_values
|
Observed \(\delta(k)\) values, shape (n,).
TYPE:
|
log_k
|
\(\log(k)\) values, shape (n, 1) or (n,).
TYPE:
|
noise_level
|
Diagonal noise standard deviation \(\sigma_n\) (used if noise_cov=None).
TYPE:
|
noise_cov
|
Full N×N posterior covariance matrix \(\Sigma_{\mathrm{post}}\) (RECOMMENDED for MCMC bins).
TYPE:
|
kernel_config
|
KernelConfig object specifying kernel (NEW - preferred).
TYPE:
|
length_scale
|
(Backward compatibility) RBF kernel length scale \(\ell\).
TYPE:
|
sigma
|
(Backward compatibility) Signal standard deviation \(\sigma\).
TYPE:
|
backend
|
Backend to use ('numpy', 'jax', or None for default).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
float
|
Log-marginal likelihood value. |
Examples:
With kernel configuration (recommended):
>>> config = KernelConfig(
... KernelType.PERIODIC,
... sigma=0.1,
... length_scale=0.3,
... params={'period': 1.5}
... )
>>> lml = compute_log_marginal_likelihood(delta, log_k, kernel_config=config)
Backward compatible RBF (old API):
With JAX backend:
Source code in src/primefeat/gp.py
515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 | |
estimate_sigma_range_from_data(delta_values, noise_level, lower_bound_factor=0.1, upper_bound_factor=3.0, min_sigma=0.0001)
Estimate appropriate sigma range from empirical data characteristics.
Strategy:
- Lower bound: \(\max(\mathrm{noise\_level} \times \mathrm{lower\_bound\_factor}, \mathrm{min\_sigma})\)
- Should be above noise floor to ensure signal is detectable
-
Allows exploring "weak signal" regime
-
Upper bound: \(\mathrm{empirical\_std} \times \mathrm{upper\_bound\_factor}\)
- Covers "strong signal" regime where deviations are large
- Factor of 3 ensures we explore well beyond typical variations
This ensures the search space adapts to data scale while maintaining physical interpretability (\(\sigma \ll \mathrm{noise}\) is undetectable, \(\sigma \gg \sigma_\mathrm{data}\) implies implausibly large deviations).
| PARAMETER | DESCRIPTION |
|---|---|
delta_values
|
Observed \(\delta(k)\) values, shape (n,).
TYPE:
|
noise_level
|
Fixed noise standard deviation \(\sigma_n\).
TYPE:
|
lower_bound_factor
|
Multiplier for noise level to set lower bound.
TYPE:
|
upper_bound_factor
|
Multiplier for empirical std to set upper bound.
TYPE:
|
min_sigma
|
Absolute minimum \(\sigma\) to consider (prevents degeneracy).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tuple[float, float]
|
Tuple of (sigma_min, sigma_max) representing the recommended \(\sigma\) range. |
Examples:
>>> delta = np.random.randn(20) * 0.05 + 0.02 # Small signal
>>> sigma_range = estimate_sigma_range_from_data(delta, noise_level=0.01)
>>> print(f"Auto sigma range: [{sigma_range[0]:.4f}, {sigma_range[1]:.4f}]")
Source code in src/primefeat/gp.py
estimate_noise_level_from_chain(chain, nbins=20, param_pattern='delta_{i}')
Estimate noise level directly from MCMC chain posterior uncertainties.
This is the PREFERRED method when you have access to the full MCMC chain, as it uses the actual posterior standard deviations rather than empirical variance of the posterior mean.
| PARAMETER | DESCRIPTION |
|---|---|
chain
|
MCMC chain object (dict-like with \(\delta_i\) parameters).
TYPE:
|
nbins
|
Number of bins.
TYPE:
|
param_pattern
|
Parameter name pattern (use {i} for bin index).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
float
|
Mean posterior standard deviation \(\sigma_n\) across bins. |
Examples:
>>> # With full chain
>>> noise = estimate_noise_level_from_chain(chain, nbins=20)
>>> landscape = compute_lml_landscape(delta_mean, log_k, noise_level=noise, ...)
Source code in src/primefeat/gp.py
estimate_noise_level_from_data(delta_values, fraction_of_std=0.5, min_noise=0.001)
Estimate appropriate noise level from data characteristics.
For a single realization (posterior mean), we estimate the noise level as a fraction of the empirical standard deviation. This represents the typical uncertainty/scatter in the data.
This is a FALLBACK method. If you have access to the full MCMC chain, prefer using estimate_noise_level_from_chain() instead.
| PARAMETER | DESCRIPTION |
|---|---|
delta_values
|
Observed \(\delta(k)\) values, shape (n,).
TYPE:
|
fraction_of_std
|
What fraction of std to use (default: 0.5 = half the variation).
TYPE:
|
min_noise
|
Minimum noise level to avoid numerical issues.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
float
|
Estimated noise standard deviation \(\sigma_n\). |
Examples:
>>> delta = np.random.randn(20) * 0.05
>>> noise = estimate_noise_level_from_data(delta)
>>> print(f"Estimated noise: {noise:.4f}")
Note
If you have access to the full MCMC chain, prefer using:
>>> from primefeat.gp import estimate_noise_level_from_chain
>>> noise_level = estimate_noise_level_from_chain(chain, nbins=20)
This gives a more accurate estimate of bin-wise uncertainties.
Source code in src/primefeat/gp.py
estimate_length_scale_from_autocorrelation(delta_values, log_k, min_correlation=0.1, fallback_quantile=0.3)
Estimate correlation length scale from empirical autocorrelation structure.
This function is simpler than significance.estimate_correlation_length() because we're working with a single realization (delta_values) rather than posterior samples.
Strategy:
- Compute pairwise distances in \(\log(k)\) space
- Compute \(\delta\)-\(\delta\) correlations via \((\delta_i - \mathrm{mean})(\delta_j - \mathrm{mean}) / \mathrm{var}\)
- Fit exponential decay: \(\mathrm{corr}(d) \sim \exp(-d / \ell)\)
- Return characteristic length scale \(\ell\)
If insufficient correlation structure is present, falls back to a quantile of the log_k range (e.g., 30% of range).
| PARAMETER | DESCRIPTION |
|---|---|
delta_values
|
Observed \(\delta(k)\) values, shape (n,).
TYPE:
|
log_k
|
\(\log(k)\) values, shape (n, 1) or (n,).
TYPE:
|
min_correlation
|
Minimum correlation threshold for fitting.
TYPE:
|
fallback_quantile
|
Fraction of log_k range to use if fit fails.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
float
|
Estimated correlation length \(\ell\) in \(\log(k)\) units. |
Examples:
>>> log_k = np.linspace(-7, -1.5, 20)
>>> delta = np.sin(log_k) * 0.1 + np.random.randn(20) * 0.02
>>> ell = estimate_length_scale_from_autocorrelation(delta, log_k)
>>> print(f"Estimated length scale: {ell:.3f}")
Source code in src/primefeat/gp.py
794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 | |
validate_hyperparameter_ranges(sigma_range, length_scale_range, noise_level, empirical_std, resolution_info=None, warn=True)
Validate user-provided hyperparameter ranges for reasonableness.
This function warns users if their specified ranges are likely to produce poor results due to:
- \(\sigma\) too close to noise level (undetectable signal)
- \(\sigma \gg\) empirical variation (implausibly large deviations)
- Length scale below resolution limit (aliasing)
- Length scale \(>\) half the domain (poorly constrained)
| PARAMETER | DESCRIPTION |
|---|---|
sigma_range
|
(min, max) for signal standard deviation \(\sigma\).
TYPE:
|
length_scale_range
|
(min, max) for length scale \(\ell\).
TYPE:
|
noise_level
|
Fixed noise standard deviation \(\sigma_n\).
TYPE:
|
empirical_std
|
Standard deviation of \(\delta(k)\) values.
TYPE:
|
resolution_info
|
Output from compute_bin_resolution() (optional).
TYPE:
|
warn
|
Whether to print warnings.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if ranges pass all checks, False otherwise. |
Source code in src/primefeat/gp.py
888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 | |
compute_lml_landscape(delta_values, log_k, sigma_range=None, length_scale_range=None, n_sigma=50, n_length=50, noise_level=None, posterior_cov=None, nbins=None, k_start=None, k_end=None, auto_sigma_factor=3.0, auto_length_fallback=0.3, validate_ranges=True, kernel_type=KernelType.RBF, kernel_params=None)
Compute log-marginal likelihood landscape in \((\sigma, \ell)\) hyperparameter space.
This visualizes how well different GP models explain the data, allowing us to:
- Test whether signal variance \(\sigma\) is significantly non-zero (evidence for features)
- Infer characteristic length scale \(\ell\) of features (sharp vs smooth)
- Assess parameter uncertainty and degeneracies (ridge structures)
- Compute Bayes factors for model comparison (signal vs noise)
Supports multiple kernel types: - RBF (default): Infinitely smooth, single length scale - RATIONAL_QUADRATIC: Multi-scale structure (set kernel_params={'alpha': value}) - PERIODIC: Exactly repeating patterns (set kernel_params={'period': value}) - LOCALLY_PERIODIC: Periodic with decay (set kernel_params={'period': p, 'length_scale_rbf': l})
Intelligent automatic hyperparameter range selection! - If sigma_range=None, estimates appropriate bounds from empirical data variance - If length_scale_range=None, combines bin resolution + empirical autocorrelation - Validates user-provided ranges and warns about unreasonable choices - Transparently reports what ranges were selected and why
Mathematical Framework:
The log-marginal likelihood is:
where \(K(\theta) = K_{\mathrm{signal}}(\sigma, \ell, \mathrm{kernel\_params}) + \Sigma_{\mathrm{noise}}\)
Interpretation of Landscape Features:
- Peak at \(\sigma \approx 0\): Data consistent with noise (null hypothesis)
- Peak at \(\sigma > 0\): Evidence for signal beyond noise
- Small \(\ell\) at maximum: Sharp, localized features (e.g., resonances)
- Large \(\ell\) at maximum: Smooth, broad features (e.g., running)
- Narrow peak: Well-constrained hyperparameters
- Ridge structure: \(\sigma\)-\(\ell\) degeneracy (multiple models fit equally well)
Bayes Factor Interpretation:
\(\mathrm{BF} = \exp(\mathrm{LML}_{\mathrm{signal}} - \mathrm{LML}_{\mathrm{noise}})\) compares signal vs noise models:
- BF > 10: Strong evidence for features
- BF > 3: Moderate evidence
- BF < 3: Weak/no evidence
Bin Resolution Constraints:
With nbins bins over finite \(k\)-range, we cannot resolve arbitrarily small \(\ell\).
- Minimum resolvable: \(\ell_{\mathrm{min}} \approx 2 \Delta\log(k) \approx 2 \times \log{k_\mathrm{range}} / \mathrm{nbins}\)
- Maximum sensible: \(\ell_{\mathrm{max}} \approx \log{k_\mathrm{range}} / 2\)
Automatic Range Selection Strategy:
Sigma range (if None):
- Lower: \(\max(\mathrm{noise\_level} \times 0.1, 10^{-4})\) - slightly above noise floor
- Upper: \(\mathrm{empirical\_std} \times \mathrm{auto\_sigma\_factor}\) - covers strong signal regime
- Adapts to data scale while maintaining physical interpretability
Length scale range (if None):
- Lower: \(0.8 \times \ell_{\mathrm{min}}\) (from bin resolution)
- Upper: \(1.2 \times \ell_{\mathrm{max}}\) (from bin resolution)
- Optionally refined by empirical autocorrelation if available
| PARAMETER | DESCRIPTION |
|---|---|
delta_values
|
Observed \(\delta(k)\) values (single sample or posterior mean), shape (n,).
TYPE:
|
log_k
|
\(\log(k)\) values, shape (n, 1) or (n,).
TYPE:
|
sigma_range
|
(min, max) for signal std \(\sigma\) (None = auto-determine from data).
TYPE:
|
length_scale_range
|
(min, max) for length scale \(\ell\) (None = auto from resolution).
TYPE:
|
n_sigma
|
Number of \(\sigma\) grid points.
TYPE:
|
n_length
|
Number of length scale \(\ell\) grid points.
TYPE:
|
noise_level
|
Fixed diagonal noise \(\sigma_n^2\) (used if posterior_cov=None). DEPRECATED: Use posterior_cov instead for proper statistics.
TYPE:
|
posterior_cov
|
Full posterior covariance matrix \(\Sigma_{\mathrm{post}}\) (RECOMMENDED, nbins x nbins). If provided, uses full covariance instead of diagonal noise. Extract from MCMC: np.cov(delta_samples.T)
TYPE:
|
nbins
|
Number of bins for automatic length scale range (recommended).
TYPE:
|
k_start
|
Start of \(k\)-range in \(\mathrm{Mpc}^{-1}\) for automatic length scale range.
TYPE:
|
k_end
|
End of \(k\)-range in \(\mathrm{Mpc}^{-1}\) for automatic length scale range.
TYPE:
|
auto_sigma_factor
|
Multiplier for empirical_std when auto-determining sigma_max.
TYPE:
|
auto_length_fallback
|
Fraction of log_k range for length scale estimation fallback.
TYPE:
|
validate_ranges
|
If True, validate user-provided ranges and warn if unreasonable.
TYPE:
|
kernel_type
|
Type of kernel to use (default: KernelType.RBF).
TYPE:
|
kernel_params
|
Fixed kernel-specific parameters (not explored in landscape). RQ: {'alpha': float} - mixture parameter. Periodic: {'period': float} - oscillation period. LocallyPeriodic: {'period': float, 'length_scale_rbf': float}
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Dict
|
Dictionary containing: - sigma_grid: 1D array of \(\sigma\) values - length_scale_grid: 1D array of \(\ell\) values - lml_grid: 2D array of log-marginal likelihoods (n_sigma, n_length) - optimal_sigma: ML estimate of \(\sigma\) - optimal_length_scale: ML estimate of \(\ell\) - max_lml: Maximum log-marginal likelihood - null_lml: LML at \(\sigma \approx 0\) (null hypothesis) - bayes_factor: exp(max_lml - null_lml) - resolution_info: Bin resolution diagnostics - auto_selected_ranges: Dict with 'sigma_range', 'length_scale_range', 'method' |
Examples:
Automatic range selection:
>>> from primefeat.compute import get_bin_centers
>>> bin_centers = get_bin_centers(0.001, 0.23, 20)
>>> log_k = np.log(bin_centers).reshape(-1, 1)
>>> delta_mean = np.array([chain[f'delta_{i}'].mean() for i in range(1, 21)])
>>> # Let function auto-determine ranges
>>> landscape = compute_lml_landscape(
... delta_mean, log_k,
... nbins=20, k_start=0.001, k_end=0.23
... )
>>> print(f"Auto-selected sigma: {landscape['auto_selected_ranges']['sigma_range']}")
>>> if landscape['bayes_factor'] > 10:
... print(f"Strong evidence! ell = {landscape['optimal_length_scale']:.3f}")
Manual ranges with validation:
>>> # Provide your own ranges - function will validate them
>>> landscape = compute_lml_landscape(
... delta_mean, log_k,
... sigma_range=(0.001, 0.2),
... length_scale_range=(0.1, 1.5),
... nbins=20, k_start=0.001, k_end=0.23
... )
Source code in src/primefeat/gp.py
1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 | |
compare_kernel_likelihoods(delta_values, log_k, kernel_configs, noise_level=None, noise_cov=None)
Compare log-marginal likelihoods for different kernel configurations.
This function computes the LML for each provided kernel configuration, allowing direct comparison of how well different kernels explain the data.
| PARAMETER | DESCRIPTION |
|---|---|
delta_values
|
Observed \(\delta(k)\) values, shape (n,).
TYPE:
|
log_k
|
\(\log(k)\) values, shape (n,) or (n, 1).
TYPE:
|
kernel_configs
|
Dictionary mapping names to KernelConfig objects.
TYPE:
|
noise_level
|
Diagonal noise standard deviation \(\sigma_n\) (if noise_cov not provided).
TYPE:
|
noise_cov
|
Full posterior covariance matrix \(\Sigma_{\mathrm{post}}\) (recommended).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, Dict]
|
Dictionary with: - 'results': Dict mapping kernel name to {lml, config} - 'best_kernel': Name of kernel with highest LML - 'best_lml': Highest log-marginal likelihood - 'bayes_factors': Dict of Bayes factors relative to worst model |
Examples:
>>> configs = {
... 'RBF': KernelConfig(KernelType.RBF, sigma=0.1, length_scale=0.5),
... 'RQ': KernelConfig(
... KernelType.RATIONAL_QUADRATIC,
... sigma=0.1,
... length_scale=0.5,
... params={'alpha': 2.0}
... ),
... 'Periodic': KernelConfig(
... KernelType.PERIODIC,
... sigma=0.1,
... length_scale=0.3,
... params={'period': 1.5}
... ),
... }
>>> comparison = compare_kernel_likelihoods(delta, log_k, configs, noise_level=0.01)
>>> print(f"Best kernel: {comparison['best_kernel']}")
>>> print(f"Bayes factors: {comparison['bayes_factors']}")
Source code in src/primefeat/gp.py
1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 | |