Skip to content

Performance

Benchmarks and optimization tips.

Benchmarks

Typical performance on modern CPU:

Grid Size Samples Time (s) Samples/sec
50 10 0.002 5000
100 10 0.008 1250
200 10 0.030 333
500 10 0.190 53
1000 10 0.760 13

Benchmarked on Apple M1 Pro

Complexity

Prior Sampling

Time complexity: \(O(n^3)\) for covariance decomposition, where \(n\) is total grid size.

  • Cholesky decomposition of covariance matrix dominates
  • Sampling is \(O(n \cdot k)\) where \(k\) is number of samples

For joint sampling (f, g, h):

  • Total grid size: \(n = n_f + n_g + n_h\)
  • Decomposition: \(O(n^3)\)
  • Per sample: \(O(n)\)

Posterior Sampling

Additional operations:

  • Conditioning: \(O(m^3)\) where \(m\) is number of observations
  • Cross-covariance: \(O(m \cdot n)\)
  • Predictive mean: \(O(m \cdot n)\)

Posterior is typically faster than prior when \(m \ll n\).

Optimization Tips

1. Use Coarser Grids

Integral is smooth - use fewer points:

# Dense function samples
spec = SamplingSpec(
    x_f=np.linspace(0, 5, 200),  # Fine
    x_g=np.linspace(0, 5, 50)    # Coarse
)

Reduces computational cost by 75% while preserving accuracy.

2. Sample Only What You Need

Skip unused quantities:

# Only need derivative
spec = SamplingSpec(x_h=x)  # Don't request f or g

Smaller covariance matrix = faster decomposition.

3. Batch Sampling

Generate multiple samples in one call:

# Efficient
result = sample_prior(spec, n_samples=100)

# Inefficient
samples = [sample_prior(spec, n_samples=1) for _ in range(100)]

Amortizes decomposition cost over all samples.

4. Larger Length Scales

Small length scales require finer grids for accuracy:

# Small ell needs dense grid
ell_small = 0.1
x_dense = np.linspace(0, 5, 500)

# Large ell works with coarse grid
ell_large = 2.0
x_coarse = np.linspace(0, 5, 100)

5. Sparse Observations

For posterior sampling, use fewer observations:

# Instead of 100 observations
x_train_dense = np.linspace(0, 5, 100)

# Use 20 well-placed observations
x_train_sparse = np.linspace(0, 5, 20)

Reduces conditioning overhead from \(O(100^3)\) to \(O(20^3)\).

Memory Usage

Memory scales as \(O(n^2)\) for covariance matrix.

Approximate memory:

Grid Size Matrix Size Memory (MB)
100 100×100 0.08
500 500×500 2
1000 1000×1000 8
5000 5000×5000 200

For large grids (\(n > 1000\)), consider:

  • Using sparse/low-rank approximations
  • Splitting into smaller regions
  • Increasing ell to reduce required resolution

Parallelization

C implementation uses GSL's BLAS routines, which may parallelize automatically if linked with multithreaded BLAS (e.g., OpenBLAS, MKL).

Check BLAS backend:

import numpy as np
np.show_config()

Profiling

Profile your code to identify bottlenecks:

import cProfile
import pstats

with cProfile.Profile() as pr:
    result = sample_prior(spec, n_samples=100)

stats = pstats.Stats(pr)
stats.sort_stats('cumtime')
stats.print_stats(10)

Next Steps