Performance¶
Benchmarks and optimization tips.
Benchmarks¶
Typical performance on modern CPU:
| Grid Size | Samples | Time (s) | Samples/sec |
|---|---|---|---|
| 50 | 10 | 0.002 | 5000 |
| 100 | 10 | 0.008 | 1250 |
| 200 | 10 | 0.030 | 333 |
| 500 | 10 | 0.190 | 53 |
| 1000 | 10 | 0.760 | 13 |
Benchmarked on Apple M1 Pro
Complexity¶
Prior Sampling¶
Time complexity: \(O(n^3)\) for covariance decomposition, where \(n\) is total grid size.
- Cholesky decomposition of covariance matrix dominates
- Sampling is \(O(n \cdot k)\) where \(k\) is number of samples
For joint sampling (f, g, h):
- Total grid size: \(n = n_f + n_g + n_h\)
- Decomposition: \(O(n^3)\)
- Per sample: \(O(n)\)
Posterior Sampling¶
Additional operations:
- Conditioning: \(O(m^3)\) where \(m\) is number of observations
- Cross-covariance: \(O(m \cdot n)\)
- Predictive mean: \(O(m \cdot n)\)
Posterior is typically faster than prior when \(m \ll n\).
Optimization Tips¶
1. Use Coarser Grids¶
Integral is smooth - use fewer points:
# Dense function samples
spec = SamplingSpec(
x_f=np.linspace(0, 5, 200), # Fine
x_g=np.linspace(0, 5, 50) # Coarse
)
Reduces computational cost by 75% while preserving accuracy.
2. Sample Only What You Need¶
Skip unused quantities:
Smaller covariance matrix = faster decomposition.
3. Batch Sampling¶
Generate multiple samples in one call:
# Efficient
result = sample_prior(spec, n_samples=100)
# Inefficient
samples = [sample_prior(spec, n_samples=1) for _ in range(100)]
Amortizes decomposition cost over all samples.
4. Larger Length Scales¶
Small length scales require finer grids for accuracy:
# Small ell needs dense grid
ell_small = 0.1
x_dense = np.linspace(0, 5, 500)
# Large ell works with coarse grid
ell_large = 2.0
x_coarse = np.linspace(0, 5, 100)
5. Sparse Observations¶
For posterior sampling, use fewer observations:
# Instead of 100 observations
x_train_dense = np.linspace(0, 5, 100)
# Use 20 well-placed observations
x_train_sparse = np.linspace(0, 5, 20)
Reduces conditioning overhead from \(O(100^3)\) to \(O(20^3)\).
Memory Usage¶
Memory scales as \(O(n^2)\) for covariance matrix.
Approximate memory:
| Grid Size | Matrix Size | Memory (MB) |
|---|---|---|
| 100 | 100×100 | 0.08 |
| 500 | 500×500 | 2 |
| 1000 | 1000×1000 | 8 |
| 5000 | 5000×5000 | 200 |
For large grids (\(n > 1000\)), consider:
- Using sparse/low-rank approximations
- Splitting into smaller regions
- Increasing
ellto reduce required resolution
Parallelization¶
C implementation uses GSL's BLAS routines, which may parallelize automatically if linked with multithreaded BLAS (e.g., OpenBLAS, MKL).
Check BLAS backend:
Profiling¶
Profile your code to identify bottlenecks:
import cProfile
import pstats
with cProfile.Profile() as pr:
result = sample_prior(spec, n_samples=100)
stats = pstats.Stats(pr)
stats.sort_stats('cumtime')
stats.print_stats(10)
Next Steps¶
- Troubleshooting - Numerical issues
- Testing - Validation and tests