Metropolis-Hastings Algorithm¶

Metropolis-Hastings is a Markov Chain Monte Carlo (MCMC) algorithm for sampling from posterior distributions. It enables uncertainty quantification by generating samples that characterise the probability distribution over parameters.

Algorithm Overview¶

The algorithm constructs a Markov chain whose stationary distribution is the target posterior. By proposing random moves and accepting/rejecting them based on the likelihood ratio, it explores the parameter space in proportion to posterior probability.

Key Properties¶

Property	Value
Type	MCMC sampler
Parallelisable	Yes (across chains)
Output	Posterior samples
Best for	Uncertainty quantification

Mathematical Foundation¶

Given a target distribution \(\pi(\theta)\) (the posterior), the algorithm:

Proposes a new state from proposal distribution \(q(\theta' | \theta)\)
Computes acceptance probability:

\[ \alpha = \min\left(1, \frac{\pi(\theta') q(\theta | \theta')}{\pi(\theta) q(\theta' | \theta)}\right) \]

For the symmetric random walk proposal used here (\(q(\theta' | \theta) = q(\theta | \theta')\)):

\[ \alpha = \min\left(1, \frac{\pi(\theta')}{\pi(\theta)}\right) = \min\left(1, \exp(\log\pi(\theta') - \log\pi(\theta))\right) \]

Algorithm Steps¶

Initialise: Start at initial point \(\theta_0\)
Propose: Generate candidate \(\theta' = \theta_t + \sigma \cdot \mathcal{N}(0, I)\)
Evaluate: Compute log-likelihood \(\log\mathcal{L}(\theta')\)
Accept/Reject:
- Draw \(u \sim \text{Uniform}(0, 1)\)
- If \(u < \exp(\log\mathcal{L}(\theta') - \log\mathcal{L}(\theta_t))\): accept \(\theta_{t+1} = \theta'\)
- Else: reject \(\theta_{t+1} = \theta_t\)
Repeat: For specified number of iterations
Return: Chain of samples \(\{\theta_0, \theta_1, \ldots, \theta_T\}\)

Parameters¶

Parameter	Default	Description
`iterations`	1000	MCMC iterations per chain
`num_chains`	1	Number of independent chains
`step_size`	0.1	Proposal distribution width \(\sigma\)
`seed`	None	Random seed for reproducibility

Tuning Guidance¶

Step size controls exploration efficiency:

Target acceptance rate: 20-50% (optimal ~23% for high dimensions)
Too large: low acceptance rate, chain gets stuck
Too small: high acceptance rate but slow mixing
Tune by monitoring acceptance ratio

Number of chains aids convergence diagnostics:

Multiple chains from different starting points detect convergence issues
Enables Gelman-Rubin \(\hat{R}\) diagnostic
More chains = more robust inference but higher cost

Iterations should be sufficient for:

Burn-in period (discard initial samples)
Effective sample size for reliable estimates

Convergence Diagnostics¶

After sampling, assess convergence:

Trace plots: Visual inspection for stationarity
Acceptance rate: Should be 20-50%
Autocorrelation: Lower is better; high autocorrelation means inefficient sampling
\(\hat{R}\) statistic: Should be < 1.1 for all parameters (requires multiple chains)

When to Use¶

Strengths:

Full posterior characterisation
Uncertainty quantification (credible intervals)
Parameter correlation analysis
Simple and robust

Limitations:

Many function evaluations required
Can be slow for high-dimensional problems
Requires careful tuning for efficiency
Does not compute model evidence (use nested sampling)

Cost Metric Requirement¶

Metropolis-Hastings requires a negative log-likelihood cost metric:

problem = (
    diffid.ScalarProblemBuilder()
    .with_objective(model_fn)
    .with_cost_metric(diffid.CostMetric.GaussianNLL)  # Required
    .build()
)

Example¶

import diffid as chron

sampler = (
    diffid.MetropolisHastings()
    .with_iterations(5000)
    .with_num_chains(4)
    .with_step_size(0.05)
    .with_seed(42)
)

result = sampler.run(problem, initial_guess=[1.0, 2.0])

# Access samples
samples = result.samples  # Shape: (n_chains * iterations, n_params)

References¶

Metropolis, N. et al. (1953). "Equation of State Calculations by Fast Computing Machines". Journal of Chemical Physics, 21(6), 1087-1092.
Hastings, W.K. (1970). "Monte Carlo Sampling Methods Using Markov Chains and Their Applications". Biometrika, 57(1), 97-109.
Gelman, A. et al. (2013). Bayesian Data Analysis. 3^rd ed. Chapman & Hall/CRC.