CMA-ES Algorithm¶
Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is a stochastic, derivative-free algorithm for global optimisation. It maintains a multivariate Gaussian distribution that adapts its covariance structure to the objective landscape.
Algorithm Overview¶
CMA-ES samples a population of candidate solutions from a multivariate normal distribution, ranks them by fitness, and updates the distribution parameters to bias future samples towards better regions.
Key Properties¶
| Property | Value |
|---|---|
| Type | Global, gradient-free |
| Parallelisable | Yes (population evaluation) |
| Function evaluations | \(\lambda\) per generation |
| Best dimensions | 10-100+ parameters |
Algorithm¶
The algorithm samples offspring from:
\[ x_k \sim m + \sigma \cdot \mathcal{N}(0, C) \]
Where:
- \(m\) is the distribution mean (current best estimate)
- \(\sigma\) is the global step size
- \(C\) is the covariance matrix
Algorithm Steps¶
- Sample: Generate \(\lambda\) offspring from \(\mathcal{N}(m, \sigma^2 C)\)
- Evaluate: Compute fitness for all offspring (parallelisable)
- Select: Rank offspring; select best \(\mu\) as parents
- Update mean: \(m \leftarrow \sum_{i=1}^{\mu} w_i x_{i:\lambda}\)
- Update evolution paths:
- Conjugate path: \(p_\sigma \leftarrow (1-c_\sigma) p_\sigma + \sqrt{c_\sigma(2-c_\sigma)\mu_\text{eff}} \cdot C^{-1/2}(m - m_\text{old})/\sigma\)
- Covariance path: \(p_c \leftarrow (1-c_c) p_c + \sqrt{c_c(2-c_c)\mu_\text{eff}} \cdot (m - m_\text{old})/\sigma\)
- Update covariance: Rank-one + rank-\(\mu\) update
- Update step size: Based on evolution path length vs expected length
- Repeat: Until convergence
Strategy Parameters¶
The algorithm automatically computes these from dimension \(n\):
| Parameter | Formula | Purpose |
|---|---|---|
| \(\lambda\) | \(\max(4, \lfloor 4 + 3\ln(n) \rfloor)\) | Population size |
| \(\mu\) | \(\lfloor \lambda / 2 \rfloor\) | Parent count |
| \(\mu_\text{eff}\) | \(1 / \sum w_i^2\) | Effective parent count |
| \(c_\sigma\) | \((μ_\text{eff} + 2) / (n + μ_\text{eff} + 5)\) | Step size learning rate |
| \(c_c\) | \((4 + μ_\text{eff}/n) / (n + 4 + 2μ_\text{eff}/n)\) | Covariance path learning rate |
Parameters¶
| Parameter | Default | Description |
|---|---|---|
max_iter | 1000 | Maximum generations |
threshold | 1e-6 | Objective convergence tolerance |
step_size | 0.5 | Initial search radius \(\sigma\) |
population_size | Auto | Offspring per generation \(\lambda\) |
seed | None | Random seed for reproducibility |
patience | None | Timeout in seconds |
Convergence Criteria¶
The algorithm terminates when any condition is met:
- Iteration limit:
generation >= max_iter - Function tolerance: Best objective value below
threshold - Patience: Elapsed time exceeds
patienceseconds
Tuning Guidance¶
Step size (\(\sigma\)) is the most important parameter:
- Set to ~⅓ of the expected distance to the optimum
- Too large: slow convergence, overshooting
- Too small: premature convergence, stuck in local optima
Population size affects exploration vs exploitation:
- Default formula works well for most problems
- Increase for highly multi-modal landscapes
- Decrease for faster convergence on simpler problems
- Match to available parallel compute resources
When to Use¶
Strengths:
- Global search capability
- Scales well to high dimensions (10-100+ parameters)
- Self-adapting covariance learns problem structure
- Parallelisable population evaluation
- Robust to local minima
Limitations:
- More evaluations than gradient methods
- Stochastic results (use seed for reproducibility)
- Memory scales as \(O(n^2)\) for covariance matrix
- Overkill for simple, low-dimensional problems
Example¶
import diffid as chron
optimiser = (
diffid.CMAES()
.with_max_iter(500)
.with_step_size(0.3)
.with_population_size(20)
.with_seed(42)
)
result = optimiser.run(problem, initial_guess=[0.5] * n_params)
Implementation Notes¶
- Full covariance matrix (not diagonal approximation)
- Lazy eigendecomposition for efficient sampling
- Bounds enforced via clamping after each sample
References¶
- Hansen, N. (2016). "The CMA Evolution Strategy: A Tutorial". arXiv:1604.00772.
- Hansen, N. and Ostermeier, A. (2001). "Completely Derandomized Self-Adaptation in Evolution Strategies". Evolutionary Computation, 9(2), 159-195.
- Hansen, N. et al. (2003). "Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES)". Evolutionary Computation, 11(1), 1-18.