Advanced Cost Functions¶
Learning Objectives:
- Understand built-in cost metrics (SSE, RMSE, GaussianNLL)
- Implement custom cost functions
- Apply weighted fitting for heteroscedastic data
- Use regularisation to prevent over-fitting
- Combine multiple objectives
Prerequisites: Optimisation Basics, ODE Fitting, basic statistics
Runtime: ~15 minutes
Introduction¶
The cost function (or loss function) quantifies how well model predictions match observations. Diffid provides built-in metrics, but real-world problems often require:
- Weighted fitting when measurement errors vary
- Custom metrics for domain-specific requirements
- Regularisation to prevent over-fitting
- Multi-objective combinations
This tutorial demonstrates advanced cost function techniques.
# Import plotting utilities
from functools import partial
import diffid
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(42)
Built-in Cost Metrics¶
Diffid provides three standard metrics:
1. Sum of Squared Errors (SSE)¶
$$ \text{SSE} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$
- Default metric
- Penalises large errors heavily
- Assumes constant variance
2. Root Mean Squared Error (RMSE)¶
$$ \text{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2} $$
- Normalised by sample size
- Same units as data
- Better for comparing across datasets
3. Gaussian Negative Log-Likelihood (GaussianNLL)¶
$$ \text{NLL} = \frac{n}{2}\log(2\pi\sigma^2) + \frac{1}{2\sigma^2}\sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$
- Statistically principled
- Enables Bayesian inference
- Can estimate noise parameter $\sigma$
# Generate simple test data
x_data = np.linspace(0, 10, 50)
y_true = 2.5 * x_data + 1.0
y_observed = y_true + np.random.normal(0, 2.0, len(x_data))
plt.figure(figsize=(10, 6))
plt.plot(x_data, y_observed, "o", label="Observed", alpha=0.6, markersize=6)
plt.plot(x_data, y_true, "--", label="True", linewidth=2)
plt.xlabel("x", fontsize=12)
plt.ylabel("y", fontsize=12)
plt.title("Linear Model Test Data", fontsize=14, fontweight="bold")
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Compare built-in metrics
def linear_model(params):
"""Simple linear model: y = slope * x + intercept"""
slope, intercept = params
return slope * x_data + intercept
# Define problem
builder = (
diffid.VectorBuilder()
.with_objective(linear_model)
.with_data(y_observed)
.with_parameter("slope", 1.0)
.with_parameter("intercept", 0.0)
)
# Test each metric
metrics = {
"SSE": diffid.SSE(),
"RMSE": diffid.RMSE(),
"GaussianNLL": diffid.GaussianNLL(),
}
results = {}
for name, metric in metrics.items():
result = builder.with_cost(metric).build().optimise()
results[name] = result
# Display comparison
print("\n" + "=" * 70)
print("COST METRIC COMPARISON")
print("=" * 70)
print("True parameters: [2.5, 1.0]")
print("-" * 70)
print(f"{'Metric':<15} {'Slope':<12} {'Intercept':<12} {'Cost Value'}")
print("-" * 70)
for name, result in results.items():
print(f"{name:<15} {result.x[0]:<12.4f} {result.x[1]:<12.4f} {result.value:<12.3e}")
print("\n💡 Note: All metrics produce similar parameter estimates!")
print(" The cost values differ in scale, but optima are equivalent.")
====================================================================== COST METRIC COMPARISON ====================================================================== True parameters: [2.5, 1.0] ---------------------------------------------------------------------- Metric Slope Intercept Cost Value ---------------------------------------------------------------------- SSE 2.3840 1.1288 1.650e+02 RMSE 2.3840 1.1288 1.668e+02 GaussianNLL 2.3840 1.1288 2.953e+02 💡 Note: All metrics produce similar parameter estimates! The cost values differ in scale, but optima are equivalent.
Weighted Fitting: Heteroscedastic Data¶
Real data often has non-uniform measurement errors (heteroscedasticity). Points with larger errors should contribute less to the cost function.
Weighted Least Squares¶
$$\text{WSSE} = \sum_{i=1}^{n} w_i (y_i - \hat{y}_i)^2$$
where $w_i = 1/\sigma_i^2$ (inverse variance weighting).
# Generate heteroscedastic data (error increases with x)
x_hetero = np.linspace(0, 10, 50)
y_true_hetero = 2.0 * x_hetero + 3.0
# Error increases linearly with x
error_std = 0.5 + 0.3 * x_hetero
y_hetero = y_true_hetero + np.random.normal(0, error_std)
# Visualize heteroscedastic data
fig, ax = plt.subplots(figsize=(10, 6))
# Error bars showing measurement uncertainty
ax.errorbar(
x_hetero,
y_hetero,
yerr=error_std,
fmt="o",
alpha=0.6,
label="Observed (with error bars)",
capsize=3,
markersize=6,
)
ax.plot(x_hetero, y_true_hetero, "r--", linewidth=2, label="True")
ax.set_xlabel("x", fontsize=12)
ax.set_ylabel("y", fontsize=12)
ax.set_title(
"Heteroscedastic Data (Error Increases with x)", fontsize=14, fontweight="bold"
)
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print(f"Error range: [{error_std.min():.2f}, {error_std.max():.2f}]")
print("Notice: Uncertainty increases from left to right!")
Error range: [0.50, 3.50] Notice: Uncertainty increases from left to right!
# Custom weighted SSE cost function
class WeightedSSE:
"""Weighted sum of squared errors."""
def __init__(self, weights):
"""
Parameters
----------
weights : array_like
Weight for each data point (typically 1/sigma^2)
"""
self.weights = np.asarray(weights)
def __call__(self, predicted, observed):
"""Compute weighted SSE."""
residuals = observed - predicted
return np.sum(self.weights * residuals**2)
# Weights: inverse variance (1/sigma^2)
weights = 1.0 / error_std**2
def linear_model_hetero(params):
slope, intercept = params
return slope * x_hetero + intercept
# Construct custom cost
weighted_sse = WeightedSSE(weights)
# Fit WITHOUT weights (standard SSE)
result_unweighted = (
diffid.VectorBuilder()
.with_objective(linear_model_hetero)
.with_data(y_hetero)
.with_parameter("slope", 1.0)
.with_parameter("intercept", 0.0)
.with_cost(diffid.SSE()) # Standard SSE
.build()
.optimise()
)
# Fit with weights
result_weighted = (
diffid.ScalarBuilder()
.with_objective(lambda x: weighted_sse(linear_model_hetero(x), y_hetero))
.with_parameter("slope", 1.0)
.with_parameter("intercept", 0.0)
.build()
.optimise()
)
print("\n" + "=" * 70)
print("WEIGHTED vs UNWEIGHTED FITTING")
print("=" * 70)
print("True parameters: [2.0, 3.0]")
print(f"Unweighted fit: {result_unweighted.x}")
print(f"Weighted fit: {result_weighted.x}")
print("-" * 70)
print(f"Unweighted error: {np.linalg.norm(result_unweighted.x - [2.0, 3.0]):.4f}")
print(f"Weighted error: {np.linalg.norm(result_weighted.x - [2.0, 3.0]):.4f}")
print("\n💡 Weighted fit is more accurate!")
print(" It down-weights noisy (high-x) points appropriately.")
====================================================================== WEIGHTED vs UNWEIGHTED FITTING ====================================================================== True parameters: [2.0, 3.0] Unweighted fit: [1.93530391 3.27981493] Weighted fit: [1.97635486 3.10714606] ---------------------------------------------------------------------- Unweighted error: 0.2872 Weighted error: 0.1097 💡 Weighted fit is more accurate! It down-weights noisy (high-x) points appropriately.
# Visualize comparison
x_plot = np.linspace(0, 10, 200)
y_true_plot = 2.0 * x_plot + 3.0
y_unweighted = result_unweighted.x[0] * x_plot + result_unweighted.x[1]
y_weighted = result_weighted.x[0] * x_plot + result_weighted.x[1]
fig, ax = plt.subplots(figsize=(12, 7))
# Data with error bars
ax.errorbar(
x_hetero,
y_hetero,
yerr=error_std,
fmt="o",
alpha=0.5,
label="Data (with uncertainties)",
capsize=3,
markersize=6,
color="gray",
)
# Fits
ax.plot(x_plot, y_true_plot, "k--", linewidth=3, label="True", alpha=0.8)
ax.plot(x_plot, y_unweighted, linewidth=2.5, label="Unweighted fit", color="#ff7f0e")
ax.plot(x_plot, y_weighted, linewidth=2.5, label="Weighted fit", color="#2ca02c")
ax.set_xlabel("x", fontsize=13)
ax.set_ylabel("y", fontsize=13)
ax.set_title("Weighted vs Unweighted Fitting", fontsize=15, fontweight="bold")
ax.legend(fontsize=12, loc="upper left")
ax.grid(True, alpha=0.3)
# Highlight the difference in high-error region
ax.axvspan(7, 10, alpha=0.1, color="red", label="High uncertainty region")
ax.text(
8.5,
5,
"High uncertainty\nregion",
ha="center",
fontsize=10,
bbox=dict(boxstyle="round", facecolor="white", alpha=0.8),
)
plt.tight_layout()
plt.show()
Regularisation: Preventing Over-fitting¶
When fitting complex models with many parameters, regularisation prevents over-fitting by penalising large parameter values.
L2 Regularisation (Ridge)¶
$$\text{Cost} = \text{Data Term} + \lambda \sum_{i=1}^{p} \theta_i^2$$
L1 Regularisation (Lasso)¶
$$\text{Cost} = \text{Data Term} + \lambda \sum_{i=1}^{p} |\theta_i|$$
# Generate data for polynomial fitting
np.random.seed(123)
x_poly = np.linspace(-1, 1, 20)
y_true_poly = 2 * x_poly - 3 * x_poly**2 # True: quadratic
y_poly = y_true_poly + np.random.normal(0, 0.3, len(x_poly))
plt.figure(figsize=(10, 6))
plt.plot(x_poly, y_poly, "o", label="Observed", alpha=0.7, markersize=8)
plt.plot(x_poly, y_true_poly, "r--", linewidth=2, label="True (quadratic)")
plt.xlabel("x", fontsize=12)
plt.ylabel("y", fontsize=12)
plt.title("Polynomial Fitting Test Data", fontsize=14, fontweight="bold")
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Fit high-degree polynomial (degree 8) - prone to over-fitting
degree = 8
def polynomial_model(params):
"""Polynomial model: sum of coefficients * x^i"""
y_pred = np.zeros_like(x_poly)
for i, coef in enumerate(params):
y_pred += coef * x_poly**i
return y_pred
# Custom cost with L2 regularisation
class RegularisedSSE:
"""SSE with L2 regularisation (Ridge)."""
def __init__(self, lambda_reg=0.0):
self.lambda_reg = lambda_reg
def __call__(self, predicted, observed, params=None):
# Data term
sse = np.sum((observed - predicted) ** 2)
# Regularisation term (note: params must be passed separately)
if params is not None and self.lambda_reg > 0:
reg_term = self.lambda_reg * np.sum(params**2)
return sse + reg_term
return sse
# Note: Diffid's built-in costs don't support parameter access yet,
# so we'll compare by manually adding regularisation to parameter update
# Unregularised fit
initial_params = np.zeros(degree + 1)
initial_params[0] = np.mean(y_poly)
builder_poly = diffid.VectorBuilder().with_objective(polynomial_model).with_data(y_poly)
for i in range(degree + 1):
builder_poly = builder_poly.with_parameter(f"c{i}", initial_params[i])
result_unreg = (
builder_poly.with_cost(diffid.SSE())
.with_optimiser(diffid.NelderMead().with_max_iter(2000))
.build()
.optimise()
)
print("\n" + "=" * 70)
print("POLYNOMIAL FITTING (Degree 8)")
print("=" * 70)
print("Unregularised coefficients:")
print(f" {result_unreg.x}")
print(f" Max |coef|: {np.max(np.abs(result_unreg.x)):.2f}")
print(f" SSE: {result_unreg.value:.4f}")
====================================================================== POLYNOMIAL FITTING (Degree 8) ====================================================================== Unregularised coefficients: [ 1.76916563e-03 1.96793262e+00 -4.26606389e+00 1.70512990e+00 3.98781374e+00 -2.59931773e+00 -3.98125430e-01 1.08590028e+00 -2.43719499e+00] Max |coef|: 4.27 SSE: 1.8069
Visualise Overfitting¶
To visualise model overfitting, an extrapolation region is plotted. This provides insight into whether the model accurately captures the system dynamics and not the underlying noise model.
# Visualize fits
x_dense = np.linspace(-1, 1, 200)
def eval_poly(x, coeffs):
y = np.zeros_like(x)
for i, c in enumerate(coeffs):
y += c * x**i
return y
y_true_dense = 2 * x_dense - 3 * x_dense**2
y_unreg_dense = eval_poly(x_dense, result_unreg.x)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
# Left: Training data fit
ax1.plot(
x_poly, y_poly, "o", label="Training data", alpha=0.7, markersize=8, color="gray"
)
ax1.plot(
x_dense, y_true_dense, "k--", linewidth=2.5, label="True (quadratic)", alpha=0.8
)
ax1.plot(x_dense, y_unreg_dense, linewidth=2.5, label="Degree-8 fit", color="#d62728")
ax1.set_xlabel("x", fontsize=12)
ax1.set_ylabel("y", fontsize=12)
ax1.set_title("Training Data: Over-fitting", fontsize=14, fontweight="bold")
ax1.legend(fontsize=11)
ax1.grid(True, alpha=0.3)
ax1.set_ylim(-4, 3)
# Right: Extrapolation (shows over-fitting clearly)
x_extrap = np.linspace(-1.5, 1.5, 200)
y_true_extrap = 2 * x_extrap - 3 * x_extrap**2
y_unreg_extrap = eval_poly(x_extrap, result_unreg.x)
ax2.plot(
x_poly, y_poly, "o", label="Training data", alpha=0.7, markersize=8, color="gray"
)
ax2.plot(x_extrap, y_true_extrap, "k--", linewidth=2.5, label="True", alpha=0.8)
ax2.plot(x_extrap, y_unreg_extrap, linewidth=2.5, label="Degree-8 fit", color="#d62728")
ax2.axvline(x=-1, color="red", linestyle=":", alpha=0.5)
ax2.axvline(x=1, color="red", linestyle=":", alpha=0.5)
ax2.set_xlabel("x", fontsize=12)
ax2.set_ylabel("y", fontsize=12)
ax2.set_title("Extrapolation: Catastrophic Failure", fontsize=14, fontweight="bold")
ax2.legend(fontsize=11)
ax2.grid(True, alpha=0.3)
ax2.set_ylim(-10, 5)
ax2.text(-1.25, -8, "Extrapolation\nregion", fontsize=10, ha="center")
ax2.text(1.25, -8, "Extrapolation\nregion", fontsize=10, ha="center")
plt.tight_layout()
plt.show()
print("\n⚠️ Over-fitting Alert!")
print(" The high-degree polynomial fits training data well but")
print(" extrapolates poorly. Regularisation would help!")
⚠️ Over-fitting Alert! The high-degree polynomial fits training data well but extrapolates poorly. Regularisation would help!
Multi-Objective Cost Functions¶
Sometimes we want to balance multiple objectives simultaneously:
$$\text{Cost} = w_1 \cdot \text{Fit Quality} + w_2 \cdot \text{Smoothness} + w_3 \cdot \text{Physical Constraints}$$
# Example: Fit with smoothness penalty
class MultiObjectiveCost:
"""Combined data fit + smoothness penalty."""
def __init__(self, weight_fit=1.0, weight_smooth=0.1):
self.weight_fit = weight_fit
self.weight_smooth = weight_smooth
def __call__(self, predicted, observed):
# Data fit term (SSE)
fit_cost = np.sum((observed - predicted) ** 2)
# Smoothness term (penalise large second derivatives)
second_deriv = np.diff(predicted, n=2)
smooth_cost = np.sum(second_deriv**2)
return self.weight_fit * fit_cost + self.weight_smooth * smooth_cost
# Fit with different smoothness weights
weights_smooth = [0.0, 0.1, 1.0, 10.0]
results_multi = {}
# Cost wrapper
def wrapper(x, y_poly, cost_func):
return cost_func(polynomial_model(x), y_poly)
for w_smooth in weights_smooth:
# Construct custom cost
multi_cost = MultiObjectiveCost(weight_fit=1.0, weight_smooth=w_smooth)
result = diffid.ScalarBuilder().with_objective(
partial(wrapper, y_poly=y_poly, cost_func=multi_cost)
)
for i in range(degree + 1):
result = result.with_parameter(f"c{i}", initial_params[i])
result = (
result.with_optimiser(diffid.NelderMead().with_max_iter(2000))
.build()
.optimise()
)
results_multi[w_smooth] = result
print(
f"Smoothness weight={w_smooth:5.1f}: SSE={np.sum((eval_poly(x_poly, result.x) - y_poly) ** 2):.3f}"
)
Smoothness weight= 0.0: SSE=1.807 Smoothness weight= 0.1: SSE=1.816 Smoothness weight= 1.0: SSE=1.976 Smoothness weight= 10.0: SSE=2.267
# Visualize effect of smoothness weight
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.flatten()
for idx, (w_smooth, result) in enumerate(results_multi.items()):
ax = axes[idx]
y_fit = eval_poly(x_dense, result.x)
ax.plot(x_poly, y_poly, "o", label="Data", alpha=0.6, markersize=7, color="gray")
ax.plot(x_dense, y_true_dense, "k--", linewidth=2, label="True", alpha=0.7)
ax.plot(x_dense, y_fit, linewidth=2.5, label=f"Fit (λ={w_smooth})", color="#2ca02c")
ax.set_xlabel("x", fontsize=11)
ax.set_ylabel("y", fontsize=11)
ax.set_title(f"Smoothness Weight λ = {w_smooth}", fontsize=13, fontweight="bold")
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)
ax.set_ylim(-4, 3)
plt.tight_layout()
plt.show()
print("\n💡 Smoothness Penalty Effect:")
print(" λ=0.0: No penalty → over-fitting")
print(" λ=0.1: Slight smoothing")
print(" λ=1.0: Balanced fit")
print(" λ=10.0: Too smooth → under-fitting")
💡 Smoothness Penalty Effect: λ=0.0: No penalty → over-fitting λ=0.1: Slight smoothing λ=1.0: Balanced fit λ=10.0: Too smooth → under-fitting
Physical Constraints as Cost Penalties¶
In scientific applications, we often have physical constraints:
- Parameters must be positive (e.g., rate constants)
- Conservation laws (e.g., mass balance)
- Monotonicity requirements
These can be enforced via penalty terms:
class ConstrainedCost:
"""Cost with soft constraints via penalties."""
def __init__(self, penalty_weight=1000.0):
self.penalty_weight = penalty_weight
def __call__(self, predicted, observed, params=None):
# Data fit term
sse = np.sum((observed - predicted) ** 2)
# Example constraint: parameters should sum to a target value
if params is not None:
# Soft constraint: sum(params) ≈ 0
constraint_violation = (np.sum(params) - 0.0) ** 2
# Positivity constraint: penalise negative parameters
negative_penalty = np.sum(np.minimum(0, params) ** 2)
penalty = self.penalty_weight * (constraint_violation + negative_penalty)
return sse + penalty
return sse
print("Example constraint penalties:")
print(" • Sum constraint: forces Σθᵢ ≈ target")
print(" • Positivity: penalises θᵢ < 0")
print(" • Bounds: penalises θᵢ outside [a, b]")
print(" • Monotonicity: penalises θᵢ₊₁ < θᵢ")
print("\n💡 Tip: Start with large penalty weights, then tune.")
Example constraint penalties: • Sum constraint: forces Σθᵢ ≈ target • Positivity: penalises θᵢ < 0 • Bounds: penalises θᵢ outside [a, b] • Monotonicity: penalises θᵢ₊₁ < θᵢ 💡 Tip: Start with large penalty weights, then tune.
Best Practices¶
Choosing a Cost Function¶
- Default (SSE): Good starting point for most problems
- RMSE: When comparing across datasets with different sizes
- GaussianNLL: For Bayesian inference and uncertainty quantification
- Weighted: When measurement errors vary (heteroscedastic data)
- Custom: For domain-specific requirements
Regularisation Guidelines¶
- When to use: High-dimensional problems, limited data, polynomial fitting
- L2 (Ridge): Shrinks all coefficients smoothly
- L1 (Lasso): Promotes sparsity (some coefficients → 0)
- Tuning λ: Cross-validation or information criteria (AIC, BIC)
Multi-Objective Balancing¶
- Start simple: Fit data first, add penalties incrementally
- Scale matters: Normalise objectives to similar magnitudes
- Weight tuning: Use logarithmic search (0.001, 0.01, 0.1, 1, 10, ...)
- Validation: Check if constraints are actually satisfied
Summary: Cost Function Design Pattern¶
class CustomCost:
def __init__(self, **hyperparameters):
# Store configuration
pass
def __call__(self, predicted, observed, params=None):
# 1. Data fidelity term
fit_cost = compute_fit(predicted, observed)
# 2. Regularisation term (optional)
reg_cost = compute_regularisation(params)
# 3. Physical constraints (optional)
constraint_cost = compute_penalties(params)
# 4. Combine with weights
return w1*fit_cost + w2*reg_cost + w3*constraint_cost
Key Takeaways¶
- Built-in metrics (SSE, RMSE, GaussianNLL) cover most use cases
- Weighted fitting essential for heteroscedastic data (varying errors)
- Regularisation prevents over-fitting in high-dimensional problems
- Multi-objective costs balance competing goals (fit vs smoothness)
- Constraint penalties enforce physical requirements
- Custom costs are easy to implement via
__call__interface - Scale and normalisation critical for multi-term objectives
Next Steps¶
- Guide: Cost Metrics - Detailed metric selection
- Parameter Uncertainty - Bayesian inference with GaussianNLL
- API Reference: Cost Metrics - Complete API
Exercises¶
Implement L1 Regularisation: Create a
LassoSSEclass and compare with L2Cross-Validation: Implement k-fold cross-validation to tune regularisation strength
Robust Fitting: Implement Huber loss (robust to outliers): $$L_\delta(r) = \begin{cases} \frac{1}{2}r^2 & \text{for } |r| \leq \delta \\ \delta(|r| - \frac{1}{2}\delta) & \text{otherwise} \end{cases}$$
Time-Series Smoothness: Add a penalty on parameter changes over time for dynamic fitting
Information Criteria: Compute AIC and BIC for model selection:
- AIC = $2k - 2\ln(\mathcal{L})$
- BIC = $\ln(n)k - 2\ln(\mathcal{L})$