Solver Algorithms
Levenberg-Marquardt, DogLeg, BFGS, L-BFGS, Sparse CG, and the automatic fallback chain.
Solver Algorithms
The solver implements five distinct algorithms, each optimized for different problem structures. An automatic fallback chain tries alternatives when the primary algorithm fails to converge.
Levenberg-Marquardt (LM)
The workhorse algorithm, suitable for all problem sizes. LM interpolates between gradient descent and Gauss-Newton:
Update rule:
(JᵀJ + λ · diag(JᵀJ)) · Δx = −Jᵀr
Key features:
- Armijo backtracking line search with α = 1.0, β = 0.5, c = 1e-4
- Marquardt scaling — uses
diag(JᵀJ)instead of identity, giving curvature-aware damping - Adaptive damping — λ increases on rejected steps, decreases on accepted steps
- Convergence — terminates when
‖r‖ < tolor‖Δx‖ < xtolor max iterations reached
LM is robust but can be slow for long constraint chains due to the dense normal equations solve at each iteration.
DogLeg / Powell
A trust-region method that interpolates between the steepest-descent step and the Gauss-Newton step:
Δx = α · Δx_SD + (1 − α) · Δx_GN, ‖Δx‖ ≤ Δ (trust radius)
Key features:
- 22.7× faster than LM for chain-type problems (2.41ms vs 54.8ms on chain-20)
- Trust radius adapts based on the ratio of actual to predicted reduction
- When the GN step lies inside the trust region, it's taken directly (superlinear convergence)
- When outside, the step follows the "dog-leg" path to the trust-region boundary
DogLeg excels on well-conditioned problems where the Gauss-Newton approximation is good.
BFGS
A quasi-Newton method that builds an approximation to the inverse Hessian H ≈ (JᵀJ)⁻¹:
- Strong Wolfe line search with zoom phase for step-size acceptance
- Hessian approximation updated via the BFGS formula:
H_{k+1} = (I − ρ·s·yᵀ)H(I − ρ·y·sᵀ) + ρ·s·sᵀ - Full n×n inverse Hessian stored — suitable for small-to-medium problems (n < 100)
- Superlinear convergence near the solution
L-BFGS
Limited-memory BFGS stores only the last m = 10 vector pairs (s, y) instead of the full Hessian:
- Two-loop recursion computes
H·gin O(mn) time - Memory usage: O(mn) instead of O(n²)
- Auto-selected for n ≥ 100 parameters
- Nearly identical convergence to full BFGS in practice
Sparse Conjugate Gradient (CG)
For large, sparse problems the solver uses CG on the normal equations with a CSC Jacobian:
- Compute
A = JᵀJvia SpGEMM (sparse matrix multiplication) - Apply Jacobi preconditioner
M = diag(A) - Run Conjugate Gradient to solve
A·Δx = −Jᵀr
This avoids forming the dense normal equations and scales to problems with thousands of parameters.
Fallback Chain
When using the Auto algorithm selection, the solver tries algorithms in sequence:
Primary: LM ├─ fails → DogLeg │ ├─ fails → BFGS │ │ └─ fails → return SolverError │ └─ succeeds → return solution └─ succeeds → return solution
A solve is considered "failed" if it reaches max iterations without convergence or if the residual increases on every step (stagnation).
Algorithm Comparison
| Algorithm | Chain-20 Time | Relative | Memory | Best For |
|---|---|---|---|---|
| DogLeg | 2.41 ms | 1.0× | O(n²) | Well-conditioned, chain-like |
| BFGS | 3.30 ms | 1.4× | O(n²) | Small smooth problems |
| L-BFGS | 3.15 ms | 1.3× | O(mn) | Medium problems, n ≥ 100 |
| Sparse CG | 5.20 ms | 2.2× | O(nnz) | Large sparse problems |
| LM | 54.8 ms | 22.7× | O(n²) | Difficult/ill-conditioned |
LM's poor showing on chain problems is due to its conservative step-size strategy; however, it remains the most robust algorithm for ill-conditioned or near-singular problems where others diverge.