Solver Algorithms

Levenberg-Marquardt, DogLeg, BFGS, L-BFGS, Sparse CG, and the automatic fallback chain.

Solver Algorithms

The solver implements five distinct algorithms, each optimized for different problem structures. An automatic fallback chain tries alternatives when the primary algorithm fails to converge.

Levenberg-Marquardt (LM)

The workhorse algorithm, suitable for all problem sizes. LM interpolates between gradient descent and Gauss-Newton:

Update rule:

(JᵀJ + λ · diag(JᵀJ)) · Δx = −Jᵀr

Key features:

Armijo backtracking line search with α = 1.0, β = 0.5, c = 1e-4
Marquardt scaling — uses diag(JᵀJ) instead of identity, giving curvature-aware damping
Adaptive damping — λ increases on rejected steps, decreases on accepted steps
Convergence — terminates when ‖r‖ < tol or ‖Δx‖ < xtol or max iterations reached

LM is robust but can be slow for long constraint chains due to the dense normal equations solve at each iteration.

DogLeg / Powell

A trust-region method that interpolates between the steepest-descent step and the Gauss-Newton step:

Δx = α · Δx_SD + (1 − α) · Δx_GN,   ‖Δx‖ ≤ Δ (trust radius)

Key features:

22.7× faster than LM for chain-type problems (2.41ms vs 54.8ms on chain-20)
Trust radius adapts based on the ratio of actual to predicted reduction
When the GN step lies inside the trust region, it's taken directly (superlinear convergence)
When outside, the step follows the "dog-leg" path to the trust-region boundary

DogLeg excels on well-conditioned problems where the Gauss-Newton approximation is good.

BFGS

A quasi-Newton method that builds an approximation to the inverse Hessian H ≈ (JᵀJ)⁻¹:

Strong Wolfe line search with zoom phase for step-size acceptance
Hessian approximation updated via the BFGS formula: H_{k+1} = (I − ρ·s·yᵀ)H(I − ρ·y·sᵀ) + ρ·s·sᵀ
Full n×n inverse Hessian stored — suitable for small-to-medium problems (n < 100)
Superlinear convergence near the solution

L-BFGS

Limited-memory BFGS stores only the last m = 10 vector pairs (s, y) instead of the full Hessian:

Two-loop recursion computes H·g in O(mn) time
Memory usage: O(mn) instead of O(n²)
Auto-selected for n ≥ 100 parameters
Nearly identical convergence to full BFGS in practice

Sparse Conjugate Gradient (CG)

For large, sparse problems the solver uses CG on the normal equations with a CSC Jacobian:

Compute A = JᵀJ via SpGEMM (sparse matrix multiplication)
Apply Jacobi preconditioner M = diag(A)
Run Conjugate Gradient to solve A·Δx = −Jᵀr

This avoids forming the dense normal equations and scales to problems with thousands of parameters.

Fallback Chain

When using the Auto algorithm selection, the solver tries algorithms in sequence:

Primary: LM
  ├─ fails → DogLeg
  │          ├─ fails → BFGS
  │          │          └─ fails → return SolverError
  │          └─ succeeds → return solution
  └─ succeeds → return solution

A solve is considered "failed" if it reaches max iterations without convergence or if the residual increases on every step (stagnation).

Algorithm Comparison

Algorithm	Chain-20 Time	Relative	Memory	Best For
DogLeg	2.41 ms	1.0×	O(n²)	Well-conditioned, chain-like
BFGS	3.30 ms	1.4×	O(n²)	Small smooth problems
L-BFGS	3.15 ms	1.3×	O(mn)	Medium problems, n ≥ 100
Sparse CG	5.20 ms	2.2×	O(nnz)	Large sparse problems
LM	54.8 ms	22.7×	O(n²)	Difficult/ill-conditioned

LM's poor showing on chain problems is due to its conservative step-size strategy; however, it remains the most robust algorithm for ill-conditioned or near-singular problems where others diverge.