1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

"""Trust Region Reflective algorithm for least-squares optimization. 

 

The algorithm is based on ideas from paper [STIR]_. The main idea is to 

account for presence of the bounds by appropriate scaling of the variables (or 

equivalently changing a trust-region shape). Let's introduce a vector v: 

 

| ub[i] - x[i], if g[i] < 0 and ub[i] < np.inf 

v[i] = | x[i] - lb[i], if g[i] > 0 and lb[i] > -np.inf 

| 1, otherwise 

 

where g is the gradient of a cost function and lb, ub are the bounds. Its 

components are distances to the bounds at which the anti-gradient points (if 

this distance is finite). Define a scaling matrix D = diag(v**0.5). 

First-order optimality conditions can be stated as 

 

D^2 g(x) = 0. 

 

Meaning that components of the gradient should be zero for strictly interior 

variables, and components must point inside the feasible region for variables 

on the bound. 

 

Now consider this system of equations as a new optimization problem. If the 

point x is strictly interior (not on the bound) then the left-hand side is 

differentiable and the Newton step for it satisfies 

 

(D^2 H + diag(g) Jv) p = -D^2 g 

 

where H is the Hessian matrix (or its J^T J approximation in least squares), 

Jv is the Jacobian matrix of v with components -1, 1 or 0, such that all 

elements of matrix C = diag(g) Jv are non-negative. Introduce the change 

of the variables x = D x_h (_h would be "hat" in LaTeX). In the new variables 

we have a Newton step satisfying 

 

B_h p_h = -g_h, 

 

where B_h = D H D + C, g_h = D g. In least squares B_h = J_h^T J_h, where 

J_h = J D. Note that J_h and g_h are proper Jacobian and gradient with respect 

to "hat" variables. To guarantee global convergence we formulate a 

trust-region problem based on the Newton step in the new variables: 

 

0.5 * p_h^T B_h p + g_h^T p_h -> min, ||p_h|| <= Delta 

 

In the original space B = H + D^{-1} C D^{-1}, and the equivalent trust-region 

problem is 

 

0.5 * p^T B p + g^T p -> min, ||D^{-1} p|| <= Delta 

 

Here the meaning of the matrix D becomes more clear: it alters the shape 

of a trust-region, such that large steps towards the bounds are not allowed. 

In the implementation the trust-region problem is solved in "hat" space, 

but handling of the bounds is done in the original space (see below and read 

the code). 

 

The introduction of the matrix D doesn't allow to ignore bounds, the algorithm 

must keep iterates strictly feasible (to satisfy aforementioned 

differentiability), the parameter theta controls step back from the boundary 

(see the code for details). 

 

The algorithm does another important trick. If the trust-region solution 

doesn't fit into the bounds, then a reflected (from a firstly encountered 

bound) search direction is considered. For motivation and analysis refer to 

[STIR]_ paper (and other papers of the authors). In practice it doesn't need 

a lot of justifications, the algorithm simply chooses the best step among 

three: a constrained trust-region step, a reflected step and a constrained 

Cauchy step (a minimizer along -g_h in "hat" space, or -D^2 g in the original 

space). 

 

Another feature is that a trust-region radius control strategy is modified to 

account for appearance of the diagonal C matrix (called diag_h in the code). 

 

Note, that all described peculiarities are completely gone as we consider 

problems without bounds (the algorithm becomes a standard trust-region type 

algorithm very similar to ones implemented in MINPACK). 

 

The implementation supports two methods of solving the trust-region problem. 

The first, called 'exact', applies SVD on Jacobian and then solves the problem 

very accurately using the algorithm described in [JJMore]_. It is not 

applicable to large problem. The second, called 'lsmr', uses the 2-D subspace 

approach (sometimes called "indefinite dogleg"), where the problem is solved 

in a subspace spanned by the gradient and the approximate Gauss-Newton step 

found by ``scipy.sparse.linalg.lsmr``. A 2-D trust-region problem is 

reformulated as a 4-th order algebraic equation and solved very accurately by 

``numpy.roots``. The subspace approach allows to solve very large problems 

(up to couple of millions of residuals on a regular PC), provided the Jacobian 

matrix is sufficiently sparse. 

 

References 

---------- 

.. [STIR] Branch, M.A., T.F. Coleman, and Y. Li, "A Subspace, Interior, 

and Conjugate Gradient Method for Large-Scale Bound-Constrained 

Minimization Problems," SIAM Journal on Scientific Computing, 

Vol. 21, Number 1, pp 1-23, 1999. 

.. [JJMore] More, J. J., "The Levenberg-Marquardt Algorithm: Implementation 

and Theory," Numerical Analysis, ed. G. A. Watson, Lecture 

""" 

from __future__ import division, print_function, absolute_import 

 

import numpy as np 

from numpy.linalg import norm 

from scipy.linalg import svd, qr 

from scipy.sparse.linalg import LinearOperator, lsmr 

from scipy.optimize import OptimizeResult 

from scipy._lib.six import string_types 

 

from .common import ( 

step_size_to_bound, find_active_constraints, in_bounds, 

make_strictly_feasible, intersect_trust_region, solve_lsq_trust_region, 

solve_trust_region_2d, minimize_quadratic_1d, build_quadratic_1d, 

evaluate_quadratic, right_multiplied_operator, regularized_lsq_operator, 

CL_scaling_vector, compute_grad, compute_jac_scale, check_termination, 

update_tr_radius, scale_for_robust_loss_function, print_header_nonlinear, 

print_iteration_nonlinear) 

 

 

def trf(fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, x_scale, 

loss_function, tr_solver, tr_options, verbose): 

# For efficiency it makes sense to run the simplified version of the 

# algorithm when no bounds are imposed. We decided to write the two 

# separate functions. It violates DRY principle, but the individual 

# functions are kept the most readable. 

if np.all(lb == -np.inf) and np.all(ub == np.inf): 

return trf_no_bounds( 

fun, jac, x0, f0, J0, ftol, xtol, gtol, max_nfev, x_scale, 

loss_function, tr_solver, tr_options, verbose) 

else: 

return trf_bounds( 

fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, x_scale, 

loss_function, tr_solver, tr_options, verbose) 

 

 

def select_step(x, J_h, diag_h, g_h, p, p_h, d, Delta, lb, ub, theta): 

"""Select the best step according to Trust Region Reflective algorithm.""" 

if in_bounds(x + p, lb, ub): 

p_value = evaluate_quadratic(J_h, g_h, p_h, diag=diag_h) 

return p, p_h, -p_value 

 

p_stride, hits = step_size_to_bound(x, p, lb, ub) 

 

# Compute the reflected direction. 

r_h = np.copy(p_h) 

r_h[hits.astype(bool)] *= -1 

r = d * r_h 

 

# Restrict trust-region step, such that it hits the bound. 

p *= p_stride 

p_h *= p_stride 

x_on_bound = x + p 

 

# Reflected direction will cross first either feasible region or trust 

# region boundary. 

_, to_tr = intersect_trust_region(p_h, r_h, Delta) 

to_bound, _ = step_size_to_bound(x_on_bound, r, lb, ub) 

 

# Find lower and upper bounds on a step size along the reflected 

# direction, considering the strict feasibility requirement. There is no 

# single correct way to do that, the chosen approach seems to work best 

# on test problems. 

r_stride = min(to_bound, to_tr) 

if r_stride > 0: 

r_stride_l = (1 - theta) * p_stride / r_stride 

if r_stride == to_bound: 

r_stride_u = theta * to_bound 

else: 

r_stride_u = to_tr 

else: 

r_stride_l = 0 

r_stride_u = -1 

 

# Check if reflection step is available. 

if r_stride_l <= r_stride_u: 

a, b, c = build_quadratic_1d(J_h, g_h, r_h, s0=p_h, diag=diag_h) 

r_stride, r_value = minimize_quadratic_1d( 

a, b, r_stride_l, r_stride_u, c=c) 

r_h *= r_stride 

r_h += p_h 

r = r_h * d 

else: 

r_value = np.inf 

 

# Now correct p_h to make it strictly interior. 

p *= theta 

p_h *= theta 

p_value = evaluate_quadratic(J_h, g_h, p_h, diag=diag_h) 

 

ag_h = -g_h 

ag = d * ag_h 

 

to_tr = Delta / norm(ag_h) 

to_bound, _ = step_size_to_bound(x, ag, lb, ub) 

if to_bound < to_tr: 

ag_stride = theta * to_bound 

else: 

ag_stride = to_tr 

 

a, b = build_quadratic_1d(J_h, g_h, ag_h, diag=diag_h) 

ag_stride, ag_value = minimize_quadratic_1d(a, b, 0, ag_stride) 

ag_h *= ag_stride 

ag *= ag_stride 

 

if p_value < r_value and p_value < ag_value: 

return p, p_h, -p_value 

elif r_value < p_value and r_value < ag_value: 

return r, r_h, -r_value 

else: 

return ag, ag_h, -ag_value 

 

 

def trf_bounds(fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, 

x_scale, loss_function, tr_solver, tr_options, verbose): 

x = x0.copy() 

 

f = f0 

f_true = f.copy() 

nfev = 1 

 

J = J0 

njev = 1 

m, n = J.shape 

 

if loss_function is not None: 

rho = loss_function(f) 

cost = 0.5 * np.sum(rho[0]) 

J, f = scale_for_robust_loss_function(J, f, rho) 

else: 

cost = 0.5 * np.dot(f, f) 

 

g = compute_grad(J, f) 

 

jac_scale = isinstance(x_scale, string_types) and x_scale == 'jac' 

if jac_scale: 

scale, scale_inv = compute_jac_scale(J) 

else: 

scale, scale_inv = x_scale, 1 / x_scale 

 

v, dv = CL_scaling_vector(x, g, lb, ub) 

v[dv != 0] *= scale_inv[dv != 0] 

Delta = norm(x0 * scale_inv / v**0.5) 

if Delta == 0: 

Delta = 1.0 

 

g_norm = norm(g * v, ord=np.inf) 

 

f_augmented = np.zeros((m + n)) 

if tr_solver == 'exact': 

J_augmented = np.empty((m + n, n)) 

elif tr_solver == 'lsmr': 

reg_term = 0.0 

regularize = tr_options.pop('regularize', True) 

 

if max_nfev is None: 

max_nfev = x0.size * 100 

 

alpha = 0.0 # "Levenberg-Marquardt" parameter 

 

termination_status = None 

iteration = 0 

step_norm = None 

actual_reduction = None 

 

if verbose == 2: 

print_header_nonlinear() 

 

while True: 

v, dv = CL_scaling_vector(x, g, lb, ub) 

 

g_norm = norm(g * v, ord=np.inf) 

if g_norm < gtol: 

termination_status = 1 

 

if verbose == 2: 

print_iteration_nonlinear(iteration, nfev, cost, actual_reduction, 

step_norm, g_norm) 

 

if termination_status is not None or nfev == max_nfev: 

break 

 

# Now compute variables in "hat" space. Here we also account for 

# scaling introduced by `x_scale` parameter. This part is a bit tricky, 

# you have to write down the formulas and see how the trust-region 

# problem is formulated when the two types of scaling are applied. 

# The idea is that first we apply `x_scale` and then apply Coleman-Li 

# approach in the new variables. 

 

# v is recomputed in the variables after applying `x_scale`, note that 

# components which were identically 1 not affected. 

v[dv != 0] *= scale_inv[dv != 0] 

 

# Here we apply two types of scaling. 

d = v**0.5 * scale 

 

# C = diag(g * scale) Jv 

diag_h = g * dv * scale 

 

# After all this were done, we continue normally. 

 

# "hat" gradient. 

g_h = d * g 

 

f_augmented[:m] = f 

if tr_solver == 'exact': 

J_augmented[:m] = J * d 

J_h = J_augmented[:m] # Memory view. 

J_augmented[m:] = np.diag(diag_h**0.5) 

U, s, V = svd(J_augmented, full_matrices=False) 

V = V.T 

uf = U.T.dot(f_augmented) 

elif tr_solver == 'lsmr': 

J_h = right_multiplied_operator(J, d) 

 

if regularize: 

a, b = build_quadratic_1d(J_h, g_h, -g_h, diag=diag_h) 

to_tr = Delta / norm(g_h) 

ag_value = minimize_quadratic_1d(a, b, 0, to_tr)[1] 

reg_term = -ag_value / Delta**2 

 

lsmr_op = regularized_lsq_operator(J_h, (diag_h + reg_term)**0.5) 

gn_h = lsmr(lsmr_op, f_augmented, **tr_options)[0] 

S = np.vstack((g_h, gn_h)).T 

S, _ = qr(S, mode='economic') 

JS = J_h.dot(S) # LinearOperator does dot too. 

B_S = np.dot(JS.T, JS) + np.dot(S.T * diag_h, S) 

g_S = S.T.dot(g_h) 

 

# theta controls step back step ratio from the bounds. 

theta = max(0.995, 1 - g_norm) 

 

actual_reduction = -1 

while actual_reduction <= 0 and nfev < max_nfev: 

if tr_solver == 'exact': 

p_h, alpha, n_iter = solve_lsq_trust_region( 

n, m, uf, s, V, Delta, initial_alpha=alpha) 

elif tr_solver == 'lsmr': 

p_S, _ = solve_trust_region_2d(B_S, g_S, Delta) 

p_h = S.dot(p_S) 

 

p = d * p_h # Trust-region solution in the original space. 

step, step_h, predicted_reduction = select_step( 

x, J_h, diag_h, g_h, p, p_h, d, Delta, lb, ub, theta) 

 

x_new = make_strictly_feasible(x + step, lb, ub, rstep=0) 

f_new = fun(x_new) 

nfev += 1 

 

step_h_norm = norm(step_h) 

 

if not np.all(np.isfinite(f_new)): 

Delta = 0.25 * step_h_norm 

continue 

 

# Usual trust-region step quality estimation. 

if loss_function is not None: 

cost_new = loss_function(f_new, cost_only=True) 

else: 

cost_new = 0.5 * np.dot(f_new, f_new) 

actual_reduction = cost - cost_new 

# Correction term is specific to the algorithm, 

# vanishes in unbounded case. 

correction = 0.5 * np.dot(step_h * diag_h, step_h) 

 

Delta_new, ratio = update_tr_radius( 

Delta, actual_reduction - correction, predicted_reduction, 

step_h_norm, step_h_norm > 0.95 * Delta 

) 

alpha *= Delta / Delta_new 

Delta = Delta_new 

 

step_norm = norm(step) 

termination_status = check_termination( 

actual_reduction, cost, step_norm, norm(x), ratio, ftol, xtol) 

 

if termination_status is not None: 

break 

 

if actual_reduction > 0: 

x = x_new 

 

f = f_new 

f_true = f.copy() 

 

cost = cost_new 

 

J = jac(x, f) 

njev += 1 

 

if loss_function is not None: 

rho = loss_function(f) 

J, f = scale_for_robust_loss_function(J, f, rho) 

 

g = compute_grad(J, f) 

 

if jac_scale: 

scale, scale_inv = compute_jac_scale(J, scale_inv) 

else: 

step_norm = 0 

actual_reduction = 0 

 

iteration += 1 

 

if termination_status is None: 

termination_status = 0 

 

active_mask = find_active_constraints(x, lb, ub, rtol=xtol) 

return OptimizeResult( 

x=x, cost=cost, fun=f_true, jac=J, grad=g, optimality=g_norm, 

active_mask=active_mask, nfev=nfev, njev=njev, 

status=termination_status) 

 

 

def trf_no_bounds(fun, jac, x0, f0, J0, ftol, xtol, gtol, max_nfev, 

x_scale, loss_function, tr_solver, tr_options, verbose): 

x = x0.copy() 

 

f = f0 

f_true = f.copy() 

nfev = 1 

 

J = J0 

njev = 1 

m, n = J.shape 

 

if loss_function is not None: 

rho = loss_function(f) 

cost = 0.5 * np.sum(rho[0]) 

J, f = scale_for_robust_loss_function(J, f, rho) 

else: 

cost = 0.5 * np.dot(f, f) 

 

g = compute_grad(J, f) 

 

jac_scale = isinstance(x_scale, string_types) and x_scale == 'jac' 

if jac_scale: 

scale, scale_inv = compute_jac_scale(J) 

else: 

scale, scale_inv = x_scale, 1 / x_scale 

 

Delta = norm(x0 * scale_inv) 

if Delta == 0: 

Delta = 1.0 

 

if tr_solver == 'lsmr': 

reg_term = 0 

damp = tr_options.pop('damp', 0.0) 

regularize = tr_options.pop('regularize', True) 

 

if max_nfev is None: 

max_nfev = x0.size * 100 

 

alpha = 0.0 # "Levenberg-Marquardt" parameter 

 

termination_status = None 

iteration = 0 

step_norm = None 

actual_reduction = None 

 

if verbose == 2: 

print_header_nonlinear() 

 

while True: 

g_norm = norm(g, ord=np.inf) 

if g_norm < gtol: 

termination_status = 1 

 

if verbose == 2: 

print_iteration_nonlinear(iteration, nfev, cost, actual_reduction, 

step_norm, g_norm) 

 

if termination_status is not None or nfev == max_nfev: 

break 

 

d = scale 

g_h = d * g 

 

if tr_solver == 'exact': 

J_h = J * d 

U, s, V = svd(J_h, full_matrices=False) 

V = V.T 

uf = U.T.dot(f) 

elif tr_solver == 'lsmr': 

J_h = right_multiplied_operator(J, d) 

 

if regularize: 

a, b = build_quadratic_1d(J_h, g_h, -g_h) 

to_tr = Delta / norm(g_h) 

ag_value = minimize_quadratic_1d(a, b, 0, to_tr)[1] 

reg_term = -ag_value / Delta**2 

 

damp_full = (damp**2 + reg_term)**0.5 

gn_h = lsmr(J_h, f, damp=damp_full, **tr_options)[0] 

S = np.vstack((g_h, gn_h)).T 

S, _ = qr(S, mode='economic') 

JS = J_h.dot(S) 

B_S = np.dot(JS.T, JS) 

g_S = S.T.dot(g_h) 

 

actual_reduction = -1 

while actual_reduction <= 0 and nfev < max_nfev: 

if tr_solver == 'exact': 

step_h, alpha, n_iter = solve_lsq_trust_region( 

n, m, uf, s, V, Delta, initial_alpha=alpha) 

elif tr_solver == 'lsmr': 

p_S, _ = solve_trust_region_2d(B_S, g_S, Delta) 

step_h = S.dot(p_S) 

 

predicted_reduction = -evaluate_quadratic(J_h, g_h, step_h) 

step = d * step_h 

x_new = x + step 

f_new = fun(x_new) 

nfev += 1 

 

step_h_norm = norm(step_h) 

 

if not np.all(np.isfinite(f_new)): 

Delta = 0.25 * step_h_norm 

continue 

 

# Usual trust-region step quality estimation. 

if loss_function is not None: 

cost_new = loss_function(f_new, cost_only=True) 

else: 

cost_new = 0.5 * np.dot(f_new, f_new) 

actual_reduction = cost - cost_new 

 

Delta_new, ratio = update_tr_radius( 

Delta, actual_reduction, predicted_reduction, 

step_h_norm, step_h_norm > 0.95 * Delta) 

alpha *= Delta / Delta_new 

Delta = Delta_new 

 

step_norm = norm(step) 

termination_status = check_termination( 

actual_reduction, cost, step_norm, norm(x), ratio, ftol, xtol) 

 

if termination_status is not None: 

break 

 

if actual_reduction > 0: 

x = x_new 

 

f = f_new 

f_true = f.copy() 

 

cost = cost_new 

 

J = jac(x, f) 

njev += 1 

 

if loss_function is not None: 

rho = loss_function(f) 

J, f = scale_for_robust_loss_function(J, f, rho) 

 

g = compute_grad(J, f) 

 

if jac_scale: 

scale, scale_inv = compute_jac_scale(J, scale_inv) 

else: 

step_norm = 0 

actual_reduction = 0 

 

iteration += 1 

 

if termination_status is None: 

termination_status = 0 

 

active_mask = np.zeros_like(x) 

return OptimizeResult( 

x=x, cost=cost, fun=f_true, jac=J, grad=g, optimality=g_norm, 

active_mask=active_mask, nfev=nfev, njev=njev, 

status=termination_status)