A Survey of Generalized Gauss-Newton and Sequential Convex

A Survey of Generalized Gauss-Newton and

Sequential Convex Programming Methods

Moritz Diehl

Systems Control and Optimization Laboratory

Department of Microsystems Engineering and Department of Mathematics

University of Freiburg, Germany

based on joint work with

Florian Messerer (Freiburg)

and Joris Gillis (Leuven)

19th French-German-Swiss Conference on Optimization, Nice,

September 18, 2019

1 / 42

A Tutorial of Generalized Gauss-Newton and

Sequential Convex Programming Methods

Moritz Diehl

Systems Control and Optimization Laboratory

Department of Microsystems Engineering and Department of Mathematics

University of Freiburg, Germany

based on joint work with

Florian Messerer (Freiburg)

and Joris Gillis (Leuven)

19th French-German-Swiss Conference on Optimization, Nice,

September 18, 2019

2 / 42

Nonlinear optimization with convex substructure

minimize

w ∈ R

(w))

subject to F

(w) ∈ Ω

i = 1, . . . , m,

G(w) = 0

Assumptions:

twice continuously diﬀerentiable functions G : R

→ R

and

: R

→ R

for i = 0, 1, . . . , m.

outer function φ

: R

→ R convex.

sets Ω

⊂ R

convex for i = 1, . . . , m,

(possibly z ∈ Ω

⇔ φ

(z) ≤ 0 with smooth convex φ

)

Idea:

exploit convex substructure via iterative convex approximations.

3 / 42

Nonlinear optimization with convex substructure

minimize

w ∈ R

(w))

subject to F

(w) ∈ Ω

i = 1, . . . , m,

G(w) = 0

Assumptions:

twice continuously diﬀerentiable functions G : R

→ R

and

: R

→ R

for i = 0, 1, . . . , m.

outer function φ

: R

→ R convex.

sets Ω

⊂ R

convex for i = 1, . . . , m,

(possibly z ∈ Ω

⇔ φ

(z) ≤ 0 with smooth convex φ

)

Idea:

exploit convex substructure via iterative convex approximations.

3 / 42

Why is this class of problems and algorithms interesting?

some nonlinear programming (NLP) problems have nonsmooth

convex constraints which cannot be treated by standard NLP solvers

there exist many reliable and eﬃcient convex optimization solvers

Some application areas:

nonlinear matrix inequalities for reduced order controller design

[Fares, Noll, Apkarian 2002; Tran-Dinh et al. 2012]

ellipsoidal terminal regions in nonlinear model predictive control

[Chen and Allg¨ower 1998; Verschueren 2016]

robustiﬁed inequalities in nonlinear optimization [Nagy and Braatz

2003; D., Bock, Kostina 2006]

tube-following optimal control problems [Van Duijkeren, 2019]

non-smooth composite minimization [Lewis and Wright 2016]

deep neural network training with convex loss functions

[Schraudolph 2002; Martens 2016]

4 / 42

First the bad news

Iterative convex approximation methods such as sequential convex

programming (SCP) have only linear convergence in general.

The rate of convergence cannot be improved to superlinear by any

bounded semi-deﬁnite Hessian approximation [D., Jarre, Vogelbusch

2006]

Simple TR example problem with dominant nonconvexities in objective:

minimize

w ∈ R

− w

− (1 − w

)

subject to kwk

≤ 1

But: many real-world problems have dominant convexities and SCP often

shows fast linear convergence in practice. How fast?

5 / 42

First the bad news

Iterative convex approximation methods such as sequential convex

programming (SCP) have only linear convergence in general.

The rate of convergence cannot be improved to superlinear by any

bounded semi-deﬁnite Hessian approximation [D., Jarre, Vogelbusch

2006]

Simple TR example problem with dominant nonconvexities in objective:

minimize

w ∈ R

− w

− (1 − w

)

subject to kwk

≤ 1

But: many real-world problems have dominant convexities and SCP often

shows fast linear convergence in practice. How fast?

5 / 42

First the bad news

Iterative convex approximation methods such as sequential convex

programming (SCP) have only linear convergence in general.

The rate of convergence cannot be improved to superlinear by any

bounded semi-deﬁnite Hessian approximation [D., Jarre, Vogelbusch

2006]

Simple TR example problem with dominant nonconvexities in objective:

minimize

w ∈ R

− w

− (1 − w

)

subject to kwk

≤ 1

But: many real-world problems have dominant convexities and SCP often

shows fast linear convergence in practice. How fast?

5 / 42

First the bad news

Iterative convex approximation methods such as sequential convex

programming (SCP) have only linear convergence in general.

The rate of convergence cannot be improved to superlinear by any

bounded semi-deﬁnite Hessian approximation [D., Jarre, Vogelbusch

2006]

Simple TR example problem with dominant nonconvexities in objective:

minimize

w ∈ R

− w

− (1 − w

)

subject to kwk

≤ 1

But: many real-world problems have dominant convexities and SCP often

shows fast linear convergence in practice.

How fast?

5 / 42

First the bad news

Iterative convex approximation methods such as sequential convex

programming (SCP) have only linear convergence in general.

The rate of convergence cannot be improved to superlinear by any

bounded semi-deﬁnite Hessian approximation [D., Jarre, Vogelbusch

2006]

Simple TR example problem with dominant nonconvexities in objective:

minimize

w ∈ R

− w

− (1 − w

)

subject to kwk

≤ 1

But: many real-world problems have dominant convexities and SCP often

shows fast linear convergence in practice. How fast?

5 / 42

Overview

Smooth unconstrained problems

Sequential Convex Programming (SCP)

Generalized Gauss-Newton (GGN)

Local convergence analysis

Desirable divergence

Illustrative example in CasADi

Constrained problems

Sequential Convex Programming (SCP)

Sequential Linear Programming (SLP)

Constrained Generalized Gauss-Newton (CGGN)

Sequential Convex Quadratic Programming (SCQP)

Recent Progress in Dynamic Optimization and Applications

6 / 42

Simplest case: smooth unconstrained problems

Unconstrained minimization of ”convex over nonlinear” function:

minimize

w ∈ R

φ(F (w))

| {z }

=:f(w)

Assumptions:

- Inner function F : R

→ R

of class C

- Outer function φ : R

→ R of class C

and convex

Remark:

F, φ, f ∈ C

avoids technical details that would obfuscate main results.

7 / 42

Notation and Preliminaries

Matrix inequality A  B for A, B ∈ S

(symmetric matrices)

Gradient ∇f(w) ∈ R

and Hessian ∇

f(w) ∈ S

for f : R

→ R

Jacobian J(w) :=

∂F

∂w

(w) ∈ R

N×n

for F : R

→ R

Linearization (ﬁrst order Taylor series) at ¯w ∈ R

lin

(w; ¯w) := F ( ¯w) + J( ¯w) (w − ¯w)

Big O(·) notation: e.g. F

lin

(w; ¯w) = F (w) + O(kw − ¯wk

)

Note: gradient of f(w) = φ(F (w)) is ∇f (w) = J(w)

∇φ(F (w))

8 / 42

Method 1: Sequential Convex Programming (SCP)

Starting at w

∈ R

we generate sequence . . . , w

, w

k+1

, . . .

by obtaining w

k+1

as solution of convex subproblem at ¯w = w

minimize

w ∈ R

φ (F

lin

(w; ¯w))

| {z }

=:f

SCP

(w; ¯w)

(1)

(requires possibly expensive calls of a convex optimization solver)

Observation: f

SCP

(w; ¯w) = f (w) + O(kw − ¯wk

)

Corollary: ∇f( ¯w) = 0 ⇔ ¯w ﬁxed point of SCP

(if subproblems (1) have unique solutions)

9 / 42

Tutorial Example

Regard F (w) =



sin(w)

exp(w) − 2



and φ(z) = z

+ z

Linearization: F

lin

(w; ¯w) =



sin( ¯w) + cos( ¯w)(w − ¯w)

exp( ¯w) − 2 + exp( ¯w)(w − ¯w)



SCP objective:

SCP

(w; ¯w) = (sin( ¯w)+cos( ¯w)(w− ¯w))

(exp( ¯w)−2+exp( ¯w)(w− ¯w))

Fig.: f(w) = φ(F (w)) (solid) and f

SCP

(w; ¯w) (dashed) for ¯w = 0

10 / 42

SCP for Least Squares Problems = Gauss-Newton

With quadratic φ(z) =

kzk

z, SCP subproblems become

minimize

w ∈ R

kF (w

) + J(w

)(w − w

(2)

If rank(J) = n this is uniquely solvable, giving

k+1

= w

−



J(w

)

J(w

)

| {z }

=:B

)



−1

J(w

)

F (w

)

| {z }

=∇f(w

)

SCP applied to LS = Newton method with ”Gauss-Newton Hessian”

(w) ≈ ∇

f(w)

11 / 42

Method 2: Generalized Gauss-Newton cf. [Schraudolph 2002]

For general convex φ(·) we have for f(w) = φ(F (w))

∇

f(w) = J(w)

∇

φ(F (w)) J(w)

| {z }

=:B

GGN

(w)

”GGN Hessian”

j=1

∇

(w) ∇

φ(F (w))

| {z }

=:E

GGN

(w)

”Error matrix”

Generalized Gauss-Newton (GGN) method iterates according to

k+1

= w

− B

GGN

(w)

−1

∇f(w

)

Note: GGN solves convex quadratic subproblems

min

w ∈R

f(w

)+∇f(w

)

(w−w

)

GGN

)(w−w

)

| {z }

=:f

GGN

(w;w

)

12 / 42

Method 2: Generalized Gauss-Newton cf. [Schraudolph 2002]

For general convex φ(·) we have for f(w) = φ(F (w))

∇

f(w) = J(w)

∇

φ(F (w)) J(w)

| {z }

=:B

GGN

(w)

”GGN Hessian”

j=1

∇

(w) ∇

φ(F (w))

| {z }

=:E

GGN

(w)

”Error matrix”

Generalized Gauss-Newton (GGN) method iterates according to

k+1

= w

− B

GGN

(w)

−1

∇f(w

)

Note: GGN solves convex quadratic subproblems

min

w ∈R

f(w

)+∇f(w

)

(w−w

)

GGN

)(w−w

)

| {z }

=:f

GGN

(w;w

)

12 / 42

Method 2: Generalized Gauss-Newton cf. [Schraudolph 2002]

For general convex φ(·) we have for f(w) = φ(F (w))

∇

f(w) = J(w)

∇

φ(F (w)) J(w)

| {z }

=:B

GGN

(w)

”GGN Hessian”

j=1

∇

(w) ∇

φ(F (w))

| {z }

=:E

GGN

(w)

”Error matrix”

Generalized Gauss-Newton (GGN) method iterates according to

k+1

= w

− B

GGN

(w)

−1

∇f(w

)

Note: GGN solves convex quadratic subproblems

min

w ∈R

f(w

)+∇f(w

)

(w−w

)

GGN

)(w−w

)

| {z }

=:f

GGN

(w;w

)

12 / 42

Tutorial Example: GGN and SCP

Regard again F (w) =



sin(w)

exp(w) − 2



and φ(z) = z

+ z

Jacobian:

J( ¯w) =



cos( ¯w)

exp( ¯w)



GGN objective at ¯w = 0:

GGN

(w; ¯w) = (exp( ¯w)−2+exp( ¯w)(w− ¯w))

Figure shows f (solid), f

SCP

(dashed) and f

GGN

(dash dotted)

13 / 42

Diﬀerences and Similarities of SCP and GGN

both SCP and GGN generalize the Gauss-Newton method

(and become equal to it when applied to nonlinear least squares)

both iteratively solve convex optimization problems

GGN only solves a positive deﬁnite linear system per iteration

SCP solves a full convex problem per iteration (5-30x slower)

SCP often less sensitive to initialization

both are NOT second order methods

both converge linearly with the same rate (details follow)

14 / 42

Local Convergence Analysis for SCP and GGN

Recall:

SCP

(w; ¯w) = φ (F

lin

(w; ¯w))

GGN

(w; ¯w) = f ( ¯w)+∇f( ¯w)

(w− ¯w)+

(w− ¯w)

GGN

( ¯w)(w− ¯w)

∇

SCP

( ¯w; ¯w) = ∇

GGN

( ¯w; ¯w) = ∇f( ¯w)

Lemma 1 (Equality of SCP and GGN up to Second Order)

SCP

(w; ¯w) = f

GGN

(w; ¯w) + O(kw − ¯wk

)

Proof: ∇

SCP

( ¯w; ¯w) = B

GGN

( ¯w) = ∇

GGN

( ¯w; ¯w)

Note: in general B

GGN

( ¯w) 6= ∇

f( ¯w)

15 / 42

Local Convergence of SCP and GGN

Main Theorem 1 [Diehl and Messerer, CDC 2019]

Regard w

∗

with ∇f(w

∗

) = 0 and B

GGN

∗

)  0. Then

∗

is a ﬁxed point for both the SCP and GGN iterations

both methods are well-deﬁned in a neighborhood of w

∗

their linear contraction rates are equal and given by the LMI

min{α ∈ R | − αB

GGN

∗

)  E

GGN

∗

)  αB

GGN

∗

)}

Corollary: Necessary condition for local convergence of both methods is

GGN

∗

) 

∇

f(w

∗

)  0

*Proof of corollary: Set α =1 and use E

GGN

∗

)= ∇

f(w

∗

)−B

GGN

∗

)

16 / 42

Local Convergence of SCP and GGN

Main Theorem 1 [Diehl and Messerer, CDC 2019]

Regard w

∗

with ∇f(w

∗

) = 0 and B

GGN

∗

)  0. Then

∗

is a ﬁxed point for both the SCP and GGN iterations

both methods are well-deﬁned in a neighborhood of w

∗

their linear contraction rates are equal and given by the LMI

min{α ∈ R | − αB

GGN

∗

)  E

GGN

∗

)  αB

GGN

∗

)}

Corollary: Necessary condition for local convergence of both methods is

GGN

∗

) 

∇

f(w

∗

)  0

*Proof of corollary: Set α =1 and use E

GGN

∗

)= ∇

f(w

∗

)−B

GGN

∗

)

16 / 42

Local Convergence of SCP and GGN

Main Theorem 1 [Diehl and Messerer, CDC 2019]

Regard w

∗

with ∇f(w

∗

) = 0 and B

GGN

∗

)  0. Then

∗

is a ﬁxed point for both the SCP and GGN iterations

both methods are well-deﬁned in a neighborhood of w

∗

their linear contraction rates are equal and given by the LMI

min{α ∈ R | − αB

GGN

∗

)  E

GGN

∗

)  αB

GGN

∗

)}

Corollary: Necessary condition for local convergence of both methods is

GGN

∗

) 

∇

f(w

∗

)  0

*Proof of corollary: Set α =1 and use E

GGN

∗

)= ∇

f(w

∗

)−B

GGN

∗

)

16 / 42

*Proof of Main Theorem

Deﬁne solution operators w

sol

SCP

( ¯w) and w

sol

GGN

( ¯w) at linearization point

¯w and apply the implicit function theorem to

∇

sol

( ¯w); ¯w) = 0 for i = SCP, GGN

Well-deﬁned for ¯w in neighborhood of w

∗

because

∇

SCP

∗

; w

∗

) = ∇

GGN

∗

; w

∗

) = B

GGN

∗

)  0, and derivatives

are given by

sol

d ¯w

∗

) = −(∇

∇

∗

; w

∗

)

| {z }

GGN

∗

)=:B

∗

)

−1

∇

¯w

∇

∗

; w

∗

)

| {z }

GGN

∗

)=:E

∗

We used that all second derivatives are equal due to Lemma 3 and that

∇

¯w

∇

GGN

∗

; w

∗

) = ∇

¯w

(∇f( ¯w) + B

GGN

( ¯w)(w − ¯w))|

w= ¯w=w

∗

= ∇

f(w

∗

) − B

GGN

∗

) = E

GGN

∗

)

17 / 42

*Proof of Main Theorem (continued)

Spectral radius ρ(−B

−1

∗

) = ρ(B

−1

∗

) equals linear contraction rate

of SCP and GGN algorithms. Matrix can be transformed to similar, but

symmetric matrix: B

−1

∗

∼ B

−

∗

−

∗

with same spectral radius.

Now, ρ(B

−

∗

−

∗

) ≤ α ⇔ −αI  B

−

∗

−

∗

 αI

⇔ −αB

∗

 E

∗

 αB

∗

18 / 42

Desirable divergence and mirror problem, cf. [Bock 1987]

SCP and GGN do not converge to every local minimum. This can help to

avoid ”bad” local minima, as discussed next.

Regard maximum likelihood estimation problem min

φ(M(w) − y)

with nonlinear model M : R

→ R

and measurements y ∈ R

Assume penalty φ is symmetric with φ(−z) = φ(z) as is the case for

symmetric error distributions. At a solution w

∗

, we can generate ”mirror

measurements” y

:= 2M(w

∗

) − y obtained by reﬂecting the residuals.

From a statistical point of view, y

should be as likely as y.

19 / 42

SCP Divergence ⇔ Minimum unstable under mirroring

Theorem 2 [Diehl and Messerer 2019] generalizing [Bock 1987]

Regard a local minimizer w

∗

of f(w) = φ(M(w) − y) with

∇

f(w

∗

)  0. If the necessary SCP-GGN-convergence condition

GGN

∗

) 

∇

f(w

∗

) does not hold, then w

∗

is a stationary point of

the mirror problem but not a local minimizer.

*Sketch of proof: use M (w

∗

) − y

= y − M (w

∗

) to show that

∇f

∗

) = J(w

∗

)

(y − M (w

∗

)) = 0 and

∇

∗

) = B

GGN

∗

) − E

GGN

∗

) = 2B

GGN

∗

) − ∇

f(w

∗

) 6 0

20 / 42

SCP Divergence ⇔ Minimum unstable under mirroring

Theorem 2 [Diehl and Messerer 2019] generalizing [Bock 1987]

Regard a local minimizer w

∗

of f(w) = φ(M(w) − y) with

∇

f(w

∗

)  0. If the necessary SCP-GGN-convergence condition

GGN

∗

) 

∇

f(w

∗

) does not hold, then w

∗

is a stationary point of

the mirror problem but not a local minimizer.

*Sketch of proof: use M (w

∗

) − y

= y − M (w

∗

) to show that

∇f

∗

) = J(w

∗

)

(y − M (w

∗

)) = 0 and

∇

∗

) = B

GGN

∗

) − E

GGN

∗

) = 2B

GGN

∗

) − ∇

f(w

∗

) 6 0

20 / 42

Illustrative example (experiments conducted by Florian Messerer)

0 2 4 6 8

−1

input x

output

m(x, w

true

)

Regard scalar model with unknown parameter w

m(x, w) := sin(wx)

with input output measurements (x

, y

) for i = 1, . . . , N (left plot).

Deﬁne M

(w) := m(x

, w) and F (w) := M(w) − y

As outer convexity we use a Huber-like penalty (right plot)

φ(z) :=

i=1

+ z

, (3)

21 / 42

Implementation of SCP and GGN

obtain all needed derivatives from CasADi via MATLAB interface

use linear solve (”backslash”) for GGN

formulate SCP subproblem as second order cone program (SOCP):

min

w ∈ R,

s ∈ R

i=1

s.t.

+ F

lin,i

(w; w

)

≤ s

for i = 1, . . . , N

use Gurobi via CasADi as SOCP solver

22 / 42

CasADi for Optimization Modelling

●

A software framework for nonlinear optimization and optimal control

●

“Write an efficient optimal control solver in a few lines”

●

Implements automatic differentiation (AD) on sparse matrix-valued

computational graphs in C++11

●

Front-ends to Python, Matlab and Octave

●

Supports C code generation

●

Back-ends to SUNDIALS, CPLEX, Gurobi, qpOASES, IPOPT, KNITRO,

SNOPT, SuperSCS, OSQP, …

●

Developed by Joel Andersson and Joris Gillis

http://casadi.org

Nov 18-20: Hands-on CasADi course on optimal control: http://hasselt2019.casadi.org

Jupyter CasADi demo for this talk: http://fgs.casadi.org

23 / 42

Visualization of SCP and GGN Subproblems

0 0.5 1 1.5 2

0.5

1.5

model parameter w

cost

f(w) f

SCP

(w; ¯w) f

GGN

(w; ¯w)

( ¯w = 1.15)

24 / 42

Objective and Local Contraction Rate

f(w) and local rate ˇα(w) =

|∇

f(w)−B

GGN

(w)|

GGN

(w)|

0 0.5 1 1.5 2 2.5 3

(a) (b)

model parameter w

f(w) − f

∗

ˇα(w)

Fast contraction rate at global minimum w

∗

≈ 1, otherwise slower.

25 / 42

Empirical Contraction Rates in Agreement with Theorem 1

k+1

−w

k−1

contraction rate κ

2 4 6 8

0.5

1.5

(a) w

= 1.3, w

∗

≈ 1

SCP

GGN

ˇα(w

∗

)

10 20 30 40 50 60 70

0.5

1.5

iteration k

(b) w

= 2.9, w

∗

≈ 2.88

26 / 42

Iteration count: SCP more predictable than GGN

0.6 0.8 1 1.2 1.4

initial guess w

iterations until convergence

SCP

GGN

27 / 42

Overview

Smooth unconstrained problems

Sequential Convex Programming (SCP)

Generalized Gauss-Newton (GGN)

Local convergence analysis

Desirable divergence

Illustrative example in CasADi

Constrained problems

Sequential Convex Programming (SCP)

Sequential Linear Programming (SLP)

Constrained Generalized Gauss-Newton (CGGN)

Sequential Convex Quadratic Programming (SCQP)

Recent Progress in Dynamic Optimization and Applications

28 / 42

Sequential Convex Programming (SCP)

minimize

w ∈ R

lin

(w; ¯w))

subject to F

lin

(w; ¯w) ∈ Ω

, i = 1, . . . , m,

lin

(w; ¯w) = 0

obtain w

k+1

as solution of convex problem at ¯w = w

only primal variables ¯w form ”optimizer state”, no multipliers needed

SCP is aﬃne invariant

29 / 42

A General Smooth NLP Formulation

From now on assume more smoothness and regard a smooth NLP

minimize

w ∈ R

(w))

| {z }

=:f

(w)

subject to φ

(w))

| {z }

=:f

(w)

≤ 0, i = 1, . . . , m,

G(w) = 0

with convex C

functions φ

, φ

, . . . , φ

30 / 42

Sequential Linear Programming (SLP)

If functions φ

, . . . , φ

are linear, SCP just solves linear programs (LP):

minimize

w ∈ R

lin

(w; ¯w)

subject to f

lin

(w; ¯w) ≤ 0, i = 1, . . . , m,

lin

(w; ¯w) = 0

might be called Sequential Linear Programming (SLP)

(”Method of Approximation Programming” by Griﬃth & Stewart, 1961)

equivalent to standard SQP with zero Hessian

SLP only attracted to NLP solutions in vertices of feasible set

works very well for L1-estimation [Bock, Kostina, Schl¨oder 2007]

converges quadratically once correct active set is discovered

31 / 42

Picture from Griﬃth and Stewart’s original 1961 paper

386 R. E. GRIFFITH AND R. A. STEWART

>i.P SOLUTION

STARTING

31 | ~~~~~P O I N T TR UE d

o'I 3

FIG. 4. Third LP problem

For the purpose of illustration we will consider the problem of balancing cata-

lytic reformer severity (S) with tetraethyl lead (TEL) level in gasoline blending.

We will simplify the problem to the following.

A catalytic reformer is a plant which can increase the octane number (ON)

of a gasoline stream but only with an associated loss in volume of the gasoline

stream. The only operating variable considered here is a severity parameter (S).

The by-products from this reformer operation are not gasoline components but

may be sold at a fixed unit price. The total value of the by-products will be

(VBP). The operating cost (OC) of running the reformer is a function of S. The

main product of the reformer is called reformate (R). The reformer has a fixed

amount of feed which must be processed and all the reformate must be blended.

TEL is a material which may be added to gasoline to increase the ON. TEL

does not change the volume of the gasoline.

Two blends of gasoline are required in this problem, premium (A) and regular

(B). There is a maximum amount of premium which may be blended but no

quantity restriction is placed on the regular.

The only quality specification considered here is ON. Both blends A and B

must meet a minimum ON.

This content downloaded from 46.183.103.17 on Thu, 21 Mar 2019 20:34:26 UTC

All use subject to https://about.jstor.org/terms

32 / 42

Constrained Generalized Gauss-Newton [Bock 1987]

Use B

GGN

(w) := J

(w)

∇

(w))J

(w) and solve convex quadratic

program (QP)

minimize

w ∈ R

lin

(w; ¯w) +

(w − ¯w)

GGN

( ¯w)(w − ¯w)

subject to f

lin

(w; ¯w) ≤ 0, i = 1, . . . , m,

lin

(w; ¯w) = 0

like SCP, the method is multiplier free

also aﬃne invariant

Remark: for least-squares objectives, this method is due to [Bock 1987]. In

mathematical optimization papers, Bock’s method is called ”the Generalized

Gauss-Newton (GGN) method” because it generalizes the GN method to

constrained problems. To avoid a notation clash with computer science we

prefer to call Bock’s method ”the Constrained Gauss-Newton (CGN) method”.

33 / 42

Sequential Convex Quadratic Programming (SCQP) [Verschueren et al. 2016]

SCQP

(w, µ) := J

(w)

∇

(w))J

(w) +

i=1

(w)

∇

(w))J

(w)

minimize

w ∈ R

lin

(w; ¯w) +

(w − ¯w)

SCQP

( ¯w, ¯µ)(w − ¯w)

subject to f

lin

(w; ¯w) ≤ 0, i = 1, . . . , m, | µ

lin

(w; ¯w) = 0

again, only a QP needs to be solved in each iteration

again, aﬃne invariant

”optimizer state” contains both, ¯w and inequality multipliers ¯µ

SCQP

(w, µ)  B

GGN

(w) i.e. more likely to converge than GGN

SCQP has same contraction rate as SCP, characterized by two LMI on the

reduced Hessian approximation [Messerer &D., 2019]

34 / 42

Overview

Smooth unconstrained problems

Sequential Convex Programming (SCP)

Generalized Gauss-Newton (GGN)

Local convergence analysis

Desirable divergence

Illustrative example in CasADi

Constrained problems

Sequential Convex Programming (SCP)

Sequential Linear Programming (SLP)

Constrained Generalized Gauss-Newton (CGGN)

Sequential Convex Quadratic Programming (SCQP)

Recent Progress in Dynamic Optimization and Applications

35 / 42

Dynamic Optimization in a Nutshell

minimize

x, u

i=0

, u

)) + ϕ

))

subject to x

i+1

= S

, u

), i = 0, . . . , N − 1,

, u

) ∈ Ω

, i = 0, . . . , N − 1,

) ∈ Ω

variables w = (x, u) with x = (x

, . . . , x

) and u = (u

, . . . , u

N −1

)

convexities in ϕ

(e.g. quadratic) and Ω

(e.g. polyhedral)

nonlinearities in dynamic system S

and constraint functions F

, H

often: S

result of time integration (direct multiple shooting)

36 / 42

Recent Algorithmic Progress

Inexact Newton with Iterated Sensitivities (INIS) - Rien Quirynen

Generalized Nonlinear Static Feedback Integration - Jonathan Frey

Zero Order Moving Horizon Estimation - Katrin Baumg¨artner

Advanced Step Real-Time Iteration - Armin Nurkanovic

Convex Inner Approximation (CIAO) for mobile robots - Tobias Sch¨ols

37 / 42

Recent Progress on Software and Applications

BLASFEO: BLAS for Embedded Optimization - Gianluca Frison

acados: real-time SCQP type algorithms for nonlinear model predictive

control (NMPC) - Robin Verschueren, Dimitris Kouzoupis, Gianluca

Frison, et al.

(both under permissive open-source BSD license)

Recent real-world NMPC applications:

NMPC of small quadcopters in Freiburg - Barbara Barrros

NMPC of human sized quadcopters in California - Andrea Zanelli

4 kHz NMPC of reluctance synchronous motor in Munich - Andrea Zanelli

100 Hz NMPC of Freiburg race cars - Daniel Kloeser

10 Hz CIAO-NMPC of mobile robots at Bosch - Tobias Sch¨ols

38 / 42

Some Real-Time SCP Applications

39 / 42

Conclusions

Discussed a family of Iterative Convex Approximation Methods

Sequential Convex Programming (SCP) conceptually simplest

(”linearize nonlinearities, keep all convex structures”) and most

widely applicable e.g. to nonlinear cone programming.

Generalized Gauss-Newton (GGN) cheaper iterations as only QPs

are solved. Unconstrained GGN and constrained SCQP show

identical local convergence rates as SCP for smooth problems.

Local contraction rate for both SCP and GGN equals smallest α

satisfying −αB

GGN

∗

)  ∇

f(w

∗

) − B

GGN

∗

)  αB

GGN

∗

)

Divergence might in fact be a desirable property in estimation as it

avoids being attracted by statistically unstable local minima

40 / 42

Some References 1

J. Andersson, J. Gillis, G. Horn, J. B. Rawlings, and M. Diehl. CasADi: a

software framework for nonlinear optimization and optimal control.

Mathematical Programming Computation, 2018.

H.G. Bock, E. Kostina, and J.P. Schl¨oder. Numerical methods for

parameter estimation in nonlinear diﬀerential algebraic equations. GAMM

Mitteilungen, 30/2:376-408, 2007.

J. Martens. Second-Order Optimization for Neural Networks. PhD thesis,

University of Toronto, 2016.

N. Schraudolph. Fast curvature matrix-vector products for second-order

gradient descent. Neural Computation, 14(7):1723–1738, 2002.

Q. Tran-Dinh. Sequential Convex Programming and Decomposition

Approaches for Nonlinear Optimization. PhD thesis, KU Leuven, 2012.

R. Verschueren, N. van Duijkeren, R. Quirynen, and M. Diehl. Exploiting

convexity in direct optimal control: a sequential convex quadratic

programming method. In Proceedings of the IEEE Conference on Decision

and Control (CDC), 2016.

41 / 42

Some References 2

M. Diehl and F. Messerer. Local Convergence of Generalized

Gauss-Newton and Sequential Convex Programming, CDC, 2019

(accepted)

N. Van Duijkeren. Online Motion Control in Virtual Corridors for Fast

Robotic Systems, PhD thesis, KU Leuven, 2019.

M. Diehl, F. Jarre, C. Vogelbusch. Loss of superlinear convergence for an

SQP-type method with conic constraints, SIAM Journal on Optimization,

2006.

Q. Tran-Dinh, S. Gumussoy, W. Michiels, M. Diehl. Combining

convex-concave decompositions and linearization approaches for solving

BMIs, with application to static output feedback, IEEE Transactions on

Automatic Control, 2012.

R.E. Griﬃth and R.A. Stewart. A Nonlinear Programming Technique for

the Optimization of Continuous Processing Systems, Management Sci., 7:

379-392, 1961.

42 / 42