Hessian Riemannian Gradient Flows in Convex Programming

Size: px

Start display at page:

Download "Hessian Riemannian Gradient Flows in Convex Programming"

Ambrose Gray
5 years ago
Views:

1 Hessian Riemannian Gradient Flows in Convex Programming Felipe Alvarez, Jérôme Bolte, Olivier Brahic INTERNATIONAL CONFERENCE ON MODELING AND OPTIMIZATION MODOPT 2004 Universidad de La Frontera, Temuco, Chile January 19-22, Hessian Riemannian Flows. p. 1/25

2 Outline 1. Motivation: scaling the Euclidean gradient. 2. Riemannian gradient flows on convex sets. 3. Hessian metrics, existence, convergence and examples. Hessian Riemannian Flows. p. 2/25

3 1. Motivation: gradient method Let f : R n R be a smooth function, x 0 R n. x k+1 = x k + λ k d k, Hessian Riemannian Flows. p. 3/25

4 1. Motivation: gradient method Let f : R n R be a smooth function, x 0 R n. x k+1 = x k + λ k d k, where λ k > 0 is a stepsize Hessian Riemannian Flows. p. 3/25

5 1. Motivation: gradient method Let f : R n R be a smooth function, x 0 R n. x k+1 = x k + λ k d k, where λ k > 0 is a stepsize and d k = f(x k ) is the steepest descent direction. Hessian Riemannian Flows. p. 3/25

6 1. Motivation: gradient method Let f : R n R be a smooth function, x 0 R n. x k+1 = x k + λ k d k, where λ k > 0 is a stepsize and d k = f(x k ) is the steepest descent direction. Continuous gradient method: dx dt = f(x), t > 0. Continuous flow discrete method. Hessian Riemannian Flows. p. 3/25

7 1. Scaling and Newton s method Newton s correction: x k+1 = x k λ k 2 f(x k ) 1 f(x k ). Continuous Newton s method: dx dt = 2 f(x) 1 f(x). Hessian Riemannian Flows. p. 4/25

8 1. Scaling and Newton s method Newton s correction: x k+1 = x k λ k 2 f(x k ) 1 f(x k ). Continuous Newton s method: dx dt = 2 f(x) 1 f(x). d f(x) = f(x). dt Hessian Riemannian Flows. p. 4/25

9 1. Scaling and Newton s method Newton s correction: x k+1 = x k λ k 2 f(x k ) 1 f(x k ). Continuous Newton s method: dx dt = 2 f(x) 1 f(x). d f(x) = f(x). dt f(x(t)) = e t f(x 0 ) Scale invariant rate of convergence on a straight line! back. Hessian Riemannian Flows. p. 4/25

10 1. Scaling and constraints Problem: min{f(x) x 0}. Hessian Riemannian Flows. p. 5/25

11 1. Scaling and constraints Problem: min{f(x) x 0}. ODE approach: dx i dt = x f i (x), i = 1,...,n. x i Hessian Riemannian Flows. p. 5/25

12 1. Scaling and constraints Problem: min{f(x) x 0}. ODE approach: Properties: d dt f(x) = n i=1 dx i dt = x f i (x), i = 1,...,n. x i f x i x i (x) 2 0 descent method on R n +. x i (0) > 0 t > 0, x i (t) > 0 interior point trajectory. Hessian Riemannian Flows. p. 5/25

13 1. Scaling and constraints Problem: min{f(x) x 0}. ODE approach: Properties: d dt f(x) = n i=1 dx i dt = x f i (x), i = 1,...,n. x i f x i x i (x) 2 0 descent method on R n +. x i (0) > 0 t > 0, x i (t) > 0 interior point trajectory. The equation may be written as d dt log(x i) = f (x) x i Scaling logarithmic barrier to force x(t) > 0! Hessian Riemannian Flows. p. 5/25

14 1. Hessian type scaling d dt log(x i) = f (x) x i Hessian Riemannian Flows. p. 6/25

15 1. Hessian type scaling where d dt log(x i) = f (x) x i d h(x) = f(x), dt h(x) = n x i log(x i ) x i i=1 Hessian Riemannian Flows. p. 6/25

16 1. Hessian type scaling where Thus d dt log(x i) = f (x) x i d h(x) = f(x), dt h(x) = n x i log(x i ) x i i=1 dx i dt = x f i (x) dx x i dt = 2 h(x) 1 f(x) Remark the analogy with Newton s method Hessian Riemannian Flows. p. 6/25

17 2. Riemannian gradient flows Problem: min f(x), with C = {x x C Rn x Q, Ax = b}. Q R n nonempty, open and convex. A R m n and b R m with m n. Hessian Riemannian Flows. p. 7/25

18 2. Riemannian gradient flows Problem: min f(x), with C = {x x C Rn x Q, Ax = b}. Q R n nonempty, open and convex. A R m n and b R m with m n. Strategy: Introduce a Riemannian metric H(x) S n ++ on Q, (u,v) x = H(x)u,v = Consider the gradient flow n i=1 H ij (x)u i v j, x Q. dx dt (t) = Hf(x(t)),t > 0, Hessian Riemannian Flows. p. 7/25

19 2. Riemannian gradient Let M, (, ) be a Riemannian manifold. T x M: tangent space to M at x M. Hessian Riemannian Flows. p. 8/25

20 2. Riemannian gradient Let M, (, ) be a Riemannian manifold. T x M: tangent space to M at x M. The gradient gradf of f C 1 (M; R) is uniquely determined by Tangency: for all x M, gradf(x) T x M. Duality: for all x M, v T x M, df(x)v = (gradf(x),v) x, where df(x) : T x M R is the differential of f. Hessian Riemannian Flows. p. 8/25

21 2. Riemannian gradient in our case M = Q {x R n Ax = b} with Q open set, then T x M Ker A. (, ) x = H(x), with a barrier/penalty effect near Q. Hessian Riemannian Flows. p. 9/25

22 2. Riemannian gradient in our case M = Q {x R n Ax = b} with Q open set, then T x M Ker A. (, ) x = H(x), with a barrier/penalty effect near Q. H f = H 1 [I A T (AH 1 A T ) 1 AH 1 ] f, where H stands for grad to stress the dependence on H. Hessian Riemannian Flows. p. 9/25

23 2. Riemannian gradient in our case M = Q {x R n Ax = b} with Q open set, then T x M Ker A. (, ) x = H(x), with a barrier/penalty effect near Q. H f = H 1 [I A T (AH 1 A T ) 1 AH 1 ] f, where H stands for grad to stress the dependence on H. Projection vector field in the tangent space. Scaling interior point method x(t) Q. Hessian Riemannian Flows. p. 9/25

24 2. Example C = n 1 := {x R n x 0, n i=1 x i = 1} Hessian Riemannian Flows. p. 10/25

25 2. Example C = n 1 := {x R n x 0, n i=1 x i = 1} Q = R n ++, A = (1,...,1) R 1 n and b = 1. M = {x R n x > 0, n i=1 x i = 1}. T x M = {v R n n i=1 v i = 0}. Hessian Riemannian Flows. p. 10/25

26 2. Example C = n 1 := {x R n x 0, n i=1 x i = 1} Q = R n ++, A = (1,...,1) R 1 n and b = 1. M = {x R n x > 0, n i=1 x i = 1}. T x M = {v R n n i=1 v i = 0}. Take H(x) = diag(1/x 1,...,1/x n ), then (u,v) x = n i=1 u i v j x i Shahshahani metric. dx i dt = x f i + x i n j=1 x i x j f x j Lotka-Volterra type eq. Karmarkar 90, Faybusovich 91,... Hessian Riemannian Flows. p. 10/25

27 2. Barrier effect: Legendre functions We focus on the case H(x) = 2 h(x), x Q with h : R n R {+ } is closed, convex and proper. int dom h = Q. (H 0 ) h is of Legendre type. h Q C 2 (Q; R) and 2 h(x) > 0. 2 h is locally Lipschitz. Hessian Riemannian Flows. p. 11/25

28 2. Barrier effect: Legendre functions We focus on the case H(x) = 2 h(x), x Q with h : R n R {+ } is closed, convex and proper. int dom h = Q. (H 0 ) h is of Legendre type. h Q C 2 (Q; R) and 2 h(x) > 0. 2 h is locally Lipschitz. h is strictly convex and C 1 on int dom h. int dom h x j x int dom h, h(x j ) +. back Hessian Riemannian Flows. p. 11/25

29 2. Example of Legendre function h(x) = i I θ(g i (x)). where I = {1,...,p}, g i C 3 (R n ) concave. Q = {x R n g i (x) > 0,i I}. x Q, span { g i (x) i I} = R n, and (i) (0, ) domθ [0, ), θ C 3 (0, ). (ii) lim s 0 + θ (s) = and s > 0, θ (s) > 0. (iii) Either θ is nonincreasing or i I,g i is affine. Hessian Riemannian Flows. p. 12/25

30 3. Questions Given x 0 M = Q {x R n Ax = b}: dx dt (t) = Hf(x(t)),t > 0, Hessian Riemannian Flows. p. 13/25

31 3. Questions Given x 0 M = Q {x R n Ax = b}: dx dt (t) = Hf(x(t)),t > 0, Well posedness: global existence for all t > 0. Asymptotic behavior: convergence to an equilibrium as t, rate of convergence,... Hessian Riemannian Flows. p. 13/25

32 3. Questions Given x 0 M = Q {x R n Ax = b}: dx dt (t) = Hf(x(t)),t > 0, Well posedness: global existence for all t > 0. Asymptotic behavior: convergence to an equilibrium as t, rate of convergence,... Main difficulty: singular behavior near Q. Classical results do not apply. Hessian Riemannian Flows. p. 13/25

33 3. Well posedness: global existence Thm. 2 The trajectory x(t) is defined for all t 0 under any of the following conditions: (C 1 ) {x C f(x) f(x 0 )} is bounded. (i) dom h = Q (C 2 ) (ii) y Q, γ R, {x C D h (y,x) γ} is bdd. (iii) Argmin C f and f quasiconvex. (C 3 ) K 0, L R such that x Q, H(x) 1 K x + L. Hessian Riemannian Flows. p. 14/25

34 3. Why Hessian metrics? Suppose f is convex and A = 0 and b = 0. Euclidean case: y Argmin C f x C, f(x),x y 0. Hessian Riemannian Flows. p. 15/25

35 3. Why Hessian metrics? Suppose f is convex and A = 0 and b = 0. Euclidean case: y Argmin C f x C, f(x),x y 0. ϕ y (x) = 1 x y 2 2 ẋ = f(x) d dt ϕ y(x) = ϕ y (x),ẋ = x y, f(x) 0. Hessian Riemannian Flows. p. 15/25

36 3. Why Hessian metrics? Suppose f is convex and A = 0 and b = 0. Euclidean case: y Argmin C f x C, f(x),x y 0. ϕ y (x) = 1 x y 2 2 ẋ = f(x) d dt ϕ y(x) = ϕ y (x),ẋ = x y, f(x) 0. ϕ y (x) is a Lyapunov function for the gradient flow Hessian Riemannian Flows. p. 15/25

37 3. Characterization of Hessian metrics Riemannian case: y Argmin Q f x Q, ( H f(x),x y) x 0. Hessian Riemannian Flows. p. 16/25

38 3. Characterization of Hessian metrics Riemannian case: y Argmin Q f x Q, ( H f(x),x y) x 0. Thm. 1 H C 1 (Q; S n ++) satisfies y Q, ϕ y C 1 (Q; R), H ϕ y (x) = x y? Hessian Riemannian Flows. p. 16/25

39 3. Characterization of Hessian metrics Riemannian case: y Argmin Q f x Q, ( H f(x),x y) x 0. Thm. 1 H C 1 (Q; S n ++) satisfies y Q, ϕ y C 1 (Q; R), H ϕ y (x) = x y h C 3 (Q) such that H = 2 h on Q and ϕ y (x) = D h (y,x) = h(y) h(x) h(x),y x = Bregman pseudo-distance induced by h. Hessian Riemannian Flows. p. 16/25

40 3. Implicit proximal iteration x k+1 Argmin { } f(x) + 1 λ k D h (x,x k ) Ax = b, 1 λ k [ h(x k+1 ) h(x k )] f(x k+1 ) + Im A T, Ax k+1 = b Bregman 67, Censor-Zenios 92, Teboulle 92, Eckstein 93, Kiwiel 97,... Hessian Riemannian Flows. p. 17/25

41 3. Implicit proximal iteration x k+1 Argmin { } f(x) + 1 λ k D h (x,x k ) Ax = b, 1 λ k [ h(x k+1 ) h(x k )] f(x k+1 ) + Im A T, Ax k+1 = b Bregman 67, Censor-Zenios 92, Teboulle 92, Eckstein 93, Kiwiel 97,... But dx dt = Hf(x) d h(x) f(x) + Im dt AT, Ax(t) = b, t 0. This link was already noticed by Iusem-Svaiter-Da Cruz Neto 99, together with convergence results for a linear objective function. Hessian Riemannian Flows. p. 17/25

42 3. Convergence: Bregman functions A Legendre function h with domh = Q is of Bregman type if (i) {x Q D h (y,x) γ} is bdd. y Q, γ R. (ii) y Q, y j y with y j Q, D h (y,y j ) 0. Hessian Riemannian Flows. p. 18/25

43 3. Convergence: Bregman functions A Legendre function h with domh = Q is of Bregman type if (i) {x Q D h (y,x) γ} is bdd. y Q, γ R. (ii) y Q, y j y with y j Q, D h (y,y j ) 0. Thm. 3 Suppose (H 0 ) with h of Bregman type. f is quasiconvex and Argmin C f. Then x C such that x(t) x as t + with f(x ) N Q (x ) + Ker A, where N Q (x ) is the normal cone to Q at x. Hessian Riemannian Flows. p. 18/25

44 3. Examples on n 1 Boltzmann-Shanon entropy: h(x) = n i=1 x i log(x i ) x i. Shahshahani metric: H(x) = 2 h(x) = diag(1/x 1,...,1/x n ). Kullback-Liebler divergence: D h (y,x) = n y i log(y i /x i ) + x i y i. i= x x(0)=(1/4,1/4,1/2) 2.5 O=(0,0,0) h(x)=x*log(x) x x x Lotka-Volterra type flow 1.30 Hessian Riemannian Flows. p. 19/25

45 3. Other examples h(x) = 2 n i=1 xi H(x) = 2 h(x) = 1 2 diag(1/x3/2 1,...,1/x 3/2 n ). D h (y,x) = n i=1 ( y i x i )/ x i. ( Flow: dx i = 2x 3/2 f dt i x i n j=1 x 3/2 j n k=1 x3/2 k f x j ). Hessian Riemannian Flows. p. 20/25

46 3. Other examples h(x) = 2 n i=1 xi H(x) = 2 h(x) = 1 2 diag(1/x3/2 1,...,1/x 3/2 n ). D h (y,x) = n i=1 ( y i x i )/ x i. ( Flow: dx i = 2x 3/2 f dt i x i n j=1 x 3/2 j n k=1 x3/2 k f x j ). h(x) = n i=1 log(x i) (h(0) = + so that h is not Bregman). H(x) = 2 h(x) = diag(1/x 2 1,...,1/x 2 n). D h (y,x) = n i=1 log(x i/y i ) + (y i x i )/x i. ( Flow: dx i = x 2 f dt i x i n f j=1 x j ). x 2 j n k=1 x2 k Hessian Riemannian Flows. p. 20/25

47 3. Further developments Rate of convergence. Dual trajectory and its convergence. Geodesic type characterization of trajectories. Connections with completely integrable Hamiltonian systems. Reference: A.-Bolte-Brahic, to appear in SIAM J. on Control Optim. Hessian Riemannian Flows. p. 21/25

48 3. Further developments Rate of convergence. Dual trajectory and its convergence. Geodesic type characterization of trajectories. Connections with completely integrable Hamiltonian systems. Reference: A.-Bolte-Brahic, to appear in SIAM J. on Control Optim. Continuous version of similar results for proximal iterations: Iusem-Monteiro 00. Hessian Riemannian Flows. p. 21/25

49 3. Further developments Rate of convergence. Dual trajectory and its convergence. Geodesic type characterization of trajectories. Connections with completely integrable Hamiltonian systems. Reference: A.-Bolte-Brahic, to appear in SIAM J. on Control Optim. Continuous version of similar results for proximal iterations: Iusem-Monteiro 00. Generalizations of results on the log-metric in linear programming: Bayer-Lagarias 89. Hessian Riemannian Flows. p. 21/25

50 4. Duality when Q = R n ++ (P) min{f(x) x 0, Ax = b} Assume: f is convex and S(P). x 0 R n, x 0 > 0, Ax 0 = b. Dual problem: (D) min{p(λ) λ 0} where p(λ) = sup{ λ,x f(x) Ax = b}. Then S(D) = {λ R n λ 0, λ f(x ) + ImA T, λ,x = 0}, where x is any solution of (P). Hessian Riemannian Flows. p. 22/25

51 4. Dual trajectory Integrating the differential inclusion we obtain where c(t) = 1 t d dt h(x(t)) f(x(t)) + Im AT, t 0 λ(t) c(t) + ImA T, f(x(τ))dτ and λ(t) = 1 t [ h(x0 ) h(x(t))]. If h(x) = n i=1 θ(x i ), then λ i (t) = 1 t [θ (x 0 i) θ (x i (t))]. Hessian Riemannian Flows. p. 23/25

52 4. Dual penalty scheme We have: λ(t) = 1 t [ h(x0 ) h(x(t))]. But h is Legendre h 1 = h, with h (λ) = n i=1 θ (λ i ) being the Fenchel conjugate of h. Hence x(t) = h ( h(x 0 ) tλ(t)), where Take A x = b. Since Ax(t) = b, we have x h ( h(x 0 ) tλ(t)) KerA. Then, λ(t) solves { min x,λ + 1 λ t n i=1 } θ (θ (x 0 i) tλ i ) λ c(t) + ImAT Hessian Riemannian Flows. p. 24/25

53 4. Dual trajectory: convergence Example: θ(s) = s log(s) s θ (s ) = exp(s ), s R. Then min λ { x,λ + 1 t n i=1 } x 0 i exp( tλ i ) λ c(t) + ImAT Convergence: f(x) = c,x c(t) = 1 t f(x(τ))dτ c t 0 by Cominetti-San Martin 96, Auslender et al. 97, Cominetti 00,... convergence to the θ -center of S(D). Otherwise, x(t) bounded f(x(t)) f(x ) for x S(P) c(t) f(x ) convergence by Iusem-Monteiro 00. Hessian Riemannian Flows. p. 25/25

Hessian Riemannian gradient flows in convex programming

Hessian Riemannian gradient flows in convex programming Felipe Alvarez, Jérôme Bolte, Olivier Brahic. Abstract. Motivated by a class of constrained minimization problems, it is studied the gradient flows