Interpolation-Based Trust-Region Methods for DFO

Size: px
Start display at page:

Download "Interpolation-Based Trust-Region Methods for DFO"

Transcription

1 Interpolation-Based Trust-Region Methods for DFO Luis Nunes Vicente University of Coimbra (joint work with A. Bandeira, A. R. Conn, S. Gratton, and K. Scheinberg) July 27, 2010 ICCOPT, Santiago http//

2 Why Derivative-Free Optimization? 2/54 Some of the reasons to apply derivative-free optimization are the following:

3 Why Derivative-Free Optimization? 2/54 Some of the reasons to apply derivative-free optimization are the following: Nowadays computer hardware and mathematical algorithms allows increasingly large simulations.

4 Why Derivative-Free Optimization? 2/54 Some of the reasons to apply derivative-free optimization are the following: Nowadays computer hardware and mathematical algorithms allows increasingly large simulations. Functions are noisy (one cannot trust derivatives or approximate them by finite differences).

5 Why Derivative-Free Optimization? 2/54 Some of the reasons to apply derivative-free optimization are the following: Nowadays computer hardware and mathematical algorithms allows increasingly large simulations. Functions are noisy (one cannot trust derivatives or approximate them by finite differences). Binary codes (source code not available) and random simulations making automatic differentiation impossible to apply.

6 Why Derivative-Free Optimization? 2/54 Some of the reasons to apply derivative-free optimization are the following: Nowadays computer hardware and mathematical algorithms allows increasingly large simulations. Functions are noisy (one cannot trust derivatives or approximate them by finite differences). Binary codes (source code not available) and random simulations making automatic differentiation impossible to apply. Legacy codes (written in the past and not maintained by the original authors).

7 Why Derivative-Free Optimization? 2/54 Some of the reasons to apply derivative-free optimization are the following: Nowadays computer hardware and mathematical algorithms allows increasingly large simulations. Functions are noisy (one cannot trust derivatives or approximate them by finite differences). Binary codes (source code not available) and random simulations making automatic differentiation impossible to apply. Legacy codes (written in the past and not maintained by the original authors). Lack of sophistication of the user (users need improvement but want to use something simple).

8 Limitations of Derivative-Free Optimization 3/54 In DFO convergence/stopping is typically slow (per function evaluation):

9 Recent advances in direct-search methods 4/54 For a recent talk on Direct Search (8th EUROPT, 2010) see:

10 Recent advances in direct-search methods 4/54 For a recent talk on Direct Search (8th EUROPT, 2010) see: Ana Luisa Custodio Talk WA04 10:30. Ismael Vaz Talk WB04 13:30.

11 The book! 5/54 A. R. Conn, K. Scheinberg, and L. N. Vicente, Introduction to Derivative-Free Optimization, MPS-SIAM Series on Optimization, SIAM, Philadelphia, 2009.

12 Model-based trust-region methods 6/54 One typically minimizes a model m in a trust region B p (x; ): Trust-region subproblem min m(y) y B p(x; )

13 Model-based trust-region methods 6/54 One typically minimizes a model m in a trust region B p (x; ): Trust-region subproblem min m(y) y B p(x; ) In derivative-based optimization, one could use:

14 Model-based trust-region methods 6/54 One typically minimizes a model m in a trust region B p (x; ): Trust-region subproblem min m(y) y B p(x; ) In derivative-based optimization, one could use: 1st order Taylor: m(y) = f(x) + f(x) (y x)

15 Model-based trust-region methods 6/54 One typically minimizes a model m in a trust region B p (x; ): Trust-region subproblem min m(y) y B p(x; ) In derivative-based optimization, one could use: 1st order Taylor: m(y) = f(x) + f(x) (y x) (y x) H(y x)

16 Model-based trust-region methods 6/54 One typically minimizes a model m in a trust region B p (x; ): Trust-region subproblem min m(y) y B p(x; ) In derivative-based optimization, one could use: 2nd order Taylor: m(y) = f(x) + f(x) (y x) (y x) 2 f(x)(y x)

17 Fully linear models 7/54 Given a point x and a trust-region radius, a model m(y) around x is called fully linear if

18 Fully linear models 7/54 Given a point x and a trust-region radius, a model m(y) around x is called fully linear if It is C 1 with Lipschitz continuous gradient.

19 Fully linear models 7/54 Given a point x and a trust-region radius, a model m(y) around x is called fully linear if It is C 1 with Lipschitz continuous gradient. The following error bounds hold: f(y) m(y) κ eg y B(x; ) f(y) m(y) κ ef 2 y B(x; ).

20 Fully linear models 7/54 Given a point x and a trust-region radius, a model m(y) around x is called fully linear if It is C 1 with Lipschitz continuous gradient. The following error bounds hold: f(y) m(y) κ eg y B(x; ) f(y) m(y) κ ef 2 y B(x; ). For a class of fully linear models, the (unknown) constants κ ef, κ eg > 0 must be independent of x and.

21 Fully linear models 7/54 Given a point x and a trust-region radius, a model m(y) around x is called fully linear if It is C 1 with Lipschitz continuous gradient. The following error bounds hold: f(y) m(y) κ eg y B(x; ) f(y) m(y) κ ef 2 y B(x; ). For a class of fully linear models, the (unknown) constants κ ef, κ eg > 0 must be independent of x and. Fully linear models can be quadratic (or even nonlinear).

22 Fully quadratic models 8/54 Given a point x and a trust-region radius, a model m(y) around x is called fully quadratic if

23 Fully quadratic models 8/54 Given a point x and a trust-region radius, a model m(y) around x is called fully quadratic if It is C 2 with Lipschitz continuous Hessian.

24 Fully quadratic models 8/54 Given a point x and a trust-region radius, a model m(y) around x is called fully quadratic if It is C 2 with Lipschitz continuous Hessian. The following error bounds hold: 2 f(y) 2 m(y) κ eh y B(x; ) f(y) m(y) κ eg 2 y B(x; ) f(y) m(y) κ ef 3 y B(x; ).

25 Fully quadratic models 8/54 Given a point x and a trust-region radius, a model m(y) around x is called fully quadratic if It is C 2 with Lipschitz continuous Hessian. The following error bounds hold: 2 f(y) 2 m(y) κ eh y B(x; ) f(y) m(y) κ eg 2 y B(x; ) f(y) m(y) κ ef 3 y B(x; ). For a class of fully quadratic models, the (unknown) constants κ ef, κ eg, κ eh > 0 must be independent of x and.

26 Fully quadratic models Given a point x and a trust-region radius, a model m(y) around x is called fully quadratic if It is C 2 with Lipschitz continuous Hessian. The following error bounds hold: 2 f(y) 2 m(y) κ eh y B(x; ) f(y) m(y) κ eg 2 y B(x; ) f(y) m(y) κ ef 3 y B(x; ). For a class of fully quadratic models, the (unknown) constants κ ef, κ eg, κ eh > 0 must be independent of x and. Fully quadratic models are only necessary for global convergence to 2nd order stationary points. 8/54

27 Polynomial interpolation models 9/54 Given a sample set Y = {y 0, y 1,..., y p }, a polynomial basis φ, and a polynomial model m(y) = α φ(y), the interpolating conditions form the linear system: M(φ, Y )α = f(y ),

28 Polynomial interpolation models 9/54 Given a sample set Y = {y 0, y 1,..., y p }, a polynomial basis φ, and a polynomial model m(y) = α φ(y), the interpolating conditions form the linear system: M(φ, Y )α = f(y ), where M(φ, Y ) = φ 0 (y 0 ) φ 1 (y 0 ) φ p (y 0 ) φ 0 (y 1 ) φ 1 (y 1 ) φ p (y 1 ).... φ 0 (y p ) φ 1 (y p ) φ p (y p ) f(y ) = f(y 0 ) f(y 1 ). f(y p ).

29 Natural/canonical basis 10/54 The natural/canonical basis appears in a Taylor expansion and is given by: φ = {1, y 1,..., y n, 12 y21,..., 12 } y2n, y 1 y 2,..., y n 1 y n.

30 Natural/canonical basis 10/54 The natural/canonical basis appears in a Taylor expansion and is given by: φ = {1, y 1,..., y n, 12 y21,..., 12 } y2n, y 1 y 2,..., y n 1 y n. Under appropriate smoothness, the second order Taylor model, centered at 0, is: f(0) [1] + f x 1 (0)[y 1 ] + f x 2 (0)[y 2 ] + 2 f (0)[y x 2 1 2/2] + 2 f x 1 1 x 2 (0)[y 1 y 2 ] + 2 f (0)[y x 2 2 2/2]. 2

31 Well poisedness (Λ poisedness) 11/54 Λ is a Λ poisedness constant related to the geometry of Y.

32 Well poisedness (Λ poisedness) 11/54 Λ is a Λ poisedness constant related to the geometry of Y. The original definition of Λ poisedness says that the maximum absolute value of the Lagrange polynomials in B(x; ) is bounded by Λ.

33 Well poisedness (Λ poisedness) 11/54 Λ is a Λ poisedness constant related to the geometry of Y. The original definition of Λ poisedness says that the maximum absolute value of the Lagrange polynomials in B(x; ) is bounded by Λ. An equivalent definition of Λ poisedness is ( Y = α ) M( φ, Y scaled ) 1 Λ, with Y scaled obtained from Y such that Y scaled B(0; 1).

34 Well poisedness (Λ poisedness) 11/54 Λ is a Λ poisedness constant related to the geometry of Y. The original definition of Λ poisedness says that the maximum absolute value of the Lagrange polynomials in B(x; ) is bounded by Λ. An equivalent definition of Λ poisedness is ( Y = α ) M( φ, Y scaled ) 1 Λ, with Y scaled obtained from Y such that Y scaled B(0; 1). Non-squared cases are defined analogously (IDFO).

35 A badly poised set 12/54 Λ =

36 A not so badly poised set 13/54 Λ = 440.

37 Another badly poised set 14/54 Λ =

38 An ideal set 15/54 Λ = 1.

39 Quadratic interpolation models 16/54 The system M(φ, Y )α = f(y ) can be Overdetermined when Y > α. See talk on Direct Search!

40 Quadratic interpolation models 16/54 The system M(φ, Y )α = f(y ) can be Determined when Y = α. For M(φ, Y ) to be squared one needs N = (n + 2)(n + 1)/2 evaluations of f (often too expensive).

41 Quadratic interpolation models 16/54 The system M(φ, Y )α = f(y ) can be Determined when Y = α. For M(φ, Y ) to be squared one needs N = (n + 2)(n + 1)/2 evaluations of f (often too expensive). Leads to fully quadratic models when Y is well poised (the constants κ in the error bounds will depend on Λ).

42 Quadratic interpolation models 16/54 The system M(φ, Y )α = f(y ) can be Determined when Y = α. For M(φ, Y ) to be squared one needs N = (n + 2)(n + 1)/2 evaluations of f (often too expensive). Leads to fully quadratic models when Y is well poised (the constants κ in the error bounds will depend on Λ). Underdetermined when Y < α. Minimum Frobenius norm models (Powell, IDFO book).

43 Quadratic interpolation models 16/54 The system M(φ, Y )α = f(y ) can be Determined when Y = α. For M(φ, Y ) to be squared one needs N = (n + 2)(n + 1)/2 evaluations of f (often too expensive). Leads to fully quadratic models when Y is well poised (the constants κ in the error bounds will depend on Λ). Underdetermined when Y < α. Minimum Frobenius norm models (Powell, IDFO book). Other approaches?...

44 Underdetermined quadratic models 17/54 Let m be an underdetermined quadratic model (with Hessian H) built with less than N = O(n 2 ) points.

45 Underdetermined quadratic models 17/54 Let m be an underdetermined quadratic model (with Hessian H) built with less than N = O(n 2 ) points. Theorem (IDFO book) If Y is Λ L poised for linear interpolation or regression then f(y) m(y) Λ L [C f + H ] y B(x; ).

46 Underdetermined quadratic models 17/54 Let m be an underdetermined quadratic model (with Hessian H) built with less than N = O(n 2 ) points. Theorem (IDFO book) If Y is Λ L poised for linear interpolation or regression then f(y) m(y) Λ L [C f + H ] y B(x; ). One should build models by minimizing the norm of H.

47 Minimum Frobenius norm models 18/54 Using φ and separating the quadratic terms, write m(y) = α L φ L (y) + α Q φ Q (y).

48 Minimum Frobenius norm models 18/54 Using φ and separating the quadratic terms, write m(y) = α L φ L (y) + α Q φ Q (y). Then, build models by minimizing the entries of the Hessian ( Frobenius norm ): min 1 2 α Q 2 2 s.t. M( φ, Y )α = f(y ).

49 Minimum Frobenius norm models Using φ and separating the quadratic terms, write m(y) = α L φ L (y) + α Q φ Q (y). Then, build models by minimizing the entries of the Hessian ( Frobenius norm ): min 1 2 α Q 2 2 s.t. M( φ, Y )α = f(y ). The solution of this convex QP problem requires a linear solve with: [ MQ MQ M ] L ML where M( 0 φ, Y ) = [ ] M L M Q. 18/54

50 Minimum Frobenius norm models 19/54 Theorem (IDFO book) If Y is Λ F poised in the minimum Frobenius norm sense then H C f Λ F, where H is, again, the Hessian of the model.

51 Minimum Frobenius norm models 19/54 Theorem (IDFO book) If Y is Λ F poised in the minimum Frobenius norm sense then H C f Λ F, where H is, again, the Hessian of the model. Putting the two theorems together yield: f(y) m(y) Λ L [C f + C f Λ F ] y B(x; ).

52 Minimum Frobenius norm models 19/54 Theorem (IDFO book) If Y is Λ F poised in the minimum Frobenius norm sense then H C f Λ F, where H is, again, the Hessian of the model. Putting the two theorems together yield: f(y) m(y) Λ L [C f + C f Λ F ] y B(x; ). MFN models are fully linear.

53 Sparsity on the Hessian 20/54 In many problems, pairs of variables have no correlation, leading to zero second order partial derivatives in f:

54 Sparsity on the Hessian 20/54 In many problems, pairs of variables have no correlation, leading to zero second order partial derivatives in f:

55 Sparsity on the Hessian 20/54 In many problems, pairs of variables have no correlation, leading to zero second order partial derivatives in f: Thus, the Hessian 2 m(x = 0) of the model (i.e., the vector α Q in the basis φ) should be sparse.

56 Our question 21/54 Is it possible to build fully quadratic models by quadratic underdetermined interpolation (i.e., using less than N = O(n 2 ) points) in the SPARSE case?

57 Compressed sensing sparse recovery 22/54 Objective: Find sparse α subject to a highly underdetermined linear system Mα = f.

58 Compressed sensing sparse recovery 22/54 Objective: Find sparse α subject to a highly underdetermined linear system Mα = f. { min α 0 = supp(α) is NP-Hard. s.t. Mα = f

59 Compressed sensing sparse recovery 22/54 Objective: Find sparse α subject to a highly underdetermined linear system Mα = f. { min α 0 = supp(α) is NP-Hard. s.t. Mα = f { min α 1 s.t. Mα = f often recovers sparse solutions.

60 Compressed sensing sparse recovery 22/54 Objective: Find sparse α subject to a highly underdetermined linear system Mα = f. { min α 0 = supp(α) is NP-Hard. s.t. Mα = f { min α 1 s.t. Mα = f often recovers sparse solutions.

61 Restricted isometry property 23/54 Definition (RIP) The RIP Constant of order s of M (p N) is the smallest δ s such that (1 δ s ) α 2 2 Mα 2 2 (1 + δ s ) α 2 2 for all s sparse α ( α 0 s).

62 Restricted isometry property 23/54 Definition (RIP) The RIP Constant of order s of M (p N) is the smallest δ s such that (1 δ s ) α 2 2 Mα 2 2 (1 + δ s ) α 2 2 for all s sparse α ( α 0 s). Theorem (Candès, Tao, 2005, 2006) If ᾱ is s sparse and 2δ 2s + δ s < 1 then it can be recovered by l 1 -minimization: min α 1 s.t. Mα = Mᾱ.

63 Restricted isometry property 23/54 Definition (RIP) The RIP Constant of order s of M (p N) is the smallest δ s such that (1 δ s ) α 2 2 Mα 2 2 (1 + δ s ) α 2 2 for all s sparse α ( α 0 s). Theorem (Candès, Tao, 2005, 2006) If ᾱ is s sparse and 2δ 2s + δ s < 1 then it can be recovered by l 1 -minimization: min α 1 s.t. Mα = Mᾱ. i.e., the optimal solution α of this problem is unique and given by α = ᾱ.

64 Random matrices 24/54 It is hard to find deterministic matrices that satisfy the RIP for large s.

65 Random matrices 24/54 It is hard to find deterministic matrices that satisfy the RIP for large s. Using Random Matrix Theory it is possible to prove RIP for p = O(s log N). Matrices with Gaussian entries. Matrices with Bernoulli entries. Uniformly chosen subsets of discrete Fourier transform.

66 Bounded orthonormal expansions (Rauhut) 25/54 Question How to find a basis φ and a sample set Y such that M(φ, Y ) satisfies the RIP?

67 Bounded orthonormal expansions (Rauhut) 25/54 Question How to find a basis φ and a sample set Y such that M(φ, Y ) satisfies the RIP? Choose orthonormal bases.

68 Bounded orthonormal expansions (Rauhut) 25/54 Question How to find a basis φ and a sample set Y such that M(φ, Y ) satisfies the RIP? Choose orthonormal bases. Avoid localized functions ( φ i L should be uniformly bounded).

69 Bounded orthonormal expansions (Rauhut) 25/54 Question How to find a basis φ and a sample set Y such that M(φ, Y ) satisfies the RIP? Choose orthonormal bases. Avoid localized functions ( φ i L should be uniformly bounded). Select Y randomly.

70 Sparse orthonormal bounded expansion recovery 26/54 Theorem (Rauhut, 2010) If φ is orthonormal in a probability measure µ and φ i L K.

71 Sparse orthonormal bounded expansion recovery 26/54 Theorem (Rauhut, 2010) If φ is orthonormal in a probability measure µ and φ i L K. each point of Y is drawn independently according to µ.

72 Sparse orthonormal bounded expansion recovery 26/54 Theorem (Rauhut, 2010) If φ is orthonormal in a probability measure µ and φ i L K. each point of Y is drawn independently according to µ. p log p c 1K 2 s(log s) 2 log N.

73 Sparse orthonormal bounded expansion recovery 26/54 Theorem (Rauhut, 2010) If φ is orthonormal in a probability measure µ and φ i L K. each point of Y is drawn independently according to µ. p log p c 1K 2 s(log s) 2 log N. Then, with high probability, for every s sparse vector ᾱ:

74 Sparse orthonormal bounded expansion recovery 26/54 Theorem (Rauhut, 2010) If φ is orthonormal in a probability measure µ and φ i L K. each point of Y is drawn independently according to µ. p log p c 1K 2 s(log s) 2 log N. Then, with high probability, for every s sparse vector ᾱ: Given noisy samples f = M(φ, Y )ᾱ + ɛ with ɛ 2 η, let α be the solution of

75 Sparse orthonormal bounded expansion recovery 26/54 Theorem (Rauhut, 2010) If φ is orthonormal in a probability measure µ and φ i L K. each point of Y is drawn independently according to µ. p log p c 1K 2 s(log s) 2 log N. Then, with high probability, for every s sparse vector ᾱ: Given noisy samples f = M(φ, Y )ᾱ + ɛ with ɛ 2 η, let α be the solution of min α 1 s. t. M(φ, Y )α f 2 η.

76 Sparse orthonormal bounded expansion recovery 26/54 Theorem (Rauhut, 2010) If φ is orthonormal in a probability measure µ and φ i L K. each point of Y is drawn independently according to µ. p log p c 1K 2 s(log s) 2 log N. Then, with high probability, for every s sparse vector ᾱ: Given noisy samples f = M(φ, Y )ᾱ + ɛ with ɛ 2 η, let α be the solution of min α 1 s. t. M(φ, Y )α f 2 η. Then, ᾱ α 2 C p η.

77 What basis do we need for sparse Hessian recovery? 27/54 Remember the second order Taylor model f(0) [1] + f x 1 (0)[y 1 ] + f x 2 (0)[y 2 ] + 2 f (0)[y x 2 1 2/2] + 2 f x 1 1 x 2 (0)[y 1 y 2 ] + 2 f (0)[y x 2 2 2/2]. 2

78 What basis do we need for sparse Hessian recovery? 27/54 Remember the second order Taylor model f(0) [1] + f x 1 (0)[y 1 ] + f x 2 (0)[y 2 ] + 2 f (0)[y x 2 1 2/2] + 2 f x 1 1 x 2 (0)[y 1 y 2 ] + 2 f (0)[y x 2 2 2/2]. 2 So, we want something like the natural/canonical basis: φ = {1, y 1,..., y n, 12 y21,..., 12 } y2n, y 1 y 2,..., y n 1 y n.

79 An orthonormal basis for quadratics (appropriate for sparse Hessian recovery) 28/54 Proposition (Bandeira, Scheinberg, and Vicente, 2010) The following basis ψ for quadratics is orthonormal (w.r.t. the uniform measure on B (0; )) and satisfies ψ ι L 3.

80 An orthonormal basis for quadratics (appropriate for sparse Hessian recovery) 28/54 Proposition (Bandeira, Scheinberg, and Vicente, 2010) The following basis ψ for quadratics is orthonormal (w.r.t. the uniform measure on B (0; )) and satisfies ψ ι L 3. ψ 0 (u) = 1 ψ 1,i (u) = 3 u i ψ 2,ij (u) = 3 2 u i u j ψ 2,i (u) = u 2 2 i 5 2.

81 An orthonormal basis for quadratics (appropriate for sparse Hessian recovery) 28/54 Proposition (Bandeira, Scheinberg, and Vicente, 2010) The following basis ψ for quadratics is orthonormal (w.r.t. the uniform measure on B (0; )) and satisfies ψ ι L 3. ψ 0 (u) = 1 ψ 1,i (u) = 3 u i ψ 2,ij (u) = 3 2 u i u j ψ 2,i (u) = u 2 2 i 5 2. ψ is very similar to the canonical basis, and preserves the sparsity of the Hessian (at 0).

82 Hessian sparse recovery 29/54 Let us look again at min α 1 s. t. M(φ, Y )α f 2 η, where f = M(ψ, Y )ᾱ + ɛ.

83 Hessian sparse recovery 29/54 Let us look again at min α 1 s. t. M(φ, Y )α f 2 η, where f = M(ψ, Y )ᾱ + ɛ. So, the noisy data is f = f(y ).

84 Hessian sparse recovery 29/54 Let us look again at min α 1 s. t. M(φ, Y )α f 2 η, where f = M(ψ, Y )ᾱ + ɛ. So, the noisy data is f = f(y ). What we are trying to recover is the 2nd order Taylor model ᾱ ψ(y).

85 Hessian sparse recovery 29/54 Let us look again at min α 1 s. t. M(φ, Y )α f 2 η, where f = M(ψ, Y )ᾱ + ɛ. So, the noisy data is f = f(y ). What we are trying to recover is the 2nd order Taylor model ᾱ ψ(y). Thus, in ɛ η, one has η = O( 3 ).

86 Hessian sparse recovery 30/54 Theorem (Bandeira, Scheinberg, and Vicente, 2010) If the Hessian of f at 0 is s sparse.

87 Hessian sparse recovery 30/54 Theorem (Bandeira, Scheinberg, and Vicente, 2010) If the Hessian of f at 0 is s sparse. Y is a random sample set chosen w.r.t. the uniform measure on B (0; ).

88 Hessian sparse recovery 30/54 Theorem (Bandeira, Scheinberg, and Vicente, 2010) If the Hessian of f at 0 is s sparse. Y is a random sample set chosen w.r.t. the uniform measure on B (0; ). p log p 9c 1(s + n + 1) log 2 (s + n + 1)log O(n 2 ).

89 Hessian sparse recovery 30/54 Theorem (Bandeira, Scheinberg, and Vicente, 2010) If the Hessian of f at 0 is s sparse. Y is a random sample set chosen w.r.t. the uniform measure on B (0; ). p log p 9c 1(s + n + 1) log 2 (s + n + 1)log O(n 2 ). Then, with high probability, the quadratic

90 Hessian sparse recovery 30/54 Theorem (Bandeira, Scheinberg, and Vicente, 2010) If the Hessian of f at 0 is s sparse. Y is a random sample set chosen w.r.t. the uniform measure on B (0; ). p log p 9c 1(s + n + 1) log 2 (s + n + 1)log O(n 2 ). Then, with high probability, the quadratic q = α ι ψ ι obtained by solving the noisy l 1 -minimization problem is a fully quadratic model for f (with error constants not depending on ).

91 An answer to our question 31/54 For instance, when the number of non-zeros of the Hessian is s = O(n), we are able to construct fully quadratic models with O(n log 4 n) points.

92 An answer to our question 31/54 For instance, when the number of non-zeros of the Hessian is s = O(n), we are able to construct fully quadratic models with O(n log 4 n) points. Also, we recover both the function and its sparsity structure.

93 Open question 32/54 Generalize the result above when minimizing only the l 1 -norm of the Hessian (α Q ) rather than of the whole α. Numerical simulations have shown that such approach is (slightly) advantageous.

94 Remarks 33/54 However, the Theorem only provides motivation because, in a practical approach we:

95 Remarks 33/54 However, the Theorem only provides motivation because, in a practical approach we: Solve min α Q 1 s. t. M( φ L, Y )α L + M( φ Q, Y )α Q = f(y ).

96 Remarks 33/54 However, the Theorem only provides motivation because, in a practical approach we: Solve min α Q 1 s. t. M( φ L, Y )α L + M( φ Q, Y )α Q = f(y ). Deal with small n (from the DFO setting) and the bound we obtain is asymptotical.

97 Remarks 33/54 However, the Theorem only provides motivation because, in a practical approach we: Solve min α Q 1 s. t. M( φ L, Y )α L + M( φ Q, Y )α Q = f(y ). Deal with small n (from the DFO setting) and the bound we obtain is asymptotical. Use deterministic sampling.

98 Interpolation-based trust-region methods 34/54

99 Interpolation-based trust-region methods 34/54 Trust-region methods for DFO typically: attempt to form quadratic models (by interpolation/regression and using polynomials or radial basis functions) m k (x k + s) = f(x k ) + g k s s H k s based on (well poised) sample sets.

100 Interpolation-based trust-region methods 34/54 Trust-region methods for DFO typically: attempt to form quadratic models (by interpolation/regression and using polynomials or radial basis functions) m k (x k + s) = f(x k ) + g k s s H k s based on (well poised) sample sets. Well poisedness ensures fully linear or fully quadratic models.

101 Interpolation-based trust-region methods 34/54 Trust-region methods for DFO typically: attempt to form quadratic models (by interpolation/regression and using polynomials or radial basis functions) m k (x k + s) = f(x k ) + g k s s H k s based on (well poised) sample sets. Well poisedness ensures fully linear or fully quadratic models. Calculate a step s k by approximately solving the trust-region subproblem min m k (x k + s). s B 2 (x k ; k )

102 Interpolation-based trust-region methods 35/54 Set x k+1 to x k + s k (success) or to x k (unsuccess) and update k depending on the value of ρ k = f(x k) f(x k + s k ) m k (x k ) m k (x k + s k ).

103 Interpolation-based trust-region methods 35/54 Set x k+1 to x k + s k (success) or to x k (unsuccess) and update k depending on the value of ρ k = f(x k) f(x k + s k ) m k (x k ) m k (x k + s k ). Attempt to accept steps based on simple decrease, i.e., if ρ k > 0 f(x k + s k ) < f(x k ).

104 Interpolation-based trust-region methods (IDFO) 36/54 Reduce k only if ρ k is small and the model is FL/FQ unsuccessful iterations.

105 Interpolation-based trust-region methods (IDFO) 36/54 Reduce k only if ρ k is small and the model is FL/FQ unsuccessful iterations. Accept new iterates based on simple decrease (ρ k > 0) as long as the model is FL/FQ acceptable iterations.

106 Interpolation-based trust-region methods (IDFO) 36/54 Reduce k only if ρ k is small and the model is FL/FQ unsuccessful iterations. Accept new iterates based on simple decrease (ρ k > 0) as long as the model is FL/FQ acceptable iterations. Allow for model-improving iterations (when ρ k is not large enough and the model is not certifiably FL/FQ).

107 Interpolation-based trust-region methods (IDFO) 36/54 Reduce k only if ρ k is small and the model is FL/FQ unsuccessful iterations. Accept new iterates based on simple decrease (ρ k > 0) as long as the model is FL/FQ acceptable iterations. Allow for model-improving iterations (when ρ k is not large enough and the model is not certifiably FL/FQ). Do not reduce k.

108 Interpolation-based trust-region methods (IDFO) 37/54 Incorporate a criticality step (1st or 2nd order) when the stationarity of the model is small.

109 Interpolation-based trust-region methods (IDFO) 37/54 Incorporate a criticality step (1st or 2nd order) when the stationarity of the model is small. Internal cycle of reductions of k until model is well poised in B(x k ; g k ).

110 Behavior of the trust-region radius 38/54 Due to the criticality step, one has for successful iterations: f(x k ) f(x k+1 ) O( g k min{ g k, k }) O( 2 k ).

111 Behavior of the trust-region radius 38/54 Due to the criticality step, one has for successful iterations: Thus: f(x k ) f(x k+1 ) O( g k min{ g k, k }) O( 2 k ). Theorem (Conn, Scheinberg, and Vicente, 2009) The trust-region radius converges to zero: lim k = 0. k +

112 Behavior of the trust-region radius 38/54 Due to the criticality step, one has for successful iterations: Thus: f(x k ) f(x k+1 ) O( g k min{ g k, k }) O( 2 k ). Theorem (Conn, Scheinberg, and Vicente, 2009) The trust-region radius converges to zero: lim k = 0. k + Similar to direct-search methods where lim inf k + α k = 0.

113 Analysis of TR methods (1st order) 39/54 Using fully linear models: Theorem (Conn, Scheinberg, and Vicente, 2009) If f is Lips. continuous and f is bounded below on L(x 0 ) then lim f(x k) = 0. k +

114 Analysis of TR methods (1st order) 39/54 Using fully linear models: Theorem (Conn, Scheinberg, and Vicente, 2009) If f is Lips. continuous and f is bounded below on L(x 0 ) then lim f(x k) = 0. k + Valid also for simple decrease (acceptable iterations).

115 Analysis of TR methods (2nd order) 40/54 Using fully quadratic models: Theorem (Conn, Scheinberg, and Vicente, 2009) If 2 f is Lips. continuous and f is bounded below on L(x 0 ) then lim max { f(x k ), λ min [ 2 f(x k )] } = 0. k +

116 Analysis of TR methods (2nd order) 40/54 Using fully quadratic models: Theorem (Conn, Scheinberg, and Vicente, 2009) If 2 f is Lips. continuous and f is bounded below on L(x 0 ) then lim max { f(x k ), λ min [ 2 f(x k )] } = 0. k + Valid also for simple decrease (acceptable iterations).

117 Analysis of TR methods (2nd order) 40/54 Using fully quadratic models: Theorem (Conn, Scheinberg, and Vicente, 2009) If 2 f is Lips. continuous and f is bounded below on L(x 0 ) then lim max { f(x k ), λ min [ 2 f(x k )] } = 0. k + Valid also for simple decrease (acceptable iterations). Going from lim inf to lim requires changing the update of k.

118 Sample set management 41/54 Recently, Fasano, Morales, and Nocedal (2009) suggested an one-point exchange:

119 Sample set management 41/54 Recently, Fasano, Morales, and Nocedal (2009) suggested an one-point exchange: In successful iterations: where y out = argmax y x k 2. Y k+1 = Y k {x k + s k } \ {y out }.

120 Sample set management 41/54 Recently, Fasano, Morales, and Nocedal (2009) suggested an one-point exchange: In successful iterations: where y out = argmax y x k 2. In the unsuccessful case: Y k+1 = Y k {x k + s k } \ {y out }. Y k+1 = Y k {x k + s k } \ {y out } if y out x k s k.

121 Sample set management 41/54 Recently, Fasano, Morales, and Nocedal (2009) suggested an one-point exchange: In successful iterations: where y out = argmax y x k 2. In the unsuccessful case: Y k+1 = Y k {x k + s k } \ {y out }. Y k+1 = Y k {x k + s k } \ {y out } if y out x k s k. Do not perform model-improving iterations.

122 Sample set management 41/54 Recently, Fasano, Morales, and Nocedal (2009) suggested an one-point exchange: In successful iterations: where y out = argmax y x k 2. In the unsuccessful case: Y k+1 = Y k {x k + s k } \ {y out }. Y k+1 = Y k {x k + s k } \ {y out } if y out x k s k. Do not perform model-improving iterations. They observed sample sets not badly poised!

123 Self-correcting geometry 42/54 Later, Scheinberg and Toint (2009) proposed:

124 Self-correcting geometry 42/54 Later, Scheinberg and Toint (2009) proposed: A self-correcting geometry approach based on one-point exchanges, globally convergent to first-order stationary points (lim inf).

125 Self-correcting geometry 42/54 Later, Scheinberg and Toint (2009) proposed: A self-correcting geometry approach based on one-point exchanges, globally convergent to first-order stationary points (lim inf). In the unsuccessful case, y out is not only based on y x k 2, but also on the values of the Lagrange polynomials at x k + s k.

126 Self-correcting geometry 42/54 Later, Scheinberg and Toint (2009) proposed: A self-correcting geometry approach based on one-point exchanges, globally convergent to first-order stationary points (lim inf). In the unsuccessful case, y out is not only based on y x k 2, but also on the values of the Lagrange polynomials at x k + s k. They showed that, if k is small compared to g k, then the step either improves the function or the geometry/poisedness of the model.

127 Self-correcting geometry 42/54 Later, Scheinberg and Toint (2009) proposed: A self-correcting geometry approach based on one-point exchanges, globally convergent to first-order stationary points (lim inf). In the unsuccessful case, y out is not only based on y x k 2, but also on the values of the Lagrange polynomials at x k + s k. They showed that, if k is small compared to g k, then the step either improves the function or the geometry/poisedness of the model. In their approach, model-improving iterations are not needed.

128 Self-correcting geometry 42/54 Later, Scheinberg and Toint (2009) proposed: A self-correcting geometry approach based on one-point exchanges, globally convergent to first-order stationary points (lim inf). In the unsuccessful case, y out is not only based on y x k 2, but also on the values of the Lagrange polynomials at x k + s k. They showed that, if k is small compared to g k, then the step either improves the function or the geometry/poisedness of the model. In their approach, model-improving iterations are not needed. They showed, however, that the criticality step is indeed necessary.

129 Without criticality step... (Scheinberg and Toint) 43/54

130 Without criticality step... (Scheinberg and Toint) 44/54

131 Without criticality step... (Scheinberg and Toint) 45/54

132 Without criticality step... (Scheinberg and Toint) 46/54

133 A practical interpolation-based trust-region method 47/54

134 A practical interpolation-based trust-region method 47/54 Model building: If Y k = N = O(n 2 ), use determined quadratic interpolation.

135 A practical interpolation-based trust-region method 47/54 Model building: If Y k = N = O(n 2 ), use determined quadratic interpolation. Otherwise use l 1 (p = 1) or Frobenius (p = 2) minimum norm quadratic interpolation: min 1 p α Q p p s. t. M( φ L, Y scaled )α L + M( φ Q, Y scaled )α Q = f(y scaled ).

136 A practical interpolation-based trust-region method 47/54 Model building: If Y k = N = O(n 2 ), use determined quadratic interpolation. Otherwise use l 1 (p = 1) or Frobenius (p = 2) minimum norm quadratic interpolation: min 1 p α Q p p s. t. M( φ L, Y scaled )α L + M( φ Q, Y scaled )α Q = f(y scaled ). Sample set update one starts with Y 0 = O(n):

137 A practical interpolation-based trust-region method 47/54 Model building: If Y k = N = O(n 2 ), use determined quadratic interpolation. Otherwise use l 1 (p = 1) or Frobenius (p = 2) minimum norm quadratic interpolation: min 1 p α Q p p s. t. M( φ L, Y scaled )α L + M( φ Q, Y scaled )α Q = f(y scaled ). Sample set update one starts with Y 0 = O(n): If Y k < N = O(n 2 ), set Y k+1 = Y k {x k + s k }.

138 A practical interpolation-based trust-region method 47/54 Model building: If Y k = N = O(n 2 ), use determined quadratic interpolation. Otherwise use l 1 (p = 1) or Frobenius (p = 2) minimum norm quadratic interpolation: min 1 p α Q p p s. t. M( φ L, Y scaled )α L + M( φ Q, Y scaled )α Q = f(y scaled ). Sample set update one starts with Y 0 = O(n): If Y k < N = O(n 2 ), set Y k+1 = Y k {x k + s k }. Otherwise as in Fasano et al., but with y out = argmax y x k+1 2.

139 A practical interpolation-based trust-region method 47/54 Model building: If Y k = N = O(n 2 ), use determined quadratic interpolation. Otherwise use l 1 (p = 1) or Frobenius (p = 2) minimum norm quadratic interpolation: min 1 p α Q p p s. t. M( φ L, Y scaled )α L + M( φ Q, Y scaled )α Q = f(y scaled ). Sample set update one starts with Y 0 = O(n): If Y k < N = O(n 2 ), set Y k+1 = Y k {x k + s k }. Otherwise as in Fasano et al., but with y out = argmax y x k+1 2. Criticality step : If k is very small, discard points far away from the trust region.

140 Performance profiles (accuracy of 10 4 in function values) 48/54 Figure: Performance profiles comparing DFO-TR (l 1 and Frobenius) and NEWUOA (Powell) in a test set from CUTEr (Fasano et al.).

141 Performance profiles (accuracy of 10 6 in function values) 49/54 Figure: Performance profiles comparing DFO-TR (l 1 and Frobenius) and NEWUOA (Powell) in a test set from CUTEr (Fasano et al.).

142 Concluding remarks 50/54 Optimization is a fundamental tool in Compressed Sensing. However, this work shows that CS can also be applied to Optimization.

143 Concluding remarks 50/54 Optimization is a fundamental tool in Compressed Sensing. However, this work shows that CS can also be applied to Optimization. In a sparse scenario, we were able to construct fully quadratic models with samples of size O(n log 4 n) instead of the classical O(n 2 ).

144 Concluding remarks 50/54 Optimization is a fundamental tool in Compressed Sensing. However, this work shows that CS can also be applied to Optimization. In a sparse scenario, we were able to construct fully quadratic models with samples of size O(n log 4 n) instead of the classical O(n 2 ). We proposed a practical DFO method (using l 1 -minimization) that was able to outperform state-of-the-art methods in several numerical tests (in the already tough DFO scenario where n is small).

145 Open questions 51/54 Improve the efficiency of the model l 1 -minimization, by properly warmstarting it (currently we solve it as an LP using lipsol by Y. Zhang).

146 Open questions 51/54 Improve the efficiency of the model l 1 -minimization, by properly warmstarting it (currently we solve it as an LP using lipsol by Y. Zhang). Study the convergence properties of possibly stochastic interpolation-based trust-region methods.

147 Open questions 51/54 Improve the efficiency of the model l 1 -minimization, by properly warmstarting it (currently we solve it as an LP using lipsol by Y. Zhang). Study the convergence properties of possibly stochastic interpolation-based trust-region methods. Investigate l 1 -minimization techniques in statistical models (like Kriging for interpolating sparse data sets), but applied to Optimization.

148 Open questions 51/54 Improve the efficiency of the model l 1 -minimization, by properly warmstarting it (currently we solve it as an LP using lipsol by Y. Zhang). Study the convergence properties of possibly stochastic interpolation-based trust-region methods. Investigate l 1 -minimization techniques in statistical models (like Kriging for interpolating sparse data sets), but applied to Optimization. Develop a globally convergent model-based trust-region method for non-smooth functions.

149 References 52/54 A. Bandeira, K. Scheinberg, and L. N. Vicente, Computation of sparse low degree interpolating polynomials and their application to derivative-free optimization, in preparation, A. R. Conn, K. Scheinberg, and L. N. Vicente, Global convergence of general derivative-free trust-region algorithms to first and second order critical points, SIAM J. Optim., 20 (2009) G. Fasano, J. L. Morales, and J. Nocedal, On the geometry phase in model-based algorithms for derivative-free optimization, Optim. Methods Softw., 24 (2009) K. Scheinberg and Ph. L. Toint, Self-correcting geometry in model-based algorithms for derivative-free unconstrained optimization, 2009.

150 Other recent work 53/54 S. Gratton, Ph. L. Toint, and A. Tröltzsch, An active-set trust-region method for derivative-free nonlinear bound-constrained optimization, Talk WA02 11:30

151 Other recent work 53/54 S. Gratton, Ph. L. Toint, and A. Tröltzsch, An active-set trust-region method for derivative-free nonlinear bound-constrained optimization, Talk WA02 11:30 S. Gratton and L. N. Vicente, A surrogate management framework using rigorous trust-regions steps, 2010.

152 Other recent work 53/54 S. Gratton, Ph. L. Toint, and A. Tröltzsch, An active-set trust-region method for derivative-free nonlinear bound-constrained optimization, Talk WA02 11:30 S. Gratton and L. N. Vicente, A surrogate management framework using rigorous trust-regions steps, M. J. D. Powell, The BOBYQA algorithm for bound constrained optimization without derivatives, 2009.

153 Other recent work 53/54 S. Gratton, Ph. L. Toint, and A. Tröltzsch, An active-set trust-region method for derivative-free nonlinear bound-constrained optimization, Talk WA02 11:30 S. Gratton and L. N. Vicente, A surrogate management framework using rigorous trust-regions steps, M. J. D. Powell, The BOBYQA algorithm for bound constrained optimization without derivatives, S. M. Wild and C. Shoemaker, Global convergence of radial basis function trust region derivative-free algorithms, 2009.

154 Optimization 2011 (July 24 27, Portugal) 54/54

A recursive model-based trust-region method for derivative-free bound-constrained optimization.

A recursive model-based trust-region method for derivative-free bound-constrained optimization. A recursive model-based trust-region method for derivative-free bound-constrained optimization. ANKE TRÖLTZSCH [CERFACS, TOULOUSE, FRANCE] JOINT WORK WITH: SERGE GRATTON [ENSEEIHT, TOULOUSE, FRANCE] PHILIPPE

More information

Derivative-Free Trust-Region methods

Derivative-Free Trust-Region methods Derivative-Free Trust-Region methods MTH6418 S. Le Digabel, École Polytechnique de Montréal Fall 2015 (v4) MTH6418: DFTR 1/32 Plan Quadratic models Model Quality Derivative-Free Trust-Region Framework

More information

Bilevel Derivative-Free Optimization and its Application to Robust Optimization

Bilevel Derivative-Free Optimization and its Application to Robust Optimization Bilevel Derivative-Free Optimization and its Application to Robust Optimization A. R. Conn L. N. Vicente September 15, 2010 Abstract We address bilevel programming problems when the derivatives of both

More information

1101 Kitchawan Road, Yorktown Heights, New York 10598, USA

1101 Kitchawan Road, Yorktown Heights, New York 10598, USA Self-correcting geometry in model-based algorithms for derivative-free unconstrained optimization by K. Scheinberg and Ph. L. Toint Report 9/6 February 9 IBM T.J. Watson Center Kitchawan Road Yorktown

More information

Global convergence of trust-region algorithms for constrained minimization without derivatives

Global convergence of trust-region algorithms for constrained minimization without derivatives Global convergence of trust-region algorithms for constrained minimization without derivatives P.D. Conejo E.W. Karas A.A. Ribeiro L.G. Pedroso M. Sachine September 27, 2012 Abstract In this work we propose

More information

A New Trust Region Algorithm Using Radial Basis Function Models

A New Trust Region Algorithm Using Radial Basis Function Models A New Trust Region Algorithm Using Radial Basis Function Models Seppo Pulkkinen University of Turku Department of Mathematics July 14, 2010 Outline 1 Introduction 2 Background Taylor series approximations

More information

GEOMETRY OF INTERPOLATION SETS IN DERIVATIVE FREE OPTIMIZATION

GEOMETRY OF INTERPOLATION SETS IN DERIVATIVE FREE OPTIMIZATION GEOMETRY OF INTERPOLATION SETS IN DERIVATIVE FREE OPTIMIZATION ANDREW R. CONN, KATYA SCHEINBERG, AND LUíS N. VICENTE Abstract. We consider derivative free methods based on sampling approaches for nonlinear

More information

A DERIVATIVE-FREE ALGORITHM FOR THE LEAST-SQUARE MINIMIZATION

A DERIVATIVE-FREE ALGORITHM FOR THE LEAST-SQUARE MINIMIZATION A DERIVATIVE-FREE ALGORITHM FOR THE LEAST-SQUARE MINIMIZATION HONGCHAO ZHANG, ANDREW R. CONN, AND KATYA SCHEINBERG Abstract. We develop a framework for a class of derivative-free algorithms for the least-squares

More information

Sparse Legendre expansions via l 1 minimization

Sparse Legendre expansions via l 1 minimization Sparse Legendre expansions via l 1 minimization Rachel Ward, Courant Institute, NYU Joint work with Holger Rauhut, Hausdorff Center for Mathematics, Bonn, Germany. June 8, 2010 Outline Sparse recovery

More information

A derivative-free trust-funnel method for equality-constrained nonlinear optimization by Ph. R. Sampaio and Ph. L. Toint Report NAXYS-??-204 8 February 204 Log 2 Scaled Performance Profile on Subset of

More information

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very

More information

A trust-region derivative-free algorithm for constrained optimization

A trust-region derivative-free algorithm for constrained optimization A trust-region derivative-free algorithm for constrained optimization P.D. Conejo and E.W. Karas and L.G. Pedroso February 26, 204 Abstract We propose a trust-region algorithm for constrained optimization

More information

Interpolation via weighted l 1 minimization

Interpolation via weighted l 1 minimization Interpolation via weighted l 1 minimization Rachel Ward University of Texas at Austin December 12, 2014 Joint work with Holger Rauhut (Aachen University) Function interpolation Given a function f : D C

More information

A Derivative-Free Gauss-Newton Method

A Derivative-Free Gauss-Newton Method A Derivative-Free Gauss-Newton Method Coralia Cartis Lindon Roberts 29th October 2017 Abstract We present, a derivative-free version of the Gauss-Newton method for solving nonlinear least-squares problems.

More information

A multistart multisplit direct search methodology for global optimization

A multistart multisplit direct search methodology for global optimization 1/69 A multistart multisplit direct search methodology for global optimization Ismael Vaz (Univ. Minho) Luis Nunes Vicente (Univ. Coimbra) IPAM, Optimization and Optimal Control for Complex Energy and

More information

USING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS

USING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS USING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS A. L. CUSTÓDIO, J. E. DENNIS JR., AND L. N. VICENTE Abstract. It has been shown recently that the efficiency of direct search methods

More information

Sparse recovery for spherical harmonic expansions

Sparse recovery for spherical harmonic expansions Rachel Ward 1 1 Courant Institute, New York University Workshop Sparsity and Cosmology, Nice May 31, 2011 Cosmic Microwave Background Radiation (CMB) map Temperature is measured as T (θ, ϕ) = k k=0 l=

More information

USING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS

USING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS Pré-Publicações do Departamento de Matemática Universidade de Coimbra Preprint Number 06 48 USING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS A. L. CUSTÓDIO, J. E. DENNIS JR. AND

More information

Constrained optimization

Constrained optimization Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained

More information

Recovering overcomplete sparse representations from structured sensing

Recovering overcomplete sparse representations from structured sensing Recovering overcomplete sparse representations from structured sensing Deanna Needell Claremont McKenna College Feb. 2015 Support: Alfred P. Sloan Foundation and NSF CAREER #1348721. Joint work with Felix

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

Global Convergence of Radial Basis Function Trust Region Derivative-Free Algorithms

Global Convergence of Radial Basis Function Trust Region Derivative-Free Algorithms ARGONNE NATIONAL LABORATORY 9700 South Cass Avenue Argonne, Illinois 60439 Global Convergence of Radial Basis Function Trust Region Derivative-Free Algorithms Stefan M. Wild and Christine Shoemaker Mathematics

More information

Interpolation via weighted l 1 -minimization

Interpolation via weighted l 1 -minimization Interpolation via weighted l 1 -minimization Holger Rauhut RWTH Aachen University Lehrstuhl C für Mathematik (Analysis) Matheon Workshop Compressive Sensing and Its Applications TU Berlin, December 11,

More information

SURVEY OF TRUST-REGION DERIVATIVE FREE OPTIMIZATION METHODS

SURVEY OF TRUST-REGION DERIVATIVE FREE OPTIMIZATION METHODS Manuscript submitted to AIMS Journals Volume X, Number 0X, XX 200X Website: http://aimsciences.org pp. X XX SURVEY OF TRUST-REGION DERIVATIVE FREE OPTIMIZATION METHODS Middle East Technical University

More information

Interpolation via weighted l 1 -minimization

Interpolation via weighted l 1 -minimization Interpolation via weighted l 1 -minimization Holger Rauhut RWTH Aachen University Lehrstuhl C für Mathematik (Analysis) Mathematical Analysis and Applications Workshop in honor of Rupert Lasser Helmholtz

More information

Recent Developments in Compressed Sensing

Recent Developments in Compressed Sensing Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline

More information

A trust-funnel method for nonlinear optimization problems with general nonlinear constraints and its application to derivative-free optimization

A trust-funnel method for nonlinear optimization problems with general nonlinear constraints and its application to derivative-free optimization A trust-funnel method for nonlinear optimization problems with general nonlinear constraints and its application to derivative-free optimization by Ph. R. Sampaio and Ph. L. Toint Report naxys-3-25 January

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente May 3, 200 Abstract In this paper we prove that direct search of directional type shares the worst case complexity bound of steepest descent when sufficient

More information

Strengthened Sobolev inequalities for a random subspace of functions

Strengthened Sobolev inequalities for a random subspace of functions Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)

More information

Global and derivative-free optimization Lectures 1-4

Global and derivative-free optimization Lectures 1-4 Global and derivative-free optimization Lectures 1-4 Coralia Cartis, University of Oxford INFOMM CDT: Contemporary Numerical Techniques Global and derivative-free optimizationlectures 1-4 p. 1/46 Lectures

More information

Benchmarking Derivative-Free Optimization Algorithms

Benchmarking Derivative-Free Optimization Algorithms ARGONNE NATIONAL LABORATORY 9700 South Cass Avenue Argonne, Illinois 60439 Benchmarking Derivative-Free Optimization Algorithms Jorge J. Moré and Stefan M. Wild Mathematics and Computer Science Division

More information

1. Introduction. In this paper, we design a class of derivative-free optimization algorithms for the following least-squares problem:

1. Introduction. In this paper, we design a class of derivative-free optimization algorithms for the following least-squares problem: SIAM J. OPTIM. Vol. 20, No. 6, pp. 3555 3576 c 2010 Society for Industrial and Applied Mathematics A DERIVATIVE-FREE ALGORITHM FOR LEAST-SQUARES MINIMIZATION HONGCHAO ZHANG, ANDREW R. CONN, AND KATYA SCHEINBERG

More information

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf. Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente October 25, 2012 Abstract In this paper we prove that the broad class of direct-search methods of directional type based on imposing sufficient decrease

More information

Mesures de criticalité d'ordres 1 et 2 en recherche directe

Mesures de criticalité d'ordres 1 et 2 en recherche directe Mesures de criticalité d'ordres 1 et 2 en recherche directe From rst to second-order criticality measures in direct search Clément Royer ENSEEIHT-IRIT, Toulouse, France Co-auteurs: S. Gratton, L. N. Vicente

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

AN INTRODUCTION TO COMPRESSIVE SENSING

AN INTRODUCTION TO COMPRESSIVE SENSING AN INTRODUCTION TO COMPRESSIVE SENSING Rodrigo B. Platte School of Mathematical and Statistical Sciences APM/EEE598 Reverse Engineering of Complex Dynamical Networks OUTLINE 1 INTRODUCTION 2 INCOHERENCE

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

1. Introduction. In this paper we address unconstrained local minimization

1. Introduction. In this paper we address unconstrained local minimization ORBIT: OPTIMIZATION BY RADIAL BASIS FUNCTION INTERPOLATION IN TRUST-REGIONS STEFAN M. WILD, ROMMEL G. REGIS, AND CHRISTINE A. SHOEMAKER Abstract. We present a new derivative-free algorithm, ORBIT, for

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Optimization for Compressed Sensing

Optimization for Compressed Sensing Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve

More information

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with Travis Johnson, Northwestern University Daniel P. Robinson, Johns

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar

More information

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011 Compressed Sensing Huichao Xue CS3750 Fall 2011 Table of Contents Introduction From News Reports Abstract Definition How it works A review of L 1 norm The Algorithm Backgrounds for underdetermined linear

More information

Limitations in Approximating RIP

Limitations in Approximating RIP Alok Puranik Mentor: Adrian Vladu Fifth Annual PRIMES Conference, 2015 Outline 1 Background The Problem Motivation Construction Certification 2 Planted model Planting eigenvalues Analysis Distinguishing

More information

GREEDY SIGNAL RECOVERY REVIEW

GREEDY SIGNAL RECOVERY REVIEW GREEDY SIGNAL RECOVERY REVIEW DEANNA NEEDELL, JOEL A. TROPP, ROMAN VERSHYNIN Abstract. The two major approaches to sparse recovery are L 1-minimization and greedy methods. Recently, Needell and Vershynin

More information

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems

Introduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems Z. Akbari 1, R. Yousefpour 2, M. R. Peyghami 3 1 Department of Mathematics, K.N. Toosi University of Technology,

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Sparse Solutions of an Undetermined Linear System

Sparse Solutions of an Undetermined Linear System 1 Sparse Solutions of an Undetermined Linear System Maddullah Almerdasy New York University Tandon School of Engineering arxiv:1702.07096v1 [math.oc] 23 Feb 2017 Abstract This work proposes a research

More information

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by: Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods. Jorge Nocedal

Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods. Jorge Nocedal Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods Jorge Nocedal Northwestern University Huatulco, Jan 2018 1 Collaborators Albert Berahas Northwestern University Richard Byrd University

More information

Least Squares Approximation

Least Squares Approximation Chapter 6 Least Squares Approximation As we saw in Chapter 5 we can interpret radial basis function interpolation as a constrained optimization problem. We now take this point of view again, but start

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS) Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

BOOSTERS: A Derivative-Free Algorithm Based on Radial Basis Functions

BOOSTERS: A Derivative-Free Algorithm Based on Radial Basis Functions BOOSTERS: A Derivative-Free Algorithm Based on Radial Basis Functions R. OEUVRAY Rodrigue.Oeuvay@gmail.com M. BIERLAIRE Michel.Bierlaire@epfl.ch December 21, 2005 Abstract Derivative-free optimization

More information

Convex Optimization and l 1 -minimization

Convex Optimization and l 1 -minimization Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

Uniqueness Conditions for A Class of l 0 -Minimization Problems

Uniqueness Conditions for A Class of l 0 -Minimization Problems Uniqueness Conditions for A Class of l 0 -Minimization Problems Chunlei Xu and Yun-Bin Zhao October, 03, Revised January 04 Abstract. We consider a class of l 0 -minimization problems, which is to search

More information

CoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp

CoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp CoSaMP Iterative signal recovery from incomplete and inaccurate samples Joel A. Tropp Applied & Computational Mathematics California Institute of Technology jtropp@acm.caltech.edu Joint with D. Needell

More information

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles Or: the equation Ax = b, revisited University of California, Los Angeles Mahler Lecture Series Acquiring signals Many types of real-world signals (e.g. sound, images, video) can be viewed as an n-dimensional

More information

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk Model-Based Compressive Sensing for Signal Ensembles Marco F. Duarte Volkan Cevher Richard G. Baraniuk Concise Signal Structure Sparse signal: only K out of N coordinates nonzero model: union of K-dimensional

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Tutorial: Sparse Signal Recovery

Tutorial: Sparse Signal Recovery Tutorial: Sparse Signal Recovery Anna C. Gilbert Department of Mathematics University of Michigan (Sparse) Signal recovery problem signal or population length N k important Φ x = y measurements or tests:

More information

Constructing Explicit RIP Matrices and the Square-Root Bottleneck

Constructing Explicit RIP Matrices and the Square-Root Bottleneck Constructing Explicit RIP Matrices and the Square-Root Bottleneck Ryan Cinoman July 18, 2018 Ryan Cinoman Constructing Explicit RIP Matrices July 18, 2018 1 / 36 Outline 1 Introduction 2 Restricted Isometry

More information

1 Computing with constraints

1 Computing with constraints Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)

More information

Zero-Order Methods for the Optimization of Noisy Functions. Jorge Nocedal

Zero-Order Methods for the Optimization of Noisy Functions. Jorge Nocedal Zero-Order Methods for the Optimization of Noisy Functions Jorge Nocedal Northwestern University Simons Institute, October 2017 1 Collaborators Albert Berahas Northwestern University Richard Byrd University

More information

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem SIAM J. OPTIM. Vol. 9, No. 4, pp. 1100 1127 c 1999 Society for Industrial and Applied Mathematics NEWTON S METHOD FOR LARGE BOUND-CONSTRAINED OPTIMIZATION PROBLEMS CHIH-JEN LIN AND JORGE J. MORÉ To John

More information

On Lagrange multipliers of trust region subproblems

On Lagrange multipliers of trust region subproblems On Lagrange multipliers of trust region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Applied Linear Algebra April 28-30, 2008 Novi Sad, Serbia Outline

More information

REPORTS IN INFORMATICS

REPORTS IN INFORMATICS REPORTS IN INFORMATICS ISSN 0333-3590 A class of Methods Combining L-BFGS and Truncated Newton Lennart Frimannslund Trond Steihaug REPORT NO 319 April 2006 Department of Informatics UNIVERSITY OF BERGEN

More information

Stochastic geometry and random matrix theory in CS

Stochastic geometry and random matrix theory in CS Stochastic geometry and random matrix theory in CS IPAM: numerical methods for continuous optimization University of Edinburgh Joint with Bah, Blanchard, Cartis, and Donoho Encoder Decoder pair - Encoder/Decoder

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many

More information

Algorithms for constrained local optimization

Algorithms for constrained local optimization Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Accelerated Block-Coordinate Relaxation for Regularized Optimization Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth

More information

PDE-Constrained and Nonsmooth Optimization

PDE-Constrained and Nonsmooth Optimization Frank E. Curtis October 1, 2009 Outline PDE-Constrained Optimization Introduction Newton s method Inexactness Results Summary and future work Nonsmooth Optimization Sequential quadratic programming (SQP)

More information

Three Generalizations of Compressed Sensing

Three Generalizations of Compressed Sensing Thomas Blumensath School of Mathematics The University of Southampton June, 2010 home prev next page Compressed Sensing and beyond y = Φx + e x R N or x C N x K is K-sparse and x x K 2 is small y R M or

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad

More information