Interpolation-Based Trust-Region Methods for DFO
|
|
- Crystal Scott
- 5 years ago
- Views:
Transcription
1 Interpolation-Based Trust-Region Methods for DFO Luis Nunes Vicente University of Coimbra (joint work with A. Bandeira, A. R. Conn, S. Gratton, and K. Scheinberg) July 27, 2010 ICCOPT, Santiago http//
2 Why Derivative-Free Optimization? 2/54 Some of the reasons to apply derivative-free optimization are the following:
3 Why Derivative-Free Optimization? 2/54 Some of the reasons to apply derivative-free optimization are the following: Nowadays computer hardware and mathematical algorithms allows increasingly large simulations.
4 Why Derivative-Free Optimization? 2/54 Some of the reasons to apply derivative-free optimization are the following: Nowadays computer hardware and mathematical algorithms allows increasingly large simulations. Functions are noisy (one cannot trust derivatives or approximate them by finite differences).
5 Why Derivative-Free Optimization? 2/54 Some of the reasons to apply derivative-free optimization are the following: Nowadays computer hardware and mathematical algorithms allows increasingly large simulations. Functions are noisy (one cannot trust derivatives or approximate them by finite differences). Binary codes (source code not available) and random simulations making automatic differentiation impossible to apply.
6 Why Derivative-Free Optimization? 2/54 Some of the reasons to apply derivative-free optimization are the following: Nowadays computer hardware and mathematical algorithms allows increasingly large simulations. Functions are noisy (one cannot trust derivatives or approximate them by finite differences). Binary codes (source code not available) and random simulations making automatic differentiation impossible to apply. Legacy codes (written in the past and not maintained by the original authors).
7 Why Derivative-Free Optimization? 2/54 Some of the reasons to apply derivative-free optimization are the following: Nowadays computer hardware and mathematical algorithms allows increasingly large simulations. Functions are noisy (one cannot trust derivatives or approximate them by finite differences). Binary codes (source code not available) and random simulations making automatic differentiation impossible to apply. Legacy codes (written in the past and not maintained by the original authors). Lack of sophistication of the user (users need improvement but want to use something simple).
8 Limitations of Derivative-Free Optimization 3/54 In DFO convergence/stopping is typically slow (per function evaluation):
9 Recent advances in direct-search methods 4/54 For a recent talk on Direct Search (8th EUROPT, 2010) see:
10 Recent advances in direct-search methods 4/54 For a recent talk on Direct Search (8th EUROPT, 2010) see: Ana Luisa Custodio Talk WA04 10:30. Ismael Vaz Talk WB04 13:30.
11 The book! 5/54 A. R. Conn, K. Scheinberg, and L. N. Vicente, Introduction to Derivative-Free Optimization, MPS-SIAM Series on Optimization, SIAM, Philadelphia, 2009.
12 Model-based trust-region methods 6/54 One typically minimizes a model m in a trust region B p (x; ): Trust-region subproblem min m(y) y B p(x; )
13 Model-based trust-region methods 6/54 One typically minimizes a model m in a trust region B p (x; ): Trust-region subproblem min m(y) y B p(x; ) In derivative-based optimization, one could use:
14 Model-based trust-region methods 6/54 One typically minimizes a model m in a trust region B p (x; ): Trust-region subproblem min m(y) y B p(x; ) In derivative-based optimization, one could use: 1st order Taylor: m(y) = f(x) + f(x) (y x)
15 Model-based trust-region methods 6/54 One typically minimizes a model m in a trust region B p (x; ): Trust-region subproblem min m(y) y B p(x; ) In derivative-based optimization, one could use: 1st order Taylor: m(y) = f(x) + f(x) (y x) (y x) H(y x)
16 Model-based trust-region methods 6/54 One typically minimizes a model m in a trust region B p (x; ): Trust-region subproblem min m(y) y B p(x; ) In derivative-based optimization, one could use: 2nd order Taylor: m(y) = f(x) + f(x) (y x) (y x) 2 f(x)(y x)
17 Fully linear models 7/54 Given a point x and a trust-region radius, a model m(y) around x is called fully linear if
18 Fully linear models 7/54 Given a point x and a trust-region radius, a model m(y) around x is called fully linear if It is C 1 with Lipschitz continuous gradient.
19 Fully linear models 7/54 Given a point x and a trust-region radius, a model m(y) around x is called fully linear if It is C 1 with Lipschitz continuous gradient. The following error bounds hold: f(y) m(y) κ eg y B(x; ) f(y) m(y) κ ef 2 y B(x; ).
20 Fully linear models 7/54 Given a point x and a trust-region radius, a model m(y) around x is called fully linear if It is C 1 with Lipschitz continuous gradient. The following error bounds hold: f(y) m(y) κ eg y B(x; ) f(y) m(y) κ ef 2 y B(x; ). For a class of fully linear models, the (unknown) constants κ ef, κ eg > 0 must be independent of x and.
21 Fully linear models 7/54 Given a point x and a trust-region radius, a model m(y) around x is called fully linear if It is C 1 with Lipschitz continuous gradient. The following error bounds hold: f(y) m(y) κ eg y B(x; ) f(y) m(y) κ ef 2 y B(x; ). For a class of fully linear models, the (unknown) constants κ ef, κ eg > 0 must be independent of x and. Fully linear models can be quadratic (or even nonlinear).
22 Fully quadratic models 8/54 Given a point x and a trust-region radius, a model m(y) around x is called fully quadratic if
23 Fully quadratic models 8/54 Given a point x and a trust-region radius, a model m(y) around x is called fully quadratic if It is C 2 with Lipschitz continuous Hessian.
24 Fully quadratic models 8/54 Given a point x and a trust-region radius, a model m(y) around x is called fully quadratic if It is C 2 with Lipschitz continuous Hessian. The following error bounds hold: 2 f(y) 2 m(y) κ eh y B(x; ) f(y) m(y) κ eg 2 y B(x; ) f(y) m(y) κ ef 3 y B(x; ).
25 Fully quadratic models 8/54 Given a point x and a trust-region radius, a model m(y) around x is called fully quadratic if It is C 2 with Lipschitz continuous Hessian. The following error bounds hold: 2 f(y) 2 m(y) κ eh y B(x; ) f(y) m(y) κ eg 2 y B(x; ) f(y) m(y) κ ef 3 y B(x; ). For a class of fully quadratic models, the (unknown) constants κ ef, κ eg, κ eh > 0 must be independent of x and.
26 Fully quadratic models Given a point x and a trust-region radius, a model m(y) around x is called fully quadratic if It is C 2 with Lipschitz continuous Hessian. The following error bounds hold: 2 f(y) 2 m(y) κ eh y B(x; ) f(y) m(y) κ eg 2 y B(x; ) f(y) m(y) κ ef 3 y B(x; ). For a class of fully quadratic models, the (unknown) constants κ ef, κ eg, κ eh > 0 must be independent of x and. Fully quadratic models are only necessary for global convergence to 2nd order stationary points. 8/54
27 Polynomial interpolation models 9/54 Given a sample set Y = {y 0, y 1,..., y p }, a polynomial basis φ, and a polynomial model m(y) = α φ(y), the interpolating conditions form the linear system: M(φ, Y )α = f(y ),
28 Polynomial interpolation models 9/54 Given a sample set Y = {y 0, y 1,..., y p }, a polynomial basis φ, and a polynomial model m(y) = α φ(y), the interpolating conditions form the linear system: M(φ, Y )α = f(y ), where M(φ, Y ) = φ 0 (y 0 ) φ 1 (y 0 ) φ p (y 0 ) φ 0 (y 1 ) φ 1 (y 1 ) φ p (y 1 ).... φ 0 (y p ) φ 1 (y p ) φ p (y p ) f(y ) = f(y 0 ) f(y 1 ). f(y p ).
29 Natural/canonical basis 10/54 The natural/canonical basis appears in a Taylor expansion and is given by: φ = {1, y 1,..., y n, 12 y21,..., 12 } y2n, y 1 y 2,..., y n 1 y n.
30 Natural/canonical basis 10/54 The natural/canonical basis appears in a Taylor expansion and is given by: φ = {1, y 1,..., y n, 12 y21,..., 12 } y2n, y 1 y 2,..., y n 1 y n. Under appropriate smoothness, the second order Taylor model, centered at 0, is: f(0) [1] + f x 1 (0)[y 1 ] + f x 2 (0)[y 2 ] + 2 f (0)[y x 2 1 2/2] + 2 f x 1 1 x 2 (0)[y 1 y 2 ] + 2 f (0)[y x 2 2 2/2]. 2
31 Well poisedness (Λ poisedness) 11/54 Λ is a Λ poisedness constant related to the geometry of Y.
32 Well poisedness (Λ poisedness) 11/54 Λ is a Λ poisedness constant related to the geometry of Y. The original definition of Λ poisedness says that the maximum absolute value of the Lagrange polynomials in B(x; ) is bounded by Λ.
33 Well poisedness (Λ poisedness) 11/54 Λ is a Λ poisedness constant related to the geometry of Y. The original definition of Λ poisedness says that the maximum absolute value of the Lagrange polynomials in B(x; ) is bounded by Λ. An equivalent definition of Λ poisedness is ( Y = α ) M( φ, Y scaled ) 1 Λ, with Y scaled obtained from Y such that Y scaled B(0; 1).
34 Well poisedness (Λ poisedness) 11/54 Λ is a Λ poisedness constant related to the geometry of Y. The original definition of Λ poisedness says that the maximum absolute value of the Lagrange polynomials in B(x; ) is bounded by Λ. An equivalent definition of Λ poisedness is ( Y = α ) M( φ, Y scaled ) 1 Λ, with Y scaled obtained from Y such that Y scaled B(0; 1). Non-squared cases are defined analogously (IDFO).
35 A badly poised set 12/54 Λ =
36 A not so badly poised set 13/54 Λ = 440.
37 Another badly poised set 14/54 Λ =
38 An ideal set 15/54 Λ = 1.
39 Quadratic interpolation models 16/54 The system M(φ, Y )α = f(y ) can be Overdetermined when Y > α. See talk on Direct Search!
40 Quadratic interpolation models 16/54 The system M(φ, Y )α = f(y ) can be Determined when Y = α. For M(φ, Y ) to be squared one needs N = (n + 2)(n + 1)/2 evaluations of f (often too expensive).
41 Quadratic interpolation models 16/54 The system M(φ, Y )α = f(y ) can be Determined when Y = α. For M(φ, Y ) to be squared one needs N = (n + 2)(n + 1)/2 evaluations of f (often too expensive). Leads to fully quadratic models when Y is well poised (the constants κ in the error bounds will depend on Λ).
42 Quadratic interpolation models 16/54 The system M(φ, Y )α = f(y ) can be Determined when Y = α. For M(φ, Y ) to be squared one needs N = (n + 2)(n + 1)/2 evaluations of f (often too expensive). Leads to fully quadratic models when Y is well poised (the constants κ in the error bounds will depend on Λ). Underdetermined when Y < α. Minimum Frobenius norm models (Powell, IDFO book).
43 Quadratic interpolation models 16/54 The system M(φ, Y )α = f(y ) can be Determined when Y = α. For M(φ, Y ) to be squared one needs N = (n + 2)(n + 1)/2 evaluations of f (often too expensive). Leads to fully quadratic models when Y is well poised (the constants κ in the error bounds will depend on Λ). Underdetermined when Y < α. Minimum Frobenius norm models (Powell, IDFO book). Other approaches?...
44 Underdetermined quadratic models 17/54 Let m be an underdetermined quadratic model (with Hessian H) built with less than N = O(n 2 ) points.
45 Underdetermined quadratic models 17/54 Let m be an underdetermined quadratic model (with Hessian H) built with less than N = O(n 2 ) points. Theorem (IDFO book) If Y is Λ L poised for linear interpolation or regression then f(y) m(y) Λ L [C f + H ] y B(x; ).
46 Underdetermined quadratic models 17/54 Let m be an underdetermined quadratic model (with Hessian H) built with less than N = O(n 2 ) points. Theorem (IDFO book) If Y is Λ L poised for linear interpolation or regression then f(y) m(y) Λ L [C f + H ] y B(x; ). One should build models by minimizing the norm of H.
47 Minimum Frobenius norm models 18/54 Using φ and separating the quadratic terms, write m(y) = α L φ L (y) + α Q φ Q (y).
48 Minimum Frobenius norm models 18/54 Using φ and separating the quadratic terms, write m(y) = α L φ L (y) + α Q φ Q (y). Then, build models by minimizing the entries of the Hessian ( Frobenius norm ): min 1 2 α Q 2 2 s.t. M( φ, Y )α = f(y ).
49 Minimum Frobenius norm models Using φ and separating the quadratic terms, write m(y) = α L φ L (y) + α Q φ Q (y). Then, build models by minimizing the entries of the Hessian ( Frobenius norm ): min 1 2 α Q 2 2 s.t. M( φ, Y )α = f(y ). The solution of this convex QP problem requires a linear solve with: [ MQ MQ M ] L ML where M( 0 φ, Y ) = [ ] M L M Q. 18/54
50 Minimum Frobenius norm models 19/54 Theorem (IDFO book) If Y is Λ F poised in the minimum Frobenius norm sense then H C f Λ F, where H is, again, the Hessian of the model.
51 Minimum Frobenius norm models 19/54 Theorem (IDFO book) If Y is Λ F poised in the minimum Frobenius norm sense then H C f Λ F, where H is, again, the Hessian of the model. Putting the two theorems together yield: f(y) m(y) Λ L [C f + C f Λ F ] y B(x; ).
52 Minimum Frobenius norm models 19/54 Theorem (IDFO book) If Y is Λ F poised in the minimum Frobenius norm sense then H C f Λ F, where H is, again, the Hessian of the model. Putting the two theorems together yield: f(y) m(y) Λ L [C f + C f Λ F ] y B(x; ). MFN models are fully linear.
53 Sparsity on the Hessian 20/54 In many problems, pairs of variables have no correlation, leading to zero second order partial derivatives in f:
54 Sparsity on the Hessian 20/54 In many problems, pairs of variables have no correlation, leading to zero second order partial derivatives in f:
55 Sparsity on the Hessian 20/54 In many problems, pairs of variables have no correlation, leading to zero second order partial derivatives in f: Thus, the Hessian 2 m(x = 0) of the model (i.e., the vector α Q in the basis φ) should be sparse.
56 Our question 21/54 Is it possible to build fully quadratic models by quadratic underdetermined interpolation (i.e., using less than N = O(n 2 ) points) in the SPARSE case?
57 Compressed sensing sparse recovery 22/54 Objective: Find sparse α subject to a highly underdetermined linear system Mα = f.
58 Compressed sensing sparse recovery 22/54 Objective: Find sparse α subject to a highly underdetermined linear system Mα = f. { min α 0 = supp(α) is NP-Hard. s.t. Mα = f
59 Compressed sensing sparse recovery 22/54 Objective: Find sparse α subject to a highly underdetermined linear system Mα = f. { min α 0 = supp(α) is NP-Hard. s.t. Mα = f { min α 1 s.t. Mα = f often recovers sparse solutions.
60 Compressed sensing sparse recovery 22/54 Objective: Find sparse α subject to a highly underdetermined linear system Mα = f. { min α 0 = supp(α) is NP-Hard. s.t. Mα = f { min α 1 s.t. Mα = f often recovers sparse solutions.
61 Restricted isometry property 23/54 Definition (RIP) The RIP Constant of order s of M (p N) is the smallest δ s such that (1 δ s ) α 2 2 Mα 2 2 (1 + δ s ) α 2 2 for all s sparse α ( α 0 s).
62 Restricted isometry property 23/54 Definition (RIP) The RIP Constant of order s of M (p N) is the smallest δ s such that (1 δ s ) α 2 2 Mα 2 2 (1 + δ s ) α 2 2 for all s sparse α ( α 0 s). Theorem (Candès, Tao, 2005, 2006) If ᾱ is s sparse and 2δ 2s + δ s < 1 then it can be recovered by l 1 -minimization: min α 1 s.t. Mα = Mᾱ.
63 Restricted isometry property 23/54 Definition (RIP) The RIP Constant of order s of M (p N) is the smallest δ s such that (1 δ s ) α 2 2 Mα 2 2 (1 + δ s ) α 2 2 for all s sparse α ( α 0 s). Theorem (Candès, Tao, 2005, 2006) If ᾱ is s sparse and 2δ 2s + δ s < 1 then it can be recovered by l 1 -minimization: min α 1 s.t. Mα = Mᾱ. i.e., the optimal solution α of this problem is unique and given by α = ᾱ.
64 Random matrices 24/54 It is hard to find deterministic matrices that satisfy the RIP for large s.
65 Random matrices 24/54 It is hard to find deterministic matrices that satisfy the RIP for large s. Using Random Matrix Theory it is possible to prove RIP for p = O(s log N). Matrices with Gaussian entries. Matrices with Bernoulli entries. Uniformly chosen subsets of discrete Fourier transform.
66 Bounded orthonormal expansions (Rauhut) 25/54 Question How to find a basis φ and a sample set Y such that M(φ, Y ) satisfies the RIP?
67 Bounded orthonormal expansions (Rauhut) 25/54 Question How to find a basis φ and a sample set Y such that M(φ, Y ) satisfies the RIP? Choose orthonormal bases.
68 Bounded orthonormal expansions (Rauhut) 25/54 Question How to find a basis φ and a sample set Y such that M(φ, Y ) satisfies the RIP? Choose orthonormal bases. Avoid localized functions ( φ i L should be uniformly bounded).
69 Bounded orthonormal expansions (Rauhut) 25/54 Question How to find a basis φ and a sample set Y such that M(φ, Y ) satisfies the RIP? Choose orthonormal bases. Avoid localized functions ( φ i L should be uniformly bounded). Select Y randomly.
70 Sparse orthonormal bounded expansion recovery 26/54 Theorem (Rauhut, 2010) If φ is orthonormal in a probability measure µ and φ i L K.
71 Sparse orthonormal bounded expansion recovery 26/54 Theorem (Rauhut, 2010) If φ is orthonormal in a probability measure µ and φ i L K. each point of Y is drawn independently according to µ.
72 Sparse orthonormal bounded expansion recovery 26/54 Theorem (Rauhut, 2010) If φ is orthonormal in a probability measure µ and φ i L K. each point of Y is drawn independently according to µ. p log p c 1K 2 s(log s) 2 log N.
73 Sparse orthonormal bounded expansion recovery 26/54 Theorem (Rauhut, 2010) If φ is orthonormal in a probability measure µ and φ i L K. each point of Y is drawn independently according to µ. p log p c 1K 2 s(log s) 2 log N. Then, with high probability, for every s sparse vector ᾱ:
74 Sparse orthonormal bounded expansion recovery 26/54 Theorem (Rauhut, 2010) If φ is orthonormal in a probability measure µ and φ i L K. each point of Y is drawn independently according to µ. p log p c 1K 2 s(log s) 2 log N. Then, with high probability, for every s sparse vector ᾱ: Given noisy samples f = M(φ, Y )ᾱ + ɛ with ɛ 2 η, let α be the solution of
75 Sparse orthonormal bounded expansion recovery 26/54 Theorem (Rauhut, 2010) If φ is orthonormal in a probability measure µ and φ i L K. each point of Y is drawn independently according to µ. p log p c 1K 2 s(log s) 2 log N. Then, with high probability, for every s sparse vector ᾱ: Given noisy samples f = M(φ, Y )ᾱ + ɛ with ɛ 2 η, let α be the solution of min α 1 s. t. M(φ, Y )α f 2 η.
76 Sparse orthonormal bounded expansion recovery 26/54 Theorem (Rauhut, 2010) If φ is orthonormal in a probability measure µ and φ i L K. each point of Y is drawn independently according to µ. p log p c 1K 2 s(log s) 2 log N. Then, with high probability, for every s sparse vector ᾱ: Given noisy samples f = M(φ, Y )ᾱ + ɛ with ɛ 2 η, let α be the solution of min α 1 s. t. M(φ, Y )α f 2 η. Then, ᾱ α 2 C p η.
77 What basis do we need for sparse Hessian recovery? 27/54 Remember the second order Taylor model f(0) [1] + f x 1 (0)[y 1 ] + f x 2 (0)[y 2 ] + 2 f (0)[y x 2 1 2/2] + 2 f x 1 1 x 2 (0)[y 1 y 2 ] + 2 f (0)[y x 2 2 2/2]. 2
78 What basis do we need for sparse Hessian recovery? 27/54 Remember the second order Taylor model f(0) [1] + f x 1 (0)[y 1 ] + f x 2 (0)[y 2 ] + 2 f (0)[y x 2 1 2/2] + 2 f x 1 1 x 2 (0)[y 1 y 2 ] + 2 f (0)[y x 2 2 2/2]. 2 So, we want something like the natural/canonical basis: φ = {1, y 1,..., y n, 12 y21,..., 12 } y2n, y 1 y 2,..., y n 1 y n.
79 An orthonormal basis for quadratics (appropriate for sparse Hessian recovery) 28/54 Proposition (Bandeira, Scheinberg, and Vicente, 2010) The following basis ψ for quadratics is orthonormal (w.r.t. the uniform measure on B (0; )) and satisfies ψ ι L 3.
80 An orthonormal basis for quadratics (appropriate for sparse Hessian recovery) 28/54 Proposition (Bandeira, Scheinberg, and Vicente, 2010) The following basis ψ for quadratics is orthonormal (w.r.t. the uniform measure on B (0; )) and satisfies ψ ι L 3. ψ 0 (u) = 1 ψ 1,i (u) = 3 u i ψ 2,ij (u) = 3 2 u i u j ψ 2,i (u) = u 2 2 i 5 2.
81 An orthonormal basis for quadratics (appropriate for sparse Hessian recovery) 28/54 Proposition (Bandeira, Scheinberg, and Vicente, 2010) The following basis ψ for quadratics is orthonormal (w.r.t. the uniform measure on B (0; )) and satisfies ψ ι L 3. ψ 0 (u) = 1 ψ 1,i (u) = 3 u i ψ 2,ij (u) = 3 2 u i u j ψ 2,i (u) = u 2 2 i 5 2. ψ is very similar to the canonical basis, and preserves the sparsity of the Hessian (at 0).
82 Hessian sparse recovery 29/54 Let us look again at min α 1 s. t. M(φ, Y )α f 2 η, where f = M(ψ, Y )ᾱ + ɛ.
83 Hessian sparse recovery 29/54 Let us look again at min α 1 s. t. M(φ, Y )α f 2 η, where f = M(ψ, Y )ᾱ + ɛ. So, the noisy data is f = f(y ).
84 Hessian sparse recovery 29/54 Let us look again at min α 1 s. t. M(φ, Y )α f 2 η, where f = M(ψ, Y )ᾱ + ɛ. So, the noisy data is f = f(y ). What we are trying to recover is the 2nd order Taylor model ᾱ ψ(y).
85 Hessian sparse recovery 29/54 Let us look again at min α 1 s. t. M(φ, Y )α f 2 η, where f = M(ψ, Y )ᾱ + ɛ. So, the noisy data is f = f(y ). What we are trying to recover is the 2nd order Taylor model ᾱ ψ(y). Thus, in ɛ η, one has η = O( 3 ).
86 Hessian sparse recovery 30/54 Theorem (Bandeira, Scheinberg, and Vicente, 2010) If the Hessian of f at 0 is s sparse.
87 Hessian sparse recovery 30/54 Theorem (Bandeira, Scheinberg, and Vicente, 2010) If the Hessian of f at 0 is s sparse. Y is a random sample set chosen w.r.t. the uniform measure on B (0; ).
88 Hessian sparse recovery 30/54 Theorem (Bandeira, Scheinberg, and Vicente, 2010) If the Hessian of f at 0 is s sparse. Y is a random sample set chosen w.r.t. the uniform measure on B (0; ). p log p 9c 1(s + n + 1) log 2 (s + n + 1)log O(n 2 ).
89 Hessian sparse recovery 30/54 Theorem (Bandeira, Scheinberg, and Vicente, 2010) If the Hessian of f at 0 is s sparse. Y is a random sample set chosen w.r.t. the uniform measure on B (0; ). p log p 9c 1(s + n + 1) log 2 (s + n + 1)log O(n 2 ). Then, with high probability, the quadratic
90 Hessian sparse recovery 30/54 Theorem (Bandeira, Scheinberg, and Vicente, 2010) If the Hessian of f at 0 is s sparse. Y is a random sample set chosen w.r.t. the uniform measure on B (0; ). p log p 9c 1(s + n + 1) log 2 (s + n + 1)log O(n 2 ). Then, with high probability, the quadratic q = α ι ψ ι obtained by solving the noisy l 1 -minimization problem is a fully quadratic model for f (with error constants not depending on ).
91 An answer to our question 31/54 For instance, when the number of non-zeros of the Hessian is s = O(n), we are able to construct fully quadratic models with O(n log 4 n) points.
92 An answer to our question 31/54 For instance, when the number of non-zeros of the Hessian is s = O(n), we are able to construct fully quadratic models with O(n log 4 n) points. Also, we recover both the function and its sparsity structure.
93 Open question 32/54 Generalize the result above when minimizing only the l 1 -norm of the Hessian (α Q ) rather than of the whole α. Numerical simulations have shown that such approach is (slightly) advantageous.
94 Remarks 33/54 However, the Theorem only provides motivation because, in a practical approach we:
95 Remarks 33/54 However, the Theorem only provides motivation because, in a practical approach we: Solve min α Q 1 s. t. M( φ L, Y )α L + M( φ Q, Y )α Q = f(y ).
96 Remarks 33/54 However, the Theorem only provides motivation because, in a practical approach we: Solve min α Q 1 s. t. M( φ L, Y )α L + M( φ Q, Y )α Q = f(y ). Deal with small n (from the DFO setting) and the bound we obtain is asymptotical.
97 Remarks 33/54 However, the Theorem only provides motivation because, in a practical approach we: Solve min α Q 1 s. t. M( φ L, Y )α L + M( φ Q, Y )α Q = f(y ). Deal with small n (from the DFO setting) and the bound we obtain is asymptotical. Use deterministic sampling.
98 Interpolation-based trust-region methods 34/54
99 Interpolation-based trust-region methods 34/54 Trust-region methods for DFO typically: attempt to form quadratic models (by interpolation/regression and using polynomials or radial basis functions) m k (x k + s) = f(x k ) + g k s s H k s based on (well poised) sample sets.
100 Interpolation-based trust-region methods 34/54 Trust-region methods for DFO typically: attempt to form quadratic models (by interpolation/regression and using polynomials or radial basis functions) m k (x k + s) = f(x k ) + g k s s H k s based on (well poised) sample sets. Well poisedness ensures fully linear or fully quadratic models.
101 Interpolation-based trust-region methods 34/54 Trust-region methods for DFO typically: attempt to form quadratic models (by interpolation/regression and using polynomials or radial basis functions) m k (x k + s) = f(x k ) + g k s s H k s based on (well poised) sample sets. Well poisedness ensures fully linear or fully quadratic models. Calculate a step s k by approximately solving the trust-region subproblem min m k (x k + s). s B 2 (x k ; k )
102 Interpolation-based trust-region methods 35/54 Set x k+1 to x k + s k (success) or to x k (unsuccess) and update k depending on the value of ρ k = f(x k) f(x k + s k ) m k (x k ) m k (x k + s k ).
103 Interpolation-based trust-region methods 35/54 Set x k+1 to x k + s k (success) or to x k (unsuccess) and update k depending on the value of ρ k = f(x k) f(x k + s k ) m k (x k ) m k (x k + s k ). Attempt to accept steps based on simple decrease, i.e., if ρ k > 0 f(x k + s k ) < f(x k ).
104 Interpolation-based trust-region methods (IDFO) 36/54 Reduce k only if ρ k is small and the model is FL/FQ unsuccessful iterations.
105 Interpolation-based trust-region methods (IDFO) 36/54 Reduce k only if ρ k is small and the model is FL/FQ unsuccessful iterations. Accept new iterates based on simple decrease (ρ k > 0) as long as the model is FL/FQ acceptable iterations.
106 Interpolation-based trust-region methods (IDFO) 36/54 Reduce k only if ρ k is small and the model is FL/FQ unsuccessful iterations. Accept new iterates based on simple decrease (ρ k > 0) as long as the model is FL/FQ acceptable iterations. Allow for model-improving iterations (when ρ k is not large enough and the model is not certifiably FL/FQ).
107 Interpolation-based trust-region methods (IDFO) 36/54 Reduce k only if ρ k is small and the model is FL/FQ unsuccessful iterations. Accept new iterates based on simple decrease (ρ k > 0) as long as the model is FL/FQ acceptable iterations. Allow for model-improving iterations (when ρ k is not large enough and the model is not certifiably FL/FQ). Do not reduce k.
108 Interpolation-based trust-region methods (IDFO) 37/54 Incorporate a criticality step (1st or 2nd order) when the stationarity of the model is small.
109 Interpolation-based trust-region methods (IDFO) 37/54 Incorporate a criticality step (1st or 2nd order) when the stationarity of the model is small. Internal cycle of reductions of k until model is well poised in B(x k ; g k ).
110 Behavior of the trust-region radius 38/54 Due to the criticality step, one has for successful iterations: f(x k ) f(x k+1 ) O( g k min{ g k, k }) O( 2 k ).
111 Behavior of the trust-region radius 38/54 Due to the criticality step, one has for successful iterations: Thus: f(x k ) f(x k+1 ) O( g k min{ g k, k }) O( 2 k ). Theorem (Conn, Scheinberg, and Vicente, 2009) The trust-region radius converges to zero: lim k = 0. k +
112 Behavior of the trust-region radius 38/54 Due to the criticality step, one has for successful iterations: Thus: f(x k ) f(x k+1 ) O( g k min{ g k, k }) O( 2 k ). Theorem (Conn, Scheinberg, and Vicente, 2009) The trust-region radius converges to zero: lim k = 0. k + Similar to direct-search methods where lim inf k + α k = 0.
113 Analysis of TR methods (1st order) 39/54 Using fully linear models: Theorem (Conn, Scheinberg, and Vicente, 2009) If f is Lips. continuous and f is bounded below on L(x 0 ) then lim f(x k) = 0. k +
114 Analysis of TR methods (1st order) 39/54 Using fully linear models: Theorem (Conn, Scheinberg, and Vicente, 2009) If f is Lips. continuous and f is bounded below on L(x 0 ) then lim f(x k) = 0. k + Valid also for simple decrease (acceptable iterations).
115 Analysis of TR methods (2nd order) 40/54 Using fully quadratic models: Theorem (Conn, Scheinberg, and Vicente, 2009) If 2 f is Lips. continuous and f is bounded below on L(x 0 ) then lim max { f(x k ), λ min [ 2 f(x k )] } = 0. k +
116 Analysis of TR methods (2nd order) 40/54 Using fully quadratic models: Theorem (Conn, Scheinberg, and Vicente, 2009) If 2 f is Lips. continuous and f is bounded below on L(x 0 ) then lim max { f(x k ), λ min [ 2 f(x k )] } = 0. k + Valid also for simple decrease (acceptable iterations).
117 Analysis of TR methods (2nd order) 40/54 Using fully quadratic models: Theorem (Conn, Scheinberg, and Vicente, 2009) If 2 f is Lips. continuous and f is bounded below on L(x 0 ) then lim max { f(x k ), λ min [ 2 f(x k )] } = 0. k + Valid also for simple decrease (acceptable iterations). Going from lim inf to lim requires changing the update of k.
118 Sample set management 41/54 Recently, Fasano, Morales, and Nocedal (2009) suggested an one-point exchange:
119 Sample set management 41/54 Recently, Fasano, Morales, and Nocedal (2009) suggested an one-point exchange: In successful iterations: where y out = argmax y x k 2. Y k+1 = Y k {x k + s k } \ {y out }.
120 Sample set management 41/54 Recently, Fasano, Morales, and Nocedal (2009) suggested an one-point exchange: In successful iterations: where y out = argmax y x k 2. In the unsuccessful case: Y k+1 = Y k {x k + s k } \ {y out }. Y k+1 = Y k {x k + s k } \ {y out } if y out x k s k.
121 Sample set management 41/54 Recently, Fasano, Morales, and Nocedal (2009) suggested an one-point exchange: In successful iterations: where y out = argmax y x k 2. In the unsuccessful case: Y k+1 = Y k {x k + s k } \ {y out }. Y k+1 = Y k {x k + s k } \ {y out } if y out x k s k. Do not perform model-improving iterations.
122 Sample set management 41/54 Recently, Fasano, Morales, and Nocedal (2009) suggested an one-point exchange: In successful iterations: where y out = argmax y x k 2. In the unsuccessful case: Y k+1 = Y k {x k + s k } \ {y out }. Y k+1 = Y k {x k + s k } \ {y out } if y out x k s k. Do not perform model-improving iterations. They observed sample sets not badly poised!
123 Self-correcting geometry 42/54 Later, Scheinberg and Toint (2009) proposed:
124 Self-correcting geometry 42/54 Later, Scheinberg and Toint (2009) proposed: A self-correcting geometry approach based on one-point exchanges, globally convergent to first-order stationary points (lim inf).
125 Self-correcting geometry 42/54 Later, Scheinberg and Toint (2009) proposed: A self-correcting geometry approach based on one-point exchanges, globally convergent to first-order stationary points (lim inf). In the unsuccessful case, y out is not only based on y x k 2, but also on the values of the Lagrange polynomials at x k + s k.
126 Self-correcting geometry 42/54 Later, Scheinberg and Toint (2009) proposed: A self-correcting geometry approach based on one-point exchanges, globally convergent to first-order stationary points (lim inf). In the unsuccessful case, y out is not only based on y x k 2, but also on the values of the Lagrange polynomials at x k + s k. They showed that, if k is small compared to g k, then the step either improves the function or the geometry/poisedness of the model.
127 Self-correcting geometry 42/54 Later, Scheinberg and Toint (2009) proposed: A self-correcting geometry approach based on one-point exchanges, globally convergent to first-order stationary points (lim inf). In the unsuccessful case, y out is not only based on y x k 2, but also on the values of the Lagrange polynomials at x k + s k. They showed that, if k is small compared to g k, then the step either improves the function or the geometry/poisedness of the model. In their approach, model-improving iterations are not needed.
128 Self-correcting geometry 42/54 Later, Scheinberg and Toint (2009) proposed: A self-correcting geometry approach based on one-point exchanges, globally convergent to first-order stationary points (lim inf). In the unsuccessful case, y out is not only based on y x k 2, but also on the values of the Lagrange polynomials at x k + s k. They showed that, if k is small compared to g k, then the step either improves the function or the geometry/poisedness of the model. In their approach, model-improving iterations are not needed. They showed, however, that the criticality step is indeed necessary.
129 Without criticality step... (Scheinberg and Toint) 43/54
130 Without criticality step... (Scheinberg and Toint) 44/54
131 Without criticality step... (Scheinberg and Toint) 45/54
132 Without criticality step... (Scheinberg and Toint) 46/54
133 A practical interpolation-based trust-region method 47/54
134 A practical interpolation-based trust-region method 47/54 Model building: If Y k = N = O(n 2 ), use determined quadratic interpolation.
135 A practical interpolation-based trust-region method 47/54 Model building: If Y k = N = O(n 2 ), use determined quadratic interpolation. Otherwise use l 1 (p = 1) or Frobenius (p = 2) minimum norm quadratic interpolation: min 1 p α Q p p s. t. M( φ L, Y scaled )α L + M( φ Q, Y scaled )α Q = f(y scaled ).
136 A practical interpolation-based trust-region method 47/54 Model building: If Y k = N = O(n 2 ), use determined quadratic interpolation. Otherwise use l 1 (p = 1) or Frobenius (p = 2) minimum norm quadratic interpolation: min 1 p α Q p p s. t. M( φ L, Y scaled )α L + M( φ Q, Y scaled )α Q = f(y scaled ). Sample set update one starts with Y 0 = O(n):
137 A practical interpolation-based trust-region method 47/54 Model building: If Y k = N = O(n 2 ), use determined quadratic interpolation. Otherwise use l 1 (p = 1) or Frobenius (p = 2) minimum norm quadratic interpolation: min 1 p α Q p p s. t. M( φ L, Y scaled )α L + M( φ Q, Y scaled )α Q = f(y scaled ). Sample set update one starts with Y 0 = O(n): If Y k < N = O(n 2 ), set Y k+1 = Y k {x k + s k }.
138 A practical interpolation-based trust-region method 47/54 Model building: If Y k = N = O(n 2 ), use determined quadratic interpolation. Otherwise use l 1 (p = 1) or Frobenius (p = 2) minimum norm quadratic interpolation: min 1 p α Q p p s. t. M( φ L, Y scaled )α L + M( φ Q, Y scaled )α Q = f(y scaled ). Sample set update one starts with Y 0 = O(n): If Y k < N = O(n 2 ), set Y k+1 = Y k {x k + s k }. Otherwise as in Fasano et al., but with y out = argmax y x k+1 2.
139 A practical interpolation-based trust-region method 47/54 Model building: If Y k = N = O(n 2 ), use determined quadratic interpolation. Otherwise use l 1 (p = 1) or Frobenius (p = 2) minimum norm quadratic interpolation: min 1 p α Q p p s. t. M( φ L, Y scaled )α L + M( φ Q, Y scaled )α Q = f(y scaled ). Sample set update one starts with Y 0 = O(n): If Y k < N = O(n 2 ), set Y k+1 = Y k {x k + s k }. Otherwise as in Fasano et al., but with y out = argmax y x k+1 2. Criticality step : If k is very small, discard points far away from the trust region.
140 Performance profiles (accuracy of 10 4 in function values) 48/54 Figure: Performance profiles comparing DFO-TR (l 1 and Frobenius) and NEWUOA (Powell) in a test set from CUTEr (Fasano et al.).
141 Performance profiles (accuracy of 10 6 in function values) 49/54 Figure: Performance profiles comparing DFO-TR (l 1 and Frobenius) and NEWUOA (Powell) in a test set from CUTEr (Fasano et al.).
142 Concluding remarks 50/54 Optimization is a fundamental tool in Compressed Sensing. However, this work shows that CS can also be applied to Optimization.
143 Concluding remarks 50/54 Optimization is a fundamental tool in Compressed Sensing. However, this work shows that CS can also be applied to Optimization. In a sparse scenario, we were able to construct fully quadratic models with samples of size O(n log 4 n) instead of the classical O(n 2 ).
144 Concluding remarks 50/54 Optimization is a fundamental tool in Compressed Sensing. However, this work shows that CS can also be applied to Optimization. In a sparse scenario, we were able to construct fully quadratic models with samples of size O(n log 4 n) instead of the classical O(n 2 ). We proposed a practical DFO method (using l 1 -minimization) that was able to outperform state-of-the-art methods in several numerical tests (in the already tough DFO scenario where n is small).
145 Open questions 51/54 Improve the efficiency of the model l 1 -minimization, by properly warmstarting it (currently we solve it as an LP using lipsol by Y. Zhang).
146 Open questions 51/54 Improve the efficiency of the model l 1 -minimization, by properly warmstarting it (currently we solve it as an LP using lipsol by Y. Zhang). Study the convergence properties of possibly stochastic interpolation-based trust-region methods.
147 Open questions 51/54 Improve the efficiency of the model l 1 -minimization, by properly warmstarting it (currently we solve it as an LP using lipsol by Y. Zhang). Study the convergence properties of possibly stochastic interpolation-based trust-region methods. Investigate l 1 -minimization techniques in statistical models (like Kriging for interpolating sparse data sets), but applied to Optimization.
148 Open questions 51/54 Improve the efficiency of the model l 1 -minimization, by properly warmstarting it (currently we solve it as an LP using lipsol by Y. Zhang). Study the convergence properties of possibly stochastic interpolation-based trust-region methods. Investigate l 1 -minimization techniques in statistical models (like Kriging for interpolating sparse data sets), but applied to Optimization. Develop a globally convergent model-based trust-region method for non-smooth functions.
149 References 52/54 A. Bandeira, K. Scheinberg, and L. N. Vicente, Computation of sparse low degree interpolating polynomials and their application to derivative-free optimization, in preparation, A. R. Conn, K. Scheinberg, and L. N. Vicente, Global convergence of general derivative-free trust-region algorithms to first and second order critical points, SIAM J. Optim., 20 (2009) G. Fasano, J. L. Morales, and J. Nocedal, On the geometry phase in model-based algorithms for derivative-free optimization, Optim. Methods Softw., 24 (2009) K. Scheinberg and Ph. L. Toint, Self-correcting geometry in model-based algorithms for derivative-free unconstrained optimization, 2009.
150 Other recent work 53/54 S. Gratton, Ph. L. Toint, and A. Tröltzsch, An active-set trust-region method for derivative-free nonlinear bound-constrained optimization, Talk WA02 11:30
151 Other recent work 53/54 S. Gratton, Ph. L. Toint, and A. Tröltzsch, An active-set trust-region method for derivative-free nonlinear bound-constrained optimization, Talk WA02 11:30 S. Gratton and L. N. Vicente, A surrogate management framework using rigorous trust-regions steps, 2010.
152 Other recent work 53/54 S. Gratton, Ph. L. Toint, and A. Tröltzsch, An active-set trust-region method for derivative-free nonlinear bound-constrained optimization, Talk WA02 11:30 S. Gratton and L. N. Vicente, A surrogate management framework using rigorous trust-regions steps, M. J. D. Powell, The BOBYQA algorithm for bound constrained optimization without derivatives, 2009.
153 Other recent work 53/54 S. Gratton, Ph. L. Toint, and A. Tröltzsch, An active-set trust-region method for derivative-free nonlinear bound-constrained optimization, Talk WA02 11:30 S. Gratton and L. N. Vicente, A surrogate management framework using rigorous trust-regions steps, M. J. D. Powell, The BOBYQA algorithm for bound constrained optimization without derivatives, S. M. Wild and C. Shoemaker, Global convergence of radial basis function trust region derivative-free algorithms, 2009.
154 Optimization 2011 (July 24 27, Portugal) 54/54
A recursive model-based trust-region method for derivative-free bound-constrained optimization.
A recursive model-based trust-region method for derivative-free bound-constrained optimization. ANKE TRÖLTZSCH [CERFACS, TOULOUSE, FRANCE] JOINT WORK WITH: SERGE GRATTON [ENSEEIHT, TOULOUSE, FRANCE] PHILIPPE
More informationDerivative-Free Trust-Region methods
Derivative-Free Trust-Region methods MTH6418 S. Le Digabel, École Polytechnique de Montréal Fall 2015 (v4) MTH6418: DFTR 1/32 Plan Quadratic models Model Quality Derivative-Free Trust-Region Framework
More informationBilevel Derivative-Free Optimization and its Application to Robust Optimization
Bilevel Derivative-Free Optimization and its Application to Robust Optimization A. R. Conn L. N. Vicente September 15, 2010 Abstract We address bilevel programming problems when the derivatives of both
More information1101 Kitchawan Road, Yorktown Heights, New York 10598, USA
Self-correcting geometry in model-based algorithms for derivative-free unconstrained optimization by K. Scheinberg and Ph. L. Toint Report 9/6 February 9 IBM T.J. Watson Center Kitchawan Road Yorktown
More informationGlobal convergence of trust-region algorithms for constrained minimization without derivatives
Global convergence of trust-region algorithms for constrained minimization without derivatives P.D. Conejo E.W. Karas A.A. Ribeiro L.G. Pedroso M. Sachine September 27, 2012 Abstract In this work we propose
More informationA New Trust Region Algorithm Using Radial Basis Function Models
A New Trust Region Algorithm Using Radial Basis Function Models Seppo Pulkkinen University of Turku Department of Mathematics July 14, 2010 Outline 1 Introduction 2 Background Taylor series approximations
More informationGEOMETRY OF INTERPOLATION SETS IN DERIVATIVE FREE OPTIMIZATION
GEOMETRY OF INTERPOLATION SETS IN DERIVATIVE FREE OPTIMIZATION ANDREW R. CONN, KATYA SCHEINBERG, AND LUíS N. VICENTE Abstract. We consider derivative free methods based on sampling approaches for nonlinear
More informationA DERIVATIVE-FREE ALGORITHM FOR THE LEAST-SQUARE MINIMIZATION
A DERIVATIVE-FREE ALGORITHM FOR THE LEAST-SQUARE MINIMIZATION HONGCHAO ZHANG, ANDREW R. CONN, AND KATYA SCHEINBERG Abstract. We develop a framework for a class of derivative-free algorithms for the least-squares
More informationSparse Legendre expansions via l 1 minimization
Sparse Legendre expansions via l 1 minimization Rachel Ward, Courant Institute, NYU Joint work with Holger Rauhut, Hausdorff Center for Mathematics, Bonn, Germany. June 8, 2010 Outline Sparse recovery
More informationA derivative-free trust-funnel method for equality-constrained nonlinear optimization by Ph. R. Sampaio and Ph. L. Toint Report NAXYS-??-204 8 February 204 Log 2 Scaled Performance Profile on Subset of
More informationMAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing
MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very
More informationA trust-region derivative-free algorithm for constrained optimization
A trust-region derivative-free algorithm for constrained optimization P.D. Conejo and E.W. Karas and L.G. Pedroso February 26, 204 Abstract We propose a trust-region algorithm for constrained optimization
More informationInterpolation via weighted l 1 minimization
Interpolation via weighted l 1 minimization Rachel Ward University of Texas at Austin December 12, 2014 Joint work with Holger Rauhut (Aachen University) Function interpolation Given a function f : D C
More informationA Derivative-Free Gauss-Newton Method
A Derivative-Free Gauss-Newton Method Coralia Cartis Lindon Roberts 29th October 2017 Abstract We present, a derivative-free version of the Gauss-Newton method for solving nonlinear least-squares problems.
More informationA multistart multisplit direct search methodology for global optimization
1/69 A multistart multisplit direct search methodology for global optimization Ismael Vaz (Univ. Minho) Luis Nunes Vicente (Univ. Coimbra) IPAM, Optimization and Optimal Control for Complex Energy and
More informationUSING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS
USING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS A. L. CUSTÓDIO, J. E. DENNIS JR., AND L. N. VICENTE Abstract. It has been shown recently that the efficiency of direct search methods
More informationSparse recovery for spherical harmonic expansions
Rachel Ward 1 1 Courant Institute, New York University Workshop Sparsity and Cosmology, Nice May 31, 2011 Cosmic Microwave Background Radiation (CMB) map Temperature is measured as T (θ, ϕ) = k k=0 l=
More informationUSING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS
Pré-Publicações do Departamento de Matemática Universidade de Coimbra Preprint Number 06 48 USING SIMPLEX GRADIENTS OF NONSMOOTH FUNCTIONS IN DIRECT SEARCH METHODS A. L. CUSTÓDIO, J. E. DENNIS JR. AND
More informationConstrained optimization
Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained
More informationRecovering overcomplete sparse representations from structured sensing
Recovering overcomplete sparse representations from structured sensing Deanna Needell Claremont McKenna College Feb. 2015 Support: Alfred P. Sloan Foundation and NSF CAREER #1348721. Joint work with Felix
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationGlobal Convergence of Radial Basis Function Trust Region Derivative-Free Algorithms
ARGONNE NATIONAL LABORATORY 9700 South Cass Avenue Argonne, Illinois 60439 Global Convergence of Radial Basis Function Trust Region Derivative-Free Algorithms Stefan M. Wild and Christine Shoemaker Mathematics
More informationInterpolation via weighted l 1 -minimization
Interpolation via weighted l 1 -minimization Holger Rauhut RWTH Aachen University Lehrstuhl C für Mathematik (Analysis) Matheon Workshop Compressive Sensing and Its Applications TU Berlin, December 11,
More informationSURVEY OF TRUST-REGION DERIVATIVE FREE OPTIMIZATION METHODS
Manuscript submitted to AIMS Journals Volume X, Number 0X, XX 200X Website: http://aimsciences.org pp. X XX SURVEY OF TRUST-REGION DERIVATIVE FREE OPTIMIZATION METHODS Middle East Technical University
More informationInterpolation via weighted l 1 -minimization
Interpolation via weighted l 1 -minimization Holger Rauhut RWTH Aachen University Lehrstuhl C für Mathematik (Analysis) Mathematical Analysis and Applications Workshop in honor of Rupert Lasser Helmholtz
More informationRecent Developments in Compressed Sensing
Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline
More informationA trust-funnel method for nonlinear optimization problems with general nonlinear constraints and its application to derivative-free optimization
A trust-funnel method for nonlinear optimization problems with general nonlinear constraints and its application to derivative-free optimization by Ph. R. Sampaio and Ph. L. Toint Report naxys-3-25 January
More informationWorst Case Complexity of Direct Search
Worst Case Complexity of Direct Search L. N. Vicente May 3, 200 Abstract In this paper we prove that direct search of directional type shares the worst case complexity bound of steepest descent when sufficient
More informationStrengthened Sobolev inequalities for a random subspace of functions
Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)
More informationGlobal and derivative-free optimization Lectures 1-4
Global and derivative-free optimization Lectures 1-4 Coralia Cartis, University of Oxford INFOMM CDT: Contemporary Numerical Techniques Global and derivative-free optimizationlectures 1-4 p. 1/46 Lectures
More informationBenchmarking Derivative-Free Optimization Algorithms
ARGONNE NATIONAL LABORATORY 9700 South Cass Avenue Argonne, Illinois 60439 Benchmarking Derivative-Free Optimization Algorithms Jorge J. Moré and Stefan M. Wild Mathematics and Computer Science Division
More information1. Introduction. In this paper, we design a class of derivative-free optimization algorithms for the following least-squares problem:
SIAM J. OPTIM. Vol. 20, No. 6, pp. 3555 3576 c 2010 Society for Industrial and Applied Mathematics A DERIVATIVE-FREE ALGORITHM FOR LEAST-SQUARES MINIMIZATION HONGCHAO ZHANG, ANDREW R. CONN, AND KATYA SCHEINBERG
More informationSuppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.
Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of
More informationWorst Case Complexity of Direct Search
Worst Case Complexity of Direct Search L. N. Vicente October 25, 2012 Abstract In this paper we prove that the broad class of direct-search methods of directional type based on imposing sufficient decrease
More informationMesures de criticalité d'ordres 1 et 2 en recherche directe
Mesures de criticalité d'ordres 1 et 2 en recherche directe From rst to second-order criticality measures in direct search Clément Royer ENSEEIHT-IRIT, Toulouse, France Co-auteurs: S. Gratton, L. N. Vicente
More informationsparse and low-rank tensor recovery Cubic-Sketching
Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationAN INTRODUCTION TO COMPRESSIVE SENSING
AN INTRODUCTION TO COMPRESSIVE SENSING Rodrigo B. Platte School of Mathematical and Statistical Sciences APM/EEE598 Reverse Engineering of Complex Dynamical Networks OUTLINE 1 INTRODUCTION 2 INCOHERENCE
More informationLecture Notes 9: Constrained Optimization
Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form
More information1. Introduction. In this paper we address unconstrained local minimization
ORBIT: OPTIMIZATION BY RADIAL BASIS FUNCTION INTERPOLATION IN TRUST-REGIONS STEFAN M. WILD, ROMMEL G. REGIS, AND CHRISTINE A. SHOEMAKER Abstract. We present a new derivative-free algorithm, ORBIT, for
More informationIntroduction to Compressed Sensing
Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationOptimization for Compressed Sensing
Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve
More informationAn Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization
An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with Travis Johnson, Northwestern University Daniel P. Robinson, Johns
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationNear Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing
Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar
More informationIntroduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011
Compressed Sensing Huichao Xue CS3750 Fall 2011 Table of Contents Introduction From News Reports Abstract Definition How it works A review of L 1 norm The Algorithm Backgrounds for underdetermined linear
More informationLimitations in Approximating RIP
Alok Puranik Mentor: Adrian Vladu Fifth Annual PRIMES Conference, 2015 Outline 1 Background The Problem Motivation Construction Certification 2 Planted model Planting eigenvalues Analysis Distinguishing
More informationGREEDY SIGNAL RECOVERY REVIEW
GREEDY SIGNAL RECOVERY REVIEW DEANNA NEEDELL, JOEL A. TROPP, ROMAN VERSHYNIN Abstract. The two major approaches to sparse recovery are L 1-minimization and greedy methods. Recently, Needell and Vershynin
More informationIntroduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems
New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems Z. Akbari 1, R. Yousefpour 2, M. R. Peyghami 3 1 Department of Mathematics, K.N. Toosi University of Technology,
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationSparse Solutions of an Undetermined Linear System
1 Sparse Solutions of an Undetermined Linear System Maddullah Almerdasy New York University Tandon School of Engineering arxiv:1702.07096v1 [math.oc] 23 Feb 2017 Abstract This work proposes a research
More informationWorst-Case Complexity Guarantees and Nonconvex Smooth Optimization
Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More information1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:
Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion
More informationAn Introduction to Sparse Approximation
An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,
More informationDerivative-Free Optimization of Noisy Functions via Quasi-Newton Methods. Jorge Nocedal
Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods Jorge Nocedal Northwestern University Huatulco, Jan 2018 1 Collaborators Albert Berahas Northwestern University Richard Byrd University
More informationLeast Squares Approximation
Chapter 6 Least Squares Approximation As we saw in Chapter 5 we can interpret radial basis function interpolation as a constrained optimization problem. We now take this point of view again, but start
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization
E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu
More informationAn iterative hard thresholding estimator for low rank matrix recovery
An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical
More informationQuasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb
More informationarxiv: v1 [math.oc] 1 Jul 2016
Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the
More informationBOOSTERS: A Derivative-Free Algorithm Based on Radial Basis Functions
BOOSTERS: A Derivative-Free Algorithm Based on Radial Basis Functions R. OEUVRAY Rodrigue.Oeuvay@gmail.com M. BIERLAIRE Michel.Bierlaire@epfl.ch December 21, 2005 Abstract Derivative-free optimization
More informationConvex Optimization and l 1 -minimization
Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationNew Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit
New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence
More informationMinimizing the Difference of L 1 and L 2 Norms with Applications
1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:
More informationUniqueness Conditions for A Class of l 0 -Minimization Problems
Uniqueness Conditions for A Class of l 0 -Minimization Problems Chunlei Xu and Yun-Bin Zhao October, 03, Revised January 04 Abstract. We consider a class of l 0 -minimization problems, which is to search
More informationCoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp
CoSaMP Iterative signal recovery from incomplete and inaccurate samples Joel A. Tropp Applied & Computational Mathematics California Institute of Technology jtropp@acm.caltech.edu Joint with D. Needell
More informationCompressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles
Or: the equation Ax = b, revisited University of California, Los Angeles Mahler Lecture Series Acquiring signals Many types of real-world signals (e.g. sound, images, video) can be viewed as an n-dimensional
More informationModel-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk
Model-Based Compressive Sensing for Signal Ensembles Marco F. Duarte Volkan Cevher Richard G. Baraniuk Concise Signal Structure Sparse signal: only K out of N coordinates nonzero model: union of K-dimensional
More informationStochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions
International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.
More information10-725/36-725: Convex Optimization Prerequisite Topics
10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the
More informationCompressed Sensing and Sparse Recovery
ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationTutorial: Sparse Signal Recovery
Tutorial: Sparse Signal Recovery Anna C. Gilbert Department of Mathematics University of Michigan (Sparse) Signal recovery problem signal or population length N k important Φ x = y measurements or tests:
More informationConstructing Explicit RIP Matrices and the Square-Root Bottleneck
Constructing Explicit RIP Matrices and the Square-Root Bottleneck Ryan Cinoman July 18, 2018 Ryan Cinoman Constructing Explicit RIP Matrices July 18, 2018 1 / 36 Outline 1 Introduction 2 Restricted Isometry
More information1 Computing with constraints
Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)
More informationZero-Order Methods for the Optimization of Noisy Functions. Jorge Nocedal
Zero-Order Methods for the Optimization of Noisy Functions Jorge Nocedal Northwestern University Simons Institute, October 2017 1 Collaborators Albert Berahas Northwestern University Richard Byrd University
More information1. Introduction. We analyze a trust region version of Newton s method for the optimization problem
SIAM J. OPTIM. Vol. 9, No. 4, pp. 1100 1127 c 1999 Society for Industrial and Applied Mathematics NEWTON S METHOD FOR LARGE BOUND-CONSTRAINED OPTIMIZATION PROBLEMS CHIH-JEN LIN AND JORGE J. MORÉ To John
More informationOn Lagrange multipliers of trust region subproblems
On Lagrange multipliers of trust region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Applied Linear Algebra April 28-30, 2008 Novi Sad, Serbia Outline
More informationREPORTS IN INFORMATICS
REPORTS IN INFORMATICS ISSN 0333-3590 A class of Methods Combining L-BFGS and Truncated Newton Lennart Frimannslund Trond Steihaug REPORT NO 319 April 2006 Department of Informatics UNIVERSITY OF BERGEN
More informationStochastic geometry and random matrix theory in CS
Stochastic geometry and random matrix theory in CS IPAM: numerical methods for continuous optimization University of Edinburgh Joint with Bah, Blanchard, Cartis, and Donoho Encoder Decoder pair - Encoder/Decoder
More informationMS&E 318 (CME 338) Large-Scale Numerical Optimization
Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationOn fast trust region methods for quadratic models with linear constraints. M.J.D. Powell
DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many
More informationAlgorithms for constrained local optimization
Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained
More informationAM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality
More informationAccelerated Block-Coordinate Relaxation for Regularized Optimization
Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth
More informationPDE-Constrained and Nonsmooth Optimization
Frank E. Curtis October 1, 2009 Outline PDE-Constrained Optimization Introduction Newton s method Inexactness Results Summary and future work Nonsmooth Optimization Sequential quadratic programming (SQP)
More informationThree Generalizations of Compressed Sensing
Thomas Blumensath School of Mathematics The University of Southampton June, 2010 home prev next page Compressed Sensing and beyond y = Φx + e x R N or x C N x K is K-sparse and x x K 2 is small y R M or
More informationAlgorithms for Constrained Optimization
1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic
More informationA globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications
A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationLarge-Scale L1-Related Minimization in Compressive Sensing and Beyond
Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March
More informationCompressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery
Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad
More information