Applications with l 1 -Norm Objective Terms

Size: px
Start display at page:

Download "Applications with l 1 -Norm Objective Terms"

Transcription

1 Applications with l 1 -Norm Objective Terms Stephen Wright University of Wisconsin-Madison Huatulco, January 2007 Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

2 1 Formulation 2 Least-Squares with l 1 Applications Algorithms Results 3 Logistic Regression Application Algorithms Results Based on joint work with Weiliang Shi, Grace Wahba, Rob Nowak, Mario Figueiredo. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

3 Formulation We describe two classes of applications for problems of the form: min x f (x) + λ x 1 where x IR n ; f is convex, smooth, possibly nonlinear; λ > 0 is a regularization parameter. A special case of particular interest: 1 min x 2 Ax y λ x 1 n may be very large (hence, storage and computational limitations); l 1 norm may apply to only a subvector of x; may wish to solve for a number of λ values. Use well-known optimization techniques, tailored to structure and characteristics of the applications. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

4 Related Formulations Several related formulations are also of interest in applications: for parameter t 0. min x x 1 subject to Ax y 2 t, Several related formulations are also of interest in applications: min x x 1 subject to Ax = y, and for t > 0. min x Ax y 2 2 subject to x 1 t, Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

5 Least Squares with l 1 Regularization: Applications LASSO Wavelet-based Signal Reconstruction Compressed Sensing Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

6 LASSO The LASSO technique of Tibshirani (1996) works with formulation for some parameter t 0. min x Ax y 2 2 subject to x 1 t, When t = 0, solution is x = 0. When t x LS 1, where x LS is the (unconstrained) least-squares solution, we have x = x LS. The motive is variable selection: Seek sparse approximate solutions of Ax = b, for which x has relatively few nonzeros. (In general, smaller t implies fewer nonzeros.) Once these variables are identified, solve a reduced least squares problem in which only these variables are allowed to be nonzero. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

7 Often want to find the path of solutions, for t [0, x LS 1 ], or at least the solutions for a sample of t s along this path. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

8 Wavelet-based Signal Reconstruction Problem has the form Ax y, where A = RW : x is vector of coefficients for the unknown image or signal; W is a wavelet basis (multiplication by W performs a wavelet transform) R is the observation operator (e.g. convolution of the signal/image with a blur operator, or a tomographic projection) y is vector of observations, possibly containing errors/noise. Dimensions are large, and matrix representation of W is dense in general. Impractical to store or factor it, or multiply it by R. However, multiplications by R, R T, W, W T can be performed economically. Motivation: Want to reconstruct a signal from transmitted encoding y, given prior knowledge that x is sparse. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

9 Specific Problems in this Class W represents an orthogonal wavelet basis of dimension n: W is n n; multiplication by W or W T costs O(n), using fast wavelet transform. W represents a redundant, translation-invariant wavelet system of dimension n: W is n n(log 2 n + 1), multiplication by W or W T costs O(n log n), again using FWT. R can be a k n random sampling matrix (consisting of zeros and a few ones, or a random mix of ±1). Compressed Sensing. Linear code: W = I and columns of R are codewords. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

10 Compressed Sensing Recent theory shows that, if x is known to be sparse, then it can be reconstructed from y Ax, where A is k n random with k n, under certain conditions on A. A Representative Result. (Candès, Romberg, Tao, 2005) Given A, define δ S to be the smallest quantity for which (1 δ S ) c 2 2 A T c 2 2 (1 + δ S ) c 2 2, for all c, where A T is a column submatrix of A defined by T {1, 2,..., n} with T S. (Ensures that A T is close to orthonormal.) If δ 3S + 3δ 4S < 2, then for any signal x with at most S nonzeros and any vector y such that y A x 2 ɛ, the solution of min x x 1 subject to Ax y 2 ɛ satisfies x x 2 C S ɛ, where C S is a constant depending only on δ 4S. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

11 Algorithms: Least Squares with l 1 1 min x 2 Ax y λ x 1. Formulate as bound-constrained least squares by splitting and writing min u 0,v 0 x = u v, (u, v) 0, 1 2 A(u v) y λ1 T u + λ1 T v. For signal processing applications, we ve had good success with gradient-projection algorithms. We ll describe these first, then discuss alternatives and indicate why they are less suitable. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

12 Basic Gradient Projection Writing objective as F (u, v), the problem is min u 0,v 0 F (u, v). We have Main costs: u,v F (u, v) = [ A T A(u v) A T ] y + λ1 A T A(u v) + A T. y + λ1 Evaluation of F (u, v): one multiplication by A; Evaluation of u,v F (u, v): one additional multiplication by A T. Choose search direction (δ u, δ v ) = ( u F, v F ); Look along path (u + αδ u, v + αδ v ) +, α > 0, where ( ) + denotes projection onto the nonnegative orthant. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

13 First try α = α 0, the unconstrained minimizer of F along this direction, which is given by (Cost: One multiplication by A.) α 0 = (δ u, δ v ) 2 2 A(δ u δ v ) 2. 2 Armijo backtracking: Choose the first α in the sequence α 0, βα 0, β 2 α 0, satisfying a sufficient decrease condition: F ((u + αδ u, v + αδ v ) + ) F (u, v).001(α/α 0 ) F (u, v) T [(u, v) (u + α 0 δ u, v + α 0 δ v ) + ]. Set (u, v) (u + αδ u, v + αδ v ) + Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

14 Termination It s often not practical to iterate until the optimal active set is identified. However, since our main interest is in approximately identifying the correct active set, we use a criterion based on this. Specifically, terminate when the relative change to I k def = {i u k i > 0 or v k i > 0} falls below tola. (We use tola=.02.) Tried GPCG (Moré and Toraldo, 1991), which alternates GP steps with CG exploration of a fixed working set, but it doesn t work well as the restriction of the objective to the current working set usually has a singular Hessian. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

15 Debiasing After finding an approximate solution, we follow with a debiasing step, in which the zero elements of x = u v are discarded and we perform an unconstrained minimization of Ax y 2 2 over the nonzero elements, using conjugate gradient. Terminate this phase after decreasing the gradient A T (Ax y) by factor of Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

16 Barzilai-Borwein Variants Consider a modification of GP based on method of Barzilai&Borwein (1988) for unconstrained nonlinear minimiation; modified by Dai&Fletcher (2005) for box-constrained QP. The approach is non-monotone. Motivation in terms of min x f (x): Let s and y be change in x and f over the last step: s = x k x k 1, y = f (x k ) f (x k 1 ), choose α so that αs y in the least-squares sense (so that 2 f αi in some sense) and set Can compute α trivially and obtain x k+1 = x k 1 α f (x k). x k+1 = x k st s s T y f (x k). Obtain a variant by choosing α so that s αy and setting x k+1 = x k α f (x k ). Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

17 Modifications for QP See Dai&Fletcher, modified for least-squares objective and with an additional line search. Compute step (δ u, δ v ) = (u α u F, v α v F ) + (u, v), Perform exact line search to minimize F along the line segment from (u, v) to (u, v) + λ(δ u, δ v ); Set for the next iteration. α = (δ u, δ v ) 2 2 A(δ u δ v ) 2 2 Cheap! total of two multiplications by A or A T at each iteration. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

18 In some applications, theory suggests an appropriate value of regularization parameter λ. But often need to search around this value and solve for a range of λ values. Since gradient projection approaches benefit from a good starting point, can simply use the approximate solution for one λ as the starting point for a nearby λ. Solve for preassigned sequence of λs, in increasing order. (The set of nonzero x components shrinks as λ increases.) Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

19 Alternative Approach: Active Set/Pivoting Solve the equivalent problem with objective Ax y 2 2 and constraint x 1 t by starting with t = 0 (solution x = 0) and proceeding to t = x LS 1 (solution x = x LS ). Determine breakpoints values of λ at which a component of x changes from zero to nonzero or vice versa. Use pivoting operations to update x at these values for the new active set. See Osborne et al (2000), Efron et al (2003). Need to be able to factor A and submatrices of A, hence unsuitable for problems where A is not known explicitly. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

20 Alternative Approach: Interior-Point Could apply a primal-dual method to the bound-constrained formulation, solving the linear system at each IP iteration by CG or some variant. Each inner iteration requires multiplications by A and A T, but not explicit knowledge of A. Application of this approach to our problem is one of the approaches described in the basis pursuit paper of Chen, Donoho, Saunders (1998). See also Saunders (2002) and his PDCO / SolveBP code. Not very good at solving for multiple λ values due to the usual difficulty of warm-starting interior-point methods (though probably easier here because of simplicity of the constraints). Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

21 Alternative Approach: Bound Optimization Solve an approximate QP in which Hessian A T A is replaced by a diagonal approximation D, for which A T A D: 1 min x 2 (x x k) T D(x x k ) + (x x k ) T A T (Ax k y) + λ x 1, where x k is the previous iteration. Can solve in closed form to get new iterate x k+1. (If x k = x optimal, solution is x = x.) For applications of interest, the price paid by ignoring off-diagonal information is too high, and convergence is slow. (Due to Figueiredo, Nowak and others.) Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

22 Alternative Approach: Second-Order Cone Applies to formulation for parameter t 0. min x x 1 subject to Ax y 2 t, In l1magic code (Candès, Romberg), recast as a second-order cone program and solved by a primal log-barrier / Newton / CG approach using the usual barrier term for the constraint. µ log(t 2 Ax y 2 2) Again not good at solving for multiple t values. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

23 Results: Compressed Sensing Small, simple, explicit problem to evaluate different algorithms. min 1 2 y Rx λ x 1, R is , dense, elements chosen independently from N(0, 1), then rows are normalized. Choose x i = 0 with prob.99, x i Uniform[ 1, 1] with prob.01. Choose y = Rx + e where e i N(0,.005). Solve for just a single value of τ. Compare several algorithms: Gradient Projection: Basic and Barzilai-Borwein l1-magic: SOCP formulation SparseLab: basis pursuit / interior-point A bound optimization algorithm Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

24 GP reconstructs the signal well (compared to least-squares solution). (Note some attenuation due to the x 1 term.) 1 Original 0! Reconstruction (details: n = 4096, k = 512, sigma = 0.005, tau = MSE = e!005! MSE = Pseudo!solution! Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

25 Barzilai-Borwein version is faster than basic GP: GPD!Basic GPD!BB Objective function CPU time Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

26 As problem size increases, BB beats the competition: 10 3 CPU time (seconds) GPD!BB SparseLab l1!magic BOA ! n Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

27 Debiasing removes the attenuation due to the x 1 term. & '()*)+,-!!&! "!! #!! $!! %!! &!!! &"!! &#!! &$!! &%!! "!!!./01+23(403) :5!;!!#<=> &!!&! "!! #!! $!! %!! &!!! &"!! &#!! &$!! &%!! "!!!?/@),2/A5(/01+23(403) :5#;=$/!!!$> &!!&! "!! #!! $!! %!! &!!! &"!! &#!! &$!! &%!! "!!! Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

28 Application: Logistic Regression Have n subjects with attribute vectors x(i), i = 1, 2,..., n and labels (y 1 (i), y 2 (i),..., y K (i)), where K =number of classes. y r (i) = 1 if subject i is in class r and y r (i) = 0 otherwise. Express the probability that some x = x(i) is in class k by means of functions p k (x) and f k (x), related by p k (x) = Express f k in terms of basis functions f k (x) = exp f k (x) K j=1 exp f j(x). N c kl B l (x), k = 1, 2,..., K, l=0 with coefficients c kl, k = 1, 2,..., K and l = 0, 1, 2,..., N to be determined from the optimization. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

29 Typical dimensions: K [2, 10], n possibly , N possibly exponential in number of features (dimension of each x(i)). Seek solutions for which only a small fraction of the c kl are nonzero these identify the most significant basis functions B l. Log-likelihood function: Given x(1), x(2),..., x(n) and labels y(1), y(2),..., y(n), find the optimal functions f k, k = 1, 2,..., K by minimizing L(c) = 1 n K y k (i) log p k (x(i)) n i=1 k=1 (Recall: the p k depend on f k, which depend on the c kl.) Introduce LASSO regularization term: J(c) = K k=1 l=1 N c kl (NB: no penalty for c k0, which corresponds to basis function B 0 (x) 1.) Minimize the function T λ (c) = L(c) + λj(c). Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

30 Two-Category Variant When K = 2 the formulation simplifies. Can WLOG set f 1 (x) 0, i.e. c 1,l = 0, l = 0, 1,..., N (since if we apply the same shift to c 1,l and c 2,l, the functions p 1 (x) and p 2 (x) do not change). Define c l def = c 2,l, l = 0, 1, 2,..., N, and obtain L(c) = 1 n [ ( ( n N N ))] y 2 (i) c l B l (x(i)) + log 1 + exp c l B l (x(i)) i=1 l=0 l=0 and J(c) = N c l. l=1 Minimize T λ (c) = L(c) + λj(c) for a number of different λ values; choose the most suitable λ in an outer loop: Generalized Approximate Cross-Validation (GACV). Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

31 PatternSearch: Beaver Dam Data n = 876 =number of persons in study. Each x(i) is a zero-one vector of 7 features/risk factors: Risk factor 0 1 sex female male income > $30, 000 $30, 000 juvenile myopia myopic after age 21 myopic before age 21 cataract severity 1,2,3 4,5 smoking packs years 30 > 30 aspirin no yes vitamins no yes Find combinations of factors, as well as individual factors, that predict progression of myopia. Define N = 2 7 and basis functions B i1,i 2,...,i 7 (x) = x j. j:i j =1 This function is 1 if x j = 1 for all j with i j = 1, and zero otherwise. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

32 PatternSearch: Rheumatoid Arthritis and SNPs Want to predict likelihood that an individual is susceptible to rheumatoid arthritis based on genetic variations, plus some environmental factors. SNP: variation in a single nucleotide in the genome sequence. e.g. AAGGC changes to ATGGC. This version has two alleles, A and T. The less frequent nucleotides are called minor alleles. It s observed that rheumatoid arthritis is associated with SNPs on chromosome 6. Include 9787 nucleotides, mostly on chromosome 6, in the feature vector x, where the relevant component of x is coded as 0,1,2 according to whether it contains the most common nucleotide or one of the minor alleles. x also contains coding of a variation of DR type at the HLA locus of chromosome 6. x also contains variables for gender (female=1), smoking (yes=1), and age (older than 55 = 1). Total of 9792 x components. Only two categories (K = 2). Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

33 Might like to examine all possible interactions but this would involve solving a problem with unknowns. Instead, do multiple rounds of prescreening. 795 of the 9792 individual variables survived the first round. (Actually, 880 variables survive, as for some variables more than one level was of interest.) After screening for interactions between pairs of these 795 variables, obtained 1679 interactions of possible interest. Then solve a max-likelihood problem with 2559 = variables. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

34 Algorithm Use variable splitting again (c = c + c ), and write the problem as min T λ(c + c ) = L(c + c ) + λ1 T c + + λ1 T c. u 0,v 0 (Ignore the unconstrained component of c for simplicity.) Recall that L(c) has the form [ ] L(c) = 1 n N y 2 (i) c l B l (x(i)) + log (1 + F (x(i); c)), n where i=1 l=0 ( N ) F (x; c) = exp c l B l (x). It s relatively expensive to evaluate F (x(i); c) for i = 1, 2,..., n. However once these quantities are known, the gradient L(c) is cheap and the Hessian is uncomplicated (though dense). l=0 Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

35 Two-Metric Gradient Projection Use gradient projection again, with two-metric scaling in the free components (Bertsekas, 1982). At each iterate x k for min x 0 f (x): Calculate f (x k ) and use it to form an estimate of the free variable set I k. Exclude i from I k if xi k is close to zero and f / x i > 0. Calculate the partial Hessian corresponding to I k : [ 2 f (x k ] ) H Ik = x i x j Form search direction p k by i I k,j I k. pi k f = τ k, i / I k x i for some scale factor τ k, and p k I k = (H Ik + ɛ k I ) 1 Ik f (x k ). Do Armijo backtracking line search along (x k + αp k ) +, α = 1, 1 2, 1 4, 1 8,.... Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

36 Why 2MGP? For interesting λ values, very few nonzero components of c at the solution, so have to compute only a small submatrix of the Hessian (and it s cheap). Use of the reduced Hessian (with damping) accelerates the method greatly over plain gradient projection. Strategies that need more of the Hessian are not practical because it s dense and very large. Strategies that use CG on the reduced Hessian unnecessary because it s small and easily calculated. It s faster than Matlab s fmincon! Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

37 Results: Myopia Progression Choose λ by GACV. For the selected λ, 13 of the 128 possible combined rsk factors are selected. These are subjected to further analysis ( Step 2 ) and 5 factors survive. Coefficients for f 2 : pattern coefficient constant cataract 2.42 smoking, no vitamins 1.18 male, low income, juv. myopia, no aspirin 1.84 male, low income, cataract, no aspirin 1.08 Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

38 Thanks! Thanks to the Organizers! Thanks for listening! Thanks to Dramamine! Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38

Optimization Algorithms for Compressed Sensing

Optimization Algorithms for Compressed Sensing Optimization Algorithms for Compressed Sensing Stephen Wright University of Wisconsin-Madison SIAM Gator Student Conference, Gainesville, March 2009 Stephen Wright (UW-Madison) Optimization and Compressed

More information

Sparse Optimization: Algorithms and Applications. Formulating Sparse Optimization. Motivation. Stephen Wright. Caltech, 21 April 2007

Sparse Optimization: Algorithms and Applications. Formulating Sparse Optimization. Motivation. Stephen Wright. Caltech, 21 April 2007 Sparse Optimization: Algorithms and Applications Stephen Wright 1 Motivation and Introduction 2 Compressed Sensing Algorithms University of Wisconsin-Madison Caltech, 21 April 2007 3 Image Processing +Mario

More information

Mathematical Programming in Machine Learning and Data Mining January 14-19, 2007 Banff International Research Station. Grace Wahba

Mathematical Programming in Machine Learning and Data Mining January 14-19, 2007 Banff International Research Station. Grace Wahba Mathematical Programming in Machine Learning and Data Mining January 14-19, 27 Banff International Research Station Grace Wahba On Consuming Mathematical Programming: Selection of High Order Patterns in

More information

Basis Pursuit Denoising and the Dantzig Selector

Basis Pursuit Denoising and the Dantzig Selector BPDN and DS p. 1/16 Basis Pursuit Denoising and the Dantzig Selector West Coast Optimization Meeting University of Washington Seattle, WA, April 28 29, 2007 Michael Friedlander and Michael Saunders Dept

More information

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Accelerated Block-Coordinate Relaxation for Regularized Optimization Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth

More information

Johns Hopkins Bloomberg School of Public Health Department of Biostatistics. The LASSO-Patternsearch Algorithm: Finding patterns in a haystack

Johns Hopkins Bloomberg School of Public Health Department of Biostatistics. The LASSO-Patternsearch Algorithm: Finding patterns in a haystack Johns Hopkins Bloomberg School of Public Health Department of Biostatistics The LASSO-Patternsearch Algorithm: Finding patterns in a haystack Grace Wahba Joint work with Weiliang Shi, Steve Wright, Kristine

More information

Linear Programming: Simplex

Linear Programming: Simplex Linear Programming: Simplex Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Linear Programming: Simplex IMA, August 2016

More information

Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems

Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems SUBMITTED FOR PUBLICATION; 27. Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems Mário A. T. Figueiredo, Robert D. Nowak, Stephen J. Wright Abstract

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Convex Optimization and l 1 -minimization

Convex Optimization and l 1 -minimization Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Elaine T. Hale, Wotao Yin, Yin Zhang

Elaine T. Hale, Wotao Yin, Yin Zhang , Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

Lecture 17: October 27

Lecture 17: October 27 0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

LASSO-Patternsearch Algorithm with Application to Ophthalmology and Genomic Data

LASSO-Patternsearch Algorithm with Application to Ophthalmology and Genomic Data DEPARTMENT OF STATISTICS University of Wisconsin 1300 University Ave. Madison, WI 53706 TECHNICAL REPORT NO. 1141 January 2, 2008 LASSO-Patternsearch Algorithm with Application to Ophthalmology and Genomic

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Solving l 1 Regularized Least Square Problems with Hierarchical Decomposition

Solving l 1 Regularized Least Square Problems with Hierarchical Decomposition Solving l 1 Least Square s with 1 mzhong1@umd.edu 1 AMSC and CSCAMM University of Maryland College Park Project for AMSC 663 October 2 nd, 2012 Outline 1 The 2 Outline 1 The 2 Compressed Sensing Example

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

LASSO-Patternsearch Algorithm with Application to Ophthalmology and Genomic Data 1

LASSO-Patternsearch Algorithm with Application to Ophthalmology and Genomic Data 1 LASSO-Patternsearch Algorithm with Application to Ophthalmology and Genomic Data 1 Weiliang Shi 2 shiw@stat.wisc.edu Department of Statistics, University of Wisconsin 1300 University Avenue, Madison WI

More information

Gradient based method for cone programming with application to large-scale compressed sensing

Gradient based method for cone programming with application to large-scale compressed sensing Gradient based method for cone programming with application to large-scale compressed sensing Zhaosong Lu September 3, 2008 (Revised: September 17, 2008) Abstract In this paper, we study a gradient based

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

Computing approximate PageRank vectors by Basis Pursuit Denoising

Computing approximate PageRank vectors by Basis Pursuit Denoising Computing approximate PageRank vectors by Basis Pursuit Denoising Michael Saunders Systems Optimization Laboratory, Stanford University Joint work with Holly Jin, LinkedIn Corp SIAM Annual Meeting San

More information

Sparse Optimization Lecture: Basic Sparse Optimization Models

Sparse Optimization Lecture: Basic Sparse Optimization Models Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

Bias-free Sparse Regression with Guaranteed Consistency

Bias-free Sparse Regression with Guaranteed Consistency Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March

More information

Sparse & Redundant Representations by Iterated-Shrinkage Algorithms

Sparse & Redundant Representations by Iterated-Shrinkage Algorithms Sparse & Redundant Representations by Michael Elad * The Computer Science Department The Technion Israel Institute of technology Haifa 3000, Israel 6-30 August 007 San Diego Convention Center San Diego,

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Pathwise coordinate optimization

Pathwise coordinate optimization Stanford University 1 Pathwise coordinate optimization Jerome Friedman, Trevor Hastie, Holger Hoefling, Robert Tibshirani Stanford University Acknowledgements: Thanks to Stephen Boyd, Michael Saunders,

More information

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 6, JUNE

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 6, JUNE IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 6, JUNE 2010 2935 Variance-Component Based Sparse Signal Reconstruction and Model Selection Kun Qiu, Student Member, IEEE, and Aleksandar Dogandzic,

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization Quasi-Newton Methods Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization 10-725 Last time: primal-dual interior-point methods Given the problem min x f(x) subject to h(x)

More information

Parallel Coordinate Optimization

Parallel Coordinate Optimization 1 / 38 Parallel Coordinate Optimization Julie Nutini MLRG - Spring Term March 6 th, 2018 2 / 38 Contours of a function F : IR 2 IR. Goal: Find the minimizer of F. Coordinate Descent in 2D Contours of a

More information

Optimality, Duality, Complementarity for Constrained Optimization

Optimality, Duality, Complementarity for Constrained Optimization Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear

More information

10-725/ Optimization Midterm Exam

10-725/ Optimization Midterm Exam 10-725/36-725 Optimization Midterm Exam November 6, 2012 NAME: ANDREW ID: Instructions: This exam is 1hr 20mins long Except for a single two-sided sheet of notes, no other material or discussion is permitted

More information

Primal-Dual First-Order Methods for a Class of Cone Programming

Primal-Dual First-Order Methods for a Class of Cone Programming Primal-Dual First-Order Methods for a Class of Cone Programming Zhaosong Lu March 9, 2011 Abstract In this paper we study primal-dual first-order methods for a class of cone programming problems. In particular,

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

Reduced-Hessian Methods for Constrained Optimization

Reduced-Hessian Methods for Constrained Optimization Reduced-Hessian Methods for Constrained Optimization Philip E. Gill University of California, San Diego Joint work with: Michael Ferry & Elizabeth Wong 11th US & Mexico Workshop on Optimization and its

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Making Flippy Floppy

Making Flippy Floppy Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Current

More information

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Clément Royer - University of Wisconsin-Madison Joint work with Stephen J. Wright MOPTA, Bethlehem,

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

Projection methods to solve SDP

Projection methods to solve SDP Projection methods to solve SDP Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria F. Rendl, Oberwolfach Seminar, May 2010 p.1/32 Overview Augmented Primal-Dual Method

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Constrained Optimization Theory

Constrained Optimization Theory Constrained Optimization Theory Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

A Coordinate Gradient Descent Method for l 1 -regularized Convex Minimization

A Coordinate Gradient Descent Method for l 1 -regularized Convex Minimization A Coordinate Gradient Descent Method for l 1 -regularized Convex Minimization Sangwoon Yun, and Kim-Chuan Toh January 30, 2009 Abstract In applications such as signal processing and statistics, many problems

More information

Quick Introduction to Nonnegative Matrix Factorization

Quick Introduction to Nonnegative Matrix Factorization Quick Introduction to Nonnegative Matrix Factorization Norm Matloff University of California at Davis 1 The Goal Given an u v matrix A with nonnegative elements, we wish to find nonnegative, rank-k matrices

More information

Department of Computer Science, University of British Columbia Technical Report TR , January 2008

Department of Computer Science, University of British Columbia Technical Report TR , January 2008 Department of Computer Science, University of British Columbia Technical Report TR-2008-01, January 2008 PROBING THE PARETO FRONTIER FOR BASIS PURSUIT SOLUTIONS EWOUT VAN DEN BERG AND MICHAEL P. FRIEDLANDER

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers

More information

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep

More information

Optimization for Compressed Sensing

Optimization for Compressed Sensing Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification

A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification JMLR: Workshop and Conference Proceedings 1 16 A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification Chih-Yang Hsia r04922021@ntu.edu.tw Dept. of Computer Science,

More information

A DECOMPOSITION PROCEDURE BASED ON APPROXIMATE NEWTON DIRECTIONS

A DECOMPOSITION PROCEDURE BASED ON APPROXIMATE NEWTON DIRECTIONS Working Paper 01 09 Departamento de Estadística y Econometría Statistics and Econometrics Series 06 Universidad Carlos III de Madrid January 2001 Calle Madrid, 126 28903 Getafe (Spain) Fax (34) 91 624

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization Stanford University, Management Science & Engineering (and ICME MS&E 38 (CME 338 Large-Scale Numerical Optimization Course description Instructor: Michael Saunders Spring 28 Notes : Review The course teaches

More information

Linear algebra issues in Interior Point methods for bound-constrained least-squares problems

Linear algebra issues in Interior Point methods for bound-constrained least-squares problems Linear algebra issues in Interior Point methods for bound-constrained least-squares problems Stefania Bellavia Dipartimento di Energetica S. Stecco Università degli Studi di Firenze Joint work with Jacek

More information

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles Or: the equation Ax = b, revisited University of California, Los Angeles Mahler Lecture Series Acquiring signals Many types of real-world signals (e.g. sound, images, video) can be viewed as an n-dimensional

More information

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS Martin Kleinsteuber and Simon Hawe Department of Electrical Engineering and Information Technology, Technische Universität München, München, Arcistraße

More information

A discretized Newton flow for time varying linear inverse problems

A discretized Newton flow for time varying linear inverse problems A discretized Newton flow for time varying linear inverse problems Martin Kleinsteuber and Simon Hawe Department of Electrical Engineering and Information Technology, Technische Universität München Arcisstrasse

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Optimization in Machine Learning

Optimization in Machine Learning Optimization in Machine Learning Stephen Wright University of Wisconsin-Madison Singapore, 14 Dec 2012 Stephen Wright (UW-Madison) Optimization Singapore, 14 Dec 2012 1 / 48 1 Learning from Data: Applications

More information

Algorithms for constrained local optimization

Algorithms for constrained local optimization Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm

Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm Jeevan K. Pant, Wu-Sheng Lu, and Andreas Antoniou University of Victoria May 21, 2012 Compressive Sensing 1/23

More information

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants

More information

Primal-Dual Interior-Point Methods. Ryan Tibshirani Convex Optimization /36-725

Primal-Dual Interior-Point Methods. Ryan Tibshirani Convex Optimization /36-725 Primal-Dual Interior-Point Methods Ryan Tibshirani Convex Optimization 10-725/36-725 Given the problem Last time: barrier method min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h i, i = 1,...

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

An Homotopy Algorithm for the Lasso with Online Observations

An Homotopy Algorithm for the Lasso with Online Observations An Homotopy Algorithm for the Lasso with Online Observations Pierre J. Garrigues Department of EECS Redwood Center for Theoretical Neuroscience University of California Berkeley, CA 94720 garrigue@eecs.berkeley.edu

More information

A Fixed-Point Continuation Method for l 1 -Regularized Minimization with Applications to Compressed Sensing

A Fixed-Point Continuation Method for l 1 -Regularized Minimization with Applications to Compressed Sensing CAAM Technical Report TR07-07 A Fixed-Point Continuation Method for l 1 -Regularized Minimization with Applications to Compressed Sensing Elaine T. Hale, Wotao Yin, and Yin Zhang Department of Computational

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

Interior-Point Methods for Linear Optimization

Interior-Point Methods for Linear Optimization Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function

More information

Sparse and Regularized Optimization

Sparse and Regularized Optimization Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity

More information

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

Homotopy methods based on l 0 norm for the compressed sensing problem

Homotopy methods based on l 0 norm for the compressed sensing problem Homotopy methods based on l 0 norm for the compressed sensing problem Wenxing Zhu, Zhengshan Dong Center for Discrete Mathematics and Theoretical Computer Science, Fuzhou University, Fuzhou 350108, China

More information

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION By Mazin Abdulrasool Hameed A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for

More information

Homework 5. Convex Optimization /36-725

Homework 5. Convex Optimization /36-725 Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL) Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where

More information