Applications with l 1 -Norm Objective Terms
|
|
- Theodore Randall
- 6 years ago
- Views:
Transcription
1 Applications with l 1 -Norm Objective Terms Stephen Wright University of Wisconsin-Madison Huatulco, January 2007 Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
2 1 Formulation 2 Least-Squares with l 1 Applications Algorithms Results 3 Logistic Regression Application Algorithms Results Based on joint work with Weiliang Shi, Grace Wahba, Rob Nowak, Mario Figueiredo. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
3 Formulation We describe two classes of applications for problems of the form: min x f (x) + λ x 1 where x IR n ; f is convex, smooth, possibly nonlinear; λ > 0 is a regularization parameter. A special case of particular interest: 1 min x 2 Ax y λ x 1 n may be very large (hence, storage and computational limitations); l 1 norm may apply to only a subvector of x; may wish to solve for a number of λ values. Use well-known optimization techniques, tailored to structure and characteristics of the applications. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
4 Related Formulations Several related formulations are also of interest in applications: for parameter t 0. min x x 1 subject to Ax y 2 t, Several related formulations are also of interest in applications: min x x 1 subject to Ax = y, and for t > 0. min x Ax y 2 2 subject to x 1 t, Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
5 Least Squares with l 1 Regularization: Applications LASSO Wavelet-based Signal Reconstruction Compressed Sensing Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
6 LASSO The LASSO technique of Tibshirani (1996) works with formulation for some parameter t 0. min x Ax y 2 2 subject to x 1 t, When t = 0, solution is x = 0. When t x LS 1, where x LS is the (unconstrained) least-squares solution, we have x = x LS. The motive is variable selection: Seek sparse approximate solutions of Ax = b, for which x has relatively few nonzeros. (In general, smaller t implies fewer nonzeros.) Once these variables are identified, solve a reduced least squares problem in which only these variables are allowed to be nonzero. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
7 Often want to find the path of solutions, for t [0, x LS 1 ], or at least the solutions for a sample of t s along this path. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
8 Wavelet-based Signal Reconstruction Problem has the form Ax y, where A = RW : x is vector of coefficients for the unknown image or signal; W is a wavelet basis (multiplication by W performs a wavelet transform) R is the observation operator (e.g. convolution of the signal/image with a blur operator, or a tomographic projection) y is vector of observations, possibly containing errors/noise. Dimensions are large, and matrix representation of W is dense in general. Impractical to store or factor it, or multiply it by R. However, multiplications by R, R T, W, W T can be performed economically. Motivation: Want to reconstruct a signal from transmitted encoding y, given prior knowledge that x is sparse. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
9 Specific Problems in this Class W represents an orthogonal wavelet basis of dimension n: W is n n; multiplication by W or W T costs O(n), using fast wavelet transform. W represents a redundant, translation-invariant wavelet system of dimension n: W is n n(log 2 n + 1), multiplication by W or W T costs O(n log n), again using FWT. R can be a k n random sampling matrix (consisting of zeros and a few ones, or a random mix of ±1). Compressed Sensing. Linear code: W = I and columns of R are codewords. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
10 Compressed Sensing Recent theory shows that, if x is known to be sparse, then it can be reconstructed from y Ax, where A is k n random with k n, under certain conditions on A. A Representative Result. (Candès, Romberg, Tao, 2005) Given A, define δ S to be the smallest quantity for which (1 δ S ) c 2 2 A T c 2 2 (1 + δ S ) c 2 2, for all c, where A T is a column submatrix of A defined by T {1, 2,..., n} with T S. (Ensures that A T is close to orthonormal.) If δ 3S + 3δ 4S < 2, then for any signal x with at most S nonzeros and any vector y such that y A x 2 ɛ, the solution of min x x 1 subject to Ax y 2 ɛ satisfies x x 2 C S ɛ, where C S is a constant depending only on δ 4S. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
11 Algorithms: Least Squares with l 1 1 min x 2 Ax y λ x 1. Formulate as bound-constrained least squares by splitting and writing min u 0,v 0 x = u v, (u, v) 0, 1 2 A(u v) y λ1 T u + λ1 T v. For signal processing applications, we ve had good success with gradient-projection algorithms. We ll describe these first, then discuss alternatives and indicate why they are less suitable. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
12 Basic Gradient Projection Writing objective as F (u, v), the problem is min u 0,v 0 F (u, v). We have Main costs: u,v F (u, v) = [ A T A(u v) A T ] y + λ1 A T A(u v) + A T. y + λ1 Evaluation of F (u, v): one multiplication by A; Evaluation of u,v F (u, v): one additional multiplication by A T. Choose search direction (δ u, δ v ) = ( u F, v F ); Look along path (u + αδ u, v + αδ v ) +, α > 0, where ( ) + denotes projection onto the nonnegative orthant. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
13 First try α = α 0, the unconstrained minimizer of F along this direction, which is given by (Cost: One multiplication by A.) α 0 = (δ u, δ v ) 2 2 A(δ u δ v ) 2. 2 Armijo backtracking: Choose the first α in the sequence α 0, βα 0, β 2 α 0, satisfying a sufficient decrease condition: F ((u + αδ u, v + αδ v ) + ) F (u, v).001(α/α 0 ) F (u, v) T [(u, v) (u + α 0 δ u, v + α 0 δ v ) + ]. Set (u, v) (u + αδ u, v + αδ v ) + Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
14 Termination It s often not practical to iterate until the optimal active set is identified. However, since our main interest is in approximately identifying the correct active set, we use a criterion based on this. Specifically, terminate when the relative change to I k def = {i u k i > 0 or v k i > 0} falls below tola. (We use tola=.02.) Tried GPCG (Moré and Toraldo, 1991), which alternates GP steps with CG exploration of a fixed working set, but it doesn t work well as the restriction of the objective to the current working set usually has a singular Hessian. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
15 Debiasing After finding an approximate solution, we follow with a debiasing step, in which the zero elements of x = u v are discarded and we perform an unconstrained minimization of Ax y 2 2 over the nonzero elements, using conjugate gradient. Terminate this phase after decreasing the gradient A T (Ax y) by factor of Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
16 Barzilai-Borwein Variants Consider a modification of GP based on method of Barzilai&Borwein (1988) for unconstrained nonlinear minimiation; modified by Dai&Fletcher (2005) for box-constrained QP. The approach is non-monotone. Motivation in terms of min x f (x): Let s and y be change in x and f over the last step: s = x k x k 1, y = f (x k ) f (x k 1 ), choose α so that αs y in the least-squares sense (so that 2 f αi in some sense) and set Can compute α trivially and obtain x k+1 = x k 1 α f (x k). x k+1 = x k st s s T y f (x k). Obtain a variant by choosing α so that s αy and setting x k+1 = x k α f (x k ). Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
17 Modifications for QP See Dai&Fletcher, modified for least-squares objective and with an additional line search. Compute step (δ u, δ v ) = (u α u F, v α v F ) + (u, v), Perform exact line search to minimize F along the line segment from (u, v) to (u, v) + λ(δ u, δ v ); Set for the next iteration. α = (δ u, δ v ) 2 2 A(δ u δ v ) 2 2 Cheap! total of two multiplications by A or A T at each iteration. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
18 In some applications, theory suggests an appropriate value of regularization parameter λ. But often need to search around this value and solve for a range of λ values. Since gradient projection approaches benefit from a good starting point, can simply use the approximate solution for one λ as the starting point for a nearby λ. Solve for preassigned sequence of λs, in increasing order. (The set of nonzero x components shrinks as λ increases.) Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
19 Alternative Approach: Active Set/Pivoting Solve the equivalent problem with objective Ax y 2 2 and constraint x 1 t by starting with t = 0 (solution x = 0) and proceeding to t = x LS 1 (solution x = x LS ). Determine breakpoints values of λ at which a component of x changes from zero to nonzero or vice versa. Use pivoting operations to update x at these values for the new active set. See Osborne et al (2000), Efron et al (2003). Need to be able to factor A and submatrices of A, hence unsuitable for problems where A is not known explicitly. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
20 Alternative Approach: Interior-Point Could apply a primal-dual method to the bound-constrained formulation, solving the linear system at each IP iteration by CG or some variant. Each inner iteration requires multiplications by A and A T, but not explicit knowledge of A. Application of this approach to our problem is one of the approaches described in the basis pursuit paper of Chen, Donoho, Saunders (1998). See also Saunders (2002) and his PDCO / SolveBP code. Not very good at solving for multiple λ values due to the usual difficulty of warm-starting interior-point methods (though probably easier here because of simplicity of the constraints). Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
21 Alternative Approach: Bound Optimization Solve an approximate QP in which Hessian A T A is replaced by a diagonal approximation D, for which A T A D: 1 min x 2 (x x k) T D(x x k ) + (x x k ) T A T (Ax k y) + λ x 1, where x k is the previous iteration. Can solve in closed form to get new iterate x k+1. (If x k = x optimal, solution is x = x.) For applications of interest, the price paid by ignoring off-diagonal information is too high, and convergence is slow. (Due to Figueiredo, Nowak and others.) Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
22 Alternative Approach: Second-Order Cone Applies to formulation for parameter t 0. min x x 1 subject to Ax y 2 t, In l1magic code (Candès, Romberg), recast as a second-order cone program and solved by a primal log-barrier / Newton / CG approach using the usual barrier term for the constraint. µ log(t 2 Ax y 2 2) Again not good at solving for multiple t values. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
23 Results: Compressed Sensing Small, simple, explicit problem to evaluate different algorithms. min 1 2 y Rx λ x 1, R is , dense, elements chosen independently from N(0, 1), then rows are normalized. Choose x i = 0 with prob.99, x i Uniform[ 1, 1] with prob.01. Choose y = Rx + e where e i N(0,.005). Solve for just a single value of τ. Compare several algorithms: Gradient Projection: Basic and Barzilai-Borwein l1-magic: SOCP formulation SparseLab: basis pursuit / interior-point A bound optimization algorithm Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
24 GP reconstructs the signal well (compared to least-squares solution). (Note some attenuation due to the x 1 term.) 1 Original 0! Reconstruction (details: n = 4096, k = 512, sigma = 0.005, tau = MSE = e!005! MSE = Pseudo!solution! Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
25 Barzilai-Borwein version is faster than basic GP: GPD!Basic GPD!BB Objective function CPU time Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
26 As problem size increases, BB beats the competition: 10 3 CPU time (seconds) GPD!BB SparseLab l1!magic BOA ! n Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
27 Debiasing removes the attenuation due to the x 1 term. & '()*)+,-!!&! "!! #!! $!! %!! &!!! &"!! &#!! &$!! &%!! "!!!./01+23(403) :5!;!!#<=> &!!&! "!! #!! $!! %!! &!!! &"!! &#!! &$!! &%!! "!!!?/@),2/A5(/01+23(403) :5#;=$/!!!$> &!!&! "!! #!! $!! %!! &!!! &"!! &#!! &$!! &%!! "!!! Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
28 Application: Logistic Regression Have n subjects with attribute vectors x(i), i = 1, 2,..., n and labels (y 1 (i), y 2 (i),..., y K (i)), where K =number of classes. y r (i) = 1 if subject i is in class r and y r (i) = 0 otherwise. Express the probability that some x = x(i) is in class k by means of functions p k (x) and f k (x), related by p k (x) = Express f k in terms of basis functions f k (x) = exp f k (x) K j=1 exp f j(x). N c kl B l (x), k = 1, 2,..., K, l=0 with coefficients c kl, k = 1, 2,..., K and l = 0, 1, 2,..., N to be determined from the optimization. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
29 Typical dimensions: K [2, 10], n possibly , N possibly exponential in number of features (dimension of each x(i)). Seek solutions for which only a small fraction of the c kl are nonzero these identify the most significant basis functions B l. Log-likelihood function: Given x(1), x(2),..., x(n) and labels y(1), y(2),..., y(n), find the optimal functions f k, k = 1, 2,..., K by minimizing L(c) = 1 n K y k (i) log p k (x(i)) n i=1 k=1 (Recall: the p k depend on f k, which depend on the c kl.) Introduce LASSO regularization term: J(c) = K k=1 l=1 N c kl (NB: no penalty for c k0, which corresponds to basis function B 0 (x) 1.) Minimize the function T λ (c) = L(c) + λj(c). Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
30 Two-Category Variant When K = 2 the formulation simplifies. Can WLOG set f 1 (x) 0, i.e. c 1,l = 0, l = 0, 1,..., N (since if we apply the same shift to c 1,l and c 2,l, the functions p 1 (x) and p 2 (x) do not change). Define c l def = c 2,l, l = 0, 1, 2,..., N, and obtain L(c) = 1 n [ ( ( n N N ))] y 2 (i) c l B l (x(i)) + log 1 + exp c l B l (x(i)) i=1 l=0 l=0 and J(c) = N c l. l=1 Minimize T λ (c) = L(c) + λj(c) for a number of different λ values; choose the most suitable λ in an outer loop: Generalized Approximate Cross-Validation (GACV). Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
31 PatternSearch: Beaver Dam Data n = 876 =number of persons in study. Each x(i) is a zero-one vector of 7 features/risk factors: Risk factor 0 1 sex female male income > $30, 000 $30, 000 juvenile myopia myopic after age 21 myopic before age 21 cataract severity 1,2,3 4,5 smoking packs years 30 > 30 aspirin no yes vitamins no yes Find combinations of factors, as well as individual factors, that predict progression of myopia. Define N = 2 7 and basis functions B i1,i 2,...,i 7 (x) = x j. j:i j =1 This function is 1 if x j = 1 for all j with i j = 1, and zero otherwise. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
32 PatternSearch: Rheumatoid Arthritis and SNPs Want to predict likelihood that an individual is susceptible to rheumatoid arthritis based on genetic variations, plus some environmental factors. SNP: variation in a single nucleotide in the genome sequence. e.g. AAGGC changes to ATGGC. This version has two alleles, A and T. The less frequent nucleotides are called minor alleles. It s observed that rheumatoid arthritis is associated with SNPs on chromosome 6. Include 9787 nucleotides, mostly on chromosome 6, in the feature vector x, where the relevant component of x is coded as 0,1,2 according to whether it contains the most common nucleotide or one of the minor alleles. x also contains coding of a variation of DR type at the HLA locus of chromosome 6. x also contains variables for gender (female=1), smoking (yes=1), and age (older than 55 = 1). Total of 9792 x components. Only two categories (K = 2). Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
33 Might like to examine all possible interactions but this would involve solving a problem with unknowns. Instead, do multiple rounds of prescreening. 795 of the 9792 individual variables survived the first round. (Actually, 880 variables survive, as for some variables more than one level was of interest.) After screening for interactions between pairs of these 795 variables, obtained 1679 interactions of possible interest. Then solve a max-likelihood problem with 2559 = variables. Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
34 Algorithm Use variable splitting again (c = c + c ), and write the problem as min T λ(c + c ) = L(c + c ) + λ1 T c + + λ1 T c. u 0,v 0 (Ignore the unconstrained component of c for simplicity.) Recall that L(c) has the form [ ] L(c) = 1 n N y 2 (i) c l B l (x(i)) + log (1 + F (x(i); c)), n where i=1 l=0 ( N ) F (x; c) = exp c l B l (x). It s relatively expensive to evaluate F (x(i); c) for i = 1, 2,..., n. However once these quantities are known, the gradient L(c) is cheap and the Hessian is uncomplicated (though dense). l=0 Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
35 Two-Metric Gradient Projection Use gradient projection again, with two-metric scaling in the free components (Bertsekas, 1982). At each iterate x k for min x 0 f (x): Calculate f (x k ) and use it to form an estimate of the free variable set I k. Exclude i from I k if xi k is close to zero and f / x i > 0. Calculate the partial Hessian corresponding to I k : [ 2 f (x k ] ) H Ik = x i x j Form search direction p k by i I k,j I k. pi k f = τ k, i / I k x i for some scale factor τ k, and p k I k = (H Ik + ɛ k I ) 1 Ik f (x k ). Do Armijo backtracking line search along (x k + αp k ) +, α = 1, 1 2, 1 4, 1 8,.... Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
36 Why 2MGP? For interesting λ values, very few nonzero components of c at the solution, so have to compute only a small submatrix of the Hessian (and it s cheap). Use of the reduced Hessian (with damping) accelerates the method greatly over plain gradient projection. Strategies that need more of the Hessian are not practical because it s dense and very large. Strategies that use CG on the reduced Hessian unnecessary because it s small and easily calculated. It s faster than Matlab s fmincon! Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
37 Results: Myopia Progression Choose λ by GACV. For the selected λ, 13 of the 128 possible combined rsk factors are selected. These are subjected to further analysis ( Step 2 ) and 5 factors survive. Coefficients for f 2 : pattern coefficient constant cataract 2.42 smoking, no vitamins 1.18 male, low income, juv. myopia, no aspirin 1.84 male, low income, cataract, no aspirin 1.08 Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
38 Thanks! Thanks to the Organizers! Thanks for listening! Thanks to Dramamine! Stephen Wright (UW-Madison) Applications with l 1 -Norm Objective Terms Huatulco, January / 38
Optimization Algorithms for Compressed Sensing
Optimization Algorithms for Compressed Sensing Stephen Wright University of Wisconsin-Madison SIAM Gator Student Conference, Gainesville, March 2009 Stephen Wright (UW-Madison) Optimization and Compressed
More informationSparse Optimization: Algorithms and Applications. Formulating Sparse Optimization. Motivation. Stephen Wright. Caltech, 21 April 2007
Sparse Optimization: Algorithms and Applications Stephen Wright 1 Motivation and Introduction 2 Compressed Sensing Algorithms University of Wisconsin-Madison Caltech, 21 April 2007 3 Image Processing +Mario
More informationMathematical Programming in Machine Learning and Data Mining January 14-19, 2007 Banff International Research Station. Grace Wahba
Mathematical Programming in Machine Learning and Data Mining January 14-19, 27 Banff International Research Station Grace Wahba On Consuming Mathematical Programming: Selection of High Order Patterns in
More informationBasis Pursuit Denoising and the Dantzig Selector
BPDN and DS p. 1/16 Basis Pursuit Denoising and the Dantzig Selector West Coast Optimization Meeting University of Washington Seattle, WA, April 28 29, 2007 Michael Friedlander and Michael Saunders Dept
More informationAccelerated Block-Coordinate Relaxation for Regularized Optimization
Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth
More informationJohns Hopkins Bloomberg School of Public Health Department of Biostatistics. The LASSO-Patternsearch Algorithm: Finding patterns in a haystack
Johns Hopkins Bloomberg School of Public Health Department of Biostatistics The LASSO-Patternsearch Algorithm: Finding patterns in a haystack Grace Wahba Joint work with Weiliang Shi, Steve Wright, Kristine
More informationLinear Programming: Simplex
Linear Programming: Simplex Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Linear Programming: Simplex IMA, August 2016
More informationGradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems
SUBMITTED FOR PUBLICATION; 27. Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems Mário A. T. Figueiredo, Robert D. Nowak, Stephen J. Wright Abstract
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in
More informationLarge-Scale L1-Related Minimization in Compressive Sensing and Beyond
Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationConvex Optimization and l 1 -minimization
Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationElaine T. Hale, Wotao Yin, Yin Zhang
, Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More information12. Interior-point methods
12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity
More informationLecture 17: October 27
0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationLASSO-Patternsearch Algorithm with Application to Ophthalmology and Genomic Data
DEPARTMENT OF STATISTICS University of Wisconsin 1300 University Ave. Madison, WI 53706 TECHNICAL REPORT NO. 1141 January 2, 2008 LASSO-Patternsearch Algorithm with Application to Ophthalmology and Genomic
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationSolving l 1 Regularized Least Square Problems with Hierarchical Decomposition
Solving l 1 Least Square s with 1 mzhong1@umd.edu 1 AMSC and CSCAMM University of Maryland College Park Project for AMSC 663 October 2 nd, 2012 Outline 1 The 2 Outline 1 The 2 Compressed Sensing Example
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation
More informationLASSO-Patternsearch Algorithm with Application to Ophthalmology and Genomic Data 1
LASSO-Patternsearch Algorithm with Application to Ophthalmology and Genomic Data 1 Weiliang Shi 2 shiw@stat.wisc.edu Department of Statistics, University of Wisconsin 1300 University Avenue, Madison WI
More informationGradient based method for cone programming with application to large-scale compressed sensing
Gradient based method for cone programming with application to large-scale compressed sensing Zhaosong Lu September 3, 2008 (Revised: September 17, 2008) Abstract In this paper, we study a gradient based
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationAM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationSparsity in Underdetermined Systems
Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2
More informationComputing approximate PageRank vectors by Basis Pursuit Denoising
Computing approximate PageRank vectors by Basis Pursuit Denoising Michael Saunders Systems Optimization Laboratory, Stanford University Joint work with Holly Jin, LinkedIn Corp SIAM Annual Meeting San
More informationSparse Optimization Lecture: Basic Sparse Optimization Models
Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization
E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationBias-free Sparse Regression with Guaranteed Consistency
Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March
More informationSparse & Redundant Representations by Iterated-Shrinkage Algorithms
Sparse & Redundant Representations by Michael Elad * The Computer Science Department The Technion Israel Institute of technology Haifa 3000, Israel 6-30 August 007 San Diego Convention Center San Diego,
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationPathwise coordinate optimization
Stanford University 1 Pathwise coordinate optimization Jerome Friedman, Trevor Hastie, Holger Hoefling, Robert Tibshirani Stanford University Acknowledgements: Thanks to Stephen Boyd, Michael Saunders,
More informationIEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 6, JUNE
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 6, JUNE 2010 2935 Variance-Component Based Sparse Signal Reconstruction and Model Selection Kun Qiu, Student Member, IEEE, and Aleksandar Dogandzic,
More informationMS&E 318 (CME 338) Large-Scale Numerical Optimization
Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods
More informationWritten Examination
Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes
More informationQuasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization
Quasi-Newton Methods Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization 10-725 Last time: primal-dual interior-point methods Given the problem min x f(x) subject to h(x)
More informationParallel Coordinate Optimization
1 / 38 Parallel Coordinate Optimization Julie Nutini MLRG - Spring Term March 6 th, 2018 2 / 38 Contours of a function F : IR 2 IR. Goal: Find the minimizer of F. Coordinate Descent in 2D Contours of a
More informationOptimality, Duality, Complementarity for Constrained Optimization
Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear
More information10-725/ Optimization Midterm Exam
10-725/36-725 Optimization Midterm Exam November 6, 2012 NAME: ANDREW ID: Instructions: This exam is 1hr 20mins long Except for a single two-sided sheet of notes, no other material or discussion is permitted
More informationPrimal-Dual First-Order Methods for a Class of Cone Programming
Primal-Dual First-Order Methods for a Class of Cone Programming Zhaosong Lu March 9, 2011 Abstract In this paper we study primal-dual first-order methods for a class of cone programming problems. In particular,
More information12. Interior-point methods
12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity
More informationReduced-Hessian Methods for Constrained Optimization
Reduced-Hessian Methods for Constrained Optimization Philip E. Gill University of California, San Diego Joint work with: Michael Ferry & Elizabeth Wong 11th US & Mexico Workshop on Optimization and its
More informationAn Introduction to Sparse Approximation
An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,
More informationCS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares
CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationMaking Flippy Floppy
Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Current
More informationComplexity analysis of second-order algorithms based on line search for smooth nonconvex optimization
Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Clément Royer - University of Wisconsin-Madison Joint work with Stephen J. Wright MOPTA, Bethlehem,
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationProjection methods to solve SDP
Projection methods to solve SDP Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria F. Rendl, Oberwolfach Seminar, May 2010 p.1/32 Overview Augmented Primal-Dual Method
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationConstrained Optimization Theory
Constrained Optimization Theory Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationIterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming
Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationA Coordinate Gradient Descent Method for l 1 -regularized Convex Minimization
A Coordinate Gradient Descent Method for l 1 -regularized Convex Minimization Sangwoon Yun, and Kim-Chuan Toh January 30, 2009 Abstract In applications such as signal processing and statistics, many problems
More informationQuick Introduction to Nonnegative Matrix Factorization
Quick Introduction to Nonnegative Matrix Factorization Norm Matloff University of California at Davis 1 The Goal Given an u v matrix A with nonnegative elements, we wish to find nonnegative, rank-k matrices
More informationDepartment of Computer Science, University of British Columbia Technical Report TR , January 2008
Department of Computer Science, University of British Columbia Technical Report TR-2008-01, January 2008 PROBING THE PARETO FRONTIER FOR BASIS PURSUIT SOLUTIONS EWOUT VAN DEN BERG AND MICHAEL P. FRIEDLANDER
More informationOn the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,
Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers
More informationIPAM Summer School Optimization methods for machine learning. Jorge Nocedal
IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep
More informationOptimization for Compressed Sensing
Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationA Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification
JMLR: Workshop and Conference Proceedings 1 16 A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification Chih-Yang Hsia r04922021@ntu.edu.tw Dept. of Computer Science,
More informationA DECOMPOSITION PROCEDURE BASED ON APPROXIMATE NEWTON DIRECTIONS
Working Paper 01 09 Departamento de Estadística y Econometría Statistics and Econometrics Series 06 Universidad Carlos III de Madrid January 2001 Calle Madrid, 126 28903 Getafe (Spain) Fax (34) 91 624
More informationOptimization and Root Finding. Kurt Hornik
Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding
More informationMS&E 318 (CME 338) Large-Scale Numerical Optimization
Stanford University, Management Science & Engineering (and ICME MS&E 38 (CME 338 Large-Scale Numerical Optimization Course description Instructor: Michael Saunders Spring 28 Notes : Review The course teaches
More informationLinear algebra issues in Interior Point methods for bound-constrained least-squares problems
Linear algebra issues in Interior Point methods for bound-constrained least-squares problems Stefania Bellavia Dipartimento di Energetica S. Stecco Università degli Studi di Firenze Joint work with Jacek
More informationCompressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles
Or: the equation Ax = b, revisited University of California, Los Angeles Mahler Lecture Series Acquiring signals Many types of real-world signals (e.g. sound, images, video) can be viewed as an n-dimensional
More informationTRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS
TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS Martin Kleinsteuber and Simon Hawe Department of Electrical Engineering and Information Technology, Technische Universität München, München, Arcistraße
More informationA discretized Newton flow for time varying linear inverse problems
A discretized Newton flow for time varying linear inverse problems Martin Kleinsteuber and Simon Hawe Department of Electrical Engineering and Information Technology, Technische Universität München Arcisstrasse
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationOptimization in Machine Learning
Optimization in Machine Learning Stephen Wright University of Wisconsin-Madison Singapore, 14 Dec 2012 Stephen Wright (UW-Madison) Optimization Singapore, 14 Dec 2012 1 / 48 1 Learning from Data: Applications
More informationAlgorithms for constrained local optimization
Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationReconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm
Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm Jeevan K. Pant, Wu-Sheng Lu, and Andreas Antoniou University of Victoria May 21, 2012 Compressive Sensing 1/23
More informationShort Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning
Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants
More informationPrimal-Dual Interior-Point Methods. Ryan Tibshirani Convex Optimization /36-725
Primal-Dual Interior-Point Methods Ryan Tibshirani Convex Optimization 10-725/36-725 Given the problem Last time: barrier method min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h i, i = 1,...
More informationLecture Notes 9: Constrained Optimization
Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form
More informationAn Homotopy Algorithm for the Lasso with Online Observations
An Homotopy Algorithm for the Lasso with Online Observations Pierre J. Garrigues Department of EECS Redwood Center for Theoretical Neuroscience University of California Berkeley, CA 94720 garrigue@eecs.berkeley.edu
More informationA Fixed-Point Continuation Method for l 1 -Regularized Minimization with Applications to Compressed Sensing
CAAM Technical Report TR07-07 A Fixed-Point Continuation Method for l 1 -Regularized Minimization with Applications to Compressed Sensing Elaine T. Hale, Wotao Yin, and Yin Zhang Department of Computational
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large
More informationInterior-Point Methods for Linear Optimization
Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function
More informationSparse and Regularized Optimization
Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity
More informationAM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α
More informationSubgradient Method. Ryan Tibshirani Convex Optimization
Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial
More informationHomotopy methods based on l 0 norm for the compressed sensing problem
Homotopy methods based on l 0 norm for the compressed sensing problem Wenxing Zhu, Zhengshan Dong Center for Discrete Mathematics and Theoretical Computer Science, Fuzhou University, Fuzhou 350108, China
More informationCOMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION
COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION By Mazin Abdulrasool Hameed A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for
More informationHomework 5. Convex Optimization /36-725
Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationPart 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)
Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where
More information