To a large extent abandoned once gradient-based methods. One applicable class of methods is Generating set search
|
|
- Corey Skinner
- 5 years ago
- Views:
Transcription
1 Derivative Free Optimization and Average Curvature Information Trond Steihaug Lennart Frimannslund Eighth US-Mexico Workshop on Optimization and its Applications, January th, 007 Compass search in R. Current point is black Outline Part Generating set search Reusing old values Part : Work in progress Exploiting Separability Numerical results Summary Search east, compute function value and check decrease. New point accepted (grey) GSS and Unconstrained optimization Consider the unconstrained optimization problem: min f (x), x R n where only function values are available. One applicable class of methods is Generating set search (GSS). Widely studied in the 90s To a large extent abandoned once gradient-based methods became tractable Interest revived in the 990s with development of convergence theory We illustrate the work with the method called compass search Search north, compute function value and check decrease. New point not accepted (white).
2 Search south, compute function value and check decrease. New point accepted Search north, don t step Search west, compute function value and check decrease. New point not accepted Search south, step to new point Start new sweep through the directions in the same order: Search east, step to new point If no reduction can be found, decrease step sizes
3 Points in a rectangle spanned by ei and ej, like these: Canbeusedinthefinitedifferenceformula: f (x + δiei + δjej) f (x + δiei) f (x + δjej)+f (x) δiδj. And thus provide an approximate Hessian element, or average curvature information. Suppose we are searching in R along an orthogonal basis f (x+δ q ) f (x+δ q )+f (x) δ e C Q : [ ] e Suppose we are searching in R along an orthogonal basis e e Suppose we are searching in R along an orthogonal basis f (x+δ q ) f (x+δ q )+f (x) δ e [ ] C Q : e Suppose we are searching in R along an orthogonal basis f (x+δ q +δ q ) f (x+δ q ) f (x+δ q )+f (x) δ δ e [ ] C Q : e Suppose we are searching in R along an orthogonal basis C Q : [ ] e e C Q now contains average curvature information with respect to the search directions Q =[qq]. We call C Q a curvature information matrix.
4 Recall that (C Q )ij = f (x + δ iqi + δjqj) f (x + δiqi) f (x + δjqj)+f (x) δiδj. If the function is sufficiently smooth then (C Q )ij = q T i f (x k )qj, where the vector x k R n, x k = x + τiqi + τjqj τi δi,τj δj. If the function f is quadratic with Hessian matrix C, then (C Q )ij = q T i Cqj, i, j or C Q = Q T CQ In the general case we can always construct the matrix C = QC Q Q T, where Q is the matrix with the search direction vectors as its columns. Using the Hessian approximation The eigenvectors of the matrix C turn out to be useful search directions. 00 Compass Search in narrow valley 00 Our method in a narrow valley Theorem If f is in C, and f (x) f (y) L x y, then we have f ( x) C nlδ, where x is in the neighborhood of the points x k, k =,,...,r, and δ is O(maxi,j x i x j ). If f is quadratic, then L = 0 and we recover the exact Hessian. r = n(n+) (!) Skip proof Proof of theorem Let δ be the diameter of N, the smallest ball containing x k, k =,...,r. Consider the matrix C Q Q T f (x)q, where x N. Element (i, j) can be written q T i ( f (x k ) f (x))qj, so that q T i ( f (x k ) f (x))qj Lδ. Since A n maxi,j aij, and multiplication by orthogonal matrices does not alter norms, the result follows. x y y Extensions The method is able to rotate its orthogonal search directions based on average curvature information. It can rotate once every O(n) while loop iterations, since it has to compute O(n ) average curvature elements Now to work in progress. So far we have not made use of any knowledge of the representation of the function like using partial separability. Separability for differentiable functions If a function can be written as a sum of element functions, f = m i= fi, fi : R n i R where each element function depends on relatively few (ni) of the n components of x, then f is partially separable. For instance f (x, x, x) =f(x, x)+f(x, x), is partially separable, and has a tridiagonal Hessian (if it exists). x
5 Noisy separable functions Suppose the element functions fi(x) are expensive to compute accurately, but can be computed inexactly at a much lower cost, say fi = fi + ɛi, ɛi : R n i R If the function f is partially separable, the computed function f is partially separable but may not be differentiable. Returning to the tridiagonal case: f (x, x, x) =f(x, x)+ɛ(x, x)+f(x, x)+ɛ(x, x). Alternative Approach: Covariation Graph Define a covariation graph G(V, E) with n nodes and no edge between i and j if and only if for δi,δj > 0 and for all x f (x + δiei + δjej) f (x + δiei) f (x + δjej)+f (x) δiδj = 0, Observation If the covariation graph is not complete, then the function f is partially separable. For (i, j) E choose δi = xi,δj = xj and observe that f (x) can be written as a sum of three element functions (with n = n, n = n = n. Covariation Graph () We define a covariation graph G(V, E) with n nodes and an edge between i and j if and only if xi and xj appear in the same element function, and an edge (a loop) from each i to itself. Let Ei be the set of element functions for which variable xi appears in the domain. The intersection graph of Ei, i =,...,n is the covariation graph, i.e. there is an edge (i, j) E iff Ei Ej. This graph has an adjacency matrix, A G. If F T =(f,...,fm) then f = F T e (vector of all ones) and if F C then A G and F T F has the same structure. The example: A G has the structure A G :. Application to the method Thus, in the context of our optimization method we can impose the sparsity structure of the adjacency matrix of the covariation graph onto C. Convert the matrix C = QC Q Q T, into the equivalent formulation (Q T Q T )vec(c) =vec(c Q ). Here denotes the Kronecker product and vec( ) stacks the columns of a matrix in a vector. Covariation Graph () Observation If (i, j) E: For all x,δi,δj > 0 f (x + δiei + δjej) f (x + δiei) f (x + δjej)+f (x) δiδj = 0, In other words: If f is in C then the adjacency graph of the Hessian matrix is isomorphic to covariation graph G(V, E) )(ora subgraph). A G and the Hessian have the same sparsity structure. The elements in C Q are defined for for both differentiable and non-differentiable functions. Many of these differences will be identical to zero from the (partial) separability of the function regardless of differentiability. Application to the method () Since we know that many of the elements of C are required to be zero, and by in addition requiring C to be symmetric, we get a reduced equation system (Q T Q T )Pcvec(C) =vec(c Q ), () where vec(c) contains the r nonzero elements of, say, the lower triangle of C. The coefficient matrix is n(n + )/ r (Q T Q T )Pc. In order to avoid computing all n(n + )/ elements of vec(c Q ), this equation system should be reduced to an r r system Avec(C) =cγ, by selecting rows from ().
6 At this point we encounter the following subproblem. Given the overdetermined equation system: (Q T Q T )Pcvec(C) =vec(c Q ), how to pick r rows from the n(n + )/ r matrix (Q T Q T )Pc such that the resulting (square) matrix A is Invertible Well-conditioned Easy to compute An initial ordering of the pairs (i, j) in which the rows are chosen based on the magnitude of the components of the search directions qr, qs R n. This heuristic ordering almost every time produces an invertible and in many cases well-conditioned matrix. Skip subproblem A T unfinished = Q R = Candidate column accepted. A T unfinished = Q R = We have a heuristic ordering of the pairs (i, j) in which candidate columns are chosen based on the magnitude of the components of the search directions qr, qs R n A T unfinished = Q R = Choose candidate column, update QR-factorization, reject column and down-date QR-factorization if column is linearly dependent. A T unfinished = Q R = Choose candidate column, update QR-factorization, keep column if linearly independent. A T unfinished = Q R = Candidate column rejected. The cost of such a rejection is a down-dating of Q as well as some housekeeping.
7 A T final = QrRr = This procedure always gives an invertible matrix. Pseudocode Algorithm: : Given f, x, search directions, step lengths and structure : While not converged : Choose the order of the C Q elements to be computed in cγ : For each search direction qi : If f (x + δiqi) ρ(δi) < f (x) : x x + δiqi 7: End if 8: Compute cγ element if applicable 9: End for 0: If r elements in cγ have been computed : Solve for C and update search directions : End if : Update step lengths δi : End while We construct A by building up the QR-factorization of A T one column at a time: A T = QR = The initial ordering (based on the heuristic by looking at (q r T q s T ) which mimic the coordinate vectors), seldom needs to reject a candidate column to A T. Return to subproblem Numerical results Name n Sparse Regular Compass DECONVU 7 Disc. bound. val Ext. Rosenbrock Ext. Powell sing TRIDIA Function evaluations to reduce function value to less than e-, from recommended starting point. The sparse approach rarely performs worse than compass search. What do we loose when r < n(n + )/ Theorem If f C, and f (x) f (y) L x y, then for all x in the neighborhood of the points x k, k =,,...,r: f ( x) C nlδ A, where δ is O(maxi,j x i x j ). If f is quadratic, then L = 0 and we recover the exact Hessian also in the case when r < n(n+). Conclusions and observations Conclusions: Periodically rotating the search directions can significantly reduce the number of function evaluations to reach the optimal solution Eigenvectors of matrices with curvature information make good search directions Separability can be exploited also for noisy functions While the full method can rotate once every O(n) while loop iterations the new method can, for curvature matrices with O(n) elements rotate every O() iterations. This is useful for functions with a topography that warrants frequent basis rotation. However, numerical testing indicates that rotations should not be done too often (i.e. not every iteration).
8 Future work Improvements Develop adaptive rules for rotating search directions Develop schemes for the situation when noise can be controlled Integrate existing work for reducing the number of unsuccessful function evaluations Reducing the number of function evaluations due to computing the elements in the curvature information matrix Try to solve the matrix row selection subproblem Hybrid method based on trust-region and GSS More details [] Lennart Frimannslund and Trond Steihaug. A generating set search method using curvature information. To appear in Computational Optimization and Applications, 007. [] Lennart Frimannslund and Trond Steihaug. Anew Generating Set Search Algorithm for Separable Functions. Submitted to the SIAM Journal on Optimization.
REPORTS IN INFORMATICS
REPORTS IN INFORMATICS ISSN 0333-3590 Using Partial Separability of Functions in Generating Set Search Methods for Unconstrained Optimisation Lennart Frimannslund Trond Steihaug REPORT NO 318 March 006
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationComputational Methods. Least Squares Approximation/Optimization
Computational Methods Least Squares Approximation/Optimization Manfred Huber 2011 1 Least Squares Least squares methods are aimed at finding approximate solutions when no precise solution exists Find the
More information4.8 Arnoldi Iteration, Krylov Subspaces and GMRES
48 Arnoldi Iteration, Krylov Subspaces and GMRES We start with the problem of using a similarity transformation to convert an n n matrix A to upper Hessenberg form H, ie, A = QHQ, (30) with an appropriate
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationOn Lagrange multipliers of trust-region subproblems
On Lagrange multipliers of trust-region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Programy a algoritmy numerické matematiky 14 1.- 6. června 2008
More informationOn fast trust region methods for quadratic models with linear constraints. M.J.D. Powell
DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many
More informationREPORTS IN INFORMATICS
REPORTS IN INFORMATICS ISSN 0333-3590 A class of Methods Combining L-BFGS and Truncated Newton Lennart Frimannslund Trond Steihaug REPORT NO 319 April 2006 Department of Informatics UNIVERSITY OF BERGEN
More informationOn Lagrange multipliers of trust region subproblems
On Lagrange multipliers of trust region subproblems Ladislav Lukšan, Ctirad Matonoha, Jan Vlček Institute of Computer Science AS CR, Prague Applied Linear Algebra April 28-30, 2008 Novi Sad, Serbia Outline
More informationSuppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.
Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of
More informationCS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares
CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search
More informationOverview. Higher Order Methods for Nonlinear Equations. Trond Steihaug. Department of Informatics University of Bergen, Norway
Overview Department of Informatics University of Bergen, Norway 1 Higher Order Methods Nonlinear systems of equations and the Halley Class. Newton v.s. Halley. Unconstrained Optimization The effect of
More informationInterpolation-Based Trust-Region Methods for DFO
Interpolation-Based Trust-Region Methods for DFO Luis Nunes Vicente University of Coimbra (joint work with A. Bandeira, A. R. Conn, S. Gratton, and K. Scheinberg) July 27, 2010 ICCOPT, Santiago http//www.mat.uc.pt/~lnv
More informationEAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science
EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Taylor s Theorem Can often approximate a function by a polynomial The error in the approximation
More informationMath 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008
Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008 Exam 2 will be held on Tuesday, April 8, 7-8pm in 117 MacMillan What will be covered The exam will cover material from the lectures
More informationIntroduction to gradient descent
6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our
More informationAn Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84
An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84 Introduction Almost all numerical methods for solving PDEs will at some point be reduced to solving A
More informationA New Trust Region Algorithm Using Radial Basis Function Models
A New Trust Region Algorithm Using Radial Basis Function Models Seppo Pulkkinen University of Turku Department of Mathematics July 14, 2010 Outline 1 Introduction 2 Background Taylor series approximations
More informationLINEAR SYSTEMS (11) Intensive Computation
LINEAR SYSTEMS () Intensive Computation 27-8 prof. Annalisa Massini Viviana Arrigoni EXACT METHODS:. GAUSSIAN ELIMINATION. 2. CHOLESKY DECOMPOSITION. ITERATIVE METHODS:. JACOBI. 2. GAUSS-SEIDEL 2 CHOLESKY
More informationChapter 3 Numerical Methods
Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2
More informationAlgorithms for Constrained Optimization
1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic
More informationApplied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic
Applied Mathematics 205 Unit V: Eigenvalue Problems Lecturer: Dr. David Knezevic Unit V: Eigenvalue Problems Chapter V.4: Krylov Subspace Methods 2 / 51 Krylov Subspace Methods In this chapter we give
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationScientific Computing with Case Studies SIAM Press, Lecture Notes for Unit VII Sparse Matrix
Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit VII Sparse Matrix Computations Part 1: Direct Methods Dianne P. O Leary c 2008
More informationCS 246 Review of Linear Algebra 01/17/19
1 Linear algebra In this section we will discuss vectors and matrices. We denote the (i, j)th entry of a matrix A as A ij, and the ith entry of a vector as v i. 1.1 Vectors and vector operations A vector
More informationTrust Regions. Charles J. Geyer. March 27, 2013
Trust Regions Charles J. Geyer March 27, 2013 1 Trust Region Theory We follow Nocedal and Wright (1999, Chapter 4), using their notation. Fletcher (1987, Section 5.1) discusses the same algorithm, but
More informationMATH 167: APPLIED LINEAR ALGEBRA Least-Squares
MATH 167: APPLIED LINEAR ALGEBRA Least-Squares October 30, 2014 Least Squares We do a series of experiments, collecting data. We wish to see patterns!! We expect the output b to be a linear function of
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationOn the use of quadratic models in unconstrained minimization without derivatives 1. M.J.D. Powell
On the use of quadratic models in unconstrained minimization without derivatives 1 M.J.D. Powell Abstract: Quadratic approximations to the objective function provide a way of estimating first and second
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 7: More on Householder Reflectors; Least Squares Problems Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 15 Outline
More informationIntroduction to Alternating Direction Method of Multipliers
Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction
More informationTo find the least-squares solution to Ax = b, we project b onto Col A to obtain the vector ) b = proj ColA b,
Least-Squares A famous application of linear algebra is the case study provided by the 1974 update of the North American Datum (pp. 373-374, a geodetic survey that keeps accurate measurements of distances
More informationorthogonal relations between vectors and subspaces Then we study some applications in vector spaces and linear systems, including Orthonormal Basis,
5 Orthogonality Goals: We use scalar products to find the length of a vector, the angle between 2 vectors, projections, orthogonal relations between vectors and subspaces Then we study some applications
More informationCOMP 558 lecture 18 Nov. 15, 2010
Least squares We have seen several least squares problems thus far, and we will see more in the upcoming lectures. For this reason it is good to have a more general picture of these problems and how to
More informationEAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science
EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Multidimensional Unconstrained Optimization Suppose we have a function f() of more than one
More informationApril 26, Applied mathematics PhD candidate, physics MA UC Berkeley. Lecture 4/26/2013. Jed Duersch. Spd matrices. Cholesky decomposition
Applied mathematics PhD candidate, physics MA UC Berkeley April 26, 2013 UCB 1/19 Symmetric positive-definite I Definition A symmetric matrix A R n n is positive definite iff x T Ax > 0 holds x 0 R n.
More informationORIE 6334 Spectral Graph Theory October 13, Lecture 15
ORIE 6334 Spectral Graph heory October 3, 206 Lecture 5 Lecturer: David P. Williamson Scribe: Shijin Rajakrishnan Iterative Methods We have seen in the previous lectures that given an electrical network,
More informationB553 Lecture 5: Matrix Algebra Review
B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in R n and gradients of functions. Matrices represent linear transformations
More information6.4 Krylov Subspaces and Conjugate Gradients
6.4 Krylov Subspaces and Conjugate Gradients Our original equation is Ax = b. The preconditioned equation is P Ax = P b. When we write P, we never intend that an inverse will be explicitly computed. P
More informationThere are two things that are particularly nice about the first basis
Orthogonality and the Gram-Schmidt Process In Chapter 4, we spent a great deal of time studying the problem of finding a basis for a vector space We know that a basis for a vector space can potentially
More informationDesigning Information Devices and Systems I Discussion 13B
EECS 6A Fall 7 Designing Information Devices and Systems I Discussion 3B. Orthogonal Matching Pursuit Lecture Orthogonal Matching Pursuit (OMP) algorithm: Inputs: A set of m songs, each of length n: S
More informationDesigning Information Devices and Systems I Spring 2018 Lecture Notes Note 25
EECS 6 Designing Information Devices and Systems I Spring 8 Lecture Notes Note 5 5. Speeding up OMP In the last lecture note, we introduced orthogonal matching pursuit OMP, an algorithm that can extract
More informationSimple sparse matrices we have seen so far include diagonal matrices and tridiagonal matrices, but these are not the only ones.
A matrix is sparse if most of its entries are zero. Simple sparse matrices we have seen so far include diagonal matrices and tridiagonal matrices, but these are not the only ones. In fact sparse matrices
More informationRelations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis
Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis Hansi Jiang Carl Meyer North Carolina State University October 27, 2015 1 /
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationIntroduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems
New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems Z. Akbari 1, R. Yousefpour 2, M. R. Peyghami 3 1 Department of Mathematics, K.N. Toosi University of Technology,
More informationChapter 6 - Orthogonality
Chapter 6 - Orthogonality Maggie Myers Robert A. van de Geijn The University of Texas at Austin Orthogonality Fall 2009 http://z.cs.utexas.edu/wiki/pla.wiki/ 1 Orthogonal Vectors and Subspaces http://z.cs.utexas.edu/wiki/pla.wiki/
More informationLecture 13: Spectral Graph Theory
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 13: Spectral Graph Theory Lecturer: Shayan Oveis Gharan 11/14/18 Disclaimer: These notes have not been subjected to the usual scrutiny reserved
More informationMath 471 (Numerical methods) Chapter 3 (second half). System of equations
Math 47 (Numerical methods) Chapter 3 (second half). System of equations Overlap 3.5 3.8 of Bradie 3.5 LU factorization w/o pivoting. Motivation: ( ) A I Gaussian Elimination (U L ) where U is upper triangular
More information12. Cholesky factorization
L. Vandenberghe ECE133A (Winter 2018) 12. Cholesky factorization positive definite matrices examples Cholesky factorization complex positive definite matrices kernel methods 12-1 Definitions a symmetric
More informationPractical Linear Algebra: A Geometry Toolbox
Practical Linear Algebra: A Geometry Toolbox Third edition Chapter 12: Gauss for Linear Systems Gerald Farin & Dianne Hansford CRC Press, Taylor & Francis Group, An A K Peters Book www.farinhansford.com/books/pla
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationArc Search Algorithms
Arc Search Algorithms Nick Henderson and Walter Murray Stanford University Institute for Computational and Mathematical Engineering November 10, 2011 Unconstrained Optimization minimize x D F (x) where
More informationThe quadratic eigenvalue problem (QEP) is to find scalars λ and nonzero vectors u satisfying
I.2 Quadratic Eigenvalue Problems 1 Introduction The quadratic eigenvalue problem QEP is to find scalars λ and nonzero vectors u satisfying where Qλx = 0, 1.1 Qλ = λ 2 M + λd + K, M, D and K are given
More informationNetwork Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)
Network Newton Aryan Mokhtari, Qing Ling and Alejandro Ribeiro University of Pennsylvania, University of Science and Technology (China) aryanm@seas.upenn.edu, qingling@mail.ustc.edu.cn, aribeiro@seas.upenn.edu
More informationRecovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm
Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm J. K. Pant, W.-S. Lu, and A. Antoniou University of Victoria August 25, 2011 Compressive Sensing 1 University
More informationContents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016
ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................
More informationLU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b
AM 205: lecture 7 Last time: LU factorization Today s lecture: Cholesky factorization, timing, QR factorization Reminder: assignment 1 due at 5 PM on Friday September 22 LU Factorization LU factorization
More informationMethods for Unconstrained Optimization Numerical Optimization Lectures 1-2
Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods
More informationPreliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012
Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.
More informationDimension Reduction Techniques. Presented by Jie (Jerry) Yu
Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage
More informationNumerical Optimization
Numerical Optimization Unit 2: Multivariable optimization problems Che-Rung Lee Scribe: February 28, 2011 (UNIT 2) Numerical Optimization February 28, 2011 1 / 17 Partial derivative of a two variable function
More informationMATH 22A: LINEAR ALGEBRA Chapter 4
MATH 22A: LINEAR ALGEBRA Chapter 4 Jesús De Loera, UC Davis November 30, 2012 Orthogonality and Least Squares Approximation QUESTION: Suppose Ax = b has no solution!! Then what to do? Can we find an Approximate
More informationLECTURE NOTE #11 PROF. ALAN YUILLE
LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform
More information2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian
FE661 - Statistical Methods for Financial Engineering 2. Linear algebra Jitkomut Songsiri matrices and vectors linear equations range and nullspace of matrices function of vectors, gradient and Hessian
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationFitting Linear Statistical Models to Data by Least Squares I: Introduction
Fitting Linear Statistical Models to Data by Least Squares I: Introduction Brian R. Hunt and C. David Levermore University of Maryland, College Park Math 420: Mathematical Modeling January 25, 2012 version
More informationTangent spaces, normals and extrema
Chapter 3 Tangent spaces, normals and extrema If S is a surface in 3-space, with a point a S where S looks smooth, i.e., without any fold or cusp or self-crossing, we can intuitively define the tangent
More information1 Matrix notation and preliminaries from spectral graph theory
Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.
More informationChapter 3 Transformations
Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases
More informationMS&E 318 (CME 338) Large-Scale Numerical Optimization
Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods
More informationDerivative-Free Trust-Region methods
Derivative-Free Trust-Region methods MTH6418 S. Le Digabel, École Polytechnique de Montréal Fall 2015 (v4) MTH6418: DFTR 1/32 Plan Quadratic models Model Quality Derivative-Free Trust-Region Framework
More informationGraph Partitioning Using Random Walks
Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient
More informationLecture 7. Econ August 18
Lecture 7 Econ 2001 2015 August 18 Lecture 7 Outline First, the theorem of the maximum, an amazing result about continuity in optimization problems. Then, we start linear algebra, mostly looking at familiar
More informationLinear algebra & Numerical Analysis
Linear algebra & Numerical Analysis Eigenvalues and Eigenvectors Marta Jarošová http://homel.vsb.cz/~dom033/ Outline Methods computing all eigenvalues Characteristic polynomial Jacobi method for symmetric
More informationGI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil
GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection
More informationCLASS NOTES Computational Methods for Engineering Applications I Spring 2015
CLASS NOTES Computational Methods for Engineering Applications I Spring 2015 Petros Koumoutsakos Gerardo Tauriello (Last update: July 27, 2015) IMPORTANT DISCLAIMERS 1. REFERENCES: Much of the material
More informationPrincipal Component Analysis
Principal Component Analysis Laurenz Wiskott Institute for Theoretical Biology Humboldt-University Berlin Invalidenstraße 43 D-10115 Berlin, Germany 11 March 2004 1 Intuition Problem Statement Experimental
More informationTwo-View Segmentation of Dynamic Scenes from the Multibody Fundamental Matrix
Two-View Segmentation of Dynamic Scenes from the Multibody Fundamental Matrix René Vidal Stefano Soatto Shankar Sastry Department of EECS, UC Berkeley Department of Computer Sciences, UCLA 30 Cory Hall,
More informationWorst-Case Complexity Guarantees and Nonconvex Smooth Optimization
Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex
More informationResearch Reports on Mathematical and Computing Sciences
ISSN 1342-284 Research Reports on Mathematical and Computing Sciences Exploiting Sparsity in Linear and Nonlinear Matrix Inequalities via Positive Semidefinite Matrix Completion Sunyoung Kim, Masakazu
More information5.) For each of the given sets of vectors, determine whether or not the set spans R 3. Give reasons for your answers.
Linear Algebra - Test File - Spring Test # For problems - consider the following system of equations. x + y - z = x + y + 4z = x + y + 6z =.) Solve the system without using your calculator..) Find the
More informationLinear Algebra Massoud Malek
CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product
More informationMODULE 8 Topics: Null space, range, column space, row space and rank of a matrix
MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix Definition: Let L : V 1 V 2 be a linear operator. The null space N (L) of L is the subspace of V 1 defined by N (L) = {x
More informationAM205: Assignment 2. i=1
AM05: Assignment Question 1 [10 points] (a) [4 points] For p 1, the p-norm for a vector x R n is defined as: ( n ) 1/p x p x i p ( ) i=1 This definition is in fact meaningful for p < 1 as well, although
More informationApplying Bayesian Estimation to Noisy Simulation Optimization
Applying Bayesian Estimation to Noisy Simulation Optimization Geng Deng Michael C. Ferris University of Wisconsin-Madison INFORMS Annual Meeting Pittsburgh 2006 Simulation-based optimization problem Computer
More informationOrthogonality. 6.1 Orthogonal Vectors and Subspaces. Chapter 6
Chapter 6 Orthogonality 6.1 Orthogonal Vectors and Subspaces Recall that if nonzero vectors x, y R n are linearly independent then the subspace of all vectors αx + βy, α, β R (the space spanned by x and
More informationCourse Summary Math 211
Course Summary Math 211 table of contents I. Functions of several variables. II. R n. III. Derivatives. IV. Taylor s Theorem. V. Differential Geometry. VI. Applications. 1. Best affine approximations.
More informationSOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA
1 SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA 2 OUTLINE Sparse matrix storage format Basic factorization
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Matrix Notation Mark Schmidt University of British Columbia Winter 2017 Admin Auditting/registration forms: Submit them at end of class, pick them up end of next class. I need
More informationCommunities, Spectral Clustering, and Random Walks
Communities, Spectral Clustering, and Random Walks David Bindel Department of Computer Science Cornell University 26 Sep 2011 20 21 19 16 22 28 17 18 29 26 27 30 23 1 25 5 8 24 2 4 14 3 9 13 15 11 10 12
More informationHandling Nonpositive Curvature in a Limited Memory Steepest Descent Method
Handling Nonpositive Curvature in a Limited Memory Steepest Descent Method Fran E. Curtis and Wei Guo Department of Industrial and Systems Engineering, Lehigh University, USA COR@L Technical Report 14T-011-R1
More informationMatrix Rank Minimization with Applications
Matrix Rank Minimization with Applications Maryam Fazel Haitham Hindi Stephen Boyd Information Systems Lab Electrical Engineering Department Stanford University 8/2001 ACC 01 Outline Rank Minimization
More informationCOMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017
COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We
More informationNumerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725
Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple
More informationIntroduction to Compressed Sensing
Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral
More informationDef. A topological space X is disconnected if it admits a non-trivial splitting: (We ll abbreviate disjoint union of two subsets A and B meaning A B =
CONNECTEDNESS-Notes Def. A topological space X is disconnected if it admits a non-trivial splitting: X = A B, A B =, A, B open in X, and non-empty. (We ll abbreviate disjoint union of two subsets A and
More informationLarge-scale eigenvalue problems
ELE 538B: Mathematics of High-Dimensional Data Large-scale eigenvalue problems Yuxin Chen Princeton University, Fall 208 Outline Power method Lanczos algorithm Eigenvalue problems 4-2 Eigendecomposition
More information