1 Kernel methods & optimization
|
|
- Bernice Richardson
- 6 years ago
- Views:
Transcription
1 Machine Learning Class Notes Prof. David Sontag 1 Kernel methods & optimization One eample of a kernel that is frequently used in practice and which allows for highly non-linear discriminant functions is the Gaussian kernel, ( ) y 2 ep 2σ 2 For the Gaussian kernel, k(, ) = 1 for any vector, and k(, y) 0 if is very different from y. Thus, a kernel function can be interpreted as a similarity function. However, not just any similarity function is a valid kernel. In particular, recall that (by definition) k(, y) is a valid kernel if and only if φ : X R d s.t. k(, y) = φ( ) φ( y). One consequence of this is that kernel functions must be symmetric, since φ( ) φ( y) = φ( y) φ( ). Today s lecture will eplore these requirements of kernel functions in more depth, culmunating with Mercer s theorem. Together, these requirements provide a mathematical foundation for kernel methods, ensuring both that there is a sensible feature vector representation for every data point and that the support vector machine (SVM) objective has a unique global optimum and is easy to optimize. 1.1 Background from linear algebra A matri M R d R d is said to be positive semi-definite if z R d, z T Mz 0. For eample, suppose M = I. Then, z T Iz = z i z j I ij = z 2, which is always 0. Thus, the identify matri is positive semi-definite. Net we review several concepts from linear algebra, and then use these to give an alternative definition of positive semi-definite (PSD) matrices. Suppose we find a vector v and a value λ such that M v = λ v. We call v an eigenvector of the matri M, and λ an eigenvalue. A matri M can be shown to be PSD if and only if M has all non-negative eigenvalues. We will now show one of the directions ( ). To see this, first write M = V ΛV T, where Λ is a matri with the eigenvalues along the diagonal (zero off diagonal) and V i=1 1
2 is the matri of eigenvectors: 1 M = V Net, we split Λ in two, λ M = V 0 λ λ λd 0 λ λ d V T λ λ λd V T = UUT. Letting v = z T U, since vv T = v v 0 we have that (z T U)(U T z) = z T Mz 0, showing that M is positive semi-definite (we used the fact that the eigenvalues were non-negative when taking their square root). 1.2 Mercer s Theorem For a training set S = { i } and a function k( u, v), the kernel matri (also called the Gram matri) K S is the matri of dimension S S where (K S ) ij = k( i, j ). Theorem 1 (Mercer s theorem). k( u, v) is a valid kernel if and only if the corresponding kernel matri is PSD for all training sets S = { i }. Proof. ( ) Since k( u, v) is a valid kernel, it has a corresponding feature map φ such that k( u, v) = φ( u) φ( v). Thus, the kernel matri K s has entries (K S ) ij = φ( i ) φ( j ). Let V be the matri [ φ( 1 )... φ( n ) ], where we treat φ( i ) as a column vector. Then, we have K S = V T V. However, this shows that K S must be positive semi-definite, because for any vector z R S, (z T V T )(V z) 0. ( ) Let S be the set of all possible data points (we will assume that it is finite). Since the corresponding kernel matri K S is positive semi-definite, it has non-negative eigenvalues and can be factored as K S = UU T. Let φ( i ) = u i, where u i is the i th row of U. This gives the feature mapping for i such that k( i, j ) = u i u j. Mercer s theorem guarantees for us that the kernel matri is positive semidefinite. As we show in the net section, this will guarantee that the SVM dual objective is concave, which means that it is easy to optimize. 1 This proof assumes that M is symmetric, which kernel matrices are. 2
3 Not conve: Conve: Set specified by linear inequalities: X = { R 2 : A b} Figure 1: Illustration of a non-conve and two conve sets in R Conveity A set X R d is a conve set if for any, y X and 0 α 1, α + (1 α) y X Informally, if for any two points, y that are in the set every point on the line connecting and y is also included in the set, then the set is conve. See Figure 1 for eamples of non-conve and conve sets. A function f : X R is conve for a conve set X if, y X and 0 α 1, f(α + (1 α) y) αf( ) + (1 α)f( y) (1) Informally, a function is conve if the line between any two points on the curve always upper bounds the function. We call a function strictly conve if the inequality in Eq. 1 is a strict inequality. See See Figure 2 for eamples of nonconve and conve functions. A function f() is concave is f() is conve. Importantly, it can be shown that strictly conve functions always have a unique minima. For a function f() defined over the real line, one can show that f() is d conve if and only if 2 d f 0. Just as before, strict conveity occurs when 2 the inequality is strict. For eample, consider f() = 2. The first derivative of f() is given by d d always strictly greater than 0, we have proven that f() = 2 is strictly conve. As a second eample, consider f() = log(). The first derivative is d d f = 1, f = 2 and its second derivative by d2 d 2 f = 2. Since this is and its second derivative is given by d2 d 2 f = 1 2. Since this is negative for all > 0, we have proven that log() is a concave function over R +. Not conve: Conve: f() f() = 2 f() Figure 2: Illustration of a non-conve and two conve functions over X = R. 3
4 This matters because optimization for conve functions is easy. In particular, one can show that nearly any reasonable optimization method, such as gradient descent (where one starts at arbitrary point, moves a little bit in the direction opposite to the gradient, and then repeats), is guaranteed to reach a global optimum of the function. Note that whereas the minimization of conve functions is easy, likewise, the maimization of concave functions is easy. Finally, to generalize this second definition of conve functions to higher dimensions (i.e., X = R d ), we introduce the notion of the Hessian matri of a function f, 2 f( ) = 2 1. d 1 1 d which is the matri of dimension d d with entries ( 2 f) ij equal to the partial derivative of the function with respect to i and then with respect to j, denoted i j. Note that since the order of the partial derivatives does not matter, i.e. i j = 2 f j i, the Hessian matri is symmetric. We are finally ready for our second definition of conve functions in higher dimension. A function f : X R is conve for a conve set X R d if and only if its Hessian matri 2 f( ) is positive semi-definite for all X.. 2 d 1.4 The dual SVM objective is concave Recall the dual of the support vector machine (SVM) objective, f( α) = α i 1 2 i=1 The first partial derivative is given by α i α j y i y j k( i, j ) (2) f = 1 α i y i y s k( i, s ) α s k( s, s ) α s i s The second partial derivative is given by α t α s = y t y s k( t, s ) Denote the Hessian matri as 2 f. To show that the dual SVM objective is concave, we must show that α T 2 f α = n n α iα 2 f j α i α j 0 for all α, which can be equivalently written as (α i y i )(α j y j )k( i, j ) 0. (3) 4
5 Coordinate descent: f(, y) = 10 f(, y) =8 y Figure 3: Illustration of coordinate descent on the function f(, y). Shown here are the level sets of the function. The numbers indicate the first point, the second point, etc., until the optimal solution is found. However, since k( u, v) is a valid kernel, Mercer s theorem guarantees for us that K S, the kernel matri for the n data points, is positive semi-definite. As a result, we have that z T K S z = z i z j k( i, j ) 0 (4) for all vectors z R n. Given any α, let z be given by z i = α i y i for i = 1,..., n. Using this, Eq. 4 implies Eq. 3. There are many approaches for minimizing f( α). One of the simplest such methods is called the sequential minimal optimization (SMO) algorithm, and is based on the concept of block coordinate descent. Coordinate descent is illustrated in Fig. 3 for a function defined on R 2. An arbitrary starting point is chosen. Then, in each step, one coordinate (or, in general, a set of coordinates, called a block) is chosen and the function is minimized as much as possible with respect to that coordinate (keeping all other variables fied to their current values). The larger the blocks, the faster the convergence to the optimum solution. The blocks are typically chosen to be as large as possible such that minimizing the function with respect to these coordinates can be performed in closed form. For the dual SVM, because of the constraint i y iα i = 0, the smallest block size that can be chosen is 2. The algorithm proceeds by choosing in each iteration α i and α j, then minimizing the function as much as possible with respect to these two variables. 5
UNCONSTRAINED OPTIMIZATION PAUL SCHRIMPF OCTOBER 24, 2013
PAUL SCHRIMPF OCTOBER 24, 213 UNIVERSITY OF BRITISH COLUMBIA ECONOMICS 26 Today s lecture is about unconstrained optimization. If you re following along in the syllabus, you ll notice that we ve skipped
More informationIntro to Nonlinear Optimization
Intro to Nonlinear Optimization We now rela the proportionality and additivity assumptions of LP What are the challenges of nonlinear programs NLP s? Objectives and constraints can use any function: ma
More informationOptimization Tutorial 1. Basic Gradient Descent
E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationLecture Notes: Geometric Considerations in Unconstrained Optimization
Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections
More informationGradient Descent. Dr. Xiaowei Huang
Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationSolving Linear Systems
Solving Linear Systems Iterative Solutions Methods Philippe B. Laval KSU Fall 207 Philippe B. Laval (KSU) Linear Systems Fall 207 / 2 Introduction We continue looking how to solve linear systems of the
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationLecture 8 : Eigenvalues and Eigenvectors
CPS290: Algorithmic Foundations of Data Science February 24, 2017 Lecture 8 : Eigenvalues and Eigenvectors Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Hermitian Matrices It is simpler to begin with
More informationKernels and the Kernel Trick. Machine Learning Fall 2017
Kernels and the Kernel Trick Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem Support vectors, duals and kernels
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationDual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725
Dual Methods Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 1 Last time: proimal Newton method Consider the problem min g() + h() where g, h are conve, g is twice differentiable, and h is simple.
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationCS269: Machine Learning Theory Lecture 16: SVMs and Kernels November 17, 2010
CS269: Machine Learning Theory Lecture 6: SVMs and Kernels November 7, 200 Lecturer: Jennifer Wortman Vaughan Scribe: Jason Au, Ling Fang, Kwanho Lee Today, we will continue on the topic of support vector
More information10701 Recitation 5 Duality and SVM. Ahmed Hefny
10701 Recitation 5 Duality and SVM Ahmed Hefny Outline Langrangian and Duality The Lagrangian Duality Eamples Support Vector Machines Primal Formulation Dual Formulation Soft Margin and Hinge Loss Lagrangian
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationLecture Unconstrained optimization. In this lecture we will study the unconstrained problem. minimize f(x), (2.1)
Lecture 2 In this lecture we will study the unconstrained problem minimize f(x), (2.1) where x R n. Optimality conditions aim to identify properties that potential minimizers need to satisfy in relation
More informationConvex Optimization Overview (cnt d)
Conve Optimization Overview (cnt d) Chuong B. Do November 29, 2009 During last week s section, we began our study of conve optimization, the study of mathematical optimization problems of the form, minimize
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More information0.1. Linear transformations
Suggestions for midterm review #3 The repetitoria are usually not complete; I am merely bringing up the points that many people didn t now on the recitations Linear transformations The following mostly
More informationOutline. Basic Concepts in Optimization Part I. Illustration of a (Strict) Local Minimum, x. Local Optima. Neighborhood.
Outline Basic Concepts in Optimization Part I Local and Global Optima Benoît Chachuat McMaster University Department of Chemical Engineering ChE G: Optimization in Chemical Engineering
More informationConvexity II: Optimization Basics
Conveity II: Optimization Basics Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 See supplements for reviews of basic multivariate calculus basic linear algebra Last time: conve sets and functions
More informationMachine Learning: The Perceptron. Lecture 06
Machine Learning: he Perceptron Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu 1 McCulloch-Pitts Neuron Function 0 1 w 0 activation / output function 1 w 1 w w
More informationChapter 8 Gradient Methods
Chapter 8 Gradient Methods An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Introduction Recall that a level set of a function is the set of points satisfying for some constant. Thus, a point
More informationFunctions of Several Variables
Functions of Several Variables The Unconstrained Minimization Problem where In n dimensions the unconstrained problem is stated as f() x variables. minimize f()x x, is a scalar objective function of vector
More informationMachine Learning. Lecture 6: Support Vector Machine. Feng Li.
Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)
More informationCSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming
CSC2411 - Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming Notes taken by Mike Jamieson March 28, 2005 Summary: In this lecture, we introduce semidefinite programming
More informationLecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan
Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always
More informationIntroduction to Machine Learning Spring 2018 Note Duality. 1.1 Primal and Dual Problem
CS 189 Introduction to Machine Learning Spring 2018 Note 22 1 Duality As we have seen in our discussion of kernels, ridge regression can be viewed in two ways: (1) an optimization problem over the weights
More informationL20: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs
L0: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU
More informationEC5555 Economics Masters Refresher Course in Mathematics September 2013
EC5555 Economics Masters Reresher Course in Mathematics September 3 Lecture 5 Unconstraine Optimization an Quaratic Forms Francesco Feri We consier the unconstraine optimization or the case o unctions
More informationLU Factorization. A m x n matrix A admits an LU factorization if it can be written in the form of A = LU
LU Factorization A m n matri A admits an LU factorization if it can be written in the form of Where, A = LU L : is a m m lower triangular matri with s on the diagonal. The matri L is invertible and is
More informationLecture 03 Positive Semidefinite (PSD) and Positive Definite (PD) Matrices and their Properties
Applied Optimization for Wireless, Machine Learning, Big Data Prof. Aditya K. Jagannatham Department of Electrical Engineering Indian Institute of Technology, Kanpur Lecture 03 Positive Semidefinite (PSD)
More informationLecture 14: Optimality Conditions for Conic Problems
EE 227A: Conve Optimization and Applications March 6, 2012 Lecture 14: Optimality Conditions for Conic Problems Lecturer: Laurent El Ghaoui Reading assignment: 5.5 of BV. 14.1 Optimality for Conic Problems
More informationLinear Equations in Linear Algebra
1 Linear Equations in Linear Algebra 1.4 THE MATRIX EQUATION A = b MATRIX EQUATION A = b m n Definition: If A is an matri, with columns a 1, n, a n, and if is in, then the product of A and, denoted by
More informationLecture 4: September 12
10-725/36-725: Conve Optimization Fall 2016 Lecture 4: September 12 Lecturer: Ryan Tibshirani Scribes: Jay Hennig, Yifeng Tao, Sriram Vasudevan Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More information1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016
AM 221: Advanced Optimization Spring 2016 Prof. Yaron Singer Lecture 8 February 22nd 1 Overview In the previous lecture we saw characterizations of optimality in linear optimization, and we reviewed the
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationLecture 12 : Graph Laplacians and Cheeger s Inequality
CPS290: Algorithmic Foundations of Data Science March 7, 2017 Lecture 12 : Graph Laplacians and Cheeger s Inequality Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Graph Laplacian Maybe the most beautiful
More informationAdvanced Engineering Mathematics Prof. Pratima Panigrahi Department of Mathematics Indian Institute of Technology, Kharagpur
Advanced Engineering Mathematics Prof. Pratima Panigrahi Department of Mathematics Indian Institute of Technology, Kharagpur Lecture No. #07 Jordan Canonical Form Cayley Hamilton Theorem (Refer Slide Time:
More informationMATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018
MATH 57: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 18 1 Global and Local Optima Let a function f : S R be defined on a set S R n Definition 1 (minimizers and maximizers) (i) x S
More information1 Convexity, Convex Relaxations, and Global Optimization
1 Conveity, Conve Relaations, and Global Optimization Algorithms 1 1.1 Minima Consider the nonlinear optimization problem in a Euclidean space: where the feasible region. min f(), R n Definition (global
More informationIntroduction to gradient descent
6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our
More informationChapter 3 Transformations
Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases
More informationComputational Optimization. Constrained Optimization Part 2
Computational Optimization Constrained Optimization Part Optimality Conditions Unconstrained Case X* is global min Conve f X* is local min SOSC f ( *) = SONC Easiest Problem Linear equality constraints
More informationGeometric Modeling Summer Semester 2010 Mathematical Tools (1)
Geometric Modeling Summer Semester 2010 Mathematical Tools (1) Recap: Linear Algebra Today... Topics: Mathematical Background Linear algebra Analysis & differential geometry Numerical techniques Geometric
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October
More informationLinear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights
Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University
More informationSupport Vector Machines
Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This
More informationNext topics: Solving systems of linear equations
Next topics: Solving systems of linear equations 1 Gaussian elimination (today) 2 Gaussian elimination with partial pivoting (Week 9) 3 The method of LU-decomposition (Week 10) 4 Iterative techniques:
More informationMath 3191 Applied Linear Algebra
Math 9 Applied Linear Algebra Lecture 9: Diagonalization Stephen Billups University of Colorado at Denver Math 9Applied Linear Algebra p./9 Section. Diagonalization The goal here is to develop a useful
More informationMath (P)refresher Lecture 8: Unconstrained Optimization
Math (P)refresher Lecture 8: Unconstrained Optimization September 2006 Today s Topics : Quadratic Forms Definiteness of Quadratic Forms Maxima and Minima in R n First Order Conditions Second Order Conditions
More informationSVMs, Duality and the Kernel Trick
SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More informationLecture 7: Positive Semidefinite Matrices
Lecture 7: Positive Semidefinite Matrices Rajat Mittal IIT Kanpur The main aim of this lecture note is to prepare your background for semidefinite programming. We have already seen some linear algebra.
More information10-725/36-725: Convex Optimization Spring Lecture 21: April 6
10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationLECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION
15-382 COLLECTIVE INTELLIGENCE - S19 LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION TEACHER: GIANNI A. DI CARO WHAT IF WE HAVE ONE SINGLE AGENT PSO leverages the presence of a swarm: the outcome
More informationStatistical Geometry Processing Winter Semester 2011/2012
Statistical Geometry Processing Winter Semester 2011/2012 Linear Algebra, Function Spaces & Inverse Problems Vector and Function Spaces 3 Vectors vectors are arrows in space classically: 2 or 3 dim. Euclidian
More informationReview of matrices. Let m, n IN. A rectangle of numbers written like A =
Review of matrices Let m, n IN. A rectangle of numbers written like a 11 a 12... a 1n a 21 a 22... a 2n A =...... a m1 a m2... a mn where each a ij IR is called a matrix with m rows and n columns or an
More informationB553 Lecture 5: Matrix Algebra Review
B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in R n and gradients of functions. Matrices represent linear transformations
More informationCopositive Plus Matrices
Copositive Plus Matrices Willemieke van Vliet Master Thesis in Applied Mathematics October 2011 Copositive Plus Matrices Summary In this report we discuss the set of copositive plus matrices and their
More informationSTATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY
STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY UNIVERSITY OF MARYLAND: ECON 600 1. Some Eamples 1 A general problem that arises countless times in economics takes the form: (Verbally):
More informationOn factor width and symmetric H -matrices
Linear Algebra and its Applications 405 (2005) 239 248 www.elsevier.com/locate/laa On factor width and symmetric H -matrices Erik G. Boman a,,1, Doron Chen b, Ojas Parekh c, Sivan Toledo b,2 a Department
More informationLinear algebra and differential equations (Math 54): Lecture 13
Linear algebra and differential equations (Math 54): Lecture 13 Vivek Shende March 3, 016 Hello and welcome to class! Last time We discussed change of basis. This time We will introduce the notions of
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationLecture 18 Classical Iterative Methods
Lecture 18 Classical Iterative Methods MIT 18.335J / 6.337J Introduction to Numerical Methods Per-Olof Persson November 14, 2006 1 Iterative Methods for Linear Systems Direct methods for solving Ax = b,
More informationNumerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J Olver 8 Numerical Computation of Eigenvalues In this part, we discuss some practical methods for computing eigenvalues and eigenvectors of matrices Needless to
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete
More informationMTH50 Spring 07 HW Assignment 7 {From [FIS0]}: Sec 44 #4a h 6; Sec 5 #ad ac 4ae 4 7 The due date for this assignment is 04/05/7 Sec 44 #4a h Evaluate the erminant of the following matrices by any legitimate
More informationSolving Linear Systems
Solving Linear Systems Iterative Solutions Methods Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) Linear Systems Fall 2015 1 / 12 Introduction We continue looking how to solve linear systems of
More informationarxiv: v1 [math.na] 5 May 2011
ITERATIVE METHODS FOR COMPUTING EIGENVALUES AND EIGENVECTORS MAYSUM PANJU arxiv:1105.1185v1 [math.na] 5 May 2011 Abstract. We examine some numerical iterative methods for computing the eigenvalues and
More informationLECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form
LECTURE # - EURAL COPUTATIO, Feb 4, 4 Linear Regression Assumes a functional form f (, θ) = θ θ θ K θ (Eq) where = (,, ) are the attributes and θ = (θ, θ, θ ) are the function parameters Eample: f (, θ)
More informationBeyond Newton s method Thomas P. Minka
Beyond Newton s method Thomas P. Minka 2000 (revised 7/21/2017) Abstract Newton s method for optimization is equivalent to iteratively maimizing a local quadratic approimation to the objective function.
More informationIntroduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 7 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Convex Optimization Differentiation Definition: let f : X R N R be a differentiable function,
More informationDeep Learning. Authors: I. Goodfellow, Y. Bengio, A. Courville. Chapter 4: Numerical Computation. Lecture slides edited by C. Yim. C.
Chapter 4: Numerical Computation Deep Learning Authors: I. Goodfellow, Y. Bengio, A. Courville Lecture slides edited by 1 Chapter 4: Numerical Computation 4.1 Overflow and Underflow 4.2 Poor Conditioning
More informationA = Chapter 6. Linear Programming: The Simplex Method. + 21x 3 x x 2. C = 16x 1. + x x x 1. + x 3. 16,x 2.
Chapter 6 Linear rogramming: The Simple Method Section The Dual roblem: Minimization with roblem Constraints of the Form Learning Objectives for Section 6. Dual roblem: Minimization with roblem Constraints
More informationLecture 4: January 26
10-725/36-725: Conve Optimization Spring 2015 Lecturer: Javier Pena Lecture 4: January 26 Scribes: Vipul Singh, Shinjini Kundu, Chia-Yin Tsai Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationDuality Uses and Correspondences. Ryan Tibshirani Convex Optimization
Duality Uses and Correspondences Ryan Tibshirani Conve Optimization 10-725 Recall that for the problem Last time: KKT conditions subject to f() h i () 0, i = 1,... m l j () = 0, j = 1,... r the KKT conditions
More informationBackground Mathematics (2/2) 1. David Barber
Background Mathematics (2/2) 1 David Barber University College London Modified by Samson Cheung (sccheung@ieee.org) 1 These slides accompany the book Bayesian Reasoning and Machine Learning. The book and
More informationSpectral Theorem for Self-adjoint Linear Operators
Notes for the undergraduate lecture by David Adams. (These are the notes I would write if I was teaching a course on this topic. I have included more material than I will cover in the 45 minute lecture;
More information, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are
Quadratic forms We consider the quadratic function f : R 2 R defined by f(x) = 2 xt Ax b T x with x = (x, x 2 ) T, () where A R 2 2 is symmetric and b R 2. We will see that, depending on the eigenvalues
More informationConvex Functions and Optimization
Chapter 5 Convex Functions and Optimization 5.1 Convex Functions Our next topic is that of convex functions. Again, we will concentrate on the context of a map f : R n R although the situation can be generalized
More information3.2 Iterative Solution Methods for Solving Linear
22 CHAPTER 3. NUMERICAL LINEAR ALGEBRA 3.2 Iterative Solution Methods for Solving Linear Systems 3.2.1 Introduction We continue looking how to solve linear systems of the form Ax = b where A = (a ij is
More informationPerformance Surfaces and Optimum Points
CSC 302 1.5 Neural Networks Performance Surfaces and Optimum Points 1 Entrance Performance learning is another important class of learning law. Network parameters are adjusted to optimize the performance
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationLecture 9: Large Margin Classifiers. Linear Support Vector Machines
Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation
More informationLecture 16: October 22
0-725/36-725: Conve Optimization Fall 208 Lecturer: Ryan Tibshirani Lecture 6: October 22 Scribes: Nic Dalmasso, Alan Mishler, Benja LeRoy Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,
More informationGradient Descent. Mohammad Emtiyaz Khan EPFL Sep 22, 2015
Gradient Descent Mohammad Emtiyaz Khan EPFL Sep 22, 201 Mohammad Emtiyaz Khan 2014 1 Learning/estimation/fitting Given a cost function L(β), we wish to find β that minimizes the cost: min β L(β), subject
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationLecture Notes on Support Vector Machine
Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is
More informationLecture 9: SVD, Low Rank Approximation
CSE 521: Design and Analysis of Algorithms I Spring 2016 Lecture 9: SVD, Low Rank Approimation Lecturer: Shayan Oveis Gharan April 25th Scribe: Koosha Khalvati Disclaimer: hese notes have not been subjected
More informationLecture: Local Spectral Methods (1 of 4)
Stat260/CS294: Spectral Graph Methods Lecture 18-03/31/2015 Lecture: Local Spectral Methods (1 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide
More informationRecall the convention that, for us, all vectors are column vectors.
Some linear algebra Recall the convention that, for us, all vectors are column vectors. 1. Symmetric matrices Let A be a real matrix. Recall that a complex number λ is an eigenvalue of A if there exists
More information