1 Kernel methods & optimization

Size: px
Start display at page:

Download "1 Kernel methods & optimization"

Transcription

1 Machine Learning Class Notes Prof. David Sontag 1 Kernel methods & optimization One eample of a kernel that is frequently used in practice and which allows for highly non-linear discriminant functions is the Gaussian kernel, ( ) y 2 ep 2σ 2 For the Gaussian kernel, k(, ) = 1 for any vector, and k(, y) 0 if is very different from y. Thus, a kernel function can be interpreted as a similarity function. However, not just any similarity function is a valid kernel. In particular, recall that (by definition) k(, y) is a valid kernel if and only if φ : X R d s.t. k(, y) = φ( ) φ( y). One consequence of this is that kernel functions must be symmetric, since φ( ) φ( y) = φ( y) φ( ). Today s lecture will eplore these requirements of kernel functions in more depth, culmunating with Mercer s theorem. Together, these requirements provide a mathematical foundation for kernel methods, ensuring both that there is a sensible feature vector representation for every data point and that the support vector machine (SVM) objective has a unique global optimum and is easy to optimize. 1.1 Background from linear algebra A matri M R d R d is said to be positive semi-definite if z R d, z T Mz 0. For eample, suppose M = I. Then, z T Iz = z i z j I ij = z 2, which is always 0. Thus, the identify matri is positive semi-definite. Net we review several concepts from linear algebra, and then use these to give an alternative definition of positive semi-definite (PSD) matrices. Suppose we find a vector v and a value λ such that M v = λ v. We call v an eigenvector of the matri M, and λ an eigenvalue. A matri M can be shown to be PSD if and only if M has all non-negative eigenvalues. We will now show one of the directions ( ). To see this, first write M = V ΛV T, where Λ is a matri with the eigenvalues along the diagonal (zero off diagonal) and V i=1 1

2 is the matri of eigenvectors: 1 M = V Net, we split Λ in two, λ M = V 0 λ λ λd 0 λ λ d V T λ λ λd V T = UUT. Letting v = z T U, since vv T = v v 0 we have that (z T U)(U T z) = z T Mz 0, showing that M is positive semi-definite (we used the fact that the eigenvalues were non-negative when taking their square root). 1.2 Mercer s Theorem For a training set S = { i } and a function k( u, v), the kernel matri (also called the Gram matri) K S is the matri of dimension S S where (K S ) ij = k( i, j ). Theorem 1 (Mercer s theorem). k( u, v) is a valid kernel if and only if the corresponding kernel matri is PSD for all training sets S = { i }. Proof. ( ) Since k( u, v) is a valid kernel, it has a corresponding feature map φ such that k( u, v) = φ( u) φ( v). Thus, the kernel matri K s has entries (K S ) ij = φ( i ) φ( j ). Let V be the matri [ φ( 1 )... φ( n ) ], where we treat φ( i ) as a column vector. Then, we have K S = V T V. However, this shows that K S must be positive semi-definite, because for any vector z R S, (z T V T )(V z) 0. ( ) Let S be the set of all possible data points (we will assume that it is finite). Since the corresponding kernel matri K S is positive semi-definite, it has non-negative eigenvalues and can be factored as K S = UU T. Let φ( i ) = u i, where u i is the i th row of U. This gives the feature mapping for i such that k( i, j ) = u i u j. Mercer s theorem guarantees for us that the kernel matri is positive semidefinite. As we show in the net section, this will guarantee that the SVM dual objective is concave, which means that it is easy to optimize. 1 This proof assumes that M is symmetric, which kernel matrices are. 2

3 Not conve: Conve: Set specified by linear inequalities: X = { R 2 : A b} Figure 1: Illustration of a non-conve and two conve sets in R Conveity A set X R d is a conve set if for any, y X and 0 α 1, α + (1 α) y X Informally, if for any two points, y that are in the set every point on the line connecting and y is also included in the set, then the set is conve. See Figure 1 for eamples of non-conve and conve sets. A function f : X R is conve for a conve set X if, y X and 0 α 1, f(α + (1 α) y) αf( ) + (1 α)f( y) (1) Informally, a function is conve if the line between any two points on the curve always upper bounds the function. We call a function strictly conve if the inequality in Eq. 1 is a strict inequality. See See Figure 2 for eamples of nonconve and conve functions. A function f() is concave is f() is conve. Importantly, it can be shown that strictly conve functions always have a unique minima. For a function f() defined over the real line, one can show that f() is d conve if and only if 2 d f 0. Just as before, strict conveity occurs when 2 the inequality is strict. For eample, consider f() = 2. The first derivative of f() is given by d d always strictly greater than 0, we have proven that f() = 2 is strictly conve. As a second eample, consider f() = log(). The first derivative is d d f = 1, f = 2 and its second derivative by d2 d 2 f = 2. Since this is and its second derivative is given by d2 d 2 f = 1 2. Since this is negative for all > 0, we have proven that log() is a concave function over R +. Not conve: Conve: f() f() = 2 f() Figure 2: Illustration of a non-conve and two conve functions over X = R. 3

4 This matters because optimization for conve functions is easy. In particular, one can show that nearly any reasonable optimization method, such as gradient descent (where one starts at arbitrary point, moves a little bit in the direction opposite to the gradient, and then repeats), is guaranteed to reach a global optimum of the function. Note that whereas the minimization of conve functions is easy, likewise, the maimization of concave functions is easy. Finally, to generalize this second definition of conve functions to higher dimensions (i.e., X = R d ), we introduce the notion of the Hessian matri of a function f, 2 f( ) = 2 1. d 1 1 d which is the matri of dimension d d with entries ( 2 f) ij equal to the partial derivative of the function with respect to i and then with respect to j, denoted i j. Note that since the order of the partial derivatives does not matter, i.e. i j = 2 f j i, the Hessian matri is symmetric. We are finally ready for our second definition of conve functions in higher dimension. A function f : X R is conve for a conve set X R d if and only if its Hessian matri 2 f( ) is positive semi-definite for all X.. 2 d 1.4 The dual SVM objective is concave Recall the dual of the support vector machine (SVM) objective, f( α) = α i 1 2 i=1 The first partial derivative is given by α i α j y i y j k( i, j ) (2) f = 1 α i y i y s k( i, s ) α s k( s, s ) α s i s The second partial derivative is given by α t α s = y t y s k( t, s ) Denote the Hessian matri as 2 f. To show that the dual SVM objective is concave, we must show that α T 2 f α = n n α iα 2 f j α i α j 0 for all α, which can be equivalently written as (α i y i )(α j y j )k( i, j ) 0. (3) 4

5 Coordinate descent: f(, y) = 10 f(, y) =8 y Figure 3: Illustration of coordinate descent on the function f(, y). Shown here are the level sets of the function. The numbers indicate the first point, the second point, etc., until the optimal solution is found. However, since k( u, v) is a valid kernel, Mercer s theorem guarantees for us that K S, the kernel matri for the n data points, is positive semi-definite. As a result, we have that z T K S z = z i z j k( i, j ) 0 (4) for all vectors z R n. Given any α, let z be given by z i = α i y i for i = 1,..., n. Using this, Eq. 4 implies Eq. 3. There are many approaches for minimizing f( α). One of the simplest such methods is called the sequential minimal optimization (SMO) algorithm, and is based on the concept of block coordinate descent. Coordinate descent is illustrated in Fig. 3 for a function defined on R 2. An arbitrary starting point is chosen. Then, in each step, one coordinate (or, in general, a set of coordinates, called a block) is chosen and the function is minimized as much as possible with respect to that coordinate (keeping all other variables fied to their current values). The larger the blocks, the faster the convergence to the optimum solution. The blocks are typically chosen to be as large as possible such that minimizing the function with respect to these coordinates can be performed in closed form. For the dual SVM, because of the constraint i y iα i = 0, the smallest block size that can be chosen is 2. The algorithm proceeds by choosing in each iteration α i and α j, then minimizing the function as much as possible with respect to these two variables. 5

UNCONSTRAINED OPTIMIZATION PAUL SCHRIMPF OCTOBER 24, 2013

UNCONSTRAINED OPTIMIZATION PAUL SCHRIMPF OCTOBER 24, 2013 PAUL SCHRIMPF OCTOBER 24, 213 UNIVERSITY OF BRITISH COLUMBIA ECONOMICS 26 Today s lecture is about unconstrained optimization. If you re following along in the syllabus, you ll notice that we ve skipped

More information

Intro to Nonlinear Optimization

Intro to Nonlinear Optimization Intro to Nonlinear Optimization We now rela the proportionality and additivity assumptions of LP What are the challenges of nonlinear programs NLP s? Objectives and constraints can use any function: ma

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Lecture Notes: Geometric Considerations in Unconstrained Optimization Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Solving Linear Systems

Solving Linear Systems Solving Linear Systems Iterative Solutions Methods Philippe B. Laval KSU Fall 207 Philippe B. Laval (KSU) Linear Systems Fall 207 / 2 Introduction We continue looking how to solve linear systems of the

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

Lecture 8 : Eigenvalues and Eigenvectors

Lecture 8 : Eigenvalues and Eigenvectors CPS290: Algorithmic Foundations of Data Science February 24, 2017 Lecture 8 : Eigenvalues and Eigenvectors Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Hermitian Matrices It is simpler to begin with

More information

Kernels and the Kernel Trick. Machine Learning Fall 2017

Kernels and the Kernel Trick. Machine Learning Fall 2017 Kernels and the Kernel Trick Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem Support vectors, duals and kernels

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Dual Methods Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 1 Last time: proimal Newton method Consider the problem min g() + h() where g, h are conve, g is twice differentiable, and h is simple.

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

CS269: Machine Learning Theory Lecture 16: SVMs and Kernels November 17, 2010

CS269: Machine Learning Theory Lecture 16: SVMs and Kernels November 17, 2010 CS269: Machine Learning Theory Lecture 6: SVMs and Kernels November 7, 200 Lecturer: Jennifer Wortman Vaughan Scribe: Jason Au, Ling Fang, Kwanho Lee Today, we will continue on the topic of support vector

More information

10701 Recitation 5 Duality and SVM. Ahmed Hefny

10701 Recitation 5 Duality and SVM. Ahmed Hefny 10701 Recitation 5 Duality and SVM Ahmed Hefny Outline Langrangian and Duality The Lagrangian Duality Eamples Support Vector Machines Primal Formulation Dual Formulation Soft Margin and Hinge Loss Lagrangian

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Lecture Unconstrained optimization. In this lecture we will study the unconstrained problem. minimize f(x), (2.1)

Lecture Unconstrained optimization. In this lecture we will study the unconstrained problem. minimize f(x), (2.1) Lecture 2 In this lecture we will study the unconstrained problem minimize f(x), (2.1) where x R n. Optimality conditions aim to identify properties that potential minimizers need to satisfy in relation

More information

Convex Optimization Overview (cnt d)

Convex Optimization Overview (cnt d) Conve Optimization Overview (cnt d) Chuong B. Do November 29, 2009 During last week s section, we began our study of conve optimization, the study of mathematical optimization problems of the form, minimize

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

0.1. Linear transformations

0.1. Linear transformations Suggestions for midterm review #3 The repetitoria are usually not complete; I am merely bringing up the points that many people didn t now on the recitations Linear transformations The following mostly

More information

Outline. Basic Concepts in Optimization Part I. Illustration of a (Strict) Local Minimum, x. Local Optima. Neighborhood.

Outline. Basic Concepts in Optimization Part I. Illustration of a (Strict) Local Minimum, x. Local Optima. Neighborhood. Outline Basic Concepts in Optimization Part I Local and Global Optima Benoît Chachuat McMaster University Department of Chemical Engineering ChE G: Optimization in Chemical Engineering

More information

Convexity II: Optimization Basics

Convexity II: Optimization Basics Conveity II: Optimization Basics Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 See supplements for reviews of basic multivariate calculus basic linear algebra Last time: conve sets and functions

More information

Machine Learning: The Perceptron. Lecture 06

Machine Learning: The Perceptron. Lecture 06 Machine Learning: he Perceptron Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu 1 McCulloch-Pitts Neuron Function 0 1 w 0 activation / output function 1 w 1 w w

More information

Chapter 8 Gradient Methods

Chapter 8 Gradient Methods Chapter 8 Gradient Methods An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Introduction Recall that a level set of a function is the set of points satisfying for some constant. Thus, a point

More information

Functions of Several Variables

Functions of Several Variables Functions of Several Variables The Unconstrained Minimization Problem where In n dimensions the unconstrained problem is stated as f() x variables. minimize f()x x, is a scalar objective function of vector

More information

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Machine Learning. Lecture 6: Support Vector Machine. Feng Li. Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)

More information

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming CSC2411 - Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming Notes taken by Mike Jamieson March 28, 2005 Summary: In this lecture, we introduce semidefinite programming

More information

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always

More information

Introduction to Machine Learning Spring 2018 Note Duality. 1.1 Primal and Dual Problem

Introduction to Machine Learning Spring 2018 Note Duality. 1.1 Primal and Dual Problem CS 189 Introduction to Machine Learning Spring 2018 Note 22 1 Duality As we have seen in our discussion of kernels, ridge regression can be viewed in two ways: (1) an optimization problem over the weights

More information

L20: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs

L20: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs L0: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU

More information

EC5555 Economics Masters Refresher Course in Mathematics September 2013

EC5555 Economics Masters Refresher Course in Mathematics September 2013 EC5555 Economics Masters Reresher Course in Mathematics September 3 Lecture 5 Unconstraine Optimization an Quaratic Forms Francesco Feri We consier the unconstraine optimization or the case o unctions

More information

LU Factorization. A m x n matrix A admits an LU factorization if it can be written in the form of A = LU

LU Factorization. A m x n matrix A admits an LU factorization if it can be written in the form of A = LU LU Factorization A m n matri A admits an LU factorization if it can be written in the form of Where, A = LU L : is a m m lower triangular matri with s on the diagonal. The matri L is invertible and is

More information

Lecture 03 Positive Semidefinite (PSD) and Positive Definite (PD) Matrices and their Properties

Lecture 03 Positive Semidefinite (PSD) and Positive Definite (PD) Matrices and their Properties Applied Optimization for Wireless, Machine Learning, Big Data Prof. Aditya K. Jagannatham Department of Electrical Engineering Indian Institute of Technology, Kanpur Lecture 03 Positive Semidefinite (PSD)

More information

Lecture 14: Optimality Conditions for Conic Problems

Lecture 14: Optimality Conditions for Conic Problems EE 227A: Conve Optimization and Applications March 6, 2012 Lecture 14: Optimality Conditions for Conic Problems Lecturer: Laurent El Ghaoui Reading assignment: 5.5 of BV. 14.1 Optimality for Conic Problems

More information

Linear Equations in Linear Algebra

Linear Equations in Linear Algebra 1 Linear Equations in Linear Algebra 1.4 THE MATRIX EQUATION A = b MATRIX EQUATION A = b m n Definition: If A is an matri, with columns a 1, n, a n, and if is in, then the product of A and, denoted by

More information

Lecture 4: September 12

Lecture 4: September 12 10-725/36-725: Conve Optimization Fall 2016 Lecture 4: September 12 Lecturer: Ryan Tibshirani Scribes: Jay Hennig, Yifeng Tao, Sriram Vasudevan Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016 AM 221: Advanced Optimization Spring 2016 Prof. Yaron Singer Lecture 8 February 22nd 1 Overview In the previous lecture we saw characterizations of optimality in linear optimization, and we reviewed the

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

Lecture 12 : Graph Laplacians and Cheeger s Inequality

Lecture 12 : Graph Laplacians and Cheeger s Inequality CPS290: Algorithmic Foundations of Data Science March 7, 2017 Lecture 12 : Graph Laplacians and Cheeger s Inequality Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Graph Laplacian Maybe the most beautiful

More information

Advanced Engineering Mathematics Prof. Pratima Panigrahi Department of Mathematics Indian Institute of Technology, Kharagpur

Advanced Engineering Mathematics Prof. Pratima Panigrahi Department of Mathematics Indian Institute of Technology, Kharagpur Advanced Engineering Mathematics Prof. Pratima Panigrahi Department of Mathematics Indian Institute of Technology, Kharagpur Lecture No. #07 Jordan Canonical Form Cayley Hamilton Theorem (Refer Slide Time:

More information

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018 MATH 57: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 18 1 Global and Local Optima Let a function f : S R be defined on a set S R n Definition 1 (minimizers and maximizers) (i) x S

More information

1 Convexity, Convex Relaxations, and Global Optimization

1 Convexity, Convex Relaxations, and Global Optimization 1 Conveity, Conve Relaations, and Global Optimization Algorithms 1 1.1 Minima Consider the nonlinear optimization problem in a Euclidean space: where the feasible region. min f(), R n Definition (global

More information

Introduction to gradient descent

Introduction to gradient descent 6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

Computational Optimization. Constrained Optimization Part 2

Computational Optimization. Constrained Optimization Part 2 Computational Optimization Constrained Optimization Part Optimality Conditions Unconstrained Case X* is global min Conve f X* is local min SOSC f ( *) = SONC Easiest Problem Linear equality constraints

More information

Geometric Modeling Summer Semester 2010 Mathematical Tools (1)

Geometric Modeling Summer Semester 2010 Mathematical Tools (1) Geometric Modeling Summer Semester 2010 Mathematical Tools (1) Recap: Linear Algebra Today... Topics: Mathematical Background Linear algebra Analysis & differential geometry Numerical techniques Geometric

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Next topics: Solving systems of linear equations

Next topics: Solving systems of linear equations Next topics: Solving systems of linear equations 1 Gaussian elimination (today) 2 Gaussian elimination with partial pivoting (Week 9) 3 The method of LU-decomposition (Week 10) 4 Iterative techniques:

More information

Math 3191 Applied Linear Algebra

Math 3191 Applied Linear Algebra Math 9 Applied Linear Algebra Lecture 9: Diagonalization Stephen Billups University of Colorado at Denver Math 9Applied Linear Algebra p./9 Section. Diagonalization The goal here is to develop a useful

More information

Math (P)refresher Lecture 8: Unconstrained Optimization

Math (P)refresher Lecture 8: Unconstrained Optimization Math (P)refresher Lecture 8: Unconstrained Optimization September 2006 Today s Topics : Quadratic Forms Definiteness of Quadratic Forms Maxima and Minima in R n First Order Conditions Second Order Conditions

More information

SVMs, Duality and the Kernel Trick

SVMs, Duality and the Kernel Trick SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information

Lecture 7: Positive Semidefinite Matrices

Lecture 7: Positive Semidefinite Matrices Lecture 7: Positive Semidefinite Matrices Rajat Mittal IIT Kanpur The main aim of this lecture note is to prepare your background for semidefinite programming. We have already seen some linear algebra.

More information

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

10-725/36-725: Convex Optimization Spring Lecture 21: April 6 10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION

LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION 15-382 COLLECTIVE INTELLIGENCE - S19 LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION TEACHER: GIANNI A. DI CARO WHAT IF WE HAVE ONE SINGLE AGENT PSO leverages the presence of a swarm: the outcome

More information

Statistical Geometry Processing Winter Semester 2011/2012

Statistical Geometry Processing Winter Semester 2011/2012 Statistical Geometry Processing Winter Semester 2011/2012 Linear Algebra, Function Spaces & Inverse Problems Vector and Function Spaces 3 Vectors vectors are arrows in space classically: 2 or 3 dim. Euclidian

More information

Review of matrices. Let m, n IN. A rectangle of numbers written like A =

Review of matrices. Let m, n IN. A rectangle of numbers written like A = Review of matrices Let m, n IN. A rectangle of numbers written like a 11 a 12... a 1n a 21 a 22... a 2n A =...... a m1 a m2... a mn where each a ij IR is called a matrix with m rows and n columns or an

More information

B553 Lecture 5: Matrix Algebra Review

B553 Lecture 5: Matrix Algebra Review B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in R n and gradients of functions. Matrices represent linear transformations

More information

Copositive Plus Matrices

Copositive Plus Matrices Copositive Plus Matrices Willemieke van Vliet Master Thesis in Applied Mathematics October 2011 Copositive Plus Matrices Summary In this report we discuss the set of copositive plus matrices and their

More information

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY UNIVERSITY OF MARYLAND: ECON 600 1. Some Eamples 1 A general problem that arises countless times in economics takes the form: (Verbally):

More information

On factor width and symmetric H -matrices

On factor width and symmetric H -matrices Linear Algebra and its Applications 405 (2005) 239 248 www.elsevier.com/locate/laa On factor width and symmetric H -matrices Erik G. Boman a,,1, Doron Chen b, Ojas Parekh c, Sivan Toledo b,2 a Department

More information

Linear algebra and differential equations (Math 54): Lecture 13

Linear algebra and differential equations (Math 54): Lecture 13 Linear algebra and differential equations (Math 54): Lecture 13 Vivek Shende March 3, 016 Hello and welcome to class! Last time We discussed change of basis. This time We will introduce the notions of

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

Lecture 18 Classical Iterative Methods

Lecture 18 Classical Iterative Methods Lecture 18 Classical Iterative Methods MIT 18.335J / 6.337J Introduction to Numerical Methods Per-Olof Persson November 14, 2006 1 Iterative Methods for Linear Systems Direct methods for solving Ax = b,

More information

Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Numerical Analysis Lecture Notes Peter J Olver 8 Numerical Computation of Eigenvalues In this part, we discuss some practical methods for computing eigenvalues and eigenvectors of matrices Needless to

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete

More information

MTH50 Spring 07 HW Assignment 7 {From [FIS0]}: Sec 44 #4a h 6; Sec 5 #ad ac 4ae 4 7 The due date for this assignment is 04/05/7 Sec 44 #4a h Evaluate the erminant of the following matrices by any legitimate

More information

Solving Linear Systems

Solving Linear Systems Solving Linear Systems Iterative Solutions Methods Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) Linear Systems Fall 2015 1 / 12 Introduction We continue looking how to solve linear systems of

More information

arxiv: v1 [math.na] 5 May 2011

arxiv: v1 [math.na] 5 May 2011 ITERATIVE METHODS FOR COMPUTING EIGENVALUES AND EIGENVECTORS MAYSUM PANJU arxiv:1105.1185v1 [math.na] 5 May 2011 Abstract. We examine some numerical iterative methods for computing the eigenvalues and

More information

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form LECTURE # - EURAL COPUTATIO, Feb 4, 4 Linear Regression Assumes a functional form f (, θ) = θ θ θ K θ (Eq) where = (,, ) are the attributes and θ = (θ, θ, θ ) are the function parameters Eample: f (, θ)

More information

Beyond Newton s method Thomas P. Minka

Beyond Newton s method Thomas P. Minka Beyond Newton s method Thomas P. Minka 2000 (revised 7/21/2017) Abstract Newton s method for optimization is equivalent to iteratively maimizing a local quadratic approimation to the objective function.

More information

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 7 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Convex Optimization Differentiation Definition: let f : X R N R be a differentiable function,

More information

Deep Learning. Authors: I. Goodfellow, Y. Bengio, A. Courville. Chapter 4: Numerical Computation. Lecture slides edited by C. Yim. C.

Deep Learning. Authors: I. Goodfellow, Y. Bengio, A. Courville. Chapter 4: Numerical Computation. Lecture slides edited by C. Yim. C. Chapter 4: Numerical Computation Deep Learning Authors: I. Goodfellow, Y. Bengio, A. Courville Lecture slides edited by 1 Chapter 4: Numerical Computation 4.1 Overflow and Underflow 4.2 Poor Conditioning

More information

A = Chapter 6. Linear Programming: The Simplex Method. + 21x 3 x x 2. C = 16x 1. + x x x 1. + x 3. 16,x 2.

A = Chapter 6. Linear Programming: The Simplex Method. + 21x 3 x x 2. C = 16x 1. + x x x 1. + x 3. 16,x 2. Chapter 6 Linear rogramming: The Simple Method Section The Dual roblem: Minimization with roblem Constraints of the Form Learning Objectives for Section 6. Dual roblem: Minimization with roblem Constraints

More information

Lecture 4: January 26

Lecture 4: January 26 10-725/36-725: Conve Optimization Spring 2015 Lecturer: Javier Pena Lecture 4: January 26 Scribes: Vipul Singh, Shinjini Kundu, Chia-Yin Tsai Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization Duality Uses and Correspondences Ryan Tibshirani Conve Optimization 10-725 Recall that for the problem Last time: KKT conditions subject to f() h i () 0, i = 1,... m l j () = 0, j = 1,... r the KKT conditions

More information

Background Mathematics (2/2) 1. David Barber

Background Mathematics (2/2) 1. David Barber Background Mathematics (2/2) 1 David Barber University College London Modified by Samson Cheung (sccheung@ieee.org) 1 These slides accompany the book Bayesian Reasoning and Machine Learning. The book and

More information

Spectral Theorem for Self-adjoint Linear Operators

Spectral Theorem for Self-adjoint Linear Operators Notes for the undergraduate lecture by David Adams. (These are the notes I would write if I was teaching a course on this topic. I have included more material than I will cover in the 45 minute lecture;

More information

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are

, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are Quadratic forms We consider the quadratic function f : R 2 R defined by f(x) = 2 xt Ax b T x with x = (x, x 2 ) T, () where A R 2 2 is symmetric and b R 2. We will see that, depending on the eigenvalues

More information

Convex Functions and Optimization

Convex Functions and Optimization Chapter 5 Convex Functions and Optimization 5.1 Convex Functions Our next topic is that of convex functions. Again, we will concentrate on the context of a map f : R n R although the situation can be generalized

More information

3.2 Iterative Solution Methods for Solving Linear

3.2 Iterative Solution Methods for Solving Linear 22 CHAPTER 3. NUMERICAL LINEAR ALGEBRA 3.2 Iterative Solution Methods for Solving Linear Systems 3.2.1 Introduction We continue looking how to solve linear systems of the form Ax = b where A = (a ij is

More information

Performance Surfaces and Optimum Points

Performance Surfaces and Optimum Points CSC 302 1.5 Neural Networks Performance Surfaces and Optimum Points 1 Entrance Performance learning is another important class of learning law. Network parameters are adjusted to optimize the performance

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation

More information

Lecture 16: October 22

Lecture 16: October 22 0-725/36-725: Conve Optimization Fall 208 Lecturer: Ryan Tibshirani Lecture 6: October 22 Scribes: Nic Dalmasso, Alan Mishler, Benja LeRoy Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,

More information

Gradient Descent. Mohammad Emtiyaz Khan EPFL Sep 22, 2015

Gradient Descent. Mohammad Emtiyaz Khan EPFL Sep 22, 2015 Gradient Descent Mohammad Emtiyaz Khan EPFL Sep 22, 201 Mohammad Emtiyaz Khan 2014 1 Learning/estimation/fitting Given a cost function L(β), we wish to find β that minimizes the cost: min β L(β), subject

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Lecture Notes on Support Vector Machine

Lecture Notes on Support Vector Machine Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is

More information

Lecture 9: SVD, Low Rank Approximation

Lecture 9: SVD, Low Rank Approximation CSE 521: Design and Analysis of Algorithms I Spring 2016 Lecture 9: SVD, Low Rank Approimation Lecturer: Shayan Oveis Gharan April 25th Scribe: Koosha Khalvati Disclaimer: hese notes have not been subjected

More information

Lecture: Local Spectral Methods (1 of 4)

Lecture: Local Spectral Methods (1 of 4) Stat260/CS294: Spectral Graph Methods Lecture 18-03/31/2015 Lecture: Local Spectral Methods (1 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

Recall the convention that, for us, all vectors are column vectors.

Recall the convention that, for us, all vectors are column vectors. Some linear algebra Recall the convention that, for us, all vectors are column vectors. 1. Symmetric matrices Let A be a real matrix. Recall that a complex number λ is an eigenvalue of A if there exists

More information