Matrix Support Functional and its Applications

Size: px
Start display at page:

Download "Matrix Support Functional and its Applications"

Transcription

1 Matrix Support Functional and its Applications James V Burke Mathematics, University of Washington Joint work with Yuan Gao (UW) and Tim Hoheisel (McGill), CORS, Banff 2016 June 1, 2016

2 Connections What do the following topics have in common? Quadratic Optimization Problem with Equality Constraints The Matrix Fractional Function and its Generalization Ky Fan p-k Norms K-means Clustering Best Affine Unbiased Estimator Supervised Representation Learning Multi-task Learning Variational Gram Functions

3 Connections What do the following topics have in common? Quadratic Optimization Problem with Equality Constraints The Matrix Fractional Function and its Generalization Ky Fan p-k Norms K-means Clustering Best Affine Unbiased Estimator Supervised Representation Learning Multi-task Learning Variational Gram Functions Answer: They can all be represented using a matrix support function that is smooth on the interior of its domain.

4 A Matrix Support Functional (B-Hoheisel (2015)) Given A R p n and B R p m set D(A, B) := {( ) Y, 1YY T 2 R n m S n } Y R n m : AY = B

5 A Matrix Support Functional (B-Hoheisel (2015)) Given A R p n and B R p m set D(A, B) := {( ) Y, 1YY T 2 R n m S n } Y R n m : AY = B We consider the support functional for D(A, B). σ ((X, V ) D(A, B)) = sup (X, V ), (Y, 1YY T 2 ) AY =B

6 A Matrix Support Functional (B-Hoheisel (2015)) Given A R p n and B R p m set D(A, B) := {( ) Y, 1YY T 2 R n m S n } Y R n m : AY = B We consider the support functional for D(A, B). σ ((X, V ) D(A, B)) = sup (X, V ), (Y, 1YY T 2 ) AY =B = inf AY =B 1 tr (Y T 2 VY ) X, Y

7 Support Functions σ S (x) := σ (x S ) := sup x, y y S

8 Support Functions σ S (x) := σ (x S ) := sup x, y y S S = x {y x, y σ (x S )}

9 Support Functions σ S (x) := σ (x S ) := sup x, y y S S = x {y x, y σ (x S )} σ S = σ cl S = σ conv S = σ conv S

10 Support Functions σ S (x) := σ (x S ) := sup x, y y S S = x {y x, y σ (x S )} σ S = σ cl S = σ conv S = σ conv S When S is a closed convex set, then σ S (x) = arg max y S x, y.

11 Epigraph epi f := {(x, µ) f (x) µ} f (y) := σ ((y, 1) epi f )

12 A Representation for σ ((X, V ) D(A, B)) Let A R p n and B R p m such that rge B rge A. Then { ( (X 1 2 σ ((X, V ) D(A, B)) = tr ) T B M(V ) ( ) ) X B if rge ( X B) rge M(V ), V ker A 0, + else. where ( V A T M(V ) := A 0 ).

13 A Representation for σ ((X, V ) D(A, B)) Let A R p n and B R p m such that rge B rge A. Then { ( (X 1 2 σ ((X, V ) D(A, B)) = tr ) T B M(V ) ( ) ) X B if rge ( X B) rge M(V ), V ker A 0, + else. where In particular, ( V A T M(V ) := A 0 ). dom σ ( D(A, B)) = dom σ ( D(A, B)) { ( ) } X = (X, V ) R n m S n rge rge M(V ), V ker A 0, B with int (dom σ ( D(A, B))) = {(X, V ) R n m S n V ker A 0}.

14 A Representation for σ ((X, V ) D(A, B)) Let A R p n and B R p m such that rge B rge A. Then { ( (X 1 2 σ ((X, V ) D(A, B)) = tr ) T B M(V ) ( ) ) X B if rge ( X B) rge M(V ), V ker A 0, + else. where In particular, ( V A T M(V ) := A 0 ). dom σ ( D(A, B)) = dom σ ( D(A, B)) { ( ) } X = (X, V ) R n m S n rge rge M(V ), V ker A 0, B with int (dom σ ( D(A, B))) = {(X, V ) R n m S n V ker A 0}. The inverse M(V ) 1 exists when V ker A 0 and A is surjective.

15 Relationship to Equality Constrained QP Consider a equality constrained QP: ν(x, V ) := inf { 1 u R n 2 ut Vu x T u Au = b }.

16 Relationship to Equality Constrained QP Consider a equality constrained QP: ν(x, V ) := inf { 1 u R n 2 ut Vu x T u Au = b }. The Lagrangian is L(u, λ) = 1 2 ut Vu x T u + λ T (Au b). Optimality conditions are Vu + A T λ x = 0 Au = b

17 Relationship to Equality Constrained QP Consider a equality constrained QP: ν(x, V ) := inf { 1 u R n 2 ut Vu x T u Au = b }. The Lagrangian is L(u, λ) = 1 2 ut Vu x T u + λ T (Au b). Optimality conditions are Vu + A T λ x = 0 Au = b This is equivalent to ( V A T A 0 ) ( ) u = λ ( ) x. b

18 Relationship to Equality Constrained QP Consider a equality constrained QP: ν(x, V ) := inf { 1 u R n 2 ut Vu x T u Au = b }. The Lagrangian is L(u, λ) = 1 2 ut Vu x T u + λ T (Au b). Optimality conditions are Vu + A T λ x = 0 Au = b This is equivalent to M(V ) = ( V A T A 0 ) ( ) u = λ ( ) x. b

19 Relationship to Equality Constrained QP Consider a equality constrained QP: ν(x, V ) := inf { 1 u R n 2 ut Vu x T u Au = b }. The Lagrangian is L(u, λ) = 1 2 ut Vu x T u + λ T (Au b). Optimality conditions are This is equivalent to Hence M(V ) = Vu + A T λ x = 0 Au ( V A T A 0 = b ) ( ) u = λ ( ) x. b ν(x, V ) = σ ((x, V ) D(A, b)).

20 Maximum Likelihood Estimation L(µ, Σ; Y ) := (2π) mn/2 Σ N/2 N i=1 exp((y i µ) T Σ 1 (y i µ)) Up to a constant, the negative log-likelihood is ) ln L(µ, Σ; Y ) = 1 ln det Σ + 1tr 2 2 ((Y M) T Σ 1 (Y M) = σ ((Y M), Σ) D(0, 0)) 1 2 ( ln det Σ).

21 Maximum Likelihood Estimation L(µ, Σ; Y ) := (2π) mn/2 Σ N/2 N i=1 exp((y i µ) T Σ 1 (y i µ)) Up to a constant, the negative log-likelihood is ) ln L(µ, Σ; Y ) = 1 ln det Σ + 1tr 2 2 ((Y M) T Σ 1 (Y M) = σ ((Y M), Σ) D(0, 0)) 1 2 ( ln det Σ). The Matrix Fractional Function:Take A = 0 and B = 0, and set γ(x, V ) := σ ((x, V ) D(0, 0)) = { 1 2 V X if rge X rge V, V 0, + else.

22 conv D(A, B) Recall σ C (x) = {y conv C x, y = σ C (x)}.

23 conv D(A, B) Recall σ C (x) = {y conv C x, y = σ C (x)}. For σ ((X, V ) D(A, B)) we need conv (D(A, B)).

24 conv D(A, B) Recall σ C (x) = {y conv C x, y = σ C (x)}. For σ ((X, V ) D(A, B)) we need conv (D(A, B)). Set { } S n +(ker A) := W S n u T Wu 0 u ker A = {W ker A 0}. Then S n +(ker A) is a closed convex cone whose polar is given by S n +(ker A) = {W S n W = PWP 0}, where P is the orthogonal projection onto ker A.

25 conv D(A, B) Recall σ C (x) = {y conv C x, y = σ C (x)}. For σ ((X, V ) D(A, B)) we need conv (D(A, B)). Set { } S n +(ker A) := W S n u T Wu 0 u ker A = {W ker A 0}. Then S n +(ker A) is a closed convex cone whose polar is given by S n +(ker A) = {W S n W = PWP 0}, where P is the orthogonal projection onto ker A. For D(A, B) := {( Y, 1 2 YY T ) R n m S n Y R n m : AY = B }, := { (Y, W ) R n m S n conv D(A, B) = Ω(A, B) AY = B and 1 2 YY T + W S n +(ker A) }.

26 Applications Quadratic Optimization Problem with Equality Constraints The Matrix Fractional Function and its Generalization Ky Fan p-k Norms K-means Clustering Best Affine Unbiased Estimator Supervised Representation Learning Multi-task Learning Variational Gram Functions

27 Motivating Examples Recall the matrix fractional function γ(x, V ) := σ ((X, V ) D(0, 0)) { 1 = tr (X T 2 V X ) if rge X rge V, V 0, + else.

28 Motivating Examples Recall the matrix fractional function γ(x, V ) := σ ((X, V ) D(0, 0)) { 1 = tr (X T 2 V X ) if rge X rge V, V 0, + else. We have the following two representations of the nuclear norm: X = min γ(x, V ) + 1 V 2 tr V 1 2 X 2 = min γ(x, V ) + δ (V tr(v ) 1), V

29 Infimal Projections For a closed proper convex function h, define the infimal projection: ϕ(x ) := inf V σ ((X, V ) D(A, B)) + h(v ).

30 Infimal Projections For a closed proper convex function h, define the infimal projection: ϕ(x ) := inf V σ ((X, V ) D(A, B)) + h(v ). Theorem If dom h S n ++(ker A), then ϕ (Y ) = inf {h ( W ) (Y, W ) conv D(A, B)}.

31 Infimal Projections with Indicators When h is an indicator of a closed convex set V, then ϕ V (X ) := inf σ ((X, V ) D(A, B)), V V ϕ V(Y ) = 1 2 σ (YY T {V V V ker A 0} ) + δ (Y AY = B ) = 1σ 2 (YY T ) V S n +(ker A) + δ (Y AY = B )

32 Infimal Projections with Indicators When h is an indicator of a closed convex set V, then ϕ V (X ) := inf σ ((X, V ) D(A, B)), V V ϕ V(Y ) = 1 2 σ (YY T {V V V ker A 0} ) + δ (Y AY = B ) = 1σ 2 (YY T ) V S n +(ker A) + δ (Y AY = B ) Note that when B = 0, both ϕ V and ϕ V are positively homogeneous of degree 2. When A = 0 and B = 0, ϕ V is called a variational Gram functionin Jalali-Xiao-Fazel (2016?).

33 Ky Fan (p,k) norm For p 1, 1 k min{m, n}, the Ky Fan (p,k)-norm of a matrix X R n m is given by ( k ) 1/p X p,k = σ p i, where σ i are the singular values of X sorted in nonincreasing order. i=1

34 Ky Fan (p,k) norm For p 1, 1 k min{m, n}, the Ky Fan (p,k)-norm of a matrix X R n m is given by ( k ) 1/p X p,k = σ p i, where σ i are the singular values of X sorted in nonincreasing order. i=1 The Ky Fan (p, min{m, n})-norm is the Schatten-p norm. The Ky Fan (1, k)-norm is the standard Ky Fan k-norm.

35 Ky Fan (p,k) norm For p 1, 1 k min{m, n}, the Ky Fan (p,k)-norm of a matrix X R n m is given by ( k ) 1/p X p,k = σ p i, where σ i are the singular values of X sorted in nonincreasing order. i=1 The Ky Fan (p, min{m, n})-norm is the Schatten-p norm. The Ky Fan (1, k)-norm is the standard Ky Fan k-norm. Corollary 1 2 X 2 2p = inf γ(x, V ).,min{m,n} p+1 V p,min{m,n} 1

36 Ky Fan (p,k) norm Corollary 1 2 X 2 2p = inf γ(x, V ).,min{m,n} p+1 V p,min{m,n} 1 γ(x, V ) = σ ((X, V ) D(0, 0)), ϕ V (X ) := inf σ ((X, V ) D(A, B)) V V Proof. and γ V (X ) := inf σ ((X, V ) D(0, 0)) V V ( inf γ(x, V ) V p,min{m,n} 1 ) = σ ( 1 2 XX T { V 0 V p,min{m,n} 1 }) = 1 2 XX T p p 1,min{m,n} = 1 2 X 2 2p p+1,min{m,n}.

37 Ky Fan (p,k) norm As a special case when p = 1, Corollary 1 X 2 2 = min tr V 1 γ(x, V ). Lemma Let V to be the set of rank-k orthogonal projection matrices V = { UU T U R n k, U T } U = I k, then 1 X 2 2 2,k = σ ( 1 V XX T ) 2. Proof. A consequence of the following fact [Fillmore-Williams 1971]: conv { UU T U R n k, U T U = I k } = {V S n I V 0, tr V = k }.

38 K-means Clustering [Zha-He-Ding-Gu-Simon 2001] Consider X R n m, the k-means objective is K(X ) := min C,E 1 X 2 EC 2 2, where C R k m represents the k centers, and E is a n k matrix where each row is one of e1 T,..., et k which correspond to the k cluster assignments. The optimal C is given by C = (E T E) 1 E T X. Define P E = E(E T E) 1 E T, then P E K(X ) = min E 1 (I P 2 E)X 2 2 = 1 min 2 = 1 2 X 2 2 σ Pk ( 1 2 XX T ) 1 2 is an orthogonal projection. ( ) tr (I P E )XX T E ( X 2 2 X 2 ) 2,k.

39 Best Affine Unbiased Estimator For a linear regression model y = A T β + ɛ where ɛ N (0, σ 2 V ), and a given matrix B, an affine unbiased estimator of B T β is an estimator of the form ˆθ = X T y + c satisfying Eˆθ = B T β. Best: Var(ˆθ ) Var(ˆθ), ˆθ

40 Best Affine Unbiased Estimator For a linear regression model y = A T β + ɛ where ɛ N (0, σ 2 V ), and a given matrix B, an affine unbiased estimator of B T β is an estimator of the form ˆθ = X T y + c satisfying Eˆθ = B T β. Best: Var(ˆθ ) Var(ˆθ), ˆθ If a solution to v(a, B, V ) := exists and unique, then ˆθ = (X ) T y. 1 min tr X T 2 V X. X :AX =B v(a, B, V ) = σ D(A,B) (0, V ). The optimal solution X satisfies ( ) X M(V ) = W ( ) 0. B

41 Supervised Representation Learning Consider a binary classification problem where we are given the training data: (x 1, y 1 ),..., (x n, y n ) R m { 1, 1}, and test data: x n+1,..., x n+t R m. Representation learning aims to learn a feature mapping Φ : R m H that maps the data points to a feature space where points between the two classes are well separated. Kernel Methods: Instead of specifying the function Φ explicitly, kernel methods consider mapping the data points to a reproducing kernel Hilbert space H so that the kernel matrix K S n+t +, where K ij = Φ(x i ), Φ(x j ), implicitly determines the mapping Φ.

42 Supervised Representation Learning Let K S n+t + be a set of candidate kernels. The best K K can be selected by maximizing its alignment with the kernel specified by the training labels: ϕ 1 V(y) = max 2 K V K 1:n,1:n, yy T, where [ ] V = K B 2 S n 0n n 0 +, A =, and B = 0 0 I (n+t) 1. t t

43 Multi-task Learning In multi-task learning, T sets of labelled training data (x t1, y t1 ),..., (x tn, y tn ) R m R are given, representing T learning tasks. Assumption: A linear feature map h i (x) = u i, x, i = 1,..., m, where U = (u 1,..., u m ) is an m m orthogonal matrix, and the predictor for each task is f t (x) := a t, h(x). The multi-task learning problem is then min T m A,U t=1 i=1 L ( t yti, a t, U T x ti ) + µ A 2 2,1, where A = (a 1,..., a T ), A 2 2,1 is the square of the sum of the 2-norm of the rows of A, and L t is a loss function for each task. Denote W = UA, then the nonconvex problem is equivalent to the following convex problem [Argyriou-Evgeniou-Pontil 2006]: min W,D T t=1 m i=1 L (y ti, w t, x ti ) + 2µγ(W, D) s.t. tr D 1. It is equivalent to min W T t=1 m i=1 L (y ti, w t, x ti ) + µ W 2.

44 Thank you!

45 References I J. V. Burke and T. Hoheisel: Matrix Support Functionals for Inverse Problems, Regularization, and Learning. SIAM Journal on Optimization, 25(2): , 2015.

Y R n m : AY = B. McGill University, 805 Sherbrooke St West, Room1114, Montréal, Québec, Canada H3A 0B9

Y R n m : AY = B. McGill University, 805 Sherbrooke St West, Room1114, Montréal, Québec, Canada H3A 0B9 2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 20 2 22 23 24 25 26 27 28 29 30 3 32 33 34 35 36 37 38 VARIATIONAL PROPERTIES OF MATRIX FUNCTIONS VIA THE GENERALIZED MATRIX-FRACTIONAL FUNCTION JAMES V. BURKE, YUAN GAO,

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

A Spectral Regularization Framework for Multi-Task Structure Learning

A Spectral Regularization Framework for Multi-Task Structure Learning A Spectral Regularization Framework for Multi-Task Structure Learning Massimiliano Pontil Department of Computer Science University College London (Joint work with A. Argyriou, T. Evgeniou, C.A. Micchelli,

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Supplement: Universal Self-Concordant Barrier Functions

Supplement: Universal Self-Concordant Barrier Functions IE 8534 1 Supplement: Universal Self-Concordant Barrier Functions IE 8534 2 Recall that a self-concordant barrier function for K is a barrier function satisfying 3 F (x)[h, h, h] 2( 2 F (x)[h, h]) 3/2,

More information

Feature Learning for Multitask Learning Using l 2,1 Norm Minimization

Feature Learning for Multitask Learning Using l 2,1 Norm Minimization Feature Learning for Multitask Learning Using l, Norm Minimization Priya Venkateshan Abstract We look at solving the task of Multitask Feature Learning by way of feature selection. We find that formulating

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 17 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 29, 2012 Andre Tkacenko

More information

Some tensor decomposition methods for machine learning

Some tensor decomposition methods for machine learning Some tensor decomposition methods for machine learning Massimiliano Pontil Istituto Italiano di Tecnologia and University College London 16 August 2016 1 / 36 Outline Problem and motivation Tucker decomposition

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

15. Conic optimization

15. Conic optimization L. Vandenberghe EE236C (Spring 216) 15. Conic optimization conic linear program examples modeling duality 15-1 Generalized (conic) inequalities Conic inequality: a constraint x K where K is a convex cone

More information

Bayesian Models for Regularization in Optimization

Bayesian Models for Regularization in Optimization Bayesian Models for Regularization in Optimization Aleksandr Aravkin, UBC Bradley Bell, UW Alessandro Chiuso, Padova Michael Friedlander, UBC Gianluigi Pilloneto, Padova Jim Burke, UW MOPTA, Lehigh University,

More information

Semidefinite Programming Basics and Applications

Semidefinite Programming Basics and Applications Semidefinite Programming Basics and Applications Ray Pörn, principal lecturer Åbo Akademi University Novia University of Applied Sciences Content What is semidefinite programming (SDP)? How to represent

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

The proximal mapping

The proximal mapping The proximal mapping http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/37 1 closed function 2 Conjugate function

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Multivariable Calculus

Multivariable Calculus 2 Multivariable Calculus 2.1 Limits and Continuity Problem 2.1.1 (Fa94) Let the function f : R n R n satisfy the following two conditions: (i) f (K ) is compact whenever K is a compact subset of R n. (ii)

More information

Mathematical Methods for Data Analysis

Mathematical Methods for Data Analysis Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data

More information

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS MATHEMATICS OF OPERATIONS RESEARCH Vol. 28, No. 4, November 2003, pp. 677 692 Printed in U.S.A. ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS ALEXANDER SHAPIRO We discuss in this paper a class of nonsmooth

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications Professor M. Chiang Electrical Engineering Department, Princeton University March

More information

Computing regularization paths for learning multiple kernels

Computing regularization paths for learning multiple kernels Computing regularization paths for learning multiple kernels Francis Bach Romain Thibaux Michael Jordan Computer Science, UC Berkeley December, 24 Code available at www.cs.berkeley.edu/~fbach Computing

More information

Optimality Conditions for Nonsmooth Convex Optimization

Optimality Conditions for Nonsmooth Convex Optimization Optimality Conditions for Nonsmooth Convex Optimization Sangkyun Lee Oct 22, 2014 Let us consider a convex function f : R n R, where R is the extended real field, R := R {, + }, which is proper (f never

More information

Convex Optimization in Classification Problems

Convex Optimization in Classification Problems New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1

More information

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Machine Learning. Lecture 6: Support Vector Machine. Feng Li. Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)

More information

A Note on Nonconvex Minimax Theorem with Separable Homogeneous Polynomials

A Note on Nonconvex Minimax Theorem with Separable Homogeneous Polynomials A Note on Nonconvex Minimax Theorem with Separable Homogeneous Polynomials G. Y. Li Communicated by Harold P. Benson Abstract The minimax theorem for a convex-concave bifunction is a fundamental theorem

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the

More information

Making Flippy Floppy

Making Flippy Floppy Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Current

More information

Lecture: Convex Optimization Problems

Lecture: Convex Optimization Problems 1/36 Lecture: Convex Optimization Problems http://bicmr.pku.edu.cn/~wenzw/opt-2015-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/36 optimization

More information

Algorithms for Learning Kernels Based on Centered Alignment

Algorithms for Learning Kernels Based on Centered Alignment Journal of Machine Learning Research 3 (2012) XX-XX Submitted 01/11; Published 03/12 Algorithms for Learning Kernels Based on Centered Alignment Corinna Cortes Google Research 76 Ninth Avenue New York,

More information

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given. HW1 solutions Exercise 1 (Some sets of probability distributions.) Let x be a real-valued random variable with Prob(x = a i ) = p i, i = 1,..., n, where a 1 < a 2 < < a n. Of course p R n lies in the standard

More information

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1). Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes

More information

Chapter 4 Euclid Space

Chapter 4 Euclid Space Chapter 4 Euclid Space Inner Product Spaces Definition.. Let V be a real vector space over IR. A real inner product on V is a real valued function on V V, denoted by (, ), which satisfies () (x, y) = (y,

More information

Lecture: Duality of LP, SOCP and SDP

Lecture: Duality of LP, SOCP and SDP 1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

More information

NORMS ON SPACE OF MATRICES

NORMS ON SPACE OF MATRICES NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system

More information

Making Flippy Floppy

Making Flippy Floppy Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Vietnam

More information

L 2,1 Norm and its Applications

L 2,1 Norm and its Applications L 2, Norm and its Applications Yale Chang Introduction According to the structure of the constraints, the sparsity can be obtained from three types of regularizers for different purposes.. Flat Sparsity.

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Exercises. Exercises. Basic terminology and optimality conditions. 4.2 Consider the optimization problem

Exercises. Exercises. Basic terminology and optimality conditions. 4.2 Consider the optimization problem Exercises Basic terminology and optimality conditions 4.1 Consider the optimization problem f 0(x 1, x 2) 2x 1 + x 2 1 x 1 + 3x 2 1 x 1 0, x 2 0. Make a sketch of the feasible set. For each of the following

More information

Lecture 3 January 28

Lecture 3 January 28 EECS 28B / STAT 24B: Advanced Topics in Statistical LearningSpring 2009 Lecture 3 January 28 Lecturer: Pradeep Ravikumar Scribe: Timothy J. Wheeler Note: These lecture notes are still rough, and have only

More information

Convex Multi-Task Feature Learning

Convex Multi-Task Feature Learning Convex Multi-Task Feature Learning Andreas Argyriou 1, Theodoros Evgeniou 2, and Massimiliano Pontil 1 1 Department of Computer Science University College London Gower Street London WC1E 6BT UK a.argyriou@cs.ucl.ac.uk

More information

Mathematics 530. Practice Problems. n + 1 }

Mathematics 530. Practice Problems. n + 1 } Department of Mathematical Sciences University of Delaware Prof. T. Angell October 19, 2015 Mathematics 530 Practice Problems 1. Recall that an indifference relation on a partially ordered set is defined

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Lecture 20: November 1st

Lecture 20: November 1st 10-725: Optimization Fall 2012 Lecture 20: November 1st Lecturer: Geoff Gordon Scribes: Xiaolong Shen, Alex Beutel Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not

More information

Convex Functions. Pontus Giselsson

Convex Functions. Pontus Giselsson Convex Functions Pontus Giselsson 1 Today s lecture lower semicontinuity, closure, convex hull convexity preserving operations precomposition with affine mapping infimal convolution image function supremum

More information

4. Convex optimization problems

4. Convex optimization problems Convex Optimization Boyd & Vandenberghe 4. Convex optimization problems optimization problem in standard form convex optimization problems quasiconvex optimization linear optimization quadratic optimization

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

More information

Lecture 1: January 12

Lecture 1: January 12 10-725/36-725: Convex Optimization Fall 2015 Lecturer: Ryan Tibshirani Lecture 1: January 12 Scribes: Seo-Jin Bang, Prabhat KC, Josue Orellana 1.1 Review We begin by going through some examples and key

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

Lecture 1: Background on Convex Analysis

Lecture 1: Background on Convex Analysis Lecture 1: Background on Convex Analysis John Duchi PCMI 2016 Outline I Convex sets 1.1 Definitions and examples 2.2 Basic properties 3.3 Projections onto convex sets 4.4 Separating and supporting hyperplanes

More information

Gaps in Support Vector Optimization

Gaps in Support Vector Optimization Gaps in Support Vector Optimization Nikolas List 1 (student author), Don Hush 2, Clint Scovel 2, Ingo Steinwart 2 1 Lehrstuhl Mathematik und Informatik, Ruhr-University Bochum, Germany nlist@lmi.rub.de

More information

Lecture 7: Positive Semidefinite Matrices

Lecture 7: Positive Semidefinite Matrices Lecture 7: Positive Semidefinite Matrices Rajat Mittal IIT Kanpur The main aim of this lecture note is to prepare your background for semidefinite programming. We have already seen some linear algebra.

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Kernel-Based Contrast Functions for Sufficient Dimension Reduction Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach

More information

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,

More information

Constrained optimization

Constrained optimization Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Variational Functions of Gram Matrices: Convex Analysis and Applications

Variational Functions of Gram Matrices: Convex Analysis and Applications Variational Functions of Gram Matrices: Convex Analysis and Applications Maryam Fazel University of Washington Joint work with: Amin Jalali (UW), Lin Xiao (Microsoft Research) IMA Workshop on Resource

More information

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of

More information

Constraint qualifications for convex inequality systems with applications in constrained optimization

Constraint qualifications for convex inequality systems with applications in constrained optimization Constraint qualifications for convex inequality systems with applications in constrained optimization Chong Li, K. F. Ng and T. K. Pong Abstract. For an inequality system defined by an infinite family

More information

Characterizing Robust Solution Sets of Convex Programs under Data Uncertainty

Characterizing Robust Solution Sets of Convex Programs under Data Uncertainty Characterizing Robust Solution Sets of Convex Programs under Data Uncertainty V. Jeyakumar, G. M. Lee and G. Li Communicated by Sándor Zoltán Németh Abstract This paper deals with convex optimization problems

More information

Convex Functions and Optimization

Convex Functions and Optimization Chapter 5 Convex Functions and Optimization 5.1 Convex Functions Our next topic is that of convex functions. Again, we will concentrate on the context of a map f : R n R although the situation can be generalized

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Kernel Logistic Regression and the Import Vector Machine

Kernel Logistic Regression and the Import Vector Machine Kernel Logistic Regression and the Import Vector Machine Ji Zhu and Trevor Hastie Journal of Computational and Graphical Statistics, 2005 Presented by Mingtao Ding Duke University December 8, 2011 Mingtao

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational

More information

Lecture Notes on Support Vector Machine

Lecture Notes on Support Vector Machine Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as Chapter 8 Geometric problems 8.1 Projection on a set The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as dist(x 0,C) = inf{ x 0 x x C}. The infimum here is always achieved.

More information

Algebraic Statistics progress report

Algebraic Statistics progress report Algebraic Statistics progress report Joe Neeman December 11, 2008 1 A model for biochemical reaction networks We consider a model introduced by Craciun, Pantea and Rempala [2] for identifying biochemical

More information

5. Duality. Lagrangian

5. Duality. Lagrangian 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b.

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b. Lab 1 Conjugate-Gradient Lab Objective: Learn about the Conjugate-Gradient Algorithm and its Uses Descent Algorithms and the Conjugate-Gradient Method There are many possibilities for solving a linear

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

The local equicontinuity of a maximal monotone operator

The local equicontinuity of a maximal monotone operator arxiv:1410.3328v2 [math.fa] 3 Nov 2014 The local equicontinuity of a maximal monotone operator M.D. Voisei Abstract The local equicontinuity of an operator T : X X with proper Fitzpatrick function ϕ T

More information

Hilbert Schmidt Independence Criterion

Hilbert Schmidt Independence Criterion Hilbert Schmidt Independence Criterion Thanks to Arthur Gretton, Le Song, Bernhard Schölkopf, Olivier Bousquet Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au

More information

On the computation of Hermite-Humbert constants for real quadratic number fields

On the computation of Hermite-Humbert constants for real quadratic number fields Journal de Théorie des Nombres de Bordeaux 00 XXXX 000 000 On the computation of Hermite-Humbert constants for real quadratic number fields par Marcus WAGNER et Michael E POHST Abstract We present algorithms

More information

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions

More information

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines vs for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines Ding Ma Michael Saunders Working paper, January 5 Introduction In machine learning,

More information

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Least squares regularized or constrained by L0: relationship between their global minimizers. Mila Nikolova

Least squares regularized or constrained by L0: relationship between their global minimizers. Mila Nikolova Least squares regularized or constrained by L0: relationship between their global minimizers Mila Nikolova CMLA, CNRS, ENS Cachan, Université Paris-Saclay, France nikolova@cmla.ens-cachan.fr SIAM Minisymposium

More information

1-Bit Matrix Completion

1-Bit Matrix Completion 1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix Definition: Let L : V 1 V 2 be a linear operator. The null space N (L) of L is the subspace of V 1 defined by N (L) = {x

More information

Lecture 10: Support Vector Machine and Large Margin Classifier

Lecture 10: Support Vector Machine and Large Margin Classifier Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Multi-Task Learning and Matrix Regularization

Multi-Task Learning and Matrix Regularization Multi-Task Learning and Matrix Regularization Andreas Argyriou Department of Computer Science University College London Collaborators T. Evgeniou (INSEAD) R. Hauser (University of Oxford) M. Herbster (University

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

Each new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!

Each new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up! Feature Mapping Consider the following mapping φ for an example x = {x 1,...,x D } φ : x {x1,x 2 2,...,x 2 D,,x 2 1 x 2,x 1 x 2,...,x 1 x D,...,x D 1 x D } It s an example of a quadratic mapping Each new

More information

Lecture 2: Convex functions

Lecture 2: Convex functions Lecture 2: Convex functions f : R n R is convex if dom f is convex and for all x, y dom f, θ [0, 1] f is concave if f is convex f(θx + (1 θ)y) θf(x) + (1 θ)f(y) x x convex concave neither x examples (on

More information