Matrix Support Functional and its Applications
|
|
- Helena Wiggins
- 5 years ago
- Views:
Transcription
1 Matrix Support Functional and its Applications James V Burke Mathematics, University of Washington Joint work with Yuan Gao (UW) and Tim Hoheisel (McGill), CORS, Banff 2016 June 1, 2016
2 Connections What do the following topics have in common? Quadratic Optimization Problem with Equality Constraints The Matrix Fractional Function and its Generalization Ky Fan p-k Norms K-means Clustering Best Affine Unbiased Estimator Supervised Representation Learning Multi-task Learning Variational Gram Functions
3 Connections What do the following topics have in common? Quadratic Optimization Problem with Equality Constraints The Matrix Fractional Function and its Generalization Ky Fan p-k Norms K-means Clustering Best Affine Unbiased Estimator Supervised Representation Learning Multi-task Learning Variational Gram Functions Answer: They can all be represented using a matrix support function that is smooth on the interior of its domain.
4 A Matrix Support Functional (B-Hoheisel (2015)) Given A R p n and B R p m set D(A, B) := {( ) Y, 1YY T 2 R n m S n } Y R n m : AY = B
5 A Matrix Support Functional (B-Hoheisel (2015)) Given A R p n and B R p m set D(A, B) := {( ) Y, 1YY T 2 R n m S n } Y R n m : AY = B We consider the support functional for D(A, B). σ ((X, V ) D(A, B)) = sup (X, V ), (Y, 1YY T 2 ) AY =B
6 A Matrix Support Functional (B-Hoheisel (2015)) Given A R p n and B R p m set D(A, B) := {( ) Y, 1YY T 2 R n m S n } Y R n m : AY = B We consider the support functional for D(A, B). σ ((X, V ) D(A, B)) = sup (X, V ), (Y, 1YY T 2 ) AY =B = inf AY =B 1 tr (Y T 2 VY ) X, Y
7 Support Functions σ S (x) := σ (x S ) := sup x, y y S
8 Support Functions σ S (x) := σ (x S ) := sup x, y y S S = x {y x, y σ (x S )}
9 Support Functions σ S (x) := σ (x S ) := sup x, y y S S = x {y x, y σ (x S )} σ S = σ cl S = σ conv S = σ conv S
10 Support Functions σ S (x) := σ (x S ) := sup x, y y S S = x {y x, y σ (x S )} σ S = σ cl S = σ conv S = σ conv S When S is a closed convex set, then σ S (x) = arg max y S x, y.
11 Epigraph epi f := {(x, µ) f (x) µ} f (y) := σ ((y, 1) epi f )
12 A Representation for σ ((X, V ) D(A, B)) Let A R p n and B R p m such that rge B rge A. Then { ( (X 1 2 σ ((X, V ) D(A, B)) = tr ) T B M(V ) ( ) ) X B if rge ( X B) rge M(V ), V ker A 0, + else. where ( V A T M(V ) := A 0 ).
13 A Representation for σ ((X, V ) D(A, B)) Let A R p n and B R p m such that rge B rge A. Then { ( (X 1 2 σ ((X, V ) D(A, B)) = tr ) T B M(V ) ( ) ) X B if rge ( X B) rge M(V ), V ker A 0, + else. where In particular, ( V A T M(V ) := A 0 ). dom σ ( D(A, B)) = dom σ ( D(A, B)) { ( ) } X = (X, V ) R n m S n rge rge M(V ), V ker A 0, B with int (dom σ ( D(A, B))) = {(X, V ) R n m S n V ker A 0}.
14 A Representation for σ ((X, V ) D(A, B)) Let A R p n and B R p m such that rge B rge A. Then { ( (X 1 2 σ ((X, V ) D(A, B)) = tr ) T B M(V ) ( ) ) X B if rge ( X B) rge M(V ), V ker A 0, + else. where In particular, ( V A T M(V ) := A 0 ). dom σ ( D(A, B)) = dom σ ( D(A, B)) { ( ) } X = (X, V ) R n m S n rge rge M(V ), V ker A 0, B with int (dom σ ( D(A, B))) = {(X, V ) R n m S n V ker A 0}. The inverse M(V ) 1 exists when V ker A 0 and A is surjective.
15 Relationship to Equality Constrained QP Consider a equality constrained QP: ν(x, V ) := inf { 1 u R n 2 ut Vu x T u Au = b }.
16 Relationship to Equality Constrained QP Consider a equality constrained QP: ν(x, V ) := inf { 1 u R n 2 ut Vu x T u Au = b }. The Lagrangian is L(u, λ) = 1 2 ut Vu x T u + λ T (Au b). Optimality conditions are Vu + A T λ x = 0 Au = b
17 Relationship to Equality Constrained QP Consider a equality constrained QP: ν(x, V ) := inf { 1 u R n 2 ut Vu x T u Au = b }. The Lagrangian is L(u, λ) = 1 2 ut Vu x T u + λ T (Au b). Optimality conditions are Vu + A T λ x = 0 Au = b This is equivalent to ( V A T A 0 ) ( ) u = λ ( ) x. b
18 Relationship to Equality Constrained QP Consider a equality constrained QP: ν(x, V ) := inf { 1 u R n 2 ut Vu x T u Au = b }. The Lagrangian is L(u, λ) = 1 2 ut Vu x T u + λ T (Au b). Optimality conditions are Vu + A T λ x = 0 Au = b This is equivalent to M(V ) = ( V A T A 0 ) ( ) u = λ ( ) x. b
19 Relationship to Equality Constrained QP Consider a equality constrained QP: ν(x, V ) := inf { 1 u R n 2 ut Vu x T u Au = b }. The Lagrangian is L(u, λ) = 1 2 ut Vu x T u + λ T (Au b). Optimality conditions are This is equivalent to Hence M(V ) = Vu + A T λ x = 0 Au ( V A T A 0 = b ) ( ) u = λ ( ) x. b ν(x, V ) = σ ((x, V ) D(A, b)).
20 Maximum Likelihood Estimation L(µ, Σ; Y ) := (2π) mn/2 Σ N/2 N i=1 exp((y i µ) T Σ 1 (y i µ)) Up to a constant, the negative log-likelihood is ) ln L(µ, Σ; Y ) = 1 ln det Σ + 1tr 2 2 ((Y M) T Σ 1 (Y M) = σ ((Y M), Σ) D(0, 0)) 1 2 ( ln det Σ).
21 Maximum Likelihood Estimation L(µ, Σ; Y ) := (2π) mn/2 Σ N/2 N i=1 exp((y i µ) T Σ 1 (y i µ)) Up to a constant, the negative log-likelihood is ) ln L(µ, Σ; Y ) = 1 ln det Σ + 1tr 2 2 ((Y M) T Σ 1 (Y M) = σ ((Y M), Σ) D(0, 0)) 1 2 ( ln det Σ). The Matrix Fractional Function:Take A = 0 and B = 0, and set γ(x, V ) := σ ((x, V ) D(0, 0)) = { 1 2 V X if rge X rge V, V 0, + else.
22 conv D(A, B) Recall σ C (x) = {y conv C x, y = σ C (x)}.
23 conv D(A, B) Recall σ C (x) = {y conv C x, y = σ C (x)}. For σ ((X, V ) D(A, B)) we need conv (D(A, B)).
24 conv D(A, B) Recall σ C (x) = {y conv C x, y = σ C (x)}. For σ ((X, V ) D(A, B)) we need conv (D(A, B)). Set { } S n +(ker A) := W S n u T Wu 0 u ker A = {W ker A 0}. Then S n +(ker A) is a closed convex cone whose polar is given by S n +(ker A) = {W S n W = PWP 0}, where P is the orthogonal projection onto ker A.
25 conv D(A, B) Recall σ C (x) = {y conv C x, y = σ C (x)}. For σ ((X, V ) D(A, B)) we need conv (D(A, B)). Set { } S n +(ker A) := W S n u T Wu 0 u ker A = {W ker A 0}. Then S n +(ker A) is a closed convex cone whose polar is given by S n +(ker A) = {W S n W = PWP 0}, where P is the orthogonal projection onto ker A. For D(A, B) := {( Y, 1 2 YY T ) R n m S n Y R n m : AY = B }, := { (Y, W ) R n m S n conv D(A, B) = Ω(A, B) AY = B and 1 2 YY T + W S n +(ker A) }.
26 Applications Quadratic Optimization Problem with Equality Constraints The Matrix Fractional Function and its Generalization Ky Fan p-k Norms K-means Clustering Best Affine Unbiased Estimator Supervised Representation Learning Multi-task Learning Variational Gram Functions
27 Motivating Examples Recall the matrix fractional function γ(x, V ) := σ ((X, V ) D(0, 0)) { 1 = tr (X T 2 V X ) if rge X rge V, V 0, + else.
28 Motivating Examples Recall the matrix fractional function γ(x, V ) := σ ((X, V ) D(0, 0)) { 1 = tr (X T 2 V X ) if rge X rge V, V 0, + else. We have the following two representations of the nuclear norm: X = min γ(x, V ) + 1 V 2 tr V 1 2 X 2 = min γ(x, V ) + δ (V tr(v ) 1), V
29 Infimal Projections For a closed proper convex function h, define the infimal projection: ϕ(x ) := inf V σ ((X, V ) D(A, B)) + h(v ).
30 Infimal Projections For a closed proper convex function h, define the infimal projection: ϕ(x ) := inf V σ ((X, V ) D(A, B)) + h(v ). Theorem If dom h S n ++(ker A), then ϕ (Y ) = inf {h ( W ) (Y, W ) conv D(A, B)}.
31 Infimal Projections with Indicators When h is an indicator of a closed convex set V, then ϕ V (X ) := inf σ ((X, V ) D(A, B)), V V ϕ V(Y ) = 1 2 σ (YY T {V V V ker A 0} ) + δ (Y AY = B ) = 1σ 2 (YY T ) V S n +(ker A) + δ (Y AY = B )
32 Infimal Projections with Indicators When h is an indicator of a closed convex set V, then ϕ V (X ) := inf σ ((X, V ) D(A, B)), V V ϕ V(Y ) = 1 2 σ (YY T {V V V ker A 0} ) + δ (Y AY = B ) = 1σ 2 (YY T ) V S n +(ker A) + δ (Y AY = B ) Note that when B = 0, both ϕ V and ϕ V are positively homogeneous of degree 2. When A = 0 and B = 0, ϕ V is called a variational Gram functionin Jalali-Xiao-Fazel (2016?).
33 Ky Fan (p,k) norm For p 1, 1 k min{m, n}, the Ky Fan (p,k)-norm of a matrix X R n m is given by ( k ) 1/p X p,k = σ p i, where σ i are the singular values of X sorted in nonincreasing order. i=1
34 Ky Fan (p,k) norm For p 1, 1 k min{m, n}, the Ky Fan (p,k)-norm of a matrix X R n m is given by ( k ) 1/p X p,k = σ p i, where σ i are the singular values of X sorted in nonincreasing order. i=1 The Ky Fan (p, min{m, n})-norm is the Schatten-p norm. The Ky Fan (1, k)-norm is the standard Ky Fan k-norm.
35 Ky Fan (p,k) norm For p 1, 1 k min{m, n}, the Ky Fan (p,k)-norm of a matrix X R n m is given by ( k ) 1/p X p,k = σ p i, where σ i are the singular values of X sorted in nonincreasing order. i=1 The Ky Fan (p, min{m, n})-norm is the Schatten-p norm. The Ky Fan (1, k)-norm is the standard Ky Fan k-norm. Corollary 1 2 X 2 2p = inf γ(x, V ).,min{m,n} p+1 V p,min{m,n} 1
36 Ky Fan (p,k) norm Corollary 1 2 X 2 2p = inf γ(x, V ).,min{m,n} p+1 V p,min{m,n} 1 γ(x, V ) = σ ((X, V ) D(0, 0)), ϕ V (X ) := inf σ ((X, V ) D(A, B)) V V Proof. and γ V (X ) := inf σ ((X, V ) D(0, 0)) V V ( inf γ(x, V ) V p,min{m,n} 1 ) = σ ( 1 2 XX T { V 0 V p,min{m,n} 1 }) = 1 2 XX T p p 1,min{m,n} = 1 2 X 2 2p p+1,min{m,n}.
37 Ky Fan (p,k) norm As a special case when p = 1, Corollary 1 X 2 2 = min tr V 1 γ(x, V ). Lemma Let V to be the set of rank-k orthogonal projection matrices V = { UU T U R n k, U T } U = I k, then 1 X 2 2 2,k = σ ( 1 V XX T ) 2. Proof. A consequence of the following fact [Fillmore-Williams 1971]: conv { UU T U R n k, U T U = I k } = {V S n I V 0, tr V = k }.
38 K-means Clustering [Zha-He-Ding-Gu-Simon 2001] Consider X R n m, the k-means objective is K(X ) := min C,E 1 X 2 EC 2 2, where C R k m represents the k centers, and E is a n k matrix where each row is one of e1 T,..., et k which correspond to the k cluster assignments. The optimal C is given by C = (E T E) 1 E T X. Define P E = E(E T E) 1 E T, then P E K(X ) = min E 1 (I P 2 E)X 2 2 = 1 min 2 = 1 2 X 2 2 σ Pk ( 1 2 XX T ) 1 2 is an orthogonal projection. ( ) tr (I P E )XX T E ( X 2 2 X 2 ) 2,k.
39 Best Affine Unbiased Estimator For a linear regression model y = A T β + ɛ where ɛ N (0, σ 2 V ), and a given matrix B, an affine unbiased estimator of B T β is an estimator of the form ˆθ = X T y + c satisfying Eˆθ = B T β. Best: Var(ˆθ ) Var(ˆθ), ˆθ
40 Best Affine Unbiased Estimator For a linear regression model y = A T β + ɛ where ɛ N (0, σ 2 V ), and a given matrix B, an affine unbiased estimator of B T β is an estimator of the form ˆθ = X T y + c satisfying Eˆθ = B T β. Best: Var(ˆθ ) Var(ˆθ), ˆθ If a solution to v(a, B, V ) := exists and unique, then ˆθ = (X ) T y. 1 min tr X T 2 V X. X :AX =B v(a, B, V ) = σ D(A,B) (0, V ). The optimal solution X satisfies ( ) X M(V ) = W ( ) 0. B
41 Supervised Representation Learning Consider a binary classification problem where we are given the training data: (x 1, y 1 ),..., (x n, y n ) R m { 1, 1}, and test data: x n+1,..., x n+t R m. Representation learning aims to learn a feature mapping Φ : R m H that maps the data points to a feature space where points between the two classes are well separated. Kernel Methods: Instead of specifying the function Φ explicitly, kernel methods consider mapping the data points to a reproducing kernel Hilbert space H so that the kernel matrix K S n+t +, where K ij = Φ(x i ), Φ(x j ), implicitly determines the mapping Φ.
42 Supervised Representation Learning Let K S n+t + be a set of candidate kernels. The best K K can be selected by maximizing its alignment with the kernel specified by the training labels: ϕ 1 V(y) = max 2 K V K 1:n,1:n, yy T, where [ ] V = K B 2 S n 0n n 0 +, A =, and B = 0 0 I (n+t) 1. t t
43 Multi-task Learning In multi-task learning, T sets of labelled training data (x t1, y t1 ),..., (x tn, y tn ) R m R are given, representing T learning tasks. Assumption: A linear feature map h i (x) = u i, x, i = 1,..., m, where U = (u 1,..., u m ) is an m m orthogonal matrix, and the predictor for each task is f t (x) := a t, h(x). The multi-task learning problem is then min T m A,U t=1 i=1 L ( t yti, a t, U T x ti ) + µ A 2 2,1, where A = (a 1,..., a T ), A 2 2,1 is the square of the sum of the 2-norm of the rows of A, and L t is a loss function for each task. Denote W = UA, then the nonconvex problem is equivalent to the following convex problem [Argyriou-Evgeniou-Pontil 2006]: min W,D T t=1 m i=1 L (y ti, w t, x ti ) + 2µγ(W, D) s.t. tr D 1. It is equivalent to min W T t=1 m i=1 L (y ti, w t, x ti ) + µ W 2.
44 Thank you!
45 References I J. V. Burke and T. Hoheisel: Matrix Support Functionals for Inverse Problems, Regularization, and Learning. SIAM Journal on Optimization, 25(2): , 2015.
Y R n m : AY = B. McGill University, 805 Sherbrooke St West, Room1114, Montréal, Québec, Canada H3A 0B9
2 3 4 5 6 7 8 9 0 2 3 4 5 6 7 8 9 20 2 22 23 24 25 26 27 28 29 30 3 32 33 34 35 36 37 38 VARIATIONAL PROPERTIES OF MATRIX FUNCTIONS VIA THE GENERALIZED MATRIX-FRACTIONAL FUNCTION JAMES V. BURKE, YUAN GAO,
More information10-725/36-725: Convex Optimization Prerequisite Topics
10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the
More informationA Spectral Regularization Framework for Multi-Task Structure Learning
A Spectral Regularization Framework for Multi-Task Structure Learning Massimiliano Pontil Department of Computer Science University College London (Joint work with A. Argyriou, T. Evgeniou, C.A. Micchelli,
More informationOptimization and Optimal Control in Banach Spaces
Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,
More informationSupplement: Universal Self-Concordant Barrier Functions
IE 8534 1 Supplement: Universal Self-Concordant Barrier Functions IE 8534 2 Recall that a self-concordant barrier function for K is a barrier function satisfying 3 F (x)[h, h, h] 2( 2 F (x)[h, h]) 3/2,
More informationFeature Learning for Multitask Learning Using l 2,1 Norm Minimization
Feature Learning for Multitask Learning Using l, Norm Minimization Priya Venkateshan Abstract We look at solving the task of Multitask Feature Learning by way of feature selection. We find that formulating
More informationEE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17
EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 17 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 29, 2012 Andre Tkacenko
More informationSome tensor decomposition methods for machine learning
Some tensor decomposition methods for machine learning Massimiliano Pontil Istituto Italiano di Tecnologia and University College London 16 August 2016 1 / 36 Outline Problem and motivation Tucker decomposition
More informationSupport Vector Machines
Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More information15. Conic optimization
L. Vandenberghe EE236C (Spring 216) 15. Conic optimization conic linear program examples modeling duality 15-1 Generalized (conic) inequalities Conic inequality: a constraint x K where K is a convex cone
More informationBayesian Models for Regularization in Optimization
Bayesian Models for Regularization in Optimization Aleksandr Aravkin, UBC Bradley Bell, UW Alessandro Chiuso, Padova Michael Friedlander, UBC Gianluigi Pilloneto, Padova Jim Burke, UW MOPTA, Lehigh University,
More informationSemidefinite Programming Basics and Applications
Semidefinite Programming Basics and Applications Ray Pörn, principal lecturer Åbo Akademi University Novia University of Applied Sciences Content What is semidefinite programming (SDP)? How to represent
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu
More informationThe proximal mapping
The proximal mapping http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/37 1 closed function 2 Conjugate function
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationMultivariable Calculus
2 Multivariable Calculus 2.1 Limits and Continuity Problem 2.1.1 (Fa94) Let the function f : R n R n satisfy the following two conditions: (i) f (K ) is compact whenever K is a compact subset of R n. (ii)
More informationMathematical Methods for Data Analysis
Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data
More informationON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS
MATHEMATICS OF OPERATIONS RESEARCH Vol. 28, No. 4, November 2003, pp. 677 692 Printed in U.S.A. ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS ALEXANDER SHAPIRO We discuss in this paper a class of nonsmooth
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications
ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications Professor M. Chiang Electrical Engineering Department, Princeton University March
More informationComputing regularization paths for learning multiple kernels
Computing regularization paths for learning multiple kernels Francis Bach Romain Thibaux Michael Jordan Computer Science, UC Berkeley December, 24 Code available at www.cs.berkeley.edu/~fbach Computing
More informationOptimality Conditions for Nonsmooth Convex Optimization
Optimality Conditions for Nonsmooth Convex Optimization Sangkyun Lee Oct 22, 2014 Let us consider a convex function f : R n R, where R is the extended real field, R := R {, + }, which is proper (f never
More informationConvex Optimization in Classification Problems
New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1
More informationMachine Learning. Lecture 6: Support Vector Machine. Feng Li.
Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)
More informationA Note on Nonconvex Minimax Theorem with Separable Homogeneous Polynomials
A Note on Nonconvex Minimax Theorem with Separable Homogeneous Polynomials G. Y. Li Communicated by Harold P. Benson Abstract The minimax theorem for a convex-concave bifunction is a fundamental theorem
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationOutline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space
to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the
More informationMaking Flippy Floppy
Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Current
More informationLecture: Convex Optimization Problems
1/36 Lecture: Convex Optimization Problems http://bicmr.pku.edu.cn/~wenzw/opt-2015-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/36 optimization
More informationAlgorithms for Learning Kernels Based on Centered Alignment
Journal of Machine Learning Research 3 (2012) XX-XX Submitted 01/11; Published 03/12 Algorithms for Learning Kernels Based on Centered Alignment Corinna Cortes Google Research 76 Ninth Avenue New York,
More informationHW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.
HW1 solutions Exercise 1 (Some sets of probability distributions.) Let x be a real-valued random variable with Prob(x = a i ) = p i, i = 1,..., n, where a 1 < a 2 < < a n. Of course p R n lies in the standard
More informationClassification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).
Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes
More informationChapter 4 Euclid Space
Chapter 4 Euclid Space Inner Product Spaces Definition.. Let V be a real vector space over IR. A real inner product on V is a real valued function on V V, denoted by (, ), which satisfies () (x, y) = (y,
More informationLecture: Duality of LP, SOCP and SDP
1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:
More informationNORMS ON SPACE OF MATRICES
NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system
More informationMaking Flippy Floppy
Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Vietnam
More informationL 2,1 Norm and its Applications
L 2, Norm and its Applications Yale Chang Introduction According to the structure of the constraints, the sparsity can be obtained from three types of regularizers for different purposes.. Flat Sparsity.
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationExercises. Exercises. Basic terminology and optimality conditions. 4.2 Consider the optimization problem
Exercises Basic terminology and optimality conditions 4.1 Consider the optimization problem f 0(x 1, x 2) 2x 1 + x 2 1 x 1 + 3x 2 1 x 1 0, x 2 0. Make a sketch of the feasible set. For each of the following
More informationLecture 3 January 28
EECS 28B / STAT 24B: Advanced Topics in Statistical LearningSpring 2009 Lecture 3 January 28 Lecturer: Pradeep Ravikumar Scribe: Timothy J. Wheeler Note: These lecture notes are still rough, and have only
More informationConvex Multi-Task Feature Learning
Convex Multi-Task Feature Learning Andreas Argyriou 1, Theodoros Evgeniou 2, and Massimiliano Pontil 1 1 Department of Computer Science University College London Gower Street London WC1E 6BT UK a.argyriou@cs.ucl.ac.uk
More informationMathematics 530. Practice Problems. n + 1 }
Department of Mathematical Sciences University of Delaware Prof. T. Angell October 19, 2015 Mathematics 530 Practice Problems 1. Recall that an indifference relation on a partially ordered set is defined
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationLecture 20: November 1st
10-725: Optimization Fall 2012 Lecture 20: November 1st Lecturer: Geoff Gordon Scribes: Xiaolong Shen, Alex Beutel Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not
More informationConvex Functions. Pontus Giselsson
Convex Functions Pontus Giselsson 1 Today s lecture lower semicontinuity, closure, convex hull convexity preserving operations precomposition with affine mapping infimal convolution image function supremum
More information4. Convex optimization problems
Convex Optimization Boyd & Vandenberghe 4. Convex optimization problems optimization problem in standard form convex optimization problems quasiconvex optimization linear optimization quadratic optimization
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationConvex Optimization M2
Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization
More informationLecture 1: January 12
10-725/36-725: Convex Optimization Fall 2015 Lecturer: Ryan Tibshirani Lecture 1: January 12 Scribes: Seo-Jin Bang, Prabhat KC, Josue Orellana 1.1 Review We begin by going through some examples and key
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationConvex Optimization M2
Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial
More informationLecture 1: Background on Convex Analysis
Lecture 1: Background on Convex Analysis John Duchi PCMI 2016 Outline I Convex sets 1.1 Definitions and examples 2.2 Basic properties 3.3 Projections onto convex sets 4.4 Separating and supporting hyperplanes
More informationGaps in Support Vector Optimization
Gaps in Support Vector Optimization Nikolas List 1 (student author), Don Hush 2, Clint Scovel 2, Ingo Steinwart 2 1 Lehrstuhl Mathematik und Informatik, Ruhr-University Bochum, Germany nlist@lmi.rub.de
More informationLecture 7: Positive Semidefinite Matrices
Lecture 7: Positive Semidefinite Matrices Rajat Mittal IIT Kanpur The main aim of this lecture note is to prepare your background for semidefinite programming. We have already seen some linear algebra.
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationOptimality Conditions for Constrained Optimization
72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)
More informationKernel-Based Contrast Functions for Sufficient Dimension Reduction
Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach
More informationA GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong
A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,
More informationConstrained optimization
Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationVariational Functions of Gram Matrices: Convex Analysis and Applications
Variational Functions of Gram Matrices: Convex Analysis and Applications Maryam Fazel University of Washington Joint work with: Amin Jalali (UW), Lin Xiao (Microsoft Research) IMA Workshop on Resource
More informationBeyond the Point Cloud: From Transductive to Semi-Supervised Learning
Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of
More informationConstraint qualifications for convex inequality systems with applications in constrained optimization
Constraint qualifications for convex inequality systems with applications in constrained optimization Chong Li, K. F. Ng and T. K. Pong Abstract. For an inequality system defined by an infinite family
More informationCharacterizing Robust Solution Sets of Convex Programs under Data Uncertainty
Characterizing Robust Solution Sets of Convex Programs under Data Uncertainty V. Jeyakumar, G. M. Lee and G. Li Communicated by Sándor Zoltán Németh Abstract This paper deals with convex optimization problems
More informationConvex Functions and Optimization
Chapter 5 Convex Functions and Optimization 5.1 Convex Functions Our next topic is that of convex functions. Again, we will concentrate on the context of a map f : R n R although the situation can be generalized
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationKernel Logistic Regression and the Import Vector Machine
Kernel Logistic Regression and the Import Vector Machine Ji Zhu and Trevor Hastie Journal of Computational and Graphical Statistics, 2005 Presented by Mingtao Ding Duke University December 8, 2011 Mingtao
More informationThe University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.
The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational
More informationLecture Notes on Support Vector Machine
Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is
More informationConvex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014
Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,
More informationGeometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as
Chapter 8 Geometric problems 8.1 Projection on a set The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as dist(x 0,C) = inf{ x 0 x x C}. The infimum here is always achieved.
More informationAlgebraic Statistics progress report
Algebraic Statistics progress report Joe Neeman December 11, 2008 1 A model for biochemical reaction networks We consider a model introduced by Craciun, Pantea and Rempala [2] for identifying biochemical
More information5. Duality. Lagrangian
5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationHigh Dimensional Inverse Covariate Matrix Estimation via Linear Programming
High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω
More informationConjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b.
Lab 1 Conjugate-Gradient Lab Objective: Learn about the Conjugate-Gradient Algorithm and its Uses Descent Algorithms and the Conjugate-Gradient Method There are many possibilities for solving a linear
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationThe local equicontinuity of a maximal monotone operator
arxiv:1410.3328v2 [math.fa] 3 Nov 2014 The local equicontinuity of a maximal monotone operator M.D. Voisei Abstract The local equicontinuity of an operator T : X X with proper Fitzpatrick function ϕ T
More informationHilbert Schmidt Independence Criterion
Hilbert Schmidt Independence Criterion Thanks to Arthur Gretton, Le Song, Bernhard Schölkopf, Olivier Bousquet Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au
More informationOn the computation of Hermite-Humbert constants for real quadratic number fields
Journal de Théorie des Nombres de Bordeaux 00 XXXX 000 000 On the computation of Hermite-Humbert constants for real quadratic number fields par Marcus WAGNER et Michael E POHST Abstract We present algorithms
More informationDiffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)
Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions
More informationSMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines
vs for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines Ding Ma Michael Saunders Working paper, January 5 Introduction In machine learning,
More informationSome Properties of the Augmented Lagrangian in Cone Constrained Optimization
MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationLeast squares regularized or constrained by L0: relationship between their global minimizers. Mila Nikolova
Least squares regularized or constrained by L0: relationship between their global minimizers Mila Nikolova CMLA, CNRS, ENS Cachan, Université Paris-Saclay, France nikolova@cmla.ens-cachan.fr SIAM Minisymposium
More information1-Bit Matrix Completion
1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions
More informationMODULE 8 Topics: Null space, range, column space, row space and rank of a matrix
MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix Definition: Let L : V 1 V 2 be a linear operator. The null space N (L) of L is the subspace of V 1 defined by N (L) = {x
More informationLecture 10: Support Vector Machine and Large Margin Classifier
Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu
More informationDS-GA 1002 Lecture notes 10 November 23, Linear models
DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationMulti-Task Learning and Matrix Regularization
Multi-Task Learning and Matrix Regularization Andreas Argyriou Department of Computer Science University College London Collaborators T. Evgeniou (INSEAD) R. Hauser (University of Oxford) M. Herbster (University
More informationGI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil
GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection
More informationEach new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!
Feature Mapping Consider the following mapping φ for an example x = {x 1,...,x D } φ : x {x1,x 2 2,...,x 2 D,,x 2 1 x 2,x 1 x 2,...,x 1 x D,...,x D 1 x D } It s an example of a quadratic mapping Each new
More informationLecture 2: Convex functions
Lecture 2: Convex functions f : R n R is convex if dom f is convex and for all x, y dom f, θ [0, 1] f is concave if f is convex f(θx + (1 θ)y) θf(x) + (1 θ)f(y) x x convex concave neither x examples (on
More information