Maximum Margin Matrix Factorization

Size: px
Start display at page:

Download "Maximum Margin Matrix Factorization"

Transcription

1 Maximum Margin Matrix Factorization Nati Srebro Toyota Technological Institute Chicago Joint work with Noga Alon Tel-Aviv Yonatan Amit Hebrew U Alex d Aspremont Princeton Michael Fink Hebrew U Tommi Jaakkola MIT Ben Marlin Toronto Jason Rennie MIT Adi Schraibman Hebrew U Shimon Ullman Weizmann

2 Collaborative Prediction Based on partially observed matrix: Predict unobserved entries users movies ? ? 2 5 3? ? ? ? ? 1 3? ? ? Will user i like movie j?

3 Linear Factor Model movies movies u1 +1 v 1 comic value preferences of a specific user + + u2 +2 v 2 u3-1 v 3 dramatic value violence characteristics of the user

4 Linear Factor Model movies u movies V

5 Linear Factor Model users movies Y u U movies V

6 Matrix Factorization Unconstrained: Low Rank Approximation Y U V = X rank k Additive Gaussian noise: minimize Y-UV Fro General additive noise General conditional models Multiplicative noise, Exponential-PCA [Collins+01], Multinomial (plsa [Hofmann01]), etc General loss functions Hinge loss, loss functions appropriate for ratings, etc [Gordon03] Unconstrained U,V, fully observed Y use SVD non-convex, no explicit solution

7 Matrix Factorization Y U V = X rank k Non-Negativity [LeeSeung99] Stochasticity (convexity) [LeeSeung97] [Hofmann01] Sparsity Clustering as an extreme (when rows of U sparse) Overall number of factors still constrained Non-convex optimization problems

8 Outline Maximum Margin Matrix Factorization low max-norm and low trace-norm matrices Unbounded number of factors Convex! Learning MMMF: Semidefinite Programming Generalization error bounds Learning with low-rank (finite factor) matrices MMMF (low max-norm / trace-norm) Representational power of low rank, max-norm and trace-norm matrices

9 Collaborative Prediction with U Matrix Factorization V ? ? ? ? ? +1 5? ? ? ? ? Fit factorizable (lowrank) matrix X=UV to observed entries. minimize Σloss(X ij ;Y ij ) prediction observation Use matrix X to predict unobserved entries. [Sarwar+00] [Azar+01] [Hoffman04] [Marlin+04]

10 Collaborative Prediction with U Matrix Factorization V v12 v11 v10 v9 v8 v7 v6 v5 v4 v3 v2 v When U is fixed, each row is a linear classification problem: rows of U are feature vectors columns of V are linear classifiers Fitting U and V: Learning features that work well across all classification problems.

11 Geometric Interpretation: Co-embedding Points and Separating Hyperplanes U V rows of U (users) columns of V (movies)

12 Geometric Interpretation: Co-embedding Points and Separating Hyperplanes U V rows of U (users) columns of V (movies)

13 Max-Margin Matrix Factorization: Bound norms of U,V instead of their dimensionality low norm U V bound norms uniformly: (max i U i 2 ) (max j V j 2 ) 1 rows of U (users) columns of V (movies) For observed Y ij ±1: Y ij X ij Margin U i,v j

14 Max-Margin Matrix Factorization: Bound norms of U,V instead of their dimensionality low norm U V bound norms uniformly: (max i U i 2 ) (max j V j 2 ) R 2 X max = min (max i U i )(max j V j ) X=UV max-norm bound norms on average: ( i U i 2 ) ( j V j 2 ) R 2 X tr = min U Fro V Fro X=UV U is fixed: each column of V is SVM For observed Y ij ±1: Y ij X ij Margin U i,v j

15 Max-Margin Matrix Factorization: Bound norms of U,V instead of their dimensionality αv 1 bound norms uniformly: (max i U i 2 ) (max j V j 2 ) R 2 1 αv 2 X max = min (max i U i )(max j V j ) X=UV max-norm α U 1 1 α U 2 αu 1 V 1 + (1-α)U 2 V 2 bound norms on average: ( i U i 2 ) ( j V j 2 ) R 2 X tr = min U Fro V Fro X=UV Unlike rank(x) k, these are convex constraints!

16 Max-Margin Matrix Factorization: Convex Combination of Factors unit norm columns S U unit norm rows V bound norms on average: ( i U i 2 ) ( j V j 2 ) 1 X tr = (singular values of X) s i 1 s i u i v i outer product of norm-1 vectors: rank-1 norm-1 matrix

17 Finding Max-Margin Matrix Factorizations minimize ( i U i 2 ) ( j V j 2 ) Y ij X ij 1 X=UV X tr = (singular values of X) = min A,B ½( tr(a) + tr(b) ) U V U A X V X B p.s.d. [Fazel Hindi Boyd 2001] minimize ½(tr(A)+tr(B)) Y ij X ij 1 A X X B p.s.d.

18 Finding Max-Margin Matrix Factorizations X dense primal Y sparse observations (constraints) Q sparse dual minimize tr(a)+tr(b) Y ij X ij 1 A X X B p.s.d. Dual variable Q ij for each observed (i,j) maximize Q ij 0 Q ij Q Y 2 1 sparse elementwise product (zero for unobserved entries)

19 Finding Max-Margin Matrix Factorizations X dense primal Y sparse observations (constraints) Q sparse dual X tr err(x) minimize ½(tr(A)+tr(B)) + c ξ ij Y ij X ij 1- ξ ij A X X B p.s.d. Dual variable Q ij for each observed (i,j) maximize Q ij 0 Q ij c Q Y 2 1 sparse elementwise product (zero for unobserved entries)

20 Finding Max-Margin Matrix Factorizations X dense primal Y sparse observations (constraints) Q sparse dual X tr err(x) minimize ½(tr(A)+tr(B)) + c ξ ij Y ij X ij 1- ξ ij A X X B Dual variable Q ij for each observed (i,j) maximize Q ij 0 Q ij c p.s.d. Q Y 2 1 X * =UDV Q * =USV can be recovered by solving small LP

21 Fast Optimization of the Dual err(x) minimize X tr + c (1-Y ij X ij ) + maximize 1 - b err(x) X tr err(x) / #obs ~ X=X/v ~ maximize v - b (v-y ij X ij ) X ~ tr 1 Attainable ( err(x), X tr ) maximize Q ij Q Y Q ij c minimize Q Y ~ 2 Q ~ ij 1 0 Q ~ ij b min max u (Q Y)u ~ Use Nesterov s smoothing for saddle problems Front of extreme 0.2 err(x), X tr values 0.1 (aka regularization path ) X tr Subgradient (and grad of smoothed problem) given by SVD

22 Normalized Mean Absolute Error Finding Max-Margin Matrix Factorizations 1000x x x x x x x562 1% observed 2 sec 2 sec 19 sec 3:27 min 34:35 min 5:44:07 57:23:09 EachMovie 74,424 1, M observations (1 6) 10% observed 3 sec 18 sec 2:34 min 3:37 min 41:15 min 6:40:06 67:35:34 50% observed 10 sec 35 sec 2:41 min 19:11 min 1:35:28 hours 19:09:49 62:12:21 Prediction Performance MovieLens 6,040 3,952 1M observations (1 5) URP (Aspect/LDA-like) "Atitude (softmax-based) MMMF Over 1.5 million observations 5 million observations 3.02 GHz Xeon Top methods from [Marlin04] comprehensive comparison

23 MMMF Optimization: Other Options minimize X tr + loss(x) Gradient descent of smoothed primal: X tr = i λ i (X) smooth X / X = U S tr smooth / S V norm singular value Gradient descent on unconstrained: U Fro + V Fro + loss(uv) Non-convex If dimensionality of U,V large enough, still guaranteed global optimum [Burer Monteiro 2004]

24 Outline Maximum Margin Matrix Factorization low max-norm and low trace-norm matrices Unbounded number of factors Convex! Learning MMMF: Semidefinite Programming Generalization error bounds Learning with low-rank (finite factor) matrices MMMF (low max-norm / trace-norm) Representational power of low rank, max-norm and trace-norm matrices

25 Generalization Error Bounds D(X;Y) = { ij X ij Y ij 0 } / nm generalization error D S (X;Y) = { ij S X ij Y ij 0 } / S empirical error Assuming a low-rank structure (eigengap): Asymptotic behavior [Azar+01] Sample complexity, query strategy [Drineas+02] Y Pr S ( X D(X;Y)<D S (X;Y)+ε ) > 1-δ hypothesis X Y S source distribution training set random unknown, assumption-free

26 Generalization Error Bounds: Low Trace-Norm, Max-Norm or Rank D(X;Y) = { ij X ij Y ij 0 } / nm generalization error D S (X;Y) = { ij S X ij Y ij <1 } / S empirical error Y Pr S ( X D(X;Y)<D S (X;Y)+ε ) > 1-δ = { X R n m rank(x) k } dc(a) = min rank(x) A ij X ij >0 ε = 8en k( n + m)log + log 1 k 2 S δ = { X R n m X 2 max R2 } mc 2 (A) = min X 2 max A ij X ij 1 ε = 12 R 2 ( n + m) + log S 1 δ = { X R n m X 2 tr /nm R2 } ac 2 (A) = min X 2 tr /nm A ij X ij 1 ε = K R ( n + m)log n + log 1 S δ

27 Matrices Realizable by Low-Rank, Low-Trace-Norm and Low-Max- Norm Rank and max-norm are uniform measures, but trace-norm is on-average measure; Might be able to realize with low trace-norm, but require high (O(n 2/3 )) rank, max-norm Realizable with low max-norm Realizable with low rank (and extra log factor) dc(a) 10 mc 2 (A) log(3nm) Some of matrices realizable with low rank require arbitrarily high max-norm, trace-norm [Sherstov 07]

28 Maximum Margin Matrix Factorization as a Convex Combination of Classifiers { UV ( U i 2 )( V i 2 ) 1 } = convex-hull( { uv u R n, v R m u = v =1} ) rank-one, unit-norm, matrices conv( { uv u {±1} n, v {±1}m} ) { UV (max U i 2 )(max V j 2 ) 1 } 2 conv( { uv u {±1} n, v {±1}m} ) rank-one sign matrices

29 Transfer Learning in Multi-Task and Multi-Class Settings minimize ½ X tr + loss(x predicting Y) minimize ½ W tr + loss(wφ predicting Y) instances instances tasks W Φ features Y tasks [Argyriou et al 2007] [Abernethy et al] [Amit et al 2007]

30 Transfer Learning in Multi-Task and Multi-Class Settings minimize ½ X tr + loss(x predicting Y) minimize ½ W tr + loss(wφ predicting Y) instances instances tasks U V Φ features Y tasks Learned feature space VΦ [Argyriou et al 2007] [Abernethy et al] [Amit et al 2007]

31 Mammal Recognition Experiment 72 classes of mammals Training: 1000 images Testing: 1000 images Compare: Frobenius norm SVM Trace norm SVM

32 accuracy change using trace norm Mammal Recognition Experiment Trace Norm Frobenius norm number of training instances

33 Mammal Recognition Trace Experiment Frobenius

34 Max-Margin Matrix Factorization Nati Srebro, TTI-Chicago Infinite number of factors (norm replaces dimensionality) Convex optimization problem (SDP) Low-norm factorization is true goal, not surrogate for rank Infinite factor model, connection to SVM Generalization error bounds Theoretically, not a good approximation to rank Appropriate model in practice good empirical results Not only collaborative filtering! Multi-task learning Multi-class learning Semi-supervised learning following eg [Rie Zhang 05,07] Bottleneck-type methods [Globerson Tishby 04]

Matrix Factorizations: A Tale of Two Norms

Matrix Factorizations: A Tale of Two Norms Matrix Factorizations: A Tale of Two Norms Nati Srebro Toyota Technological Institute Chicago Maximum Margin Matrix Factorization S, Jason Rennie, Tommi Jaakkola (MIT), NIPS 2004 Rank, Trace-Norm and Max-Norm

More information

Learning with Matrix Factorizations

Learning with Matrix Factorizations Learning with Matrix Factorizations Nati Srebro Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Adviser: Tommi Jaakkola (MIT) Committee: Alan Willsky (MIT),

More information

Rank, Trace-Norm & Max-Norm

Rank, Trace-Norm & Max-Norm Rank, Trace-Norm & Max-Norm as measures of matrix complexity Nati Srebro University of Toronto Adi Shraibman Hebrew University Matrix Learning users movies 2 1 4 5 5 4? 1 3 3 5 2 4? 5 3? 4 1 3 5 2 1? 4

More information

Spectral k-support Norm Regularization

Spectral k-support Norm Regularization Spectral k-support Norm Regularization Andrew McDonald Department of Computer Science, UCL (Joint work with Massimiliano Pontil and Dimitris Stamos) 25 March, 2015 1 / 19 Problem: Matrix Completion Goal:

More information

A Simple Algorithm for Nuclear Norm Regularized Problems

A Simple Algorithm for Nuclear Norm Regularized Problems A Simple Algorithm for Nuclear Norm Regularized Problems ICML 00 Martin Jaggi, Marek Sulovský ETH Zurich Matrix Factorizations for recommender systems Y = Customer Movie UV T = u () The Netflix challenge:

More information

Collaborative Filtering Matrix Completion Alternating Least Squares

Collaborative Filtering Matrix Completion Alternating Least Squares Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 19, 2016

More information

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22

More information

Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices

Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices Nathan Srebro Department of Computer Science University of Toronto Toronto, ON, Canada nati@cs.toronto.edu Noga Alon School

More information

Collaborative Filtering

Collaborative Filtering Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin

More information

Weighted Low Rank Approximations

Weighted Low Rank Approximations Weighted Low Rank Approximations Nathan Srebro and Tommi Jaakkola Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Weighted Low Rank Approximations What is

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Uncovering Shared Structures in Multiclass Classification

Uncovering Shared Structures in Multiclass Classification Yonatan Amit School of Computer Science and Engineering, The Hebrew University, Israel Michael Fink Center for Neural Computation, The Hebrew University, Israel Nathan Srebro Toyota Technological Institute

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Semi Supervised Distance Metric Learning

Semi Supervised Distance Metric Learning Semi Supervised Distance Metric Learning wliu@ee.columbia.edu Outline Background Related Work Learning Framework Collaborative Image Retrieval Future Research Background Euclidean distance d( x, x ) =

More information

An Algorithm for Transfer Learning in a Heterogeneous Environment

An Algorithm for Transfer Learning in a Heterogeneous Environment An Algorithm for Transfer Learning in a Heterogeneous Environment Andreas Argyriou 1, Andreas Maurer 2, and Massimiliano Pontil 1 1 Department of Computer Science University College London Malet Place,

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Loss Functions for Preference Levels: Regression with Discrete Ordered Labels

Loss Functions for Preference Levels: Regression with Discrete Ordered Labels Loss Functions for Preference Levels: Regression with Discrete Ordered Labels Jason D. M. Rennie Massachusetts Institute of Technology Comp. Sci. and Artificial Intelligence Laboratory Cambridge, MA 9,

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Using SVD to Recommend Movies

Using SVD to Recommend Movies Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last

More information

Support Vector Machine

Support Vector Machine Andrea Passerini passerini@disi.unitn.it Machine Learning Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Multi-Task Learning and Matrix Regularization

Multi-Task Learning and Matrix Regularization Multi-Task Learning and Matrix Regularization Andreas Argyriou Department of Computer Science University College London Collaborators T. Evgeniou (INSEAD) R. Hauser (University of Oxford) M. Herbster (University

More information

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ

More information

Low Rank Matrix Completion Formulation and Algorithm

Low Rank Matrix Completion Formulation and Algorithm 1 2 Low Rank Matrix Completion and Algorithm Jian Zhang Department of Computer Science, ETH Zurich zhangjianthu@gmail.com March 25, 2014 Movie Rating 1 2 Critic A 5 5 Critic B 6 5 Jian 9 8 Kind Guy B 9

More information

Learning to Learn and Collaborative Filtering

Learning to Learn and Collaborative Filtering Appearing in NIPS 2005 workshop Inductive Transfer: Canada, December, 2005. 10 Years Later, Whistler, Learning to Learn and Collaborative Filtering Kai Yu, Volker Tresp Siemens AG, 81739 Munich, Germany

More information

CSC 411 Lecture 17: Support Vector Machine

CSC 411 Lecture 17: Support Vector Machine CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17

More information

Collaborative Filtering: A Machine Learning Perspective

Collaborative Filtering: A Machine Learning Perspective Collaborative Filtering: A Machine Learning Perspective Chapter 6: Dimensionality Reduction Benjamin Marlin Presenter: Chaitanya Desai Collaborative Filtering: A Machine Learning Perspective p.1/18 Topics

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Linear Dependent Dimensionality Reduction

Linear Dependent Dimensionality Reduction Linear Dependent Dimensionality Reduction Nathan Srebro Tommi Jaakkola Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Cambridge, MA 239 nati@mit.edu,tommi@ai.mit.edu

More information

Overparametrization for Landscape Design in Non-convex Optimization

Overparametrization for Landscape Design in Non-convex Optimization Overparametrization for Landscape Design in Non-convex Optimization Jason D. Lee University of Southern California September 19, 2018 The State of Non-Convex Optimization Practical observation: Empirically,

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

Sparse Optimization Lecture: Basic Sparse Optimization Models

Sparse Optimization Lecture: Basic Sparse Optimization Models Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Concentration-Based Guarantees for Low-Rank Matrix Reconstruction

Concentration-Based Guarantees for Low-Rank Matrix Reconstruction JMLR: Workshop and Conference Proceedings 9 20 35 339 24th Annual Conference on Learning Theory Concentration-Based Guarantees for Low-Rank Matrix Reconstruction Rina Foygel Department of Statistics, University

More information

The convex algebraic geometry of rank minimization

The convex algebraic geometry of rank minimization The convex algebraic geometry of rank minimization Pablo A. Parrilo Laboratory for Information and Decision Systems Massachusetts Institute of Technology International Symposium on Mathematical Programming

More information

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017 Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

A Spectral Regularization Framework for Multi-Task Structure Learning

A Spectral Regularization Framework for Multi-Task Structure Learning A Spectral Regularization Framework for Multi-Task Structure Learning Massimiliano Pontil Department of Computer Science University College London (Joint work with A. Argyriou, T. Evgeniou, C.A. Micchelli,

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Andriy Mnih and Ruslan Salakhutdinov

Andriy Mnih and Ruslan Salakhutdinov MATRIX FACTORIZATION METHODS FOR COLLABORATIVE FILTERING Andriy Mnih and Ruslan Salakhutdinov University of Toronto, Machine Learning Group 1 What is collaborative filtering? The goal of collaborative

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Popularity Recommendation Systems Predicting user responses to options Offering news articles based on users interests Offering suggestions on what the user might like to buy/consume

More information

Similarity-Based Theoretical Foundation for Sparse Parzen Window Prediction

Similarity-Based Theoretical Foundation for Sparse Parzen Window Prediction Similarity-Based Theoretical Foundation for Sparse Parzen Window Prediction Nina Balcan Avrim Blum Nati Srebro Toyota Technological Institute Chicago Sparse Parzen Window Prediction We are concerned with

More information

Maximum Margin Matrix Factorization for Collaborative Ranking

Maximum Margin Matrix Factorization for Collaborative Ranking Maximum Margin Matrix Factorization for Collaborative Ranking Joint work with Quoc Le, Alexandros Karatzoglou and Markus Weimer Alexander J. Smola sml.nicta.com.au Statistical Machine Learning Program

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Machine Learning A Geometric Approach

Machine Learning A Geometric Approach Machine Learning A Geometric Approach CIML book Chap 7.7 Linear Classification: Support Vector Machines (SVM) Professor Liang Huang some slides from Alex Smola (CMU) Linear Separator Ham Spam From Perceptron

More information

How Good is a Kernel When Used as a Similarity Measure?

How Good is a Kernel When Used as a Similarity Measure? How Good is a Kernel When Used as a Similarity Measure? Nathan Srebro Toyota Technological Institute-Chicago IL, USA IBM Haifa Research Lab, ISRAEL nati@uchicago.edu Abstract. Recently, Balcan and Blum

More information

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications Professor M. Chiang Electrical Engineering Department, Princeton University March

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

Symmetric Factorization for Nonconvex Optimization

Symmetric Factorization for Nonconvex Optimization Symmetric Factorization for Nonconvex Optimization Qinqing Zheng February 24, 2017 1 Overview A growing body of recent research is shedding new light on the role of nonconvex optimization for tackling

More information

Constrained optimization

Constrained optimization Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Adrien Todeschini Inria Bordeaux JdS 2014, Rennes Aug. 2014 Joint work with François Caron (Univ. Oxford), Marie

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Generalized Conditional Gradient and Its Applications

Generalized Conditional Gradient and Its Applications Generalized Conditional Gradient and Its Applications Yaoliang Yu University of Alberta UBC Kelowna, 04/18/13 Y-L. Yu (UofA) GCG and Its Apps. UBC Kelowna, 04/18/13 1 / 25 1 Introduction 2 Generalized

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Implicit Regularization in Matrix Factorization

Implicit Regularization in Matrix Factorization Implicit Regularization in Matrix Factorization Suriya Gunasekar Blake Woodworth Srinadh Bhojanapalli Behnam Neyshabur Nathan Srebro SURIYA@TTIC.EDU BLAKE@TTIC.EDU SRINADH@TTIC.EDU BEHNAM@TTIC.EDU NATI@TTIC.EDU

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Department of Mathematics & Risk Management Institute National University of Singapore (Based on a joint work with Shujun

More information

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming CSC2411 - Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming Notes taken by Mike Jamieson March 28, 2005 Summary: In this lecture, we introduce semidefinite programming

More information

A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization

A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization Jacob Abernethy, Francis Bach, Theodoros Evgeniou, Jean-Philippe Vert To cite this version: Jacob Abernethy,

More information

Learning with Matrix Factorizations. Nathan Srebro

Learning with Matrix Factorizations. Nathan Srebro Learning with Matrix Factorizations by Nathan Srebro Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Matrix Rank Minimization with Applications

Matrix Rank Minimization with Applications Matrix Rank Minimization with Applications Maryam Fazel Haitham Hindi Stephen Boyd Information Systems Lab Electrical Engineering Department Stanford University 8/2001 ACC 01 Outline Rank Minimization

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and

More information

Convex Optimization in Classification Problems

Convex Optimization in Classification Problems New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

5. Duality. Lagrangian

5. Duality. Lagrangian 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine CS295: Convex Optimization Xiaohui Xie Department of Computer Science University of California, Irvine Course information Prerequisites: multivariate calculus and linear algebra Textbook: Convex Optimization

More information

Short Course Robust Optimization and Machine Learning. Lecture 4: Optimization in Unsupervised Learning

Short Course Robust Optimization and Machine Learning. Lecture 4: Optimization in Unsupervised Learning Short Course Robust Optimization and Machine Machine Lecture 4: Optimization in Unsupervised Laurent El Ghaoui EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 s

More information

Statistical Performance of Convex Tensor Decomposition

Statistical Performance of Convex Tensor Decomposition Slides available: h-p://www.ibis.t.u tokyo.ac.jp/ryotat/tensor12kyoto.pdf Statistical Performance of Convex Tensor Decomposition Ryota Tomioka 2012/01/26 @ Kyoto University Perspectives in Informatics

More information

Semi-Supervised Learning in Gigantic Image Collections. Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT)

Semi-Supervised Learning in Gigantic Image Collections. Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT) Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT) Gigantic Image Collections What does the world look like? High

More information

Binary matrix completion

Binary matrix completion Binary matrix completion Yaniv Plan University of Michigan SAMSI, LDHD workshop, 2013 Joint work with (a) Mark Davenport (b) Ewout van den Berg (c) Mary Wootters Yaniv Plan (U. Mich.) Binary matrix completion

More information

Rank minimization via the γ 2 norm

Rank minimization via the γ 2 norm Rank minimization via the γ 2 norm Troy Lee Columbia University Adi Shraibman Weizmann Institute Rank Minimization Problem Consider the following problem min X rank(x) A i, X b i for i = 1,..., k Arises

More information

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion Robert M. Freund, MIT joint with Paul Grigas (UC Berkeley) and Rahul Mazumder (MIT) CDC, December 2016 1 Outline of Topics

More information

Recovering any low-rank matrix, provably

Recovering any low-rank matrix, provably Recovering any low-rank matrix, provably Rachel Ward University of Texas at Austin October, 2014 Joint work with Yudong Chen (U.C. Berkeley), Srinadh Bhojanapalli and Sujay Sanghavi (U.T. Austin) Matrix

More information

EE 381V: Large Scale Learning Spring Lecture 16 March 7

EE 381V: Large Scale Learning Spring Lecture 16 March 7 EE 381V: Large Scale Learning Spring 2013 Lecture 16 March 7 Lecturer: Caramanis & Sanghavi Scribe: Tianyang Bai 16.1 Topics Covered In this lecture, we introduced one method of matrix completion via SVD-based

More information

About this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes

About this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes About this class Maximum margin classifiers SVMs: geometric derivation of the primal problem Statement of the dual problem The kernel trick SVMs as the solution to a regularization problem Maximizing the

More information

what can deep learning learn from linear regression? Benjamin Recht University of California, Berkeley

what can deep learning learn from linear regression? Benjamin Recht University of California, Berkeley what can deep learning learn from linear regression? Benjamin Recht University of California, Berkeley Collaborators Joint work with Samy Bengio, Moritz Hardt, Michael Jordan, Jason Lee, Max Simchowitz,

More information

Kernelized Perceptron Support Vector Machines

Kernelized Perceptron Support Vector Machines Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Lecture Notes 10: Matrix Factorization

Lecture Notes 10: Matrix Factorization Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To

More information

Large Scale Semi-supervised Linear SVMs. University of Chicago

Large Scale Semi-supervised Linear SVMs. University of Chicago Large Scale Semi-supervised Linear SVMs Vikas Sindhwani and Sathiya Keerthi University of Chicago SIGIR 2006 Semi-supervised Learning (SSL) Motivation Setting Categorize x-billion documents into commercial/non-commercial.

More information

Computational and Statistical Tradeoffs via Convex Relaxation

Computational and Statistical Tradeoffs via Convex Relaxation Computational and Statistical Tradeoffs via Convex Relaxation Venkat Chandrasekaran Caltech Joint work with Michael Jordan Time-constrained Inference o Require decision after a fixed (usually small) amount

More information

Stochastic Subgradient Method

Stochastic Subgradient Method Stochastic Subgradient Method Lingjie Weng, Yutian Chen Bren School of Information and Computer Science UC Irvine Subgradient Recall basic inequality for convex differentiable f : f y f x + f x T (y x)

More information

Rank, Trace-Norm and Max-Norm

Rank, Trace-Norm and Max-Norm Rank, Trace-Norm and Max-Norm Nathan Srebro 1 and Adi Shraibman 2 1 University of Toronto Department of Computer Science, Toronto ON, Canada 2 Hebrew University Institute of Computer Science, Jerusalem,

More information