Maximum Margin Matrix Factorization
|
|
- Carmel Ward
- 5 years ago
- Views:
Transcription
1 Maximum Margin Matrix Factorization Nati Srebro Toyota Technological Institute Chicago Joint work with Noga Alon Tel-Aviv Yonatan Amit Hebrew U Alex d Aspremont Princeton Michael Fink Hebrew U Tommi Jaakkola MIT Ben Marlin Toronto Jason Rennie MIT Adi Schraibman Hebrew U Shimon Ullman Weizmann
2 Collaborative Prediction Based on partially observed matrix: Predict unobserved entries users movies ? ? 2 5 3? ? ? ? ? 1 3? ? ? Will user i like movie j?
3 Linear Factor Model movies movies u1 +1 v 1 comic value preferences of a specific user + + u2 +2 v 2 u3-1 v 3 dramatic value violence characteristics of the user
4 Linear Factor Model movies u movies V
5 Linear Factor Model users movies Y u U movies V
6 Matrix Factorization Unconstrained: Low Rank Approximation Y U V = X rank k Additive Gaussian noise: minimize Y-UV Fro General additive noise General conditional models Multiplicative noise, Exponential-PCA [Collins+01], Multinomial (plsa [Hofmann01]), etc General loss functions Hinge loss, loss functions appropriate for ratings, etc [Gordon03] Unconstrained U,V, fully observed Y use SVD non-convex, no explicit solution
7 Matrix Factorization Y U V = X rank k Non-Negativity [LeeSeung99] Stochasticity (convexity) [LeeSeung97] [Hofmann01] Sparsity Clustering as an extreme (when rows of U sparse) Overall number of factors still constrained Non-convex optimization problems
8 Outline Maximum Margin Matrix Factorization low max-norm and low trace-norm matrices Unbounded number of factors Convex! Learning MMMF: Semidefinite Programming Generalization error bounds Learning with low-rank (finite factor) matrices MMMF (low max-norm / trace-norm) Representational power of low rank, max-norm and trace-norm matrices
9 Collaborative Prediction with U Matrix Factorization V ? ? ? ? ? +1 5? ? ? ? ? Fit factorizable (lowrank) matrix X=UV to observed entries. minimize Σloss(X ij ;Y ij ) prediction observation Use matrix X to predict unobserved entries. [Sarwar+00] [Azar+01] [Hoffman04] [Marlin+04]
10 Collaborative Prediction with U Matrix Factorization V v12 v11 v10 v9 v8 v7 v6 v5 v4 v3 v2 v When U is fixed, each row is a linear classification problem: rows of U are feature vectors columns of V are linear classifiers Fitting U and V: Learning features that work well across all classification problems.
11 Geometric Interpretation: Co-embedding Points and Separating Hyperplanes U V rows of U (users) columns of V (movies)
12 Geometric Interpretation: Co-embedding Points and Separating Hyperplanes U V rows of U (users) columns of V (movies)
13 Max-Margin Matrix Factorization: Bound norms of U,V instead of their dimensionality low norm U V bound norms uniformly: (max i U i 2 ) (max j V j 2 ) 1 rows of U (users) columns of V (movies) For observed Y ij ±1: Y ij X ij Margin U i,v j
14 Max-Margin Matrix Factorization: Bound norms of U,V instead of their dimensionality low norm U V bound norms uniformly: (max i U i 2 ) (max j V j 2 ) R 2 X max = min (max i U i )(max j V j ) X=UV max-norm bound norms on average: ( i U i 2 ) ( j V j 2 ) R 2 X tr = min U Fro V Fro X=UV U is fixed: each column of V is SVM For observed Y ij ±1: Y ij X ij Margin U i,v j
15 Max-Margin Matrix Factorization: Bound norms of U,V instead of their dimensionality αv 1 bound norms uniformly: (max i U i 2 ) (max j V j 2 ) R 2 1 αv 2 X max = min (max i U i )(max j V j ) X=UV max-norm α U 1 1 α U 2 αu 1 V 1 + (1-α)U 2 V 2 bound norms on average: ( i U i 2 ) ( j V j 2 ) R 2 X tr = min U Fro V Fro X=UV Unlike rank(x) k, these are convex constraints!
16 Max-Margin Matrix Factorization: Convex Combination of Factors unit norm columns S U unit norm rows V bound norms on average: ( i U i 2 ) ( j V j 2 ) 1 X tr = (singular values of X) s i 1 s i u i v i outer product of norm-1 vectors: rank-1 norm-1 matrix
17 Finding Max-Margin Matrix Factorizations minimize ( i U i 2 ) ( j V j 2 ) Y ij X ij 1 X=UV X tr = (singular values of X) = min A,B ½( tr(a) + tr(b) ) U V U A X V X B p.s.d. [Fazel Hindi Boyd 2001] minimize ½(tr(A)+tr(B)) Y ij X ij 1 A X X B p.s.d.
18 Finding Max-Margin Matrix Factorizations X dense primal Y sparse observations (constraints) Q sparse dual minimize tr(a)+tr(b) Y ij X ij 1 A X X B p.s.d. Dual variable Q ij for each observed (i,j) maximize Q ij 0 Q ij Q Y 2 1 sparse elementwise product (zero for unobserved entries)
19 Finding Max-Margin Matrix Factorizations X dense primal Y sparse observations (constraints) Q sparse dual X tr err(x) minimize ½(tr(A)+tr(B)) + c ξ ij Y ij X ij 1- ξ ij A X X B p.s.d. Dual variable Q ij for each observed (i,j) maximize Q ij 0 Q ij c Q Y 2 1 sparse elementwise product (zero for unobserved entries)
20 Finding Max-Margin Matrix Factorizations X dense primal Y sparse observations (constraints) Q sparse dual X tr err(x) minimize ½(tr(A)+tr(B)) + c ξ ij Y ij X ij 1- ξ ij A X X B Dual variable Q ij for each observed (i,j) maximize Q ij 0 Q ij c p.s.d. Q Y 2 1 X * =UDV Q * =USV can be recovered by solving small LP
21 Fast Optimization of the Dual err(x) minimize X tr + c (1-Y ij X ij ) + maximize 1 - b err(x) X tr err(x) / #obs ~ X=X/v ~ maximize v - b (v-y ij X ij ) X ~ tr 1 Attainable ( err(x), X tr ) maximize Q ij Q Y Q ij c minimize Q Y ~ 2 Q ~ ij 1 0 Q ~ ij b min max u (Q Y)u ~ Use Nesterov s smoothing for saddle problems Front of extreme 0.2 err(x), X tr values 0.1 (aka regularization path ) X tr Subgradient (and grad of smoothed problem) given by SVD
22 Normalized Mean Absolute Error Finding Max-Margin Matrix Factorizations 1000x x x x x x x562 1% observed 2 sec 2 sec 19 sec 3:27 min 34:35 min 5:44:07 57:23:09 EachMovie 74,424 1, M observations (1 6) 10% observed 3 sec 18 sec 2:34 min 3:37 min 41:15 min 6:40:06 67:35:34 50% observed 10 sec 35 sec 2:41 min 19:11 min 1:35:28 hours 19:09:49 62:12:21 Prediction Performance MovieLens 6,040 3,952 1M observations (1 5) URP (Aspect/LDA-like) "Atitude (softmax-based) MMMF Over 1.5 million observations 5 million observations 3.02 GHz Xeon Top methods from [Marlin04] comprehensive comparison
23 MMMF Optimization: Other Options minimize X tr + loss(x) Gradient descent of smoothed primal: X tr = i λ i (X) smooth X / X = U S tr smooth / S V norm singular value Gradient descent on unconstrained: U Fro + V Fro + loss(uv) Non-convex If dimensionality of U,V large enough, still guaranteed global optimum [Burer Monteiro 2004]
24 Outline Maximum Margin Matrix Factorization low max-norm and low trace-norm matrices Unbounded number of factors Convex! Learning MMMF: Semidefinite Programming Generalization error bounds Learning with low-rank (finite factor) matrices MMMF (low max-norm / trace-norm) Representational power of low rank, max-norm and trace-norm matrices
25 Generalization Error Bounds D(X;Y) = { ij X ij Y ij 0 } / nm generalization error D S (X;Y) = { ij S X ij Y ij 0 } / S empirical error Assuming a low-rank structure (eigengap): Asymptotic behavior [Azar+01] Sample complexity, query strategy [Drineas+02] Y Pr S ( X D(X;Y)<D S (X;Y)+ε ) > 1-δ hypothesis X Y S source distribution training set random unknown, assumption-free
26 Generalization Error Bounds: Low Trace-Norm, Max-Norm or Rank D(X;Y) = { ij X ij Y ij 0 } / nm generalization error D S (X;Y) = { ij S X ij Y ij <1 } / S empirical error Y Pr S ( X D(X;Y)<D S (X;Y)+ε ) > 1-δ = { X R n m rank(x) k } dc(a) = min rank(x) A ij X ij >0 ε = 8en k( n + m)log + log 1 k 2 S δ = { X R n m X 2 max R2 } mc 2 (A) = min X 2 max A ij X ij 1 ε = 12 R 2 ( n + m) + log S 1 δ = { X R n m X 2 tr /nm R2 } ac 2 (A) = min X 2 tr /nm A ij X ij 1 ε = K R ( n + m)log n + log 1 S δ
27 Matrices Realizable by Low-Rank, Low-Trace-Norm and Low-Max- Norm Rank and max-norm are uniform measures, but trace-norm is on-average measure; Might be able to realize with low trace-norm, but require high (O(n 2/3 )) rank, max-norm Realizable with low max-norm Realizable with low rank (and extra log factor) dc(a) 10 mc 2 (A) log(3nm) Some of matrices realizable with low rank require arbitrarily high max-norm, trace-norm [Sherstov 07]
28 Maximum Margin Matrix Factorization as a Convex Combination of Classifiers { UV ( U i 2 )( V i 2 ) 1 } = convex-hull( { uv u R n, v R m u = v =1} ) rank-one, unit-norm, matrices conv( { uv u {±1} n, v {±1}m} ) { UV (max U i 2 )(max V j 2 ) 1 } 2 conv( { uv u {±1} n, v {±1}m} ) rank-one sign matrices
29 Transfer Learning in Multi-Task and Multi-Class Settings minimize ½ X tr + loss(x predicting Y) minimize ½ W tr + loss(wφ predicting Y) instances instances tasks W Φ features Y tasks [Argyriou et al 2007] [Abernethy et al] [Amit et al 2007]
30 Transfer Learning in Multi-Task and Multi-Class Settings minimize ½ X tr + loss(x predicting Y) minimize ½ W tr + loss(wφ predicting Y) instances instances tasks U V Φ features Y tasks Learned feature space VΦ [Argyriou et al 2007] [Abernethy et al] [Amit et al 2007]
31 Mammal Recognition Experiment 72 classes of mammals Training: 1000 images Testing: 1000 images Compare: Frobenius norm SVM Trace norm SVM
32 accuracy change using trace norm Mammal Recognition Experiment Trace Norm Frobenius norm number of training instances
33 Mammal Recognition Trace Experiment Frobenius
34 Max-Margin Matrix Factorization Nati Srebro, TTI-Chicago Infinite number of factors (norm replaces dimensionality) Convex optimization problem (SDP) Low-norm factorization is true goal, not surrogate for rank Infinite factor model, connection to SVM Generalization error bounds Theoretically, not a good approximation to rank Appropriate model in practice good empirical results Not only collaborative filtering! Multi-task learning Multi-class learning Semi-supervised learning following eg [Rie Zhang 05,07] Bottleneck-type methods [Globerson Tishby 04]
Matrix Factorizations: A Tale of Two Norms
Matrix Factorizations: A Tale of Two Norms Nati Srebro Toyota Technological Institute Chicago Maximum Margin Matrix Factorization S, Jason Rennie, Tommi Jaakkola (MIT), NIPS 2004 Rank, Trace-Norm and Max-Norm
More informationLearning with Matrix Factorizations
Learning with Matrix Factorizations Nati Srebro Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Adviser: Tommi Jaakkola (MIT) Committee: Alan Willsky (MIT),
More informationRank, Trace-Norm & Max-Norm
Rank, Trace-Norm & Max-Norm as measures of matrix complexity Nati Srebro University of Toronto Adi Shraibman Hebrew University Matrix Learning users movies 2 1 4 5 5 4? 1 3 3 5 2 4? 5 3? 4 1 3 5 2 1? 4
More informationSpectral k-support Norm Regularization
Spectral k-support Norm Regularization Andrew McDonald Department of Computer Science, UCL (Joint work with Massimiliano Pontil and Dimitris Stamos) 25 March, 2015 1 / 19 Problem: Matrix Completion Goal:
More informationA Simple Algorithm for Nuclear Norm Regularized Problems
A Simple Algorithm for Nuclear Norm Regularized Problems ICML 00 Martin Jaggi, Marek Sulovský ETH Zurich Matrix Factorizations for recommender systems Y = Customer Movie UV T = u () The Netflix challenge:
More informationCollaborative Filtering Matrix Completion Alternating Least Squares
Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 19, 2016
More informationA Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices
A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22
More informationGeneralization Error Bounds for Collaborative Prediction with Low-Rank Matrices
Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices Nathan Srebro Department of Computer Science University of Toronto Toronto, ON, Canada nati@cs.toronto.edu Noga Alon School
More informationCollaborative Filtering
Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin
More informationWeighted Low Rank Approximations
Weighted Low Rank Approximations Nathan Srebro and Tommi Jaakkola Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Weighted Low Rank Approximations What is
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationUncovering Shared Structures in Multiclass Classification
Yonatan Amit School of Computer Science and Engineering, The Hebrew University, Israel Michael Fink Center for Neural Computation, The Hebrew University, Israel Nathan Srebro Toyota Technological Institute
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More informationSemi Supervised Distance Metric Learning
Semi Supervised Distance Metric Learning wliu@ee.columbia.edu Outline Background Related Work Learning Framework Collaborative Image Retrieval Future Research Background Euclidean distance d( x, x ) =
More informationAn Algorithm for Transfer Learning in a Heterogeneous Environment
An Algorithm for Transfer Learning in a Heterogeneous Environment Andreas Argyriou 1, Andreas Maurer 2, and Massimiliano Pontil 1 1 Department of Computer Science University College London Malet Place,
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More informationLoss Functions for Preference Levels: Regression with Discrete Ordered Labels
Loss Functions for Preference Levels: Regression with Discrete Ordered Labels Jason D. M. Rennie Massachusetts Institute of Technology Comp. Sci. and Artificial Intelligence Laboratory Cambridge, MA 9,
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationUsing SVD to Recommend Movies
Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last
More informationSupport Vector Machine
Andrea Passerini passerini@disi.unitn.it Machine Learning Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationMulti-Task Learning and Matrix Regularization
Multi-Task Learning and Matrix Regularization Andreas Argyriou Department of Computer Science University College London Collaborators T. Evgeniou (INSEAD) R. Hauser (University of Oxford) M. Herbster (University
More informationAdaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ
More informationLow Rank Matrix Completion Formulation and Algorithm
1 2 Low Rank Matrix Completion and Algorithm Jian Zhang Department of Computer Science, ETH Zurich zhangjianthu@gmail.com March 25, 2014 Movie Rating 1 2 Critic A 5 5 Critic B 6 5 Jian 9 8 Kind Guy B 9
More informationLearning to Learn and Collaborative Filtering
Appearing in NIPS 2005 workshop Inductive Transfer: Canada, December, 2005. 10 Years Later, Whistler, Learning to Learn and Collaborative Filtering Kai Yu, Volker Tresp Siemens AG, 81739 Munich, Germany
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationCollaborative Filtering: A Machine Learning Perspective
Collaborative Filtering: A Machine Learning Perspective Chapter 6: Dimensionality Reduction Benjamin Marlin Presenter: Chaitanya Desai Collaborative Filtering: A Machine Learning Perspective p.1/18 Topics
More informationProbabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationLinear Dependent Dimensionality Reduction
Linear Dependent Dimensionality Reduction Nathan Srebro Tommi Jaakkola Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Cambridge, MA 239 nati@mit.edu,tommi@ai.mit.edu
More informationOverparametrization for Landscape Design in Non-convex Optimization
Overparametrization for Landscape Design in Non-convex Optimization Jason D. Lee University of Southern California September 19, 2018 The State of Non-Convex Optimization Practical observation: Empirically,
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationConvex Optimization M2
Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial
More informationSparse Optimization Lecture: Basic Sparse Optimization Models
Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationConcentration-Based Guarantees for Low-Rank Matrix Reconstruction
JMLR: Workshop and Conference Proceedings 9 20 35 339 24th Annual Conference on Learning Theory Concentration-Based Guarantees for Low-Rank Matrix Reconstruction Rina Foygel Department of Statistics, University
More informationThe convex algebraic geometry of rank minimization
The convex algebraic geometry of rank minimization Pablo A. Parrilo Laboratory for Information and Decision Systems Massachusetts Institute of Technology International Symposium on Mathematical Programming
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationSupport Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017
Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationA Spectral Regularization Framework for Multi-Task Structure Learning
A Spectral Regularization Framework for Multi-Task Structure Learning Massimiliano Pontil Department of Computer Science University College London (Joint work with A. Argyriou, T. Evgeniou, C.A. Micchelli,
More informationConvex Optimization M2
Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationAndriy Mnih and Ruslan Salakhutdinov
MATRIX FACTORIZATION METHODS FOR COLLABORATIVE FILTERING Andriy Mnih and Ruslan Salakhutdinov University of Toronto, Machine Learning Group 1 What is collaborative filtering? The goal of collaborative
More informationRecommendation Systems
Recommendation Systems Popularity Recommendation Systems Predicting user responses to options Offering news articles based on users interests Offering suggestions on what the user might like to buy/consume
More informationSimilarity-Based Theoretical Foundation for Sparse Parzen Window Prediction
Similarity-Based Theoretical Foundation for Sparse Parzen Window Prediction Nina Balcan Avrim Blum Nati Srebro Toyota Technological Institute Chicago Sparse Parzen Window Prediction We are concerned with
More informationMaximum Margin Matrix Factorization for Collaborative Ranking
Maximum Margin Matrix Factorization for Collaborative Ranking Joint work with Quoc Le, Alexandros Karatzoglou and Markus Weimer Alexander J. Smola sml.nicta.com.au Statistical Machine Learning Program
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationMachine Learning A Geometric Approach
Machine Learning A Geometric Approach CIML book Chap 7.7 Linear Classification: Support Vector Machines (SVM) Professor Liang Huang some slides from Alex Smola (CMU) Linear Separator Ham Spam From Perceptron
More informationHow Good is a Kernel When Used as a Similarity Measure?
How Good is a Kernel When Used as a Similarity Measure? Nathan Srebro Toyota Technological Institute-Chicago IL, USA IBM Haifa Research Lab, ISRAEL nati@uchicago.edu Abstract. Recently, Balcan and Blum
More informationELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications
ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications Professor M. Chiang Electrical Engineering Department, Princeton University March
More informationSparse PCA with applications in finance
Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction
More informationSymmetric Factorization for Nonconvex Optimization
Symmetric Factorization for Nonconvex Optimization Qinqing Zheng February 24, 2017 1 Overview A growing body of recent research is shedding new light on the role of nonconvex optimization for tackling
More informationConstrained optimization
Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationProbabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Adrien Todeschini Inria Bordeaux JdS 2014, Rennes Aug. 2014 Joint work with François Caron (Univ. Oxford), Marie
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationGeneralized Conditional Gradient and Its Applications
Generalized Conditional Gradient and Its Applications Yaoliang Yu University of Alberta UBC Kelowna, 04/18/13 Y-L. Yu (UofA) GCG and Its Apps. UBC Kelowna, 04/18/13 1 / 25 1 Introduction 2 Generalized
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationImplicit Regularization in Matrix Factorization
Implicit Regularization in Matrix Factorization Suriya Gunasekar Blake Woodworth Srinadh Bhojanapalli Behnam Neyshabur Nathan Srebro SURIYA@TTIC.EDU BLAKE@TTIC.EDU SRINADH@TTIC.EDU BEHNAM@TTIC.EDU NATI@TTIC.EDU
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,
More informationMulti-stage convex relaxation approach for low-rank structured PSD matrix recovery
Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Department of Mathematics & Risk Management Institute National University of Singapore (Based on a joint work with Shujun
More informationCSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming
CSC2411 - Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming Notes taken by Mike Jamieson March 28, 2005 Summary: In this lecture, we introduce semidefinite programming
More informationA New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization
A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization Jacob Abernethy, Francis Bach, Theodoros Evgeniou, Jean-Philippe Vert To cite this version: Jacob Abernethy,
More informationLearning with Matrix Factorizations. Nathan Srebro
Learning with Matrix Factorizations by Nathan Srebro Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationMatrix Rank Minimization with Applications
Matrix Rank Minimization with Applications Maryam Fazel Haitham Hindi Stephen Boyd Information Systems Lab Electrical Engineering Department Stanford University 8/2001 ACC 01 Outline Rank Minimization
More informationFantope Regularization in Metric Learning
Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationOLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research
OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and
More informationConvex Optimization in Classification Problems
New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1
More informationCompressed Sensing and Neural Networks
and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications
More information5. Duality. Lagrangian
5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon
More informationSupport Vector Machines
Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal
More informationCS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine
CS295: Convex Optimization Xiaohui Xie Department of Computer Science University of California, Irvine Course information Prerequisites: multivariate calculus and linear algebra Textbook: Convex Optimization
More informationShort Course Robust Optimization and Machine Learning. Lecture 4: Optimization in Unsupervised Learning
Short Course Robust Optimization and Machine Machine Lecture 4: Optimization in Unsupervised Laurent El Ghaoui EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 s
More informationStatistical Performance of Convex Tensor Decomposition
Slides available: h-p://www.ibis.t.u tokyo.ac.jp/ryotat/tensor12kyoto.pdf Statistical Performance of Convex Tensor Decomposition Ryota Tomioka 2012/01/26 @ Kyoto University Perspectives in Informatics
More informationSemi-Supervised Learning in Gigantic Image Collections. Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT)
Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT) Gigantic Image Collections What does the world look like? High
More informationBinary matrix completion
Binary matrix completion Yaniv Plan University of Michigan SAMSI, LDHD workshop, 2013 Joint work with (a) Mark Davenport (b) Ewout van den Berg (c) Mary Wootters Yaniv Plan (U. Mich.) Binary matrix completion
More informationRank minimization via the γ 2 norm
Rank minimization via the γ 2 norm Troy Lee Columbia University Adi Shraibman Weizmann Institute Rank Minimization Problem Consider the following problem min X rank(x) A i, X b i for i = 1,..., k Arises
More informationAn Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion
An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion Robert M. Freund, MIT joint with Paul Grigas (UC Berkeley) and Rahul Mazumder (MIT) CDC, December 2016 1 Outline of Topics
More informationRecovering any low-rank matrix, provably
Recovering any low-rank matrix, provably Rachel Ward University of Texas at Austin October, 2014 Joint work with Yudong Chen (U.C. Berkeley), Srinadh Bhojanapalli and Sujay Sanghavi (U.T. Austin) Matrix
More informationEE 381V: Large Scale Learning Spring Lecture 16 March 7
EE 381V: Large Scale Learning Spring 2013 Lecture 16 March 7 Lecturer: Caramanis & Sanghavi Scribe: Tianyang Bai 16.1 Topics Covered In this lecture, we introduced one method of matrix completion via SVD-based
More informationAbout this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes
About this class Maximum margin classifiers SVMs: geometric derivation of the primal problem Statement of the dual problem The kernel trick SVMs as the solution to a regularization problem Maximizing the
More informationwhat can deep learning learn from linear regression? Benjamin Recht University of California, Berkeley
what can deep learning learn from linear regression? Benjamin Recht University of California, Berkeley Collaborators Joint work with Samy Bengio, Moritz Hardt, Michael Jordan, Jason Lee, Max Simchowitz,
More informationKernelized Perceptron Support Vector Machines
Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationLecture Notes 10: Matrix Factorization
Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To
More informationLarge Scale Semi-supervised Linear SVMs. University of Chicago
Large Scale Semi-supervised Linear SVMs Vikas Sindhwani and Sathiya Keerthi University of Chicago SIGIR 2006 Semi-supervised Learning (SSL) Motivation Setting Categorize x-billion documents into commercial/non-commercial.
More informationComputational and Statistical Tradeoffs via Convex Relaxation
Computational and Statistical Tradeoffs via Convex Relaxation Venkat Chandrasekaran Caltech Joint work with Michael Jordan Time-constrained Inference o Require decision after a fixed (usually small) amount
More informationStochastic Subgradient Method
Stochastic Subgradient Method Lingjie Weng, Yutian Chen Bren School of Information and Computer Science UC Irvine Subgradient Recall basic inequality for convex differentiable f : f y f x + f x T (y x)
More informationRank, Trace-Norm and Max-Norm
Rank, Trace-Norm and Max-Norm Nathan Srebro 1 and Adi Shraibman 2 1 University of Toronto Department of Computer Science, Toronto ON, Canada 2 Hebrew University Institute of Computer Science, Jerusalem,
More information