Rank, Trace-Norm & Max-Norm

Size: px
Start display at page:

Download "Rank, Trace-Norm & Max-Norm"

Transcription

1 Rank, Trace-Norm & Max-Norm as measures of matrix complexity Nati Srebro University of Toronto Adi Shraibman Hebrew University

2 Matrix Learning users movies ? ? 5 3? ? ? 5? ? 5 2? ? ? X model of the data Reconstructing latent signal gene expression (biological processes) Capturing structure in a corpus documents, images, etc (topics, etc) Prediction: collaborative filtering movie ratings Fit (partially) observed Y with X from hypothesis class of matrices Low Rank: { X rank(x) k } Low Trace-Norm: { X X tr B } Low Max-Norm: { X X max B } Low dimensional factorization Low norm factorization MMMF [NIPS 04] In this talk: The three hypothesis classes (measures of matrix complexity) Generalization error bounds (for predicting unobserved entries) Relationships between the three measures / hypothesis classes

3 Learning with Low Rank Matrices X rank k = U V Low Rank: { X rank(x) k } = { UV U n k, V m k } For binary target matrices, only care about A=sign(X) Dimensional Complexity of a binary matrix A: dc(a) = min rank(x) sign(x)=a

4 Learning with Low Rank Matrices: Geometric Interpretation hypotheses V dc(a) = min rank(x) rank(x)=a rows of U data points U columns of V What is the minimum dimension into which a concept class can be embedded as linear separators?

5 Max-Margin Matrix Factorization: Bound norms of U,V instead of their dimensionality low norm U V bound norms uniformly: (max i U i 2 ) (max j V j 2 ) R 2 rows of U columns of V For each Y ij ±1: Y ij X ij Margin U i,v j

6 Max-Margin Matrix Factorization: Bound norms of U,V instead of their dimensionality low norm n U m V bound norms uniformly: (max i U i 2 ) (max j V j 2 ) R 2 X max = min (max i U i )(max j V j ) X=UV bound norms on avarage: ( i U i 2 ) ( j V j 2 ) nmr 2 X Σ = min U Fro V Fro X=UV Optimize V given U: each column is a SVM For each Y ij ±1: Y ij X ij 1 U i,v j

7 Max-Margin Matrix Factorization: Bound norms of U,V instead of their dimensionality low norm n U m V bound norms uniformly: (max i U i 2 ) (max j V j 2 ) R 2 X max = min (max i U i )(max j V j ) X=UV mc(a) = min X max A ij X ij 1 bound norms on avarage: ( i U i 2 ) ( j V j 2 ) nmr 2 X Σ = min U Fro V Fro X=UV ac(a) = min X nm Σ A ij X ij 1 For each Y ij ±1: Y ij X ij 1 U i,v j

8 Three Measures of Matrix Complexity Used for fitting observed data matrices bound norms uniformly: (max i U i 2 ) (max j V j 2 ) R 2 For matrices representing concept classes: 1/margin required for embedding as linear classifiers Convex! (semi-definite programming) [NIPS 04] Not convex dimension required for embedding as linear classifiers X max = min (max i U i )(max j V j ) X=UV mc(a) = min X max A ij X ij 1 bound norms on avarage: ( i U i 2 ) ( j V j 2 ) nmr 2 X Σ = min U Fro V Fro X=UV ac(a) = min X nm Σ A ij X ij 1 bound dimensionality of U,V: dc(a) = min rank(x) A ij X ij >0

9 Outline Three measures of matrix complexity Generalization error bounds Relationships between the three measures

10 Generalization Error Bounds D(X;Y) = { ij X ij Y ij 0 } / nm generalization error Assuming a low-rank structure (eigengap): Asymptotic behavior [Azar+01] Sample complexity, query strategy [Drineas+02] X Y S

11 Generalization Error Bounds D(X;Y) = { ij X ij Y ij 0 } / nm generalization error D S (X;Y) = { ij S X ij Y ij 0 } / S empirical error Y Pr S ( X D(X;Y)<D S (X;Y)+ε ) > 1-δ hypothesis X Y S source distribution training set random unknown, assumption-free

12 Generalization Error Bounds: Learning with Low Rank Matrices D(X;Y) = { ij X ij Y ij 0 } / nm generalization error D S (X;Y) = { ij S X ij Y ij <1 } / S empirical error Y Pr S ( X D(X;Y)<D S (X;Y)+ε ) > 1-δ = { X rank(x) k } Only signs matter, equivalent to using the class: { A ±1 n m A=sign(X), rank(x) k } = { A ±1 n m dc(a) k } Finite class of size: { A ±1 n m dc(a) k } (8em/k) k(n+m) Union bound yields generalization bound with: ε = 8em k( n + m)log + log 1 2 S k δ [S,Alon,Jaakkola 04] [BenDavid,Eiron,Simon01]: most concept classes have high dc(a), i.e. cannot be embedded as low-dim linear classifiers

13 Generalization Error Bounds: Learning with Low Trace-Norm Matrices D(X;Y) = { ij X ij Y ij 0 } / nm generalization error D S (X;Y) = { ij S X ij Y ij <1 } / S empirical error = { X X 2 Σ < nmr 2 } Y Pr S ( X D(X;Y)<D S (X;Y)+ε ) > 1-δ = convex-hull( { uv u n, v m u = v =1 } ) nm R X Σ = min X=UV U Fro V Fro = (singular values of X) s i = X Σ s i u i v i outer product of norm-1 vectors: rank-1 norm-1 matrix unit norm columns S unit U norm rows V X

14 Generalization Error Bounds: Learning with Low Trace-Norm Matrices D(X;Y) = { ij X ij Y ij 0 } / nm generalization error D S (X;Y) = { ij S X ij Y ij <1 } / S empirical error = { X X 2 Σ < nmr 2 } Y Pr S ( X D(X;Y)<D S (X;Y)+ε ) > 1-δ = convex-hull( { uv u n, v m u = v =1 } ) nm R Rademacher complexity of { uv u R n, v R m u = v =1} Rademacher complexity of { X X 2 Σ < nmr 2 } Generalization error bounds for { X X 2 Σ < nmr 2 } E S E rand signs σ [ sup X=uv ij s S σ ij s X(s) ij ]/ S S n m σ

15 Generalization Error Bounds: Learning with Low Trace-Norm Matrices D(X;Y) = { ij X ij Y ij 0 } / nm generalization error D S (X;Y) = { ij S X ij Y ij <1 } / S empirical error = { X X 2 Σ < nmr 2 } Y Pr S ( X D(X;Y)<D S (X;Y)+ε ) > 1-δ = convex-hull( { uv u n, v m u = v =1 } ) nm R Rademacher complexity of { uv u R n, v R m u = v =1} Rademacher complexity of { X X 2 Σ < nmr 2 } Generalization error bounds for { X X 2 Σ < nmr 2 } E S E rand signs σ [ sup X=uv ij S σ ij X ij = E[ σ 2 ]/ S ij 3 2 K ( n + m)log use [Seginer00] bound on singular values of random matrix ij n ]/ S S ε = K R ( n + m)log n + log 1 S δ

16 Generalization Error Bounds: Learning with Low Max-Norm Matrices D(X;Y) = { ij X ij Y ij 0 } / nm generalization error D S (X;Y) = { ij S X ij Y ij <1 } / S empirical error Y Pr S ( X D(X;Y)<D S (X;Y)+ε ) > 1-δ = { X X max < R } = R convex-hull(??? ) conv( { uv u ±1 n, v ±1m} ) { X X max 1 } 1.79 conv( { uv u ±1 n, v ±1m} ) Grothendiek's Inequality ε = 12 R 2 ( n + m) + log S 1 δ

17 Generalization Error Bounds: Low Trace-Norm, Max-Norm or Rank D(X;Y) = { ij X ij Y ij 0 } / nm generalization error D S (X;Y) = { ij S X ij Y ij <1 } / S empirical error Y Pr S ( X D(X;Y)<D S (X;Y)+ε ) > 1-δ = { X R n m rank(x) k } dc(a) = min rank(x) A ij X ij >0 ε = 8en k( n + m)log + log 1 2 S k δ = { X R n m X 2 Σ/nm R 2 } ac 2 (A) = min X 2 Σ/nm A ij X ij 1 ε = K R ( n + m)log n + log 1 S δ = { X R n m X 2 max R 2 } mc 2 (A) = min X 2 max A ij X ij 1 ε = 12 R 2 ( n + m) + log S 1 δ

18 Relationship between average margin complexity, max margin complexity and dimensional complexity average maximum ac 2 (A) mc 2 (A) dc(a) 10 mc 2 (A) log(3nm) Randomly project high-dimensional large margin arrangement to obtain low dimensional arrangement [Arriaga,Vempala99] Reverse inequalities? Gaps?

19 Between the Rank and the Max-Norm Low margin complexity Low dimensional complexity: dc(a) 10 mc 2 (A) log(3nm) ½ (k-1)n log(m) log { A ±1 n m dc(a) k } k(n+m) log(8en/k) R 2 n log(n/r) log { A ±1 n m mc(a) R } 10 R 2 (n+m) log(3nm) log(n/r) Low dimensional complexity? Low margin complexity: A, ( dc(a) log(n) ) p mc 2 (A) A ij = Parity(bits(i)&bits(j)) mc 2 (T n )=Θ(log n) dc(t n )=2 [BenDavid Eiron Simon 01] mc 2 (H n )=n n 0.5 dc(h n )<n 0.8 [Forster et al 02,03] Kronecker exponent of triangular matrices

20 Between the Rank and the Max-Norm Low margin complexity Low dimensional complexity: dc(a) 10 mc 2 (A) log(3nm) ½ (k-1)n log(m) log { A ±1 n m dc(a) k } k(n+m) log(8en/k) R 2 n log(n/r) log { A ±1 n m mc(a) R } 10 R 2 (n+m) log(3nm) log(n/r) Low dimensional complexity Low margin complexity: A, ( dc(a) log(n) ) p mc 2 (A) Low max-norm similar matrix with low rank: rank(x ) 9/ε 2 X 2 max log(3nm) for some X -X <ε dc(a) k=10m 2 log(3nm) mc(a) M Open: low dc(a)? low mc 2 (A ) for some A A low rank(x)? low X 2 max for some low X -X 1

21 Between the Trace-Norm and Max-Norm r n ac 2 (A) mc 2 (A) n r Anything ac 2 (A) < 2r 3 /n 2 + 2

22 Between the Trace-Norm and Max-Norm r n ac 2 (A) mc 2 (A) ac 2 (A) < 2r 3 /n r n For r=n 2/3, we can get: ac 2 (A) < 4 while: mc 2 (A) > n 2/3 and: dc 2 (A) > n 1/3

23 Random Sampling Assumption for Generalization Error Bounds D(X;Y) = P ij (X ij Y ij 0) Y Pr S ( X D(X;Y)<D S (X;Y)+ε ) > 1-δ D S (X;Y) = { ij S X ij Y ij <1 } / S = { X R n m X 2 Σ/nm R 2 } ε = K R Y ( n + m)log n + log 1 S S δ random unknown, assumption-free Requires uniform sampling of entries = { X R n m X 2 max R 2 } = { X R n m rank(x) k } ε = 12 ε = R 2 ( n + m) + log S 2 S 1 δ 8en k( n + m)log + log 1 k δ Applies to any distribution over index pairs ij (which entries are observed)

24 Between the Trace-Norm and Max-Norm r n ac 2 (A) mc 2 (A) ac 2 (A) < 2r 3 /n r n For r=n 2/3, we can get: ac 2 (A) < 4 while: mc 2 (A) > n 2/3 and: dc 2 (A) > n 1/3 This is the largest gap possible: mc 2 (A) 9(ac 2 (A) n n) 1/3 Using similar techniques: (R 4/3-2)n 4/3 log { A ±1 n m ac(a) R } 7 R 2/3 n 5/3 log(3nm) log(n/m 2 )

25 Rank, Max-Norm and Trace-Norm as complexity measures for fitting Data Matrix rank(x) = min dim(u,v) X=UV dc(a) = min rank(x) A ij X ij >0 X max = min (max i U i )(max j V j ) X=UV mc(a) = min X max A ij X ij 1 X Σ = min U Fro V Fro X=UV ac(a) = min A ij X ij 1 X Σ nm (dc(a), mc(a) previously studied as embedability of a concept class) Generalization error bounds scale with: k=rank(x) R 2 = X 2 max R 2 = X 2 Σ /nm Relationships: dc(a) 10 mc 2 (A) log(3nm), but A, ( dc(a) log(n) ) p < ac 2 (A) mc 2 (A) ac 2 (A) mc 2 (A) 9(ac 2 (A) n n) 1/3, and this is tight Open issues: low dc(a) low mc 2 (A ) for some A A? low rank(x) low X 2 max for some bounded X -X 1? dc(a) = O(mc 2 (A))? also tighten estimate of { A ±1 n m mc(a) R } ; Median mc(a)? Tighten polynomial gap in bounds on log( { A ±1 n m ac(a) R } )

Maximum Margin Matrix Factorization

Maximum Margin Matrix Factorization Maximum Margin Matrix Factorization Nati Srebro Toyota Technological Institute Chicago Joint work with Noga Alon Tel-Aviv Yonatan Amit Hebrew U Alex d Aspremont Princeton Michael Fink Hebrew U Tommi Jaakkola

More information

Learning with Matrix Factorizations

Learning with Matrix Factorizations Learning with Matrix Factorizations Nati Srebro Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Adviser: Tommi Jaakkola (MIT) Committee: Alan Willsky (MIT),

More information

Rank, Trace-Norm and Max-Norm

Rank, Trace-Norm and Max-Norm Rank, Trace-Norm and Max-Norm Nathan Srebro 1 and Adi Shraibman 2 1 University of Toronto Department of Computer Science, Toronto ON, Canada 2 Hebrew University Institute of Computer Science, Jerusalem,

More information

Matrix Factorizations: A Tale of Two Norms

Matrix Factorizations: A Tale of Two Norms Matrix Factorizations: A Tale of Two Norms Nati Srebro Toyota Technological Institute Chicago Maximum Margin Matrix Factorization S, Jason Rennie, Tommi Jaakkola (MIT), NIPS 2004 Rank, Trace-Norm and Max-Norm

More information

Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices

Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices Nathan Srebro Department of Computer Science University of Toronto Toronto, ON, Canada nati@cs.toronto.edu Noga Alon School

More information

An Approximation Algorithm for Approximation Rank

An Approximation Algorithm for Approximation Rank An Approximation Algorithm for Approximation Rank Troy Lee Columbia University Adi Shraibman Weizmann Institute Conventions Identify a communication function f : X Y { 1, +1} with the associated X-by-Y

More information

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds

More information

EE 381V: Large Scale Learning Spring Lecture 16 March 7

EE 381V: Large Scale Learning Spring Lecture 16 March 7 EE 381V: Large Scale Learning Spring 2013 Lecture 16 March 7 Lecturer: Caramanis & Sanghavi Scribe: Tianyang Bai 16.1 Topics Covered In this lecture, we introduced one method of matrix completion via SVD-based

More information

Rank minimization via the γ 2 norm

Rank minimization via the γ 2 norm Rank minimization via the γ 2 norm Troy Lee Columbia University Adi Shraibman Weizmann Institute Rank Minimization Problem Consider the following problem min X rank(x) A i, X b i for i = 1,..., k Arises

More information

A Simple Algorithm for Nuclear Norm Regularized Problems

A Simple Algorithm for Nuclear Norm Regularized Problems A Simple Algorithm for Nuclear Norm Regularized Problems ICML 00 Martin Jaggi, Marek Sulovský ETH Zurich Matrix Factorizations for recommender systems Y = Customer Movie UV T = u () The Netflix challenge:

More information

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22

More information

1-Bit Matrix Completion

1-Bit Matrix Completion 1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Nonnegative Matrix Factorization

Nonnegative Matrix Factorization Nonnegative Matrix Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Weighted Low Rank Approximations

Weighted Low Rank Approximations Weighted Low Rank Approximations Nathan Srebro and Tommi Jaakkola Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Weighted Low Rank Approximations What is

More information

1-Bit Matrix Completion

1-Bit Matrix Completion 1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Using Friendly Tail Bounds for Sums of Random Matrices

Using Friendly Tail Bounds for Sums of Random Matrices Using Friendly Tail Bounds for Sums of Random Matrices Joel A. Tropp Computing + Mathematical Sciences California Institute of Technology jtropp@cms.caltech.edu Research supported in part by NSF, DARPA,

More information

Concentration-Based Guarantees for Low-Rank Matrix Reconstruction

Concentration-Based Guarantees for Low-Rank Matrix Reconstruction JMLR: Workshop and Conference Proceedings 9 20 35 339 24th Annual Conference on Learning Theory Concentration-Based Guarantees for Low-Rank Matrix Reconstruction Rina Foygel Department of Statistics, University

More information

1-Bit Matrix Completion

1-Bit Matrix Completion 1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible

More information

Spectral gaps and geometric representations. Amir Yehudayoff (Technion)

Spectral gaps and geometric representations. Amir Yehudayoff (Technion) Spectral gaps and geometric representations Amir Yehudayoff (Technion) Noga Alon (Tel Aviv) Shay Moran (Simons) pseudorandomness random objects have properties difficult to describe hard to compute expanders

More information

Binary matrix completion

Binary matrix completion Binary matrix completion Yaniv Plan University of Michigan SAMSI, LDHD workshop, 2013 Joint work with (a) Mark Davenport (b) Ewout van den Berg (c) Mary Wootters Yaniv Plan (U. Mich.) Binary matrix completion

More information

Spectral k-support Norm Regularization

Spectral k-support Norm Regularization Spectral k-support Norm Regularization Andrew McDonald Department of Computer Science, UCL (Joint work with Massimiliano Pontil and Dimitris Stamos) 25 March, 2015 1 / 19 Problem: Matrix Completion Goal:

More information

Sharp Generalization Error Bounds for Randomly-projected Classifiers

Sharp Generalization Error Bounds for Randomly-projected Classifiers Sharp Generalization Error Bounds for Randomly-projected Classifiers R.J. Durrant and A. Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/ axk

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 12: Weak Learnability and the l 1 margin Converse to Scale-Sensitive Learning Stability Convex-Lipschitz-Bounded Problems

More information

Using SVD to Recommend Movies

Using SVD to Recommend Movies Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last

More information

Statistical Performance of Convex Tensor Decomposition

Statistical Performance of Convex Tensor Decomposition Slides available: h-p://www.ibis.t.u tokyo.ac.jp/ryotat/tensor12kyoto.pdf Statistical Performance of Convex Tensor Decomposition Ryota Tomioka 2012/01/26 @ Kyoto University Perspectives in Informatics

More information

Lecture 3. Random Fourier measurements

Lecture 3. Random Fourier measurements Lecture 3. Random Fourier measurements 1 Sampling from Fourier matrices 2 Law of Large Numbers and its operator-valued versions 3 Frames. Rudelson s Selection Theorem Sampling from Fourier matrices Our

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning

More information

Provable Alternating Minimization Methods for Non-convex Optimization

Provable Alternating Minimization Methods for Non-convex Optimization Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Adrien Todeschini Inria Bordeaux JdS 2014, Rennes Aug. 2014 Joint work with François Caron (Univ. Oxford), Marie

More information

Learning Weighted Automata

Learning Weighted Automata Learning Weighted Automata Joint work with Borja Balle (Amazon Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Weighted Automata (WFAs) page 2 Motivation Weighted automata (WFAs): image

More information

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Department of Mathematics & Risk Management Institute National University of Singapore (Based on a joint work with Shujun

More information

Lecture 15: Random Projections

Lecture 15: Random Projections Lecture 15: Random Projections Introduction to Learning and Analysis of Big Data Kontorovich and Sabato (BGU) Lecture 15 1 / 11 Review of PCA Unsupervised learning technique Performs dimensionality reduction

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU10701 11. Learning Theory Barnabás Póczos Learning Theory We have explored many ways of learning from data But How good is our classifier, really? How much data do we

More information

High-dimensional Statistics

High-dimensional Statistics High-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High Dimensional Data : Large p, small n 2. Sparsity 3. Group Sparsity 4. Low Rank 1 Curse of Dimensionality Statistical Learning: Given

More information

Information Retrieval

Information Retrieval Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices

More information

Collaborative Filtering Matrix Completion Alternating Least Squares

Collaborative Filtering Matrix Completion Alternating Least Squares Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 19, 2016

More information

Recovering any low-rank matrix, provably

Recovering any low-rank matrix, provably Recovering any low-rank matrix, provably Rachel Ward University of Texas at Austin October, 2014 Joint work with Yudong Chen (U.C. Berkeley), Srinadh Bhojanapalli and Sujay Sanghavi (U.T. Austin) Matrix

More information

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems

A Constraint Generation Approach to Learning Stable Linear Dynamical Systems A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University NIPS 2007 poster W22 steam Application: Dynamic Textures

More information

Optimistic Rates Nati Srebro

Optimistic Rates Nati Srebro Optimistic Rates Nati Srebro Based on work with Karthik Sridharan and Ambuj Tewari Examples based on work with Andy Cotter, Elad Hazan, Tomer Koren, Percy Liang, Shai Shalev-Shwartz, Ohad Shamir, Karthik

More information

6.034 Introduction to Artificial Intelligence

6.034 Introduction to Artificial Intelligence 6.34 Introduction to Artificial Intelligence Tommi Jaakkola MIT CSAIL The world is drowning in data... The world is drowning in data...... access to information is based on recommendations Recommending

More information

Distance Geometry-Matrix Completion

Distance Geometry-Matrix Completion Christos Konaxis Algs in Struct BioInfo 2010 Outline Outline Tertiary structure Incomplete data Tertiary structure Measure diffrence of matched sets Def. Root Mean Square Deviation RMSD = 1 n x i y i 2,

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 477 Instructor: Tony Jebara Topic 5 Generalization Guarantees VC-Dimension Nearest Neighbor Classification (infinite VC dimension) Structural Risk Minimization Support Vector Machines

More information

LUCK S THEOREM ALEX WRIGHT

LUCK S THEOREM ALEX WRIGHT LUCK S THEOREM ALEX WRIGHT Warning: These are the authors personal notes for a talk in a learning seminar (October 2015). There may be incorrect or misleading statements. Corrections welcome. 1. Convergence

More information

Low Rank Matrix Completion Formulation and Algorithm

Low Rank Matrix Completion Formulation and Algorithm 1 2 Low Rank Matrix Completion and Algorithm Jian Zhang Department of Computer Science, ETH Zurich zhangjianthu@gmail.com March 25, 2014 Movie Rating 1 2 Critic A 5 5 Critic B 6 5 Jian 9 8 Kind Guy B 9

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Direct product theorem for discrepancy

Direct product theorem for discrepancy Direct product theorem for discrepancy Troy Lee Rutgers University Joint work with: Robert Špalek Direct product theorems Knowing how to compute f, how can you compute f f f? Obvious upper bounds: If can

More information

Collaborative Filtering

Collaborative Filtering Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin

More information

Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14

Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14 Learning Theory Piyush Rai CS5350/6350: Machine Learning September 27, 2011 (CS5350/6350) Learning Theory September 27, 2011 1 / 14 Why Learning Theory? We want to have theoretical guarantees about our

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 2: Orthogonal Vectors and Matrices; Vector Norms Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 11 Outline 1 Orthogonal

More information

Robust Statistics, Revisited

Robust Statistics, Revisited Robust Statistics, Revisited Ankur Moitra (MIT) joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart CLASSIC PARAMETER ESTIMATION Given samples from an unknown

More information

Rank Determination for Low-Rank Data Completion

Rank Determination for Low-Rank Data Completion Journal of Machine Learning Research 18 017) 1-9 Submitted 7/17; Revised 8/17; Published 9/17 Rank Determination for Low-Rank Data Completion Morteza Ashraphijuo Columbia University New York, NY 1007,

More information

Linear Dependent Dimensionality Reduction

Linear Dependent Dimensionality Reduction Linear Dependent Dimensionality Reduction Nathan Srebro Tommi Jaakkola Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Cambridge, MA 239 nati@mit.edu,tommi@ai.mit.edu

More information

Semi Supervised Distance Metric Learning

Semi Supervised Distance Metric Learning Semi Supervised Distance Metric Learning wliu@ee.columbia.edu Outline Background Related Work Learning Framework Collaborative Image Retrieval Future Research Background Euclidean distance d( x, x ) =

More information

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009 LMI MODELLING 4. CONVEX LMI MODELLING Didier HENRION LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ Universidad de Valladolid, SP March 2009 Minors A minor of a matrix F is the determinant of a submatrix

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

1-bit Matrix Completion. PAC-Bayes and Variational Approximation : PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational

More information

Learning Kernels -Tutorial Part III: Theoretical Guarantees.

Learning Kernels -Tutorial Part III: Theoretical Guarantees. Learning Kernels -Tutorial Part III: Theoretical Guarantees. Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute & Google Research mohri@cims.nyu.edu Afshin Rostami UC Berkeley

More information

High-dimensional Statistical Models

High-dimensional Statistical Models High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Collaborative Filtering: A Machine Learning Perspective

Collaborative Filtering: A Machine Learning Perspective Collaborative Filtering: A Machine Learning Perspective Chapter 6: Dimensionality Reduction Benjamin Marlin Presenter: Chaitanya Desai Collaborative Filtering: A Machine Learning Perspective p.1/18 Topics

More information

Class 2 & 3 Overfitting & Regularization

Class 2 & 3 Overfitting & Regularization Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Latent Dirichlet Allocation

Latent Dirichlet Allocation Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology

More information

Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization

Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization John Duchi Outline I Motivation 1 Uniform laws of large numbers 2 Loss minimization and data dependence II Uniform

More information

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability

More information

Systems of Linear Equations

Systems of Linear Equations LECTURE 6 Systems of Linear Equations You may recall that in Math 303, matrices were first introduced as a means of encapsulating the essential data underlying a system of linear equations; that is to

More information

Statistical Learning Learning From Examples

Statistical Learning Learning From Examples Statistical Learning Learning From Examples We want to estimate the working temperature range of an iphone. We could study the physics and chemistry that affect the performance of the phone too hard We

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 5: Multi-class and preference learning Juho Rousu 11. October, 2017 Juho Rousu 11. October, 2017 1 / 37 Agenda from now on: This week s theme: going

More information

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations

More information

Learning to Learn and Collaborative Filtering

Learning to Learn and Collaborative Filtering Appearing in NIPS 2005 workshop Inductive Transfer: Canada, December, 2005. 10 Years Later, Whistler, Learning to Learn and Collaborative Filtering Kai Yu, Volker Tresp Siemens AG, 81739 Munich, Germany

More information

Deep Boosting. Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) COURANT INSTITUTE & GOOGLE RESEARCH.

Deep Boosting. Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) COURANT INSTITUTE & GOOGLE RESEARCH. Deep Boosting Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Ensemble Methods in ML Combining several base classifiers

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.

More information

Implicit Optimization Bias

Implicit Optimization Bias Implicit Optimization Bias as a key to Understanding Deep Learning Nati Srebro (TTIC) Based on joint work with Behnam Neyshabur (TTIC IAS), Ryota Tomioka (TTIC MSR), Srinadh Bhojanapalli, Suriya Gunasekar,

More information

Recommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007

Recommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007 Recommender Systems Dipanjan Das Language Technologies Institute Carnegie Mellon University 20 November, 2007 Today s Outline What are Recommender Systems? Two approaches Content Based Methods Collaborative

More information

Decoupled Collaborative Ranking

Decoupled Collaborative Ranking Decoupled Collaborative Ranking Jun Hu, Ping Li April 24, 2017 Jun Hu, Ping Li WWW2017 April 24, 2017 1 / 36 Recommender Systems Recommendation system is an information filtering technique, which provides

More information

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

A fast randomized algorithm for approximating an SVD of a matrix

A fast randomized algorithm for approximating an SVD of a matrix A fast randomized algorithm for approximating an SVD of a matrix Joint work with Franco Woolfe, Edo Liberty, and Vladimir Rokhlin Mark Tygert Program in Applied Mathematics Yale University Place July 17,

More information

Computational and Statistical Learning theory

Computational and Statistical Learning theory Computational and Statistical Learning theory Problem set 2 Due: January 31st Email solutions to : karthik at ttic dot edu Notation : Input space : X Label space : Y = {±1} Sample : (x 1, y 1,..., (x n,

More information

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization Tim Roughgarden February 28, 2017 1 Preamble This lecture fulfills a promise made back in Lecture #1,

More information

Foundations of Machine Learning

Foundations of Machine Learning Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about

More information

Robustness Meets Algorithms

Robustness Meets Algorithms Robustness Meets Algorithms Ankur Moitra (MIT) ICML 2017 Tutorial, August 6 th CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic

More information

Basic Concepts in Linear Algebra

Basic Concepts in Linear Algebra Basic Concepts in Linear Algebra Grady B Wright Department of Mathematics Boise State University February 2, 2015 Grady B Wright Linear Algebra Basics February 2, 2015 1 / 39 Numerical Linear Algebra Linear

More information

Sparse and Low Rank Recovery via Null Space Properties

Sparse and Low Rank Recovery via Null Space Properties Sparse and Low Rank Recovery via Null Space Properties Holger Rauhut Lehrstuhl C für Mathematik (Analysis), RWTH Aachen Convexity, probability and discrete structures, a geometric viewpoint Marne-la-Vallée,

More information