Rank, Trace-Norm & Max-Norm
|
|
- Nigel Pope
- 5 years ago
- Views:
Transcription
1 Rank, Trace-Norm & Max-Norm as measures of matrix complexity Nati Srebro University of Toronto Adi Shraibman Hebrew University
2 Matrix Learning users movies ? ? 5 3? ? ? 5? ? 5 2? ? ? X model of the data Reconstructing latent signal gene expression (biological processes) Capturing structure in a corpus documents, images, etc (topics, etc) Prediction: collaborative filtering movie ratings Fit (partially) observed Y with X from hypothesis class of matrices Low Rank: { X rank(x) k } Low Trace-Norm: { X X tr B } Low Max-Norm: { X X max B } Low dimensional factorization Low norm factorization MMMF [NIPS 04] In this talk: The three hypothesis classes (measures of matrix complexity) Generalization error bounds (for predicting unobserved entries) Relationships between the three measures / hypothesis classes
3 Learning with Low Rank Matrices X rank k = U V Low Rank: { X rank(x) k } = { UV U n k, V m k } For binary target matrices, only care about A=sign(X) Dimensional Complexity of a binary matrix A: dc(a) = min rank(x) sign(x)=a
4 Learning with Low Rank Matrices: Geometric Interpretation hypotheses V dc(a) = min rank(x) rank(x)=a rows of U data points U columns of V What is the minimum dimension into which a concept class can be embedded as linear separators?
5 Max-Margin Matrix Factorization: Bound norms of U,V instead of their dimensionality low norm U V bound norms uniformly: (max i U i 2 ) (max j V j 2 ) R 2 rows of U columns of V For each Y ij ±1: Y ij X ij Margin U i,v j
6 Max-Margin Matrix Factorization: Bound norms of U,V instead of their dimensionality low norm n U m V bound norms uniformly: (max i U i 2 ) (max j V j 2 ) R 2 X max = min (max i U i )(max j V j ) X=UV bound norms on avarage: ( i U i 2 ) ( j V j 2 ) nmr 2 X Σ = min U Fro V Fro X=UV Optimize V given U: each column is a SVM For each Y ij ±1: Y ij X ij 1 U i,v j
7 Max-Margin Matrix Factorization: Bound norms of U,V instead of their dimensionality low norm n U m V bound norms uniformly: (max i U i 2 ) (max j V j 2 ) R 2 X max = min (max i U i )(max j V j ) X=UV mc(a) = min X max A ij X ij 1 bound norms on avarage: ( i U i 2 ) ( j V j 2 ) nmr 2 X Σ = min U Fro V Fro X=UV ac(a) = min X nm Σ A ij X ij 1 For each Y ij ±1: Y ij X ij 1 U i,v j
8 Three Measures of Matrix Complexity Used for fitting observed data matrices bound norms uniformly: (max i U i 2 ) (max j V j 2 ) R 2 For matrices representing concept classes: 1/margin required for embedding as linear classifiers Convex! (semi-definite programming) [NIPS 04] Not convex dimension required for embedding as linear classifiers X max = min (max i U i )(max j V j ) X=UV mc(a) = min X max A ij X ij 1 bound norms on avarage: ( i U i 2 ) ( j V j 2 ) nmr 2 X Σ = min U Fro V Fro X=UV ac(a) = min X nm Σ A ij X ij 1 bound dimensionality of U,V: dc(a) = min rank(x) A ij X ij >0
9 Outline Three measures of matrix complexity Generalization error bounds Relationships between the three measures
10 Generalization Error Bounds D(X;Y) = { ij X ij Y ij 0 } / nm generalization error Assuming a low-rank structure (eigengap): Asymptotic behavior [Azar+01] Sample complexity, query strategy [Drineas+02] X Y S
11 Generalization Error Bounds D(X;Y) = { ij X ij Y ij 0 } / nm generalization error D S (X;Y) = { ij S X ij Y ij 0 } / S empirical error Y Pr S ( X D(X;Y)<D S (X;Y)+ε ) > 1-δ hypothesis X Y S source distribution training set random unknown, assumption-free
12 Generalization Error Bounds: Learning with Low Rank Matrices D(X;Y) = { ij X ij Y ij 0 } / nm generalization error D S (X;Y) = { ij S X ij Y ij <1 } / S empirical error Y Pr S ( X D(X;Y)<D S (X;Y)+ε ) > 1-δ = { X rank(x) k } Only signs matter, equivalent to using the class: { A ±1 n m A=sign(X), rank(x) k } = { A ±1 n m dc(a) k } Finite class of size: { A ±1 n m dc(a) k } (8em/k) k(n+m) Union bound yields generalization bound with: ε = 8em k( n + m)log + log 1 2 S k δ [S,Alon,Jaakkola 04] [BenDavid,Eiron,Simon01]: most concept classes have high dc(a), i.e. cannot be embedded as low-dim linear classifiers
13 Generalization Error Bounds: Learning with Low Trace-Norm Matrices D(X;Y) = { ij X ij Y ij 0 } / nm generalization error D S (X;Y) = { ij S X ij Y ij <1 } / S empirical error = { X X 2 Σ < nmr 2 } Y Pr S ( X D(X;Y)<D S (X;Y)+ε ) > 1-δ = convex-hull( { uv u n, v m u = v =1 } ) nm R X Σ = min X=UV U Fro V Fro = (singular values of X) s i = X Σ s i u i v i outer product of norm-1 vectors: rank-1 norm-1 matrix unit norm columns S unit U norm rows V X
14 Generalization Error Bounds: Learning with Low Trace-Norm Matrices D(X;Y) = { ij X ij Y ij 0 } / nm generalization error D S (X;Y) = { ij S X ij Y ij <1 } / S empirical error = { X X 2 Σ < nmr 2 } Y Pr S ( X D(X;Y)<D S (X;Y)+ε ) > 1-δ = convex-hull( { uv u n, v m u = v =1 } ) nm R Rademacher complexity of { uv u R n, v R m u = v =1} Rademacher complexity of { X X 2 Σ < nmr 2 } Generalization error bounds for { X X 2 Σ < nmr 2 } E S E rand signs σ [ sup X=uv ij s S σ ij s X(s) ij ]/ S S n m σ
15 Generalization Error Bounds: Learning with Low Trace-Norm Matrices D(X;Y) = { ij X ij Y ij 0 } / nm generalization error D S (X;Y) = { ij S X ij Y ij <1 } / S empirical error = { X X 2 Σ < nmr 2 } Y Pr S ( X D(X;Y)<D S (X;Y)+ε ) > 1-δ = convex-hull( { uv u n, v m u = v =1 } ) nm R Rademacher complexity of { uv u R n, v R m u = v =1} Rademacher complexity of { X X 2 Σ < nmr 2 } Generalization error bounds for { X X 2 Σ < nmr 2 } E S E rand signs σ [ sup X=uv ij S σ ij X ij = E[ σ 2 ]/ S ij 3 2 K ( n + m)log use [Seginer00] bound on singular values of random matrix ij n ]/ S S ε = K R ( n + m)log n + log 1 S δ
16 Generalization Error Bounds: Learning with Low Max-Norm Matrices D(X;Y) = { ij X ij Y ij 0 } / nm generalization error D S (X;Y) = { ij S X ij Y ij <1 } / S empirical error Y Pr S ( X D(X;Y)<D S (X;Y)+ε ) > 1-δ = { X X max < R } = R convex-hull(??? ) conv( { uv u ±1 n, v ±1m} ) { X X max 1 } 1.79 conv( { uv u ±1 n, v ±1m} ) Grothendiek's Inequality ε = 12 R 2 ( n + m) + log S 1 δ
17 Generalization Error Bounds: Low Trace-Norm, Max-Norm or Rank D(X;Y) = { ij X ij Y ij 0 } / nm generalization error D S (X;Y) = { ij S X ij Y ij <1 } / S empirical error Y Pr S ( X D(X;Y)<D S (X;Y)+ε ) > 1-δ = { X R n m rank(x) k } dc(a) = min rank(x) A ij X ij >0 ε = 8en k( n + m)log + log 1 2 S k δ = { X R n m X 2 Σ/nm R 2 } ac 2 (A) = min X 2 Σ/nm A ij X ij 1 ε = K R ( n + m)log n + log 1 S δ = { X R n m X 2 max R 2 } mc 2 (A) = min X 2 max A ij X ij 1 ε = 12 R 2 ( n + m) + log S 1 δ
18 Relationship between average margin complexity, max margin complexity and dimensional complexity average maximum ac 2 (A) mc 2 (A) dc(a) 10 mc 2 (A) log(3nm) Randomly project high-dimensional large margin arrangement to obtain low dimensional arrangement [Arriaga,Vempala99] Reverse inequalities? Gaps?
19 Between the Rank and the Max-Norm Low margin complexity Low dimensional complexity: dc(a) 10 mc 2 (A) log(3nm) ½ (k-1)n log(m) log { A ±1 n m dc(a) k } k(n+m) log(8en/k) R 2 n log(n/r) log { A ±1 n m mc(a) R } 10 R 2 (n+m) log(3nm) log(n/r) Low dimensional complexity? Low margin complexity: A, ( dc(a) log(n) ) p mc 2 (A) A ij = Parity(bits(i)&bits(j)) mc 2 (T n )=Θ(log n) dc(t n )=2 [BenDavid Eiron Simon 01] mc 2 (H n )=n n 0.5 dc(h n )<n 0.8 [Forster et al 02,03] Kronecker exponent of triangular matrices
20 Between the Rank and the Max-Norm Low margin complexity Low dimensional complexity: dc(a) 10 mc 2 (A) log(3nm) ½ (k-1)n log(m) log { A ±1 n m dc(a) k } k(n+m) log(8en/k) R 2 n log(n/r) log { A ±1 n m mc(a) R } 10 R 2 (n+m) log(3nm) log(n/r) Low dimensional complexity Low margin complexity: A, ( dc(a) log(n) ) p mc 2 (A) Low max-norm similar matrix with low rank: rank(x ) 9/ε 2 X 2 max log(3nm) for some X -X <ε dc(a) k=10m 2 log(3nm) mc(a) M Open: low dc(a)? low mc 2 (A ) for some A A low rank(x)? low X 2 max for some low X -X 1
21 Between the Trace-Norm and Max-Norm r n ac 2 (A) mc 2 (A) n r Anything ac 2 (A) < 2r 3 /n 2 + 2
22 Between the Trace-Norm and Max-Norm r n ac 2 (A) mc 2 (A) ac 2 (A) < 2r 3 /n r n For r=n 2/3, we can get: ac 2 (A) < 4 while: mc 2 (A) > n 2/3 and: dc 2 (A) > n 1/3
23 Random Sampling Assumption for Generalization Error Bounds D(X;Y) = P ij (X ij Y ij 0) Y Pr S ( X D(X;Y)<D S (X;Y)+ε ) > 1-δ D S (X;Y) = { ij S X ij Y ij <1 } / S = { X R n m X 2 Σ/nm R 2 } ε = K R Y ( n + m)log n + log 1 S S δ random unknown, assumption-free Requires uniform sampling of entries = { X R n m X 2 max R 2 } = { X R n m rank(x) k } ε = 12 ε = R 2 ( n + m) + log S 2 S 1 δ 8en k( n + m)log + log 1 k δ Applies to any distribution over index pairs ij (which entries are observed)
24 Between the Trace-Norm and Max-Norm r n ac 2 (A) mc 2 (A) ac 2 (A) < 2r 3 /n r n For r=n 2/3, we can get: ac 2 (A) < 4 while: mc 2 (A) > n 2/3 and: dc 2 (A) > n 1/3 This is the largest gap possible: mc 2 (A) 9(ac 2 (A) n n) 1/3 Using similar techniques: (R 4/3-2)n 4/3 log { A ±1 n m ac(a) R } 7 R 2/3 n 5/3 log(3nm) log(n/m 2 )
25 Rank, Max-Norm and Trace-Norm as complexity measures for fitting Data Matrix rank(x) = min dim(u,v) X=UV dc(a) = min rank(x) A ij X ij >0 X max = min (max i U i )(max j V j ) X=UV mc(a) = min X max A ij X ij 1 X Σ = min U Fro V Fro X=UV ac(a) = min A ij X ij 1 X Σ nm (dc(a), mc(a) previously studied as embedability of a concept class) Generalization error bounds scale with: k=rank(x) R 2 = X 2 max R 2 = X 2 Σ /nm Relationships: dc(a) 10 mc 2 (A) log(3nm), but A, ( dc(a) log(n) ) p < ac 2 (A) mc 2 (A) ac 2 (A) mc 2 (A) 9(ac 2 (A) n n) 1/3, and this is tight Open issues: low dc(a) low mc 2 (A ) for some A A? low rank(x) low X 2 max for some bounded X -X 1? dc(a) = O(mc 2 (A))? also tighten estimate of { A ±1 n m mc(a) R } ; Median mc(a)? Tighten polynomial gap in bounds on log( { A ±1 n m ac(a) R } )
Maximum Margin Matrix Factorization
Maximum Margin Matrix Factorization Nati Srebro Toyota Technological Institute Chicago Joint work with Noga Alon Tel-Aviv Yonatan Amit Hebrew U Alex d Aspremont Princeton Michael Fink Hebrew U Tommi Jaakkola
More informationLearning with Matrix Factorizations
Learning with Matrix Factorizations Nati Srebro Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Adviser: Tommi Jaakkola (MIT) Committee: Alan Willsky (MIT),
More informationRank, Trace-Norm and Max-Norm
Rank, Trace-Norm and Max-Norm Nathan Srebro 1 and Adi Shraibman 2 1 University of Toronto Department of Computer Science, Toronto ON, Canada 2 Hebrew University Institute of Computer Science, Jerusalem,
More informationMatrix Factorizations: A Tale of Two Norms
Matrix Factorizations: A Tale of Two Norms Nati Srebro Toyota Technological Institute Chicago Maximum Margin Matrix Factorization S, Jason Rennie, Tommi Jaakkola (MIT), NIPS 2004 Rank, Trace-Norm and Max-Norm
More informationGeneralization Error Bounds for Collaborative Prediction with Low-Rank Matrices
Generalization Error Bounds for Collaborative Prediction with Low-Rank Matrices Nathan Srebro Department of Computer Science University of Toronto Toronto, ON, Canada nati@cs.toronto.edu Noga Alon School
More informationAn Approximation Algorithm for Approximation Rank
An Approximation Algorithm for Approximation Rank Troy Lee Columbia University Adi Shraibman Weizmann Institute Conventions Identify a communication function f : X Y { 1, +1} with the associated X-by-Y
More informationGeneralization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh
Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds
More informationEE 381V: Large Scale Learning Spring Lecture 16 March 7
EE 381V: Large Scale Learning Spring 2013 Lecture 16 March 7 Lecturer: Caramanis & Sanghavi Scribe: Tianyang Bai 16.1 Topics Covered In this lecture, we introduced one method of matrix completion via SVD-based
More informationRank minimization via the γ 2 norm
Rank minimization via the γ 2 norm Troy Lee Columbia University Adi Shraibman Weizmann Institute Rank Minimization Problem Consider the following problem min X rank(x) A i, X b i for i = 1,..., k Arises
More informationA Simple Algorithm for Nuclear Norm Regularized Problems
A Simple Algorithm for Nuclear Norm Regularized Problems ICML 00 Martin Jaggi, Marek Sulovský ETH Zurich Matrix Factorizations for recommender systems Y = Customer Movie UV T = u () The Netflix challenge:
More informationA Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices
A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22
More information1-Bit Matrix Completion
1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationNonnegative Matrix Factorization
Nonnegative Matrix Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationWeighted Low Rank Approximations
Weighted Low Rank Approximations Nathan Srebro and Tommi Jaakkola Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Weighted Low Rank Approximations What is
More information1-Bit Matrix Completion
1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More informationUsing Friendly Tail Bounds for Sums of Random Matrices
Using Friendly Tail Bounds for Sums of Random Matrices Joel A. Tropp Computing + Mathematical Sciences California Institute of Technology jtropp@cms.caltech.edu Research supported in part by NSF, DARPA,
More informationConcentration-Based Guarantees for Low-Rank Matrix Reconstruction
JMLR: Workshop and Conference Proceedings 9 20 35 339 24th Annual Conference on Learning Theory Concentration-Based Guarantees for Low-Rank Matrix Reconstruction Rina Foygel Department of Statistics, University
More information1-Bit Matrix Completion
1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible
More informationSpectral gaps and geometric representations. Amir Yehudayoff (Technion)
Spectral gaps and geometric representations Amir Yehudayoff (Technion) Noga Alon (Tel Aviv) Shay Moran (Simons) pseudorandomness random objects have properties difficult to describe hard to compute expanders
More informationBinary matrix completion
Binary matrix completion Yaniv Plan University of Michigan SAMSI, LDHD workshop, 2013 Joint work with (a) Mark Davenport (b) Ewout van den Berg (c) Mary Wootters Yaniv Plan (U. Mich.) Binary matrix completion
More informationSpectral k-support Norm Regularization
Spectral k-support Norm Regularization Andrew McDonald Department of Computer Science, UCL (Joint work with Massimiliano Pontil and Dimitris Stamos) 25 March, 2015 1 / 19 Problem: Matrix Completion Goal:
More informationSharp Generalization Error Bounds for Randomly-projected Classifiers
Sharp Generalization Error Bounds for Randomly-projected Classifiers R.J. Durrant and A. Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/ axk
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 12: Weak Learnability and the l 1 margin Converse to Scale-Sensitive Learning Stability Convex-Lipschitz-Bounded Problems
More informationUsing SVD to Recommend Movies
Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last
More informationStatistical Performance of Convex Tensor Decomposition
Slides available: h-p://www.ibis.t.u tokyo.ac.jp/ryotat/tensor12kyoto.pdf Statistical Performance of Convex Tensor Decomposition Ryota Tomioka 2012/01/26 @ Kyoto University Perspectives in Informatics
More informationLecture 3. Random Fourier measurements
Lecture 3. Random Fourier measurements 1 Sampling from Fourier matrices 2 Law of Large Numbers and its operator-valued versions 3 Frames. Rudelson s Selection Theorem Sampling from Fourier matrices Our
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:
More informationProbabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,
More informationReducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning
More informationProvable Alternating Minimization Methods for Non-convex Optimization
Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationProbabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Adrien Todeschini Inria Bordeaux JdS 2014, Rennes Aug. 2014 Joint work with François Caron (Univ. Oxford), Marie
More informationLearning Weighted Automata
Learning Weighted Automata Joint work with Borja Balle (Amazon Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Weighted Automata (WFAs) page 2 Motivation Weighted automata (WFAs): image
More informationMulti-stage convex relaxation approach for low-rank structured PSD matrix recovery
Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Department of Mathematics & Risk Management Institute National University of Singapore (Based on a joint work with Shujun
More informationLecture 15: Random Projections
Lecture 15: Random Projections Introduction to Learning and Analysis of Big Data Kontorovich and Sabato (BGU) Lecture 15 1 / 11 Review of PCA Unsupervised learning technique Performs dimensionality reduction
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU10701 11. Learning Theory Barnabás Póczos Learning Theory We have explored many ways of learning from data But How good is our classifier, really? How much data do we
More informationHigh-dimensional Statistics
High-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High Dimensional Data : Large p, small n 2. Sparsity 3. Group Sparsity 4. Low Rank 1 Curse of Dimensionality Statistical Learning: Given
More informationInformation Retrieval
Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices
More informationCollaborative Filtering Matrix Completion Alternating Least Squares
Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 19, 2016
More informationRecovering any low-rank matrix, provably
Recovering any low-rank matrix, provably Rachel Ward University of Texas at Austin October, 2014 Joint work with Yudong Chen (U.C. Berkeley), Srinadh Bhojanapalli and Sujay Sanghavi (U.T. Austin) Matrix
More informationA Constraint Generation Approach to Learning Stable Linear Dynamical Systems
A Constraint Generation Approach to Learning Stable Linear Dynamical Systems Sajid M. Siddiqi Byron Boots Geoffrey J. Gordon Carnegie Mellon University NIPS 2007 poster W22 steam Application: Dynamic Textures
More informationOptimistic Rates Nati Srebro
Optimistic Rates Nati Srebro Based on work with Karthik Sridharan and Ambuj Tewari Examples based on work with Andy Cotter, Elad Hazan, Tomer Koren, Percy Liang, Shai Shalev-Shwartz, Ohad Shamir, Karthik
More information6.034 Introduction to Artificial Intelligence
6.34 Introduction to Artificial Intelligence Tommi Jaakkola MIT CSAIL The world is drowning in data... The world is drowning in data...... access to information is based on recommendations Recommending
More informationDistance Geometry-Matrix Completion
Christos Konaxis Algs in Struct BioInfo 2010 Outline Outline Tertiary structure Incomplete data Tertiary structure Measure diffrence of matched sets Def. Root Mean Square Deviation RMSD = 1 n x i y i 2,
More informationMachine Learning 4771
Machine Learning 477 Instructor: Tony Jebara Topic 5 Generalization Guarantees VC-Dimension Nearest Neighbor Classification (infinite VC dimension) Structural Risk Minimization Support Vector Machines
More informationLUCK S THEOREM ALEX WRIGHT
LUCK S THEOREM ALEX WRIGHT Warning: These are the authors personal notes for a talk in a learning seminar (October 2015). There may be incorrect or misleading statements. Corrections welcome. 1. Convergence
More informationLow Rank Matrix Completion Formulation and Algorithm
1 2 Low Rank Matrix Completion and Algorithm Jian Zhang Department of Computer Science, ETH Zurich zhangjianthu@gmail.com March 25, 2014 Movie Rating 1 2 Critic A 5 5 Critic B 6 5 Jian 9 8 Kind Guy B 9
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element
More informationStructured matrix factorizations. Example: Eigenfaces
Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix
More informationCompressed Sensing and Neural Networks
and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationDirect product theorem for discrepancy
Direct product theorem for discrepancy Troy Lee Rutgers University Joint work with: Robert Špalek Direct product theorems Knowing how to compute f, how can you compute f f f? Obvious upper bounds: If can
More informationCollaborative Filtering
Case Study 4: Collaborative Filtering Collaborative Filtering Matrix Completion Alternating Least Squares Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin
More informationLearning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14
Learning Theory Piyush Rai CS5350/6350: Machine Learning September 27, 2011 (CS5350/6350) Learning Theory September 27, 2011 1 / 14 Why Learning Theory? We want to have theoretical guarantees about our
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 2: Orthogonal Vectors and Matrices; Vector Norms Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 11 Outline 1 Orthogonal
More informationRobust Statistics, Revisited
Robust Statistics, Revisited Ankur Moitra (MIT) joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart CLASSIC PARAMETER ESTIMATION Given samples from an unknown
More informationRank Determination for Low-Rank Data Completion
Journal of Machine Learning Research 18 017) 1-9 Submitted 7/17; Revised 8/17; Published 9/17 Rank Determination for Low-Rank Data Completion Morteza Ashraphijuo Columbia University New York, NY 1007,
More informationLinear Dependent Dimensionality Reduction
Linear Dependent Dimensionality Reduction Nathan Srebro Tommi Jaakkola Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Cambridge, MA 239 nati@mit.edu,tommi@ai.mit.edu
More informationSemi Supervised Distance Metric Learning
Semi Supervised Distance Metric Learning wliu@ee.columbia.edu Outline Background Related Work Learning Framework Collaborative Image Retrieval Future Research Background Euclidean distance d( x, x ) =
More informationLMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009
LMI MODELLING 4. CONVEX LMI MODELLING Didier HENRION LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ Universidad de Valladolid, SP March 2009 Minors A minor of a matrix F is the determinant of a submatrix
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More information1-bit Matrix Completion. PAC-Bayes and Variational Approximation
: PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational
More informationLearning Kernels -Tutorial Part III: Theoretical Guarantees.
Learning Kernels -Tutorial Part III: Theoretical Guarantees. Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute & Google Research mohri@cims.nyu.edu Afshin Rostami UC Berkeley
More informationHigh-dimensional Statistical Models
High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationCollaborative Filtering: A Machine Learning Perspective
Collaborative Filtering: A Machine Learning Perspective Chapter 6: Dimensionality Reduction Benjamin Marlin Presenter: Chaitanya Desai Collaborative Filtering: A Machine Learning Perspective p.1/18 Topics
More informationClass 2 & 3 Overfitting & Regularization
Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationLatent Dirichlet Allocation
Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology
More informationUniform concentration inequalities, martingales, Rademacher complexity and symmetrization
Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization John Duchi Outline I Motivation 1 Uniform laws of large numbers 2 Loss minimization and data dependence II Uniform
More informationCS 231A Section 1: Linear Algebra & Probability Review
CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability
More informationSystems of Linear Equations
LECTURE 6 Systems of Linear Equations You may recall that in Math 303, matrices were first introduced as a means of encapsulating the essential data underlying a system of linear equations; that is to
More informationStatistical Learning Learning From Examples
Statistical Learning Learning From Examples We want to estimate the working temperature range of an iphone. We could study the physics and chemistry that affect the performance of the phone too hard We
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 5: Multi-class and preference learning Juho Rousu 11. October, 2017 Juho Rousu 11. October, 2017 1 / 37 Agenda from now on: This week s theme: going
More informationCS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang
CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations
More informationLearning to Learn and Collaborative Filtering
Appearing in NIPS 2005 workshop Inductive Transfer: Canada, December, 2005. 10 Years Later, Whistler, Learning to Learn and Collaborative Filtering Kai Yu, Volker Tresp Siemens AG, 81739 Munich, Germany
More informationDeep Boosting. Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) COURANT INSTITUTE & GOOGLE RESEARCH.
Deep Boosting Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Ensemble Methods in ML Combining several base classifiers
More informationhttps://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:
More informationCollaborative topic models: motivations cont
Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.
More informationImplicit Optimization Bias
Implicit Optimization Bias as a key to Understanding Deep Learning Nati Srebro (TTIC) Based on joint work with Behnam Neyshabur (TTIC IAS), Ryota Tomioka (TTIC MSR), Srinadh Bhojanapalli, Suriya Gunasekar,
More informationRecommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007
Recommender Systems Dipanjan Das Language Technologies Institute Carnegie Mellon University 20 November, 2007 Today s Outline What are Recommender Systems? Two approaches Content Based Methods Collaborative
More informationDecoupled Collaborative Ranking
Decoupled Collaborative Ranking Jun Hu, Ping Li April 24, 2017 Jun Hu, Ping Li WWW2017 April 24, 2017 1 / 36 Recommender Systems Recommendation system is an information filtering technique, which provides
More informationCOMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection
COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationA fast randomized algorithm for approximating an SVD of a matrix
A fast randomized algorithm for approximating an SVD of a matrix Joint work with Franco Woolfe, Edo Liberty, and Vladimir Rokhlin Mark Tygert Program in Applied Mathematics Yale University Place July 17,
More informationComputational and Statistical Learning theory
Computational and Statistical Learning theory Problem set 2 Due: January 31st Email solutions to : karthik at ttic dot edu Notation : Input space : X Label space : Y = {±1} Sample : (x 1, y 1,..., (x n,
More informationCS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization
CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization Tim Roughgarden February 28, 2017 1 Preamble This lecture fulfills a promise made back in Lecture #1,
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationRobustness Meets Algorithms
Robustness Meets Algorithms Ankur Moitra (MIT) ICML 2017 Tutorial, August 6 th CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic
More informationBasic Concepts in Linear Algebra
Basic Concepts in Linear Algebra Grady B Wright Department of Mathematics Boise State University February 2, 2015 Grady B Wright Linear Algebra Basics February 2, 2015 1 / 39 Numerical Linear Algebra Linear
More informationSparse and Low Rank Recovery via Null Space Properties
Sparse and Low Rank Recovery via Null Space Properties Holger Rauhut Lehrstuhl C für Mathematik (Analysis), RWTH Aachen Convexity, probability and discrete structures, a geometric viewpoint Marne-la-Vallée,
More information