Efficient Spectral Methods for Learning Mixture Models!

Size: px
Start display at page:

Download "Efficient Spectral Methods for Learning Mixture Models!"

Transcription

1 Efficient Spectral Methods for Learning Mixture Models Qingqing Huang 2016 February Laboratory for Information & Decision Systems Based on joint works with Munther Dahleh, Rong Ge, Sham Kakade, Greg Valiant. 1

2 2 Learning Data generation X() Learning Infer about the underlying rule θ (Estimation, approximation, property testing, optimization of f(θ) ) Challenge: Exploit our prior for structure of the underlying θ to design fast algorithm that uses as few as possible data X to achieve the target accuracy in learning θ Computation Complexity Sample Complexity dim()

3 3 Learning Mixture Models Samples Hidden variable H 2 {1,...,K} Observed variables X =(X 1,X 2,...,X M ) 2 X Marginal distribution of the observables is a superposition of simple distributions Pr(X) = KX k=1 Pr(H = k) {z } Pr(X H = k) {z } = ( #mixture components, mixing weights, conditional probabilities ) Given N i.i.d. samples of observable variables, estimate the model parameters b apple b

4 4 Examples of Mixture Models Gaussian Mixtures (GMMs) Cluster data points in space Topic Models (Bag of Words) Topic words in each document

5 5 Examples of Mixture Models Hidden Markov Models (HMM) Current state Past, current, future observations Super-Resolution Source Complex sinusoids

6 6 Learning Mixture Models Samples Hidden variable H 2 {1,...,K} Observed variables X =(X 1,X 2,...,X M ) 2 X Given N i.i.d. samples of observable variables, estimate the model parameters b What do we know about sample complexity and computation complexity? There exist exponential lower bounds for sample complexity (worst case analysis) Maximum Likelihood Estimation is non-convex optimization (EM is heuristics)

7 7 Challenges in Learning Mixture Models There exist exponential lower bounds for sample complexity (worst case analysis) Can we learn non-worst-cases with provably efficient algorithms? Maximum Likelihood Estimation is non-convex optimization (EM is heuristics) Can we achieve statistical efficiency with tractable computation?

8 8 Contribution / Outline of the talk There exist exponential lower bounds for sample complexity (worst case analysis) Can we learn non-worst-cases with provably efficient algorithms? ü Spectral algorithms for learning GMMs, HMMs, Super-resolution ü Go beyond worst cases with randomness in analysis and algorithm Maximum Likelihood Estimation is non-convex optimization (EM is heuristics) Can we achieve statistical efficiency with tractable computation? ü Denoising low rank probability matrix with linear sample complexity (minmax optimal)

9 9 Paradigm of Spectral Algorithms for Learning Mixture models Mixture of conditional distributions Samples X() N = Easy inverse problems Compute statistics: e.g. Moments, Marginal probability Spectral decomposition to separate the mixtures A B C = M Mixture of rank-one matrices/tensors

10 Paradigm of Spectral Algorithms for Learning Samples X() N finite Compute statistics Easy inverse problems b ba b B b C = c M Spectral decomposition A B C = M ü PCA, Spectral clustering, Subspace system ID, fit into this paradigm ü Problem specific algorithm design (what statistics M to use?) analysis (is the spectral decomposition stable?) 10

11 11 PART 1 Provably efficient spectral algorithms for learning mixture models 1.1 Learn GMMs: Go beyond worst cases by smoothed analysis 1.2 Learn HMMs: Go beyond worst cases by generic analysis 1.3 Super-resolution: Go beyond worst cases by randomized algorithm

12 Learn GMMs: Setup Cluster M-dim data points [k] x 2 R n mixture of k multivariate Gaussians data points in n-dimensional space Model Parameters: weights w i means µ (i) covariance matrices (i) x = N (µ (i), (i) ), i w i

13 Learn GMMs: Prior Works General case Moment matching method [Moitra&Valiant] [Belkin&Sinha] Poly(n, e O(k)k ) With restrictive assumptions on model parameters Mean vectors are well-separated Pair wise clustering [Dasgupta] [Vempala&Wang] Poly(n, k) Mean vectors of spherical Gaussians are linearly independent Moments tensor decomposition [Hsu&Kakade] Poly(n, k)

14 1.1 Learn GMMs: Worst case lower bound Can we learn every GMM instance to target accuracy in poly runtime and using poly samples? No Exponential dependence in k for worst cases. [Moitra&Valiant] Can we learn most GMM instances with poly algorithm? without restrictive assumptions on model parameters Yes Worst cases are not everywhere, and we can handle non-worst-cases 14

15 1.1 Learn GMMs: Smoothed Analysis Framework Escape from the worst cases For an arbitrary instance Nature perturbs the parameters with a small amount (ρ) of noise e Observe data generated by e, design and analyze algorithm for e Hope With high probability over nature s perturbation, any arbitrary instance escapes from the degenerate cases, and becomes well conditioned. ü Bridge worst case and average case algo analysis [Spielman&Teng] ü A stronger notion than generic analysis Our Goal: Given samples from perturbed GMM, learn the perturbed parameters with negligible failure probability over nature s perturbation 15

16 1.1 Learn GMMs Main Results Our algorithm learns the GMM parameters up to accuracy ε With fully polynomial time and sample complexity Assumption: data in high enough dimension Poly(n, k, 1/ ) n = (k 2 ) Under smoothed analysis: works with negligible failure probability Learning Mixture of Gaussians in High dimensions R. Ge, H, S. Kakade (STOC 2015) 16

17 1.1 Learn GMMs: Algorithmic Ideas Method of moments: match 4-th and 6-th order moments M 4 M 6 Key challenge: Moment tensors are not of low rank, but they have special structures X() b ba b B b C = c M A B C = M M 4 = E[x 4 ] M 6 = E[x 6 ] X 4 = X 6 = kx (i) (i), i=1 kx (i) (i) (i). i=1 Structured linear projection M 4 = F 4 (X 4 ) M 6 = F 6 (X 6 ) Moment tensors are structured linear projections of desired low rank tensors Delicate algorithm to invert the structured linear projections 17

18 Learn GMMs: Algorithmic Ideas Method of moments: match 4-th and 6-th order moments M 4 M 6 Key challenge: Moment tensors are not of low rank, but they have special structures X() b ba b B b C = c M A B C = M M 4 = E[x 4 ] M 6 = E[x 6 ] Why high dimension n & smoothed analysis help us to learn? We have many moment matching constraints with only low order moments # free parameters (kn 2 ) < #6-th moments (n 6 ) The randomness in nature s perturbation makes matrices/tensors well-conditioned Gaussian matrix X 2 R n m, with prob at least 1 O( n ) m(x) p n.

19 19 PART 1 Provably efficient spectral algorithms for learning mixture models 1.1 Learn GMMs: Go beyond worst cases by smoothed analysis 1.2 Learn HMMs: Go beyond worst cases by generic analysis 1.3 Super-resolution: Go beyond worst cases by randomized algorithm

20 1.2 Learn HMMs: Setup h t n h t 1 h t h t+1 h t+n Hidden state [k] x t n x t x t 1 x t+1 x t+n Observation [d] N = 2n+1 window size Transition probability matrix: Observation probabilities: Q 2 R k k O 2 R d k Given length-n sequences of observation, how to recover Our focus: How large the window size N needs to be? =(Q, O) 20

21 1.2 Learn HMMs: Hidden state [k] Observation [d] N = 2n+1 window size Hardness Results HMM is not efficiently PAC learnable, under noisy parity assumption Construct an instance with reduction to parity of noise Required window size N = (k), Algorithm Complexity is [Abe,Warmuth] [Kearns] (d k ) Excluding a measure 0 set in the parameter space of Our Result for all most all HMM instances, the required window size is =(Q, O) Spectral algo achieves sample complexity and runtime both N = (log d k) poly(d, k) Minimal Realization Problems for Hidden Markov Models H, R. Ge, S. Kakade, M. Dahleh (IEEE Transactions on Signal Processing, 2016) 21

22 1.2 Learn HMMs: Algorithmic idea b X() ba B b C b = M c A B C = M M =Pr (x n 1,...,x 1 ),x 0, (x 1,...,x n ) 2 R dn d n d 1. is a low rank tensor of rank k M A =Pr x 1,x 2,...,x n h 0 2. Extract Q, O from tensor factors A B B =Pr x 1,x 2,...,x n h 0 C =Pr x 0,h 0 A =(O (O (O {z...(o O Q)...)Q)Q)Q, } {z } n n eq)...) Q) e Q) e Q e, B =(O (O (O {z...(o O } n {z } n Key challenge: C = Odiag( ) How large window size needs to be, so that we have unique tensor decomp Our careful generic analysis: If N = (log d k), worst cases all lie in a measure 0 set 22

23 23 PART 1 Provably efficient spectral algorithms for learning mixture models 1.1 Learn GMMs: Go beyond worst cases by smoothed analysis 1.2 Learn HMMs: Go beyond worst cases by generic analysis 1.3 Super-resolution: Go beyond worst cases by randomized algorithm

24 1.3 Super-Resolution: Setup Band-limited X() * = Take Fourier measurement of the band-limited observation How to recover the point sources with coarse measurement of the signal? ü small number of Fourier measurements ü Low cutoff frequency 24

25 Super-Resolution: Problem Formulation ü Recover point sources (a mixture of k vectors in d-dimensional space) x(t) = P k j=1 w j µ (j). Assume minimum separation =min j6=j 0 kµ (j) µ (j0) k 2 ü Use bandlimited and noisy Fourier measurements. ef(s) = ksk 1 apple cuto freq kx j=1 w j e i <µ(j),s> + z(s) bounded noise z(s) apple z, 8s ü Achieve target accuracy kbµ (j) µ (j) k 2 apple, 8j 2 [k]

26 Super-Resolution: kx ef(s) = w j e i <µ(j),s> + z(s) j=1 Prior Works =min j6=j 0 kµ (j) µ (j0) k 2 1-dimensional µ (j) Take uniform measurements on the grid s 2 { N,..., 1, 0, 1,...,N} SDP algorithm with cut-off frequency N = ( 1 ) [Candes, Fernandez-Granda] Hardness result N> C [Moitra] k log(k) One can use random measurements to recover 2N measurements [Tang, Bhaskar, Shah, Recht] d-dimensional µ (j) Mult-dim grid Algorithm complexity s 2 { N,..., 1, 0, 1,...,N} d O poly(k, 1 ) d

27 1.3 Super-Resolution: Main Result Our algorithm achieves stable recovery using a number of O((k + d) 2 ) Fourier measurements cutoff freq of the measurements bounded by algorithm runtime O((k + d) 3 ) O(1/ ) algorithm works with negligible failure probability cuto freq measurements runtime SDP C d 1 ( 1 1 )d poly(( 1 1 )d,k) MP Ours log(kd) (k log(k)+d) 2 (k log(k)+d) 2 Super-Resolution off the Grid H, S. Kakade (NIPS 2015) 27

28 1.3 Super-Resolution: ef(s) = Algorithmic Idea kx w j e i <µ(j),s> + z(s) j=1 X() ba b B b C = c M A B C = M ü Tensor decomposition with measurements on random frequencies d ü Random samples s such that F admits particular low rank decomp b V S = F = V S 0 V S 0 (V 2 D w ), e i <µ(1),s (1)>... e i <µ(k),s (1) > e i <µ(1),s (2)>... e i <µ(k),s (2) >..... e i <µ(1),s (m)>... e i <µ(k),s (m) > (Rank-k 3-way tensor) d d 2 (Vandermonde Matrix with complex nodes) d k ü Skip intermediate step of recovering (N d ) ü Prony s method (Matrix-Pencil / MUSIC / ) is just choosing measurements on the hyper-grid s deterministically 28

29 Super-Resolution: Algorithmic Idea ² Tensor decomposition with measurements on random frequencies ² Why we do not contradict the hardness result? O((k + d) 2 ) ü If we design a fixed grid of to take measurements f(s) there always exists model instances such that the particular grid fails s s ü Let the locations of be random (with structure for tensor decomp) then for any model instance, algo works with high probability vs O poly(k, 1 ) d

30 PART 2 Can we achieve optimal sample complexity in a tractable way? Estimate low rank probability matrices with linear sample complexity This problem is at the core of many spectral algorithms We capitalize the insights from community detection to solve it 30

31 2. Low rank matrix Setup X Probability Matrix B 2 R M M + N i.i.d. samples (distribution over M 2 outcomes) ( freq counts over outcomes) M 2 B is of rank 2: B = pp > + qq > B = Poisson(NB) M = N = B.10 p.20 q B Goal: find rank-2 bb such that kb b Bk 1 apple N sample complexity: upper bound algorithm, lower bound 31

32 2. Low rank matrix Connection to mixture models Topic model Pr(word 1, word 2 topic = T 1 )=pp > Pr(word 1, word 2 topic = T 2 )=qq > B joint distribution over word pairs HMM Pr(output 1, output 2 state = S i )=O i (OQ i ) > B distribution of consecutive outputs N data samples Extract parameters estimates empirical counts find low rank close to B b B B 32

33 2. Low rank matrix Sub-Optimal Attempt Probability Matrix B 2 R M M + B is of rank 2: B = > > + X N i.i.d. samples B = Poisson(NB) MLE is non-convex optimization L let s try something spectral J 33

34 34 2. Low rank matrix Sub-Optimal Attempt Probability Matrix B 2 R M M + B is of rank 2: B = > > + X N i.i.d. samples B = Poisson(NB) 1 N B = 1 N Poisson(NB) B, as N 1 Set bb to be the rank 2 truncated SVD of 1 N B To achieve accuracy kb b Bk 1 apple need N = (M 2 log M) Not sample efficient Hopefully N = (M) Small data in practice Word distribution in language has fat tail. More sample documents N, larger the vocabulary size M

35 2. Low rank matrix Main Result Our upper bound algorithm: Rank-2 estimate with accuracy Using N = O(M/ 2 ) number of sample counts Runtime O(M 3 ) We prove (strong) lower bound: bb k b B Bk 1 apple Lead to improved spectral algorithms for learning (M) Need a sequence of observations to test whether the sequence is i.i.d. of unif (M) or generated by a 2-state HMM Testing property is no easier than estimating? 8 > 0 Recovering Structured Probability Matrices H, S. Kakade, W. Kong, G. Valiant, (submitted to STOC 2016) 35

36 2. Low rank matrix Algorithmic Idea We capitalize the idea of community detection in sparse random network, which is a special case of our problem formulation M nodes 2 communities Expected connection Adjacency matrix B = pp > + qq > B = Bernoulli(NB) generate estimate B p q B 36

37 37 2. Low rank matrix Algorithmic Idea We capitalize the idea of community detection in sparse random network, which is a special case of our problem formulation M nodes 2 communities Expected connection Adjacency matrix B = pp > + qq > B = Bernoulli(NB) Regularize Truncated SVD: [Le, Levina, Vershynin] remove heavy row/column from B, run rank-2 SVD on the remaining graph B p q generate estimate B

38 38 2. Low rank matrix Algorithmic Idea We capitalize the idea of community detection in sparse random network, which is a special case of our problem formulation M nodes 2 communities Expected connection Adjacency matrix B = pp > + qq > B = Bernoulli(NB) Regularize Truncated SVD: [Le, Levina, Vershynin] remove heavy row/column from B, run rank-2 SVD on the remaining graph M M matrix Probability matrix Sample counts B = > + > B = Poisson(NB) Key Challenge: In our general setup, we do not have homogeneous marginal probabilities

39 2. Low rank matrix Algorithmic Idea 1, Binning We group words according to the empirical marginal probability, divide the matrix to blocks, then apply regularized t-svd to each block Phase 1 1. Estimate non-uniform marginal b 2. Bin M words according to b i 3. Regularized t-svd in each bin bin block of B to estimate Key Challenges: ü Binning is not exact, we need to deal with spillover ü We need to piece together estimates over bins 39

40 40 2. Low rank matrix Algorithmic Idea 2, Refinement The coarse estimation from Phase 1 gives some global information Make use of that to do local refinement for each row / column Phase 2 1. Phase 1 gives coarse estimate for many words 2. Refine the estimate for each word use linear regression 3. Achieve sample complexity N = O(M/ 2 )

41 41 Conclusion Spectral methods are powerful tools for learning mixture models. We can go beyond worst case analysis by exploiting the randomness in the analysis / algorithm. To make spectral algorithms more practical, one needs careful algorithm implementation to improve sample complexity.

42 42 Future works Data generation X() Learning Robustness: Agnostic learning, generalization error analysis Dynamics: Extend the analysis techniques and algorithmic ideas to learning of random processes, with streaming data, iterative algorithms X() Get hands dirty with real

43 43 References Learning Mixture of Gaussians in High dimensions R. Ge, H, S. Kakade (STOC 2015) Super-Resolution off the Grid H, S. Kakade (NIPS 2015) Minimal Realization Problems for Hidden Markov Models H, R. Ge, S. Kakade, M. Dahleh (IEEE Transactions on Signal Processing, 2016) Recovering Structured Probability Matrices H, S. Kakade, W. Kong, G. Valiant, (submitted to STOC 2016)

44 Tensor Decomposition Multi-way array in matlab 2-way tensor =matrix 3-way tensor: M j1,j 2,j 3, j 1 2 [n A ],j 2 2 [n B ],j 3 2 [n C ] 45

45 46 Tensor Decomposition Sum of rank one tensors [a b c] j1,j 2,j 3 = a j1 b j2 c j3 M = kx i=1 A [:,i] B [:,i] C [:,i] = A B C Tensor rank: minimum number of summands in a rank decomposition kx = = i=1

46 47 Tensor Decomposition M = kx i=1 A [:,i] B [:,i] C [:,i] = A B C Necessary condition for unique tensor decomposition If A and B are of full rank k, and C has rank 2 we can decompose M to uniquely find the factors A B C in poly time and stability depends poly on condition number of A B C (the algorithm boils down to matrix SVD)

Learning Structured Probability Matrices!

Learning Structured Probability Matrices! Learning Structured Probability Matrices Qingqing Huang 2016 February Laboratory for Information & Decision Systems Based on joint work with Sham Kakade, Weihao Kong and Greg Valiant. 1 2 Learning Data

More information

Non-convex Robust PCA: Provable Bounds

Non-convex Robust PCA: Provable Bounds Non-convex Robust PCA: Provable Bounds Anima Anandkumar U.C. Irvine Joint work with Praneeth Netrapalli, U.N. Niranjan, Prateek Jain and Sujay Sanghavi. Learning with Big Data High Dimensional Regime Missing

More information

Computational Lower Bounds for Statistical Estimation Problems

Computational Lower Bounds for Statistical Estimation Problems Computational Lower Bounds for Statistical Estimation Problems Ilias Diakonikolas (USC) (joint with Daniel Kane (UCSD) and Alistair Stewart (USC)) Workshop on Local Algorithms, MIT, June 2018 THIS TALK

More information

ECE 598: Representation Learning: Algorithms and Models Fall 2017

ECE 598: Representation Learning: Algorithms and Models Fall 2017 ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors

More information

Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016

Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 Optimization for Tensor Models Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 1 Tensors Matrix Tensor: higher-order matrix

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

A Robust Form of Kruskal s Identifiability Theorem

A Robust Form of Kruskal s Identifiability Theorem A Robust Form of Kruskal s Identifiability Theorem Aditya Bhaskara (Google NYC) Joint work with Moses Charikar (Princeton University) Aravindan Vijayaraghavan (NYU Northwestern) Background: understanding

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Estimating Covariance Using Factorial Hidden Markov Models

Estimating Covariance Using Factorial Hidden Markov Models Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu

More information

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING Luis Rademacher, Ohio State University, Computer Science and Engineering. Joint work with Mikhail Belkin and James Voss This talk A new approach to multi-way

More information

Going off the grid. Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison

Going off the grid. Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison Going off the grid Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison Joint work with Badri Bhaskar Parikshit Shah Gonnguo Tang We live in a continuous world... But we work

More information

ELE 538B: Sparsity, Structure and Inference. Super-Resolution. Yuxin Chen Princeton University, Spring 2017

ELE 538B: Sparsity, Structure and Inference. Super-Resolution. Yuxin Chen Princeton University, Spring 2017 ELE 538B: Sparsity, Structure and Inference Super-Resolution Yuxin Chen Princeton University, Spring 2017 Outline Classical methods for parameter estimation Polynomial method: Prony s method Subspace method:

More information

Estimating Latent Variable Graphical Models with Moments and Likelihoods

Estimating Latent Variable Graphical Models with Moments and Likelihoods Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods

More information

Lecture 21: Spectral Learning for Graphical Models

Lecture 21: Spectral Learning for Graphical Models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

More information

Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation

Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation UIUC CSL Mar. 24 Combining Sparsity with Physically-Meaningful Constraints in Sparse Parameter Estimation Yuejie Chi Department of ECE and BMI Ohio State University Joint work with Yuxin Chen (Stanford).

More information

ROBUST BLIND SPIKES DECONVOLUTION. Yuejie Chi. Department of ECE and Department of BMI The Ohio State University, Columbus, Ohio 43210

ROBUST BLIND SPIKES DECONVOLUTION. Yuejie Chi. Department of ECE and Department of BMI The Ohio State University, Columbus, Ohio 43210 ROBUST BLIND SPIKES DECONVOLUTION Yuejie Chi Department of ECE and Department of BMI The Ohio State University, Columbus, Ohio 4 ABSTRACT Blind spikes deconvolution, or blind super-resolution, deals with

More information

An Introduction to Spectral Learning

An Introduction to Spectral Learning An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,

More information

Fourier PCA. Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech)

Fourier PCA. Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech) Fourier PCA Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech) Introduction 1. Describe a learning problem. 2. Develop an efficient tensor decomposition. Independent component

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html

More information

Baum s Algorithm Learns Intersections of Halfspaces with respect to Log-Concave Distributions

Baum s Algorithm Learns Intersections of Halfspaces with respect to Log-Concave Distributions Baum s Algorithm Learns Intersections of Halfspaces with respect to Log-Concave Distributions Adam R Klivans UT-Austin klivans@csutexasedu Philip M Long Google plong@googlecom April 10, 2009 Alex K Tang

More information

approximation algorithms I

approximation algorithms I SUM-OF-SQUARES method and approximation algorithms I David Steurer Cornell Cargese Workshop, 201 meta-task encoded as low-degree polynomial in R x example: f(x) = i,j n w ij x i x j 2 given: functions

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Towards a Mathematical Theory of Super-resolution

Towards a Mathematical Theory of Super-resolution Towards a Mathematical Theory of Super-resolution Carlos Fernandez-Granda www.stanford.edu/~cfgranda/ Information Theory Forum, Information Systems Laboratory, Stanford 10/18/2013 Acknowledgements This

More information

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning

Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning Anton Rodomanov Higher School of Economics, Russia Bayesian methods research group (http://bayesgroup.ru) 14 March

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Sketching for Large-Scale Learning of Mixture Models

Sketching for Large-Scale Learning of Mixture Models Sketching for Large-Scale Learning of Mixture Models Nicolas Keriven Université Rennes 1, Inria Rennes Bretagne-atlantique Adv. Rémi Gribonval Outline Introduction Practical Approach Results Theoretical

More information

Appendix A. Proof to Theorem 1

Appendix A. Proof to Theorem 1 Appendix A Proof to Theorem In this section, we prove the sample complexity bound given in Theorem The proof consists of three main parts In Appendix A, we prove perturbation lemmas that bound the estimation

More information

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima Anandkumar U.C. Irvine Application 1: Clustering Basic operation of grouping data points. Hypothesis: each data point

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Self-Calibration and Biconvex Compressive Sensing

Self-Calibration and Biconvex Compressive Sensing Self-Calibration and Biconvex Compressive Sensing Shuyang Ling Department of Mathematics, UC Davis July 12, 2017 Shuyang Ling (UC Davis) SIAM Annual Meeting, 2017, Pittsburgh July 12, 2017 1 / 22 Acknowledgements

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Robust Statistics, Revisited

Robust Statistics, Revisited Robust Statistics, Revisited Ankur Moitra (MIT) joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart CLASSIC PARAMETER ESTIMATION Given samples from an unknown

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Robust Spectral Inference for Joint Stochastic Matrix Factorization

Robust Spectral Inference for Joint Stochastic Matrix Factorization Robust Spectral Inference for Joint Stochastic Matrix Factorization Kun Dong Cornell University October 20, 2016 K. Dong (Cornell University) Robust Spectral Inference for Joint Stochastic Matrix Factorization

More information

Efficient algorithms for estimating multi-view mixture models

Efficient algorithms for estimating multi-view mixture models Efficient algorithms for estimating multi-view mixture models Daniel Hsu Microsoft Research, New England Outline Multi-view mixture models Multi-view method-of-moments Some applications and open questions

More information

On the recovery of measures without separation conditions

On the recovery of measures without separation conditions Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu Applied and Computational Mathematics Seminar Georgia Institute of Technology October

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Provable Alternating Minimization Methods for Non-convex Optimization

Provable Alternating Minimization Methods for Non-convex Optimization Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0 overview Our Contributions: 1 overview Our Contributions: A near optimal low-rank

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Optimization-based sparse recovery: Compressed sensing vs. super-resolution

Optimization-based sparse recovery: Compressed sensing vs. super-resolution Optimization-based sparse recovery: Compressed sensing vs. super-resolution Carlos Fernandez-Granda, Google Computational Photography and Intelligent Cameras, IPAM 2/5/2014 This work was supported by a

More information

Massive MIMO: Signal Structure, Efficient Processing, and Open Problems II

Massive MIMO: Signal Structure, Efficient Processing, and Open Problems II Massive MIMO: Signal Structure, Efficient Processing, and Open Problems II Mahdi Barzegar Communications and Information Theory Group (CommIT) Technische Universität Berlin Heisenberg Communications and

More information

A randomized algorithm for approximating the SVD of a matrix

A randomized algorithm for approximating the SVD of a matrix A randomized algorithm for approximating the SVD of a matrix Joint work with Per-Gunnar Martinsson (U. of Colorado) and Vladimir Rokhlin (Yale) Mark Tygert Program in Applied Mathematics Yale University

More information

Tensor Methods for Feature Learning

Tensor Methods for Feature Learning Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to

More information

Recovering Social Networks by Observing Votes

Recovering Social Networks by Observing Votes Recovering Social Networks by Observing Votes Lev Reyzin work joint with Ben Fish and Yi Huang UIC, Department of Mathematics talk at ITA paper to appear in AAMAS 2016 February 2016 Set-up Hidden G: Set-up

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization Tim Roughgarden February 28, 2017 1 Preamble This lecture fulfills a promise made back in Lecture #1,

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

Maximum Likelihood Estimation for Mixtures of Spherical Gaussians is NP-hard

Maximum Likelihood Estimation for Mixtures of Spherical Gaussians is NP-hard Journal of Machine Learning Research 18 018 1-11 Submitted 1/16; Revised 1/16; Published 4/18 Maximum Likelihood Estimation for Mixtures of Spherical Gaussians is NP-hard Christopher Tosh Sanjoy Dasgupta

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Orthogonal tensor decomposition

Orthogonal tensor decomposition Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.

More information

LIMITATION OF LEARNING RANKINGS FROM PARTIAL INFORMATION. By Srikanth Jagabathula Devavrat Shah

LIMITATION OF LEARNING RANKINGS FROM PARTIAL INFORMATION. By Srikanth Jagabathula Devavrat Shah 00 AIM Workshop on Ranking LIMITATION OF LEARNING RANKINGS FROM PARTIAL INFORMATION By Srikanth Jagabathula Devavrat Shah Interest is in recovering distribution over the space of permutations over n elements

More information

The Informativeness of k-means for Learning Mixture Models

The Informativeness of k-means for Learning Mixture Models The Informativeness of k-means for Learning Mixture Models Vincent Y. F. Tan (Joint work with Zhaoqiang Liu) National University of Singapore June 18, 2018 1/35 Gaussian distribution For F dimensions,

More information

Super-resolution via Convex Programming

Super-resolution via Convex Programming Super-resolution via Convex Programming Carlos Fernandez-Granda (Joint work with Emmanuel Candès) Structure and Randomness in System Identication and Learning, IPAM 1/17/2013 1/17/2013 1 / 44 Index 1 Motivation

More information

Time Series 2. Robert Almgren. Sept. 21, 2009

Time Series 2. Robert Almgren. Sept. 21, 2009 Time Series 2 Robert Almgren Sept. 21, 2009 This week we will talk about linear time series models: AR, MA, ARMA, ARIMA, etc. First we will talk about theory and after we will talk about fitting the models

More information

A Practical Algorithm for Topic Modeling with Provable Guarantees

A Practical Algorithm for Topic Modeling with Provable Guarantees 1 A Practical Algorithm for Topic Modeling with Provable Guarantees Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu Reviewed by Zhao Song December

More information

Analysis of Robust PCA via Local Incoherence

Analysis of Robust PCA via Local Incoherence Analysis of Robust PCA via Local Incoherence Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yi Zhou Department of EECS Syracuse University Syracuse, NY 3244 yzhou35@syr.edu

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Note Set 5: Hidden Markov Models

Note Set 5: Hidden Markov Models Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional

More information

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with

More information

Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection

Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection Tsuyoshi Idé ( Ide-san ), Ankush Khandelwal*, Jayant Kalagnanam IBM Research, T. J. Watson Research Center (*Currently with University

More information

On Learning Mixtures of Well-Separated Gaussians

On Learning Mixtures of Well-Separated Gaussians On Learning Mixtures of Well-Separated Gaussians Oded Regev Aravindan Viayaraghavan Abstract We consider the problem of efficiently learning mixtures of a large number of spherical Gaussians, when the

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Fast and Robust Phase Retrieval

Fast and Robust Phase Retrieval Fast and Robust Phase Retrieval Aditya Viswanathan aditya@math.msu.edu CCAM Lunch Seminar Purdue University April 18 2014 0 / 27 Joint work with Yang Wang Mark Iwen Research supported in part by National

More information

Disentangling Gaussians

Disentangling Gaussians Disentangling Gaussians Ankur Moitra, MIT November 6th, 2014 Dean s Breakfast Algorithmic Aspects of Machine Learning 2015 by Ankur Moitra. Note: These are unpolished, incomplete course notes. Developed

More information

Graph Partitioning Using Random Walks

Graph Partitioning Using Random Walks Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient

More information

Learning symmetric non-monotone submodular functions

Learning symmetric non-monotone submodular functions Learning symmetric non-monotone submodular functions Maria-Florina Balcan Georgia Institute of Technology ninamf@cc.gatech.edu Nicholas J. A. Harvey University of British Columbia nickhar@cs.ubc.ca Satoru

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Sparse Parameter Estimation: Compressed Sensing meets Matrix Pencil

Sparse Parameter Estimation: Compressed Sensing meets Matrix Pencil Sparse Parameter Estimation: Compressed Sensing meets Matrix Pencil Yuejie Chi Departments of ECE and BMI The Ohio State University Colorado School of Mines December 9, 24 Page Acknowledgement Joint work

More information

Recoverabilty Conditions for Rankings Under Partial Information

Recoverabilty Conditions for Rankings Under Partial Information Recoverabilty Conditions for Rankings Under Partial Information Srikanth Jagabathula Devavrat Shah Abstract We consider the problem of exact recovery of a function, defined on the space of permutations

More information

10-701/15-781, Machine Learning: Homework 4

10-701/15-781, Machine Learning: Homework 4 10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one

More information

Minimal Realization Problems for Hidden Markov Models

Minimal Realization Problems for Hidden Markov Models Minimal Realization Problems for Hidden Markov Models Qingqing Huang, Rong Ge, Sham Kakade, Munther Dahleh 1 arxiv:14113698v2 [cslg] 14 Dec 2015 Abstract This paper addresses two fundamental problems in

More information

First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate

First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate 58th Annual IEEE Symposium on Foundations of Computer Science First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate Zeyuan Allen-Zhu Microsoft Research zeyuan@csail.mit.edu

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

6.842 Randomness and Computation Lecture 5

6.842 Randomness and Computation Lecture 5 6.842 Randomness and Computation 2012-02-22 Lecture 5 Lecturer: Ronitt Rubinfeld Scribe: Michael Forbes 1 Overview Today we will define the notion of a pairwise independent hash function, and discuss its

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning

Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning David Steurer Cornell Approximation Algorithms and Hardness, Banff, August 2014 for many problems (e.g., all UG-hard ones): better guarantees

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Robustness Meets Algorithms

Robustness Meets Algorithms Robustness Meets Algorithms Ankur Moitra (MIT) ICML 2017 Tutorial, August 6 th CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately

More information

Computational Complexity of Bayesian Networks

Computational Complexity of Bayesian Networks Computational Complexity of Bayesian Networks UAI, 2015 Complexity theory Many computations on Bayesian networks are NP-hard Meaning (no more, no less) that we cannot hope for poly time algorithms that

More information

he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation

he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation Amnon Shashua School of Computer Science & Eng. The Hebrew University Matrix Factorization

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Final Exam, Fall 2002

Final Exam, Fall 2002 15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work

More information