Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Size: px
Start display at page:

Download "Course 495: Advanced Statistical Machine Learning/Pattern Recognition"

Transcription

1 Course 495: Advanced Statistical Machine Learning/Pattern Recognition Goal (Lecture): To present Probabilistic Principal Component Analysis (PPCA) using both Maximum Likelihood (ML) and Expectation Maximization (EM). Goal (Tutorials): To provide the students the necessary mathematical tools for deeply understanding PPCA. 1

2 Materials Pattern Recognition & Machine Learning by C. Bishop Chapter 12 PPCA: Tipping, Michael E., and Christopher M. Bishop. "Probabilistic principal component analysis." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61.3 (1999): PPCA: Tipping, Michael E., and Christopher M. Bishop. "Mixtures of probabilistic principal component analyzers." Neural computation 11.2 (1999):

3 An overview PCA: Maximize the global variance LDA: Minimize the class variance while maximizing the mean variance LPP: Minimize the local variance ICA: Maximize independence by maximizing non- Gaussianity All are deterministic!! 3

4 Advantages of PPCA over PCA An EM algorithm for PCA that is computationally efficient in situations where only a few leading eigenvectors are required and that avoids having to evaluate the data covariance matrix as an intermediate step. A combination of a probabilistic model and EM allows us to deal with missing values in the dataset. Missing values Missing values 4

5 Advantages of PPCA over PCA Mixtures of a probabilistic PCA models can be formulated in a principled way and trained using the EM algorithm. W 1 W 5 W 9 5

6 Advantages of PPCA over PCA Probabilistic PCA forms the basis for a Bayesian treatment of PCA in which the dimensionality of the principal subspace can be found automatically from the data. Priors on W:Automatic Relevance Determination (find the relevant components) p W a = d a i 2π d 2 exp { 1 2 a iw i T w i } 6

7 Advantages of PPCA over PCA Probabilistic PCA can be used to model class-conditional densities and hence be applied to classification problems. p e p e < p e ( ) 7

8 Advantages of PPCA over PCA The probabilistic PCA model can be run generatively to provide samples from the distribution. p e 8

9 Probabilistic Principal Component Analysis General Concept: y 1 y 2 y 3 y N Share a common linear structure x 1 x 2 x 3 x N x N We want to find the parameters: 9

10 Probabilistic Principal Component Analysis Graphical Model: σ 2 y i μ x i N W p(y) x 2 p(x y) w x 2 p(x) μ y w 10 y x = x 1 x 2 x 1 x 1

11 ML-PPCA p(x 1,.., x N, y 1,.., y N θ) = p(x i y i, W, μ, σ) p(y i ) N p x i y i, W, μ, σ = N(x i Wy i + μ, σ 2 ) Ν p y i = N(y i 0, I) p(x 1,.., x N θ = 11 y i p(x 1,.., x N, y 1,.., y N θ)dy 1 dy N = p(x i y i, W, μ, σ) p(y i )dy 1 dy N y i N Ν

12 ML-PPCA N p(x 1,.., x N θ = p(x i y i, W, μ, σ) p(y i )dy 1 dy N y i N Ν = p(x i y i, W, μ, σ)p(y i )dy i y i p x i y i, W, μ, σ p y i = 1 (2π) F σ 1 (2π) d e y i 1 e σ 2 x i μ Wy Τ i x i μ Wy i F T y i 12

13 ML-PPCA Complete the square: 1 σ 2 x i μ Wy i Τ x i μ Wy i + σ 2 y i T yi = x i x i = x i Wy i Τ x i Wy i + σ 2 y i T yi x i = x i μ = x i T x i 2x i T Wy i + y i T W T Wy i + σ 2 y i T yi = x i T x i 2x i T Wy i + y i T (σ 2 I + W T W)y i M 13

14 ML-PPCA = x i T x i 2x i T Wy i + y i T My i = x i T x i 2(M 1 W T x i ) T My i + y i T My i + (M 1 W T x i ) T M M 1 W T x i (M 1 W T x i ) T MM 1 W T x i = x i T (I WM 1 W T )x i + y i M 1 W T x i T M(y i M 1 W T x i ) 14

15 ML-PPCA y i p(x i y i, W, μ, σ)p(y i )dy i y i e 1 T σ 2x i (I WM 1 W T )x i 1 σ 2 y i M 1 W T T x i M(yi M 1 W T x i ) dyi e 1 σ 2x it (I WM 1 W T )x i e 1 σ 2 y i M 1 W T x i T M(yi M 1 W T x i ) dyi y i = N(x i μ, σ 2 I σ 2 WM 1 W T 1 ) 1 15

16 ML-PPCA Let s have a look at: σ 2 I σ 2 WM 1 W T = σ 2 I σ 2 W(σ 2 I + W T W ) 1 W T Woodbury formula: (A + UCV) 1 = A 1 A 1 U C 1 + VA 1 U 1 VA 1 Can you prove that if: D = WW T + σ 2 Ι then D 1 = σ 2 I σ 2 W(σ 2 I + W T W ) 1 W T 16

17 ML-PPCA p x i W, μ, σ 2 = p(x i y i, W, μ, σ)p(y i )dy i y i D = WW T + σ 2 Ι = N(x i μ, D) Using also the above we can easily show that the posterior : p y i x i, W, μ, σ 2 = p(y i ) p x i W,μ,σ 2 p x i y i, W, μ, σ 2 = = N(y i M 1 W T (x i μ), σ 2 Μ 1 ) 17

18 ML-PPCA p(x 1,.., x N θ = N N(x i μ, D) θ=argmax θ ln p(x 1,.., x N θ =argmax θ N ln N x i μ, D = NF 2 ln 2π N 2 ln D 1 2 N x i μ T D 1 (x i μ) 18

19 ML-PPCA L(W, σ 2, μ) = NF 2 ln 2π N 2 ln D Ntr[D 1 S t ] N where S t = 1 N x i μ x i μ Τ dl dμ = 0 μ = 1 Ν N x i 19

20 ML-PPCA dl dw = N D 1 S t D 1 W D 1 W S t D 1 W = W Three different solutions: (1) W = 0 which is the minimum of the likelihood (2) D = S t WW T + σ 2 Ι = S t Assume the eigen-decomposition S t = UΛU T U square matrix of eigenvectors R is a rotation matrix R T R = I W = U(Λ σ 2 Ι) 1/2 R 20

21 ML-PPCA (3) D S t and W 0 d < q = rank(s t ) Assume the SVD of W = ULV T U = [u 1 u d ] Fxd matrix U T U = I V T V = VV T = I L= l l d S t D 1 ULV T = ULV T Let s study D 1 U 21

22 ML-PPCA D 1 = (WW T + σ 2 Ι ) 1 W=ULV T = (UL 2 U T + σ 2 Ι ) 1 Assume a set of bases U F d such that U F d T U = 0 and U F d T U F d = I 22 = [U U F d ] L [U U F d] T + U U F d σ 2 Ι[U U F d ] T = [U U F d ] L2 + σ 2 Ι 0 0 σ 2 [U U F d ] T Ι = [U U F d ] (L2 + σ 2 Ι) σ 2 Ι [U U F d] T 1 1

23 ML-PPCA D 1 U = [U U F d ] (L2 + σ 2 Ι) σ 2 Ι [U U F d] T U = [U U F d ] (L2 + σ 2 Ι) σ 2 Ι [I 0]T = U U F d (L 2 + σ 2 Ι) 1 0 = U(L 2 + σ 2 Ι) 1 23

24 ML-PPCA S t D 1 ULV T = ULV T S t U(L 2 + σ 2 Ι) 1 = U S t U = U(L 2 + σ 2 Ι) 1 It means that S t u i = l i 2 + σ 2 u i S t = UΛU T u i are eigenvectors of S t and λ ι = l i 2 + σ 2 l ι = λ ι σ 2 Unfortunately V cannot be determined hence there is a rotation ambiguity. 24

25 ML-PPCA Hence the optimum is given by (keeping d eigenvectors) W d = U d (Λ d σ 2 Ι)V T Having computed W we need to compute the optimum σ 2 L(W, σ 2, μ) = NF 2 ln 2π N 2 ln D Ntr[D 1 S t ] = NF 2 ln 2π N 2 ln WWT + σ 2 I Ntr[(WW T + σ 2 I ) 1 S t ] 25

26 W d W d T + σ 2 I = U d U F d Λ d σ 2 I U du F d T 26 ML-PPCA Hence U d U F d + U d U F d σ σ 2 Λ = U d U d 0 F d 0 σ 2 I W d W d T + σ 2 I = ln W d W d T + σ 2 I =(F d)lnσ 2 + d λ i U d U F d T F i=d+1 q σ 2 lnλ i T

27 ML-PPCA 1 S t = U d U F d Λ d 0 0 σ 2 I = U d U F d 0 1 σ 2 Λ q d 0 I U d U T F d U d U F d 0 Λ q d 0 1 Λ d 0 0 T U d U F d U d U F d T 27 tr D 1 S t = 1 σ 2 λ i + d q i=d+1

28 ML-PPCA L σ 2 = N 2 d Fln 2π + ln λ j + 1 σ 2 λ j j=1 L σ = 0 2σ 3 λ j q j=d+1 If I put the solution back + F d σ q j=d+1 = 0 σ 2 = 1 F d + F d lnσ 2 + d q j=d+1 λ j L σ 2 = N 2 d j=1 ln λ j + F q ln 1 F d F j=d+1 λ j + Fln2π + F 28

29 ML-PPCA d q q L σ 2 = N 2 { ln λ j + ln λ j ln λ j j=1 ln S t j=d j=d + F d ln 1 F d F j=d+1 λ j + Fln2π + F} max N 2 1 F d ln S t 1 F d q j=d ln λ j + ln 1 F d F j=d+1 ln λ j + cost min ln 1 F d q j=d ln λ j 1 F d q j=d ln λ j 29

30 ML-PPCA Jensen inequality ln n n r i 1 n n ln r i ln 1 F d q j=d+1 λ j 1 F d q j=d+1 lnλ j hence ln 1 F d q j=d ln λ j 1 F d q j=d ln λ j 0 Hence, the function is minimized when the discarded eigenvectors are the ones that correspond to the q d eigenvalues 30

31 ML-PPCA Summarize: σ 2 = 1 F d q j=d+1 λ j We no longer have a projection but: W d = U d (Λ d σ 2 Ι)V T μ = 1 Ν N x i E p(yi x i ){y i } = M 1 W T (x i μ) and a reconstruction x i = WE p(yi x i ) y i + μ 31

32 ML-PPCA lim W d = U d Λ d σ 2 0 lim M = W T d W d σ 2 0 Hence lim E p(yi x i ){y i } = M 1 W T d (x i μ) σ 2 0 = Λ d 1 U d (x i μ) which gives the whitened PCA 32

33 EM-PPCA First step formulate the joint likelihood p X, Y θ = p X, Y θ p Y = p( x i y i, θ p(y i ) N N lnp X, Z θ = lnp(x i y i, θ + lnp(y i ) 1 lnp(x i y i, θ = ln (2π) F (σ 2 ) = F 2 ln 2πσ2 1 2σ 2 lnp y i = 1 2 y i T y i D 2 ln2π 1 e 2σ 2 x i Wy i μ T (x i Wy i μ) F x i Wy i μ T (x i Wy i μ) 33

34 EM-PPCA ln p X, Y θ N = { F 2 ln2πσ2 1 2σ 2 μ) 1 2 y i T y i D 2 ln2π} x i Wy i μ T (x i Wy i I need now to optimize it with regards to y i, W, μ, σ 2 We can t hence we need to take the expectations 34

35 EM-PPCA But first we need to expand a bit: ln p X, Y θ = NF 2 ln2πσ2 ND 2 ln2π N 1 2σ 2 [ x i μ T x i μ 2 W T x i T y i + y i T W T Wy i ] y i T y i 35

36 EM-PPCA E P(Y X) {log P(X,Y θ)}= NF 2 ln2πσ2 ND 2 ln2π N { 1 2σ 2 x i μ T x i μ 2 W T x i T E{y i } + tr[e{y i y i T }W T W

37 EM-PPCA E{y i } = y i p y i x i, θ dy n = Ε{N y i M 1 W T x i μ, σ 2 M 1 } = M 1 W T (x i μ) E y i E y i y i E y T i = = E{ y i y T i } E{E{y i }y i } T E{y i E y T i } + E y i E{y i } T = E{y i y T i } E{y i }E{y i } T E y i E y T i + E y i E{y i } T = E{y i y T i } E{y i }E{y i } T = σ 2 M 1 hence E y i y i T = σ 2 M 1 + E{y i }E{y i } T 37

38 EM-PPCA Expectation Step: Given W, μ, σ: E{y i } = M 1 W T (x i μ) E y i y i T = σ 2 M 1 + E{y i }E{y i } T 38

39 EM-PPCA Maximization step E{μ} μ = 0 μ = 1 N N x i N E{W} W = 0 1 σ 2 x i μ E y T i + 2 2σ 2 WE{y iy T i } 1 N N W = x i μ E{y i } T E{y i y T i } = 0 39

40 EM-PPCA E{σ 2 } σ = 0 σ 2 = 1 NF N { x i μ 2 2E y T i W T x i μ + tr E y i y T i W T W } 40

41 Why EM-PPCA? A complexity of O(NFd) which can be significant smaller than O(NF 2 ) (computation of the covariance) A combination of a probabilistic model and EM allows us to deal with missing values in the dataset. Missing values Missing values 41

42 Why EM-PPCA? A combination of a probabilistic model and EM allows us to deal with missing values in the dataset. p x i y i, W 1, μ 1, σ p x i y i, W 9, μ 9, σ 42

43 Summary We saw how to perform parameter estimation and inference using ML and EM in the following graphical model σ 2 y i μ x i N W 43

44 What will we see next? Dynamic data y 1 y 2 y 3 y T x 1 x 2 x 3 x T EM in the following graphical models Spatial data y u+1v y u+1v+1 y uv x uv x u+1v x u+1v+1 y uv+1 x uv+1 44

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/

More information

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Lecturer: Stefanos Zafeiriou Goal (Lectures): To present discrete and continuous valued probabilistic linear dynamical systems (HMMs

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential

More information

Variational Message Passing. By John Winn, Christopher M. Bishop Presented by Andy Miller

Variational Message Passing. By John Winn, Christopher M. Bishop Presented by Andy Miller Variational Message Passing By John Winn, Christopher M. Bishop Presented by Andy Miller Overview Background Variational Inference Conjugate-Exponential Models Variational Message Passing Messages Univariate

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond PCA, ICA and beyond Summer School on Manifold Learning in Image and Signal Analysis, August 17-21, 2009, Hven Technical University of Denmark (DTU) & University of Copenhagen (KU) August 18, 2009 Motivation

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy

More information

Expectation Maximization Algorithm

Expectation Maximization Algorithm Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Mixtures of Robust Probabilistic Principal Component Analyzers

Mixtures of Robust Probabilistic Principal Component Analyzers Mixtures of Robust Probabilistic Principal Component Analyzers Cédric Archambeau, Nicolas Delannay 2 and Michel Verleysen 2 - University College London, Dept. of Computer Science Gower Street, London WCE

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

All you want to know about GPs: Linear Dimensionality Reduction

All you want to know about GPs: Linear Dimensionality Reduction All you want to know about GPs: Linear Dimensionality Reduction Raquel Urtasun and Neil Lawrence TTI Chicago, University of Sheffield June 16, 2012 Urtasun & Lawrence () GP tutorial June 16, 2012 1 / 40

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Variational Autoencoders

Variational Autoencoders Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Generative models for missing value completion

Generative models for missing value completion Generative models for missing value completion Kousuke Ariga Department of Computer Science and Engineering University of Washington Seattle, WA 98105 koar8470@cs.washington.edu Abstract Deep generative

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Hidden Markov Bayesian Principal Component Analysis

Hidden Markov Bayesian Principal Component Analysis Hidden Markov Bayesian Principal Component Analysis M. Alvarez malvarez@utp.edu.co Universidad Tecnológica de Pereira Pereira, Colombia R. Henao rhenao@utp.edu.co Universidad Tecnológica de Pereira Pereira,

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Factor Analysis and Kalman Filtering (11/2/04)

Factor Analysis and Kalman Filtering (11/2/04) CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Machine Learning and Pattern Recognition, Tutorial Sheet Number 3 Answers

Machine Learning and Pattern Recognition, Tutorial Sheet Number 3 Answers Machine Learning and Pattern Recognition, Tutorial Sheet Number 3 Answers School of Informatics, University of Edinburgh, Instructor: Chris Williams 1. Given a dataset {(x n, y n ), n 1,..., N}, where

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Lecture 6: April 19, 2002

Lecture 6: April 19, 2002 EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin

More information

Joint Factor Analysis for Speaker Verification

Joint Factor Analysis for Speaker Verification Joint Factor Analysis for Speaker Verification Mengke HU ASPITRG Group, ECE Department Drexel University mengke.hu@gmail.com October 12, 2012 1/37 Outline 1 Speaker Verification Baseline System Session

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

CSC411: Final Review. James Lucas & David Madras. December 3, 2018 CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Robust Probabilistic Projections

Robust Probabilistic Projections Cédric Archambeau cedric.archambeau@uclouvain.be Nicolas Delannay nicolas.delannay@uclouvain.be Michel Verleysen verleysen@dice.ucl.ac.be Université catholique de Louvain, Machine Learning Group, 3 Pl.

More information

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

Smart PCA. Yi Zhang Machine Learning Department Carnegie Mellon University

Smart PCA. Yi Zhang Machine Learning Department Carnegie Mellon University Smart PCA Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Abstract PCA can be smarter and makes more sensible projections. In this paper, we propose smart PCA, an extension

More information

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model Kai Yu Joint work with John Lafferty and Shenghuo Zhu NEC Laboratories America, Carnegie Mellon University First Prev Page

More information

Posterior Regularization

Posterior Regularization Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University Heeyoul (Henry) Choi Dept. of Computer Science Texas A&M University hchoi@cs.tamu.edu Introduction Speaker Adaptation Eigenvoice Comparison with others MAP, MLLR, EMAP, RMP, CAT, RSW Experiments Future

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

Dimensionality reduction

Dimensionality reduction Dimensionality Reduction PCA continued Machine Learning CSE446 Carlos Guestrin University of Washington May 22, 2013 Carlos Guestrin 2005-2013 1 Dimensionality reduction n Input data may have thousands

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Linear Dynamical Systems (Kalman filter)

Linear Dynamical Systems (Kalman filter) Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

LATENT VARIABLE MODELS. Microsoft Research 7 J. J. Thomson Avenue, Cambridge CB3 0FB, U.K.

LATENT VARIABLE MODELS. Microsoft Research 7 J. J. Thomson Avenue, Cambridge CB3 0FB, U.K. LATENT VARIABLE MODELS CHRISTOPHER M. BISHOP Microsoft Research 7 J. J. Thomson Avenue, Cambridge CB3 0FB, U.K. Published in Learning in Graphical Models, M. I. Jordan (Ed.), MIT Press (1999), 371 403.

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Dimension Reduction (PCA, ICA, CCA, FLD,

Dimension Reduction (PCA, ICA, CCA, FLD, Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP

Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP Predic?ve Distribu?on (1) Predict t for new values of x by integra?ng over w: where The Evidence Approxima?on (1) The

More information

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information