Course 495: Advanced Statistical Machine Learning/Pattern Recognition
|
|
- Meagan Stevens
- 6 years ago
- Views:
Transcription
1 Course 495: Advanced Statistical Machine Learning/Pattern Recognition Goal (Lecture): To present Probabilistic Principal Component Analysis (PPCA) using both Maximum Likelihood (ML) and Expectation Maximization (EM). Goal (Tutorials): To provide the students the necessary mathematical tools for deeply understanding PPCA. 1
2 Materials Pattern Recognition & Machine Learning by C. Bishop Chapter 12 PPCA: Tipping, Michael E., and Christopher M. Bishop. "Probabilistic principal component analysis." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61.3 (1999): PPCA: Tipping, Michael E., and Christopher M. Bishop. "Mixtures of probabilistic principal component analyzers." Neural computation 11.2 (1999):
3 An overview PCA: Maximize the global variance LDA: Minimize the class variance while maximizing the mean variance LPP: Minimize the local variance ICA: Maximize independence by maximizing non- Gaussianity All are deterministic!! 3
4 Advantages of PPCA over PCA An EM algorithm for PCA that is computationally efficient in situations where only a few leading eigenvectors are required and that avoids having to evaluate the data covariance matrix as an intermediate step. A combination of a probabilistic model and EM allows us to deal with missing values in the dataset. Missing values Missing values 4
5 Advantages of PPCA over PCA Mixtures of a probabilistic PCA models can be formulated in a principled way and trained using the EM algorithm. W 1 W 5 W 9 5
6 Advantages of PPCA over PCA Probabilistic PCA forms the basis for a Bayesian treatment of PCA in which the dimensionality of the principal subspace can be found automatically from the data. Priors on W:Automatic Relevance Determination (find the relevant components) p W a = d a i 2π d 2 exp { 1 2 a iw i T w i } 6
7 Advantages of PPCA over PCA Probabilistic PCA can be used to model class-conditional densities and hence be applied to classification problems. p e p e < p e ( ) 7
8 Advantages of PPCA over PCA The probabilistic PCA model can be run generatively to provide samples from the distribution. p e 8
9 Probabilistic Principal Component Analysis General Concept: y 1 y 2 y 3 y N Share a common linear structure x 1 x 2 x 3 x N x N We want to find the parameters: 9
10 Probabilistic Principal Component Analysis Graphical Model: σ 2 y i μ x i N W p(y) x 2 p(x y) w x 2 p(x) μ y w 10 y x = x 1 x 2 x 1 x 1
11 ML-PPCA p(x 1,.., x N, y 1,.., y N θ) = p(x i y i, W, μ, σ) p(y i ) N p x i y i, W, μ, σ = N(x i Wy i + μ, σ 2 ) Ν p y i = N(y i 0, I) p(x 1,.., x N θ = 11 y i p(x 1,.., x N, y 1,.., y N θ)dy 1 dy N = p(x i y i, W, μ, σ) p(y i )dy 1 dy N y i N Ν
12 ML-PPCA N p(x 1,.., x N θ = p(x i y i, W, μ, σ) p(y i )dy 1 dy N y i N Ν = p(x i y i, W, μ, σ)p(y i )dy i y i p x i y i, W, μ, σ p y i = 1 (2π) F σ 1 (2π) d e y i 1 e σ 2 x i μ Wy Τ i x i μ Wy i F T y i 12
13 ML-PPCA Complete the square: 1 σ 2 x i μ Wy i Τ x i μ Wy i + σ 2 y i T yi = x i x i = x i Wy i Τ x i Wy i + σ 2 y i T yi x i = x i μ = x i T x i 2x i T Wy i + y i T W T Wy i + σ 2 y i T yi = x i T x i 2x i T Wy i + y i T (σ 2 I + W T W)y i M 13
14 ML-PPCA = x i T x i 2x i T Wy i + y i T My i = x i T x i 2(M 1 W T x i ) T My i + y i T My i + (M 1 W T x i ) T M M 1 W T x i (M 1 W T x i ) T MM 1 W T x i = x i T (I WM 1 W T )x i + y i M 1 W T x i T M(y i M 1 W T x i ) 14
15 ML-PPCA y i p(x i y i, W, μ, σ)p(y i )dy i y i e 1 T σ 2x i (I WM 1 W T )x i 1 σ 2 y i M 1 W T T x i M(yi M 1 W T x i ) dyi e 1 σ 2x it (I WM 1 W T )x i e 1 σ 2 y i M 1 W T x i T M(yi M 1 W T x i ) dyi y i = N(x i μ, σ 2 I σ 2 WM 1 W T 1 ) 1 15
16 ML-PPCA Let s have a look at: σ 2 I σ 2 WM 1 W T = σ 2 I σ 2 W(σ 2 I + W T W ) 1 W T Woodbury formula: (A + UCV) 1 = A 1 A 1 U C 1 + VA 1 U 1 VA 1 Can you prove that if: D = WW T + σ 2 Ι then D 1 = σ 2 I σ 2 W(σ 2 I + W T W ) 1 W T 16
17 ML-PPCA p x i W, μ, σ 2 = p(x i y i, W, μ, σ)p(y i )dy i y i D = WW T + σ 2 Ι = N(x i μ, D) Using also the above we can easily show that the posterior : p y i x i, W, μ, σ 2 = p(y i ) p x i W,μ,σ 2 p x i y i, W, μ, σ 2 = = N(y i M 1 W T (x i μ), σ 2 Μ 1 ) 17
18 ML-PPCA p(x 1,.., x N θ = N N(x i μ, D) θ=argmax θ ln p(x 1,.., x N θ =argmax θ N ln N x i μ, D = NF 2 ln 2π N 2 ln D 1 2 N x i μ T D 1 (x i μ) 18
19 ML-PPCA L(W, σ 2, μ) = NF 2 ln 2π N 2 ln D Ntr[D 1 S t ] N where S t = 1 N x i μ x i μ Τ dl dμ = 0 μ = 1 Ν N x i 19
20 ML-PPCA dl dw = N D 1 S t D 1 W D 1 W S t D 1 W = W Three different solutions: (1) W = 0 which is the minimum of the likelihood (2) D = S t WW T + σ 2 Ι = S t Assume the eigen-decomposition S t = UΛU T U square matrix of eigenvectors R is a rotation matrix R T R = I W = U(Λ σ 2 Ι) 1/2 R 20
21 ML-PPCA (3) D S t and W 0 d < q = rank(s t ) Assume the SVD of W = ULV T U = [u 1 u d ] Fxd matrix U T U = I V T V = VV T = I L= l l d S t D 1 ULV T = ULV T Let s study D 1 U 21
22 ML-PPCA D 1 = (WW T + σ 2 Ι ) 1 W=ULV T = (UL 2 U T + σ 2 Ι ) 1 Assume a set of bases U F d such that U F d T U = 0 and U F d T U F d = I 22 = [U U F d ] L [U U F d] T + U U F d σ 2 Ι[U U F d ] T = [U U F d ] L2 + σ 2 Ι 0 0 σ 2 [U U F d ] T Ι = [U U F d ] (L2 + σ 2 Ι) σ 2 Ι [U U F d] T 1 1
23 ML-PPCA D 1 U = [U U F d ] (L2 + σ 2 Ι) σ 2 Ι [U U F d] T U = [U U F d ] (L2 + σ 2 Ι) σ 2 Ι [I 0]T = U U F d (L 2 + σ 2 Ι) 1 0 = U(L 2 + σ 2 Ι) 1 23
24 ML-PPCA S t D 1 ULV T = ULV T S t U(L 2 + σ 2 Ι) 1 = U S t U = U(L 2 + σ 2 Ι) 1 It means that S t u i = l i 2 + σ 2 u i S t = UΛU T u i are eigenvectors of S t and λ ι = l i 2 + σ 2 l ι = λ ι σ 2 Unfortunately V cannot be determined hence there is a rotation ambiguity. 24
25 ML-PPCA Hence the optimum is given by (keeping d eigenvectors) W d = U d (Λ d σ 2 Ι)V T Having computed W we need to compute the optimum σ 2 L(W, σ 2, μ) = NF 2 ln 2π N 2 ln D Ntr[D 1 S t ] = NF 2 ln 2π N 2 ln WWT + σ 2 I Ntr[(WW T + σ 2 I ) 1 S t ] 25
26 W d W d T + σ 2 I = U d U F d Λ d σ 2 I U du F d T 26 ML-PPCA Hence U d U F d + U d U F d σ σ 2 Λ = U d U d 0 F d 0 σ 2 I W d W d T + σ 2 I = ln W d W d T + σ 2 I =(F d)lnσ 2 + d λ i U d U F d T F i=d+1 q σ 2 lnλ i T
27 ML-PPCA 1 S t = U d U F d Λ d 0 0 σ 2 I = U d U F d 0 1 σ 2 Λ q d 0 I U d U T F d U d U F d 0 Λ q d 0 1 Λ d 0 0 T U d U F d U d U F d T 27 tr D 1 S t = 1 σ 2 λ i + d q i=d+1
28 ML-PPCA L σ 2 = N 2 d Fln 2π + ln λ j + 1 σ 2 λ j j=1 L σ = 0 2σ 3 λ j q j=d+1 If I put the solution back + F d σ q j=d+1 = 0 σ 2 = 1 F d + F d lnσ 2 + d q j=d+1 λ j L σ 2 = N 2 d j=1 ln λ j + F q ln 1 F d F j=d+1 λ j + Fln2π + F 28
29 ML-PPCA d q q L σ 2 = N 2 { ln λ j + ln λ j ln λ j j=1 ln S t j=d j=d + F d ln 1 F d F j=d+1 λ j + Fln2π + F} max N 2 1 F d ln S t 1 F d q j=d ln λ j + ln 1 F d F j=d+1 ln λ j + cost min ln 1 F d q j=d ln λ j 1 F d q j=d ln λ j 29
30 ML-PPCA Jensen inequality ln n n r i 1 n n ln r i ln 1 F d q j=d+1 λ j 1 F d q j=d+1 lnλ j hence ln 1 F d q j=d ln λ j 1 F d q j=d ln λ j 0 Hence, the function is minimized when the discarded eigenvectors are the ones that correspond to the q d eigenvalues 30
31 ML-PPCA Summarize: σ 2 = 1 F d q j=d+1 λ j We no longer have a projection but: W d = U d (Λ d σ 2 Ι)V T μ = 1 Ν N x i E p(yi x i ){y i } = M 1 W T (x i μ) and a reconstruction x i = WE p(yi x i ) y i + μ 31
32 ML-PPCA lim W d = U d Λ d σ 2 0 lim M = W T d W d σ 2 0 Hence lim E p(yi x i ){y i } = M 1 W T d (x i μ) σ 2 0 = Λ d 1 U d (x i μ) which gives the whitened PCA 32
33 EM-PPCA First step formulate the joint likelihood p X, Y θ = p X, Y θ p Y = p( x i y i, θ p(y i ) N N lnp X, Z θ = lnp(x i y i, θ + lnp(y i ) 1 lnp(x i y i, θ = ln (2π) F (σ 2 ) = F 2 ln 2πσ2 1 2σ 2 lnp y i = 1 2 y i T y i D 2 ln2π 1 e 2σ 2 x i Wy i μ T (x i Wy i μ) F x i Wy i μ T (x i Wy i μ) 33
34 EM-PPCA ln p X, Y θ N = { F 2 ln2πσ2 1 2σ 2 μ) 1 2 y i T y i D 2 ln2π} x i Wy i μ T (x i Wy i I need now to optimize it with regards to y i, W, μ, σ 2 We can t hence we need to take the expectations 34
35 EM-PPCA But first we need to expand a bit: ln p X, Y θ = NF 2 ln2πσ2 ND 2 ln2π N 1 2σ 2 [ x i μ T x i μ 2 W T x i T y i + y i T W T Wy i ] y i T y i 35
36 EM-PPCA E P(Y X) {log P(X,Y θ)}= NF 2 ln2πσ2 ND 2 ln2π N { 1 2σ 2 x i μ T x i μ 2 W T x i T E{y i } + tr[e{y i y i T }W T W
37 EM-PPCA E{y i } = y i p y i x i, θ dy n = Ε{N y i M 1 W T x i μ, σ 2 M 1 } = M 1 W T (x i μ) E y i E y i y i E y T i = = E{ y i y T i } E{E{y i }y i } T E{y i E y T i } + E y i E{y i } T = E{y i y T i } E{y i }E{y i } T E y i E y T i + E y i E{y i } T = E{y i y T i } E{y i }E{y i } T = σ 2 M 1 hence E y i y i T = σ 2 M 1 + E{y i }E{y i } T 37
38 EM-PPCA Expectation Step: Given W, μ, σ: E{y i } = M 1 W T (x i μ) E y i y i T = σ 2 M 1 + E{y i }E{y i } T 38
39 EM-PPCA Maximization step E{μ} μ = 0 μ = 1 N N x i N E{W} W = 0 1 σ 2 x i μ E y T i + 2 2σ 2 WE{y iy T i } 1 N N W = x i μ E{y i } T E{y i y T i } = 0 39
40 EM-PPCA E{σ 2 } σ = 0 σ 2 = 1 NF N { x i μ 2 2E y T i W T x i μ + tr E y i y T i W T W } 40
41 Why EM-PPCA? A complexity of O(NFd) which can be significant smaller than O(NF 2 ) (computation of the covariance) A combination of a probabilistic model and EM allows us to deal with missing values in the dataset. Missing values Missing values 41
42 Why EM-PPCA? A combination of a probabilistic model and EM allows us to deal with missing values in the dataset. p x i y i, W 1, μ 1, σ p x i y i, W 9, μ 9, σ 42
43 Summary We saw how to perform parameter estimation and inference using ML and EM in the following graphical model σ 2 y i μ x i N W 43
44 What will we see next? Dynamic data y 1 y 2 y 3 y T x 1 x 2 x 3 x T EM in the following graphical models Spatial data y u+1v y u+1v+1 y uv x uv x u+1v x u+1v+1 y uv+1 x uv+1 44
Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA
Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/
More informationData Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis
Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Lecturer: Stefanos Zafeiriou Goal (Lectures): To present discrete and continuous valued probabilistic linear dynamical systems (HMMs
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationLecture 7: Con3nuous Latent Variable Models
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationExpectation Maximization
Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same
More informationExponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger
Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential
More informationVariational Message Passing. By John Winn, Christopher M. Bishop Presented by Andy Miller
Variational Message Passing By John Winn, Christopher M. Bishop Presented by Andy Miller Overview Background Variational Inference Conjugate-Exponential Models Variational Message Passing Messages Univariate
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationLatent Variable Models and EM Algorithm
SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable
More informationCOMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationIntroduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond
PCA, ICA and beyond Summer School on Manifold Learning in Image and Signal Analysis, August 17-21, 2009, Hven Technical University of Denmark (DTU) & University of Copenhagen (KU) August 18, 2009 Motivation
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 9: Expectation Maximiation (EM) Algorithm, Learning in Undirected Graphical Models Some figures courtesy
More informationExpectation Maximization Algorithm
Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationMixtures of Robust Probabilistic Principal Component Analyzers
Mixtures of Robust Probabilistic Principal Component Analyzers Cédric Archambeau, Nicolas Delannay 2 and Michel Verleysen 2 - University College London, Dept. of Computer Science Gower Street, London WCE
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationAll you want to know about GPs: Linear Dimensionality Reduction
All you want to know about GPs: Linear Dimensionality Reduction Raquel Urtasun and Neil Lawrence TTI Chicago, University of Sheffield June 16, 2012 Urtasun & Lawrence () GP tutorial June 16, 2012 1 / 40
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationVariational Autoencoders
Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationGenerative models for missing value completion
Generative models for missing value completion Kousuke Ariga Department of Computer Science and Engineering University of Washington Seattle, WA 98105 koar8470@cs.washington.edu Abstract Deep generative
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationHidden Markov Bayesian Principal Component Analysis
Hidden Markov Bayesian Principal Component Analysis M. Alvarez malvarez@utp.edu.co Universidad Tecnológica de Pereira Pereira, Colombia R. Henao rhenao@utp.edu.co Universidad Tecnológica de Pereira Pereira,
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationFactor Analysis and Kalman Filtering (11/2/04)
CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationMachine Learning and Pattern Recognition, Tutorial Sheet Number 3 Answers
Machine Learning and Pattern Recognition, Tutorial Sheet Number 3 Answers School of Informatics, University of Edinburgh, Instructor: Chris Williams 1. Given a dataset {(x n, y n ), n 1,..., N}, where
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationLecture 6: April 19, 2002
EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin
More informationJoint Factor Analysis for Speaker Verification
Joint Factor Analysis for Speaker Verification Mengke HU ASPITRG Group, ECE Department Drexel University mengke.hu@gmail.com October 12, 2012 1/37 Outline 1 Speaker Verification Baseline System Session
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationCSC411: Final Review. James Lucas & David Madras. December 3, 2018
CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationRobust Probabilistic Projections
Cédric Archambeau cedric.archambeau@uclouvain.be Nicolas Delannay nicolas.delannay@uclouvain.be Michel Verleysen verleysen@dice.ucl.ac.be Université catholique de Louvain, Machine Learning Group, 3 Pl.
More informationLecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan
Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:
More informationSmart PCA. Yi Zhang Machine Learning Department Carnegie Mellon University
Smart PCA Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Abstract PCA can be smarter and makes more sensible projections. In this paper, we propose smart PCA, an extension
More informationSparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference
Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationLarge-scale Collaborative Prediction Using a Nonparametric Random Effects Model
Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model Kai Yu Joint work with John Lafferty and Shenghuo Zhu NEC Laboratories America, Carnegie Mellon University First Prev Page
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see
More informationClustering VS Classification
MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationHeeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University
Heeyoul (Henry) Choi Dept. of Computer Science Texas A&M University hchoi@cs.tamu.edu Introduction Speaker Adaptation Eigenvoice Comparison with others MAP, MLLR, EMAP, RMP, CAT, RSW Experiments Future
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationDecember 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis
.. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationDimensionality reduction
Dimensionality Reduction PCA continued Machine Learning CSE446 Carlos Guestrin University of Washington May 22, 2013 Carlos Guestrin 2005-2013 1 Dimensionality reduction n Input data may have thousands
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationMultivariate Bayesian Linear Regression MLAI Lecture 11
Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationLinear Dynamical Systems (Kalman filter)
Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationLATENT VARIABLE MODELS. Microsoft Research 7 J. J. Thomson Avenue, Cambridge CB3 0FB, U.K.
LATENT VARIABLE MODELS CHRISTOPHER M. BISHOP Microsoft Research 7 J. J. Thomson Avenue, Cambridge CB3 0FB, U.K. Published in Learning in Graphical Models, M. I. Jordan (Ed.), MIT Press (1999), 371 403.
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationDimension Reduction (PCA, ICA, CCA, FLD,
Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationSlides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP
Slides modified from: PATTERN RECOGNITION AND MACHINE LEARNING CHRISTOPHER M. BISHOP Predic?ve Distribu?on (1) Predict t for new values of x by integra?ng over w: where The Evidence Approxima?on (1) The
More informationPROBABILISTIC LATENT SEMANTIC ANALYSIS
PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood
More information