Machine Learning for Signal Processing Linear Gaussian Models

Similar documents
Machine Learning for Signal Processing Linear Gaussian Models

Machine Learning for Signal Processing Applications of Linear Gaussian Models

Expectation Maximization Mixture Models HMMs

Machine Learning for Signal Processing Regression and Prediction

Machine Learning for Signal Processing Regression and Prediction

Discriminative classifier: Logistic Regression. CS534-Machine Learning

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Generative classification models

Speech and Language Processing

e i is a random error

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Generative and Discriminative Models. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Composite Hypotheses testing

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Mixture of Gaussians Expectation Maximization (EM) Part 2

Mean Field / Variational Approximations

Classification learning II

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Linear Approximation with Regularization and Moving Least Squares

β0 + β1xi and want to estimate the unknown

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

6 Supplementary Materials

A Tutorial on Data Reduction. Linear Discriminant Analysis (LDA) Shireen Elhabian and Aly A. Farag. University of Louisville, CVIP Lab September 2009

10-701/ Machine Learning, Fall 2005 Homework 3

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

15-381: Artificial Intelligence. Regression and cross validation

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Lecture 10: Dimensionality reduction

Lecture 3 Stat102, Spring 2007

EM and Structure Learning

+, where 0 x N - n. k k

Linear discriminants. Nuno Vasconcelos ECE Department, UCSD

Retrieval Models: Language models

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede. . For P such independent random variables (aka degrees of freedom): 1 =

Parameter estimation class 5

Independent Component Analysis

CHAPTER 3: BAYESIAN DECISION THEORY

Classification as a Regression Problem

Feature Selection: Part 1

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Generalized Linear Methods

x i1 =1 for all i (the constant ).

Estimation of Non-Gaussian Probability Density by Dynamic Bayesian Networks

9 : Learning Partially Observed GM : EM Algorithm

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Machine Learning for Signal Processing

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Outline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline

Lecture Notes on Linear Regression

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Chapter 14 Simple Linear Regression

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

Support Vector Machines

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

3.1 ML and Empirical Distribution

Statistics for Economics & Business

Expectation Maximization Mixture Models

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

b ), which stands for uniform distribution on the interval a x< b. = 0 elsewhere

Machine learning: Density estimation

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Maximum Likelihood Estimation (MLE)

1 Convex Optimization

Limited Dependent Variables and Panel Data. Tibor Hanappi

Homework Assignment 3 Due in class, Thursday October 15

LECTURE 9 CANONICAL CORRELATION ANALYSIS

Laboratory 3: Method of Least Squares

Feature Selection & Dynamic Tracking F&P Textbook New: Ch 11, Old: Ch 17 Guido Gerig CS 6320, Spring 2013

Chapter 13: Multiple Regression

β0 + β1xi. You are interested in estimating the unknown parameters β

Clustering & Unsupervised Learning

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

Basic Business Statistics, 10/e

Laboratory 1c: Method of Least Squares

Supporting Information

Which Separator? Spring 1

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Rockefeller College University at Albany

Chat eld, C. and A.J.Collins, Introduction to multivariate analysis. Chapman & Hall, 1980

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Multigradient for Neural Networks for Equalizers 1

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

Mixture o f of Gaussian Gaussian clustering Nov

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Clustering & (Ken Kreutz-Delgado) UCSD

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Transcription:

Machne Learnng for Sgnal rocessng Lnear Gaussan Models lass 2. 2 Nov 203 Instructor: Bhsha Raj 2 Nov 203 755/8797

HW3 s up. Admnstrva rojects please send us an update 2 Nov 203 755/8797 2

Recap: MA stmators MA Mamum A osteror: Fnd a best guess for statstcall gven non = argma Y Y 2 Nov 203 755/8797 3

Recap: MA estmaton and are jontl Gaussan z [ z] z Var z zz [ ] z N z zz ep 0.5 z z z z 2 zz z s Gaussan 2 Nov 203 755/8797 4

MA estmaton: Gaussan DF Y F X 2 Nov 203 755/8797 5

MA estmaton: he Gaussan at a partcular value of X 0 2 Nov 203 755/8797 6

ondtonal robablt of N [ ] Var he condtonal probablt of gven s also Gaussan he slce n the fgure s Gaussan he mean of ths Gaussan s a functon of he varance of reduces f s non Uncertant s reduced 2 Nov 203 755/8797 7

MA estmaton: he Gaussan at a partcular value of X Most lel value F 0 2 Nov 203 755/8797 8

MA stmaton of a Gaussan RV ˆ arg ma [ ] 0 2 Nov 203 755/8797 9

Its also a mnmum-mean-squared error estmate Mnmze error: Dfferentatng and equatng to 0: 2 Nov 203 755/8797 0 ] ˆ ˆ [ ] ˆ [ 2 rr ] [ 2ˆ ˆ ˆ ] [ ] 2ˆ ˆ ˆ [ rr 0 ˆ ] [ 2 ˆ 2ˆ. d d rr d ] [ ˆ he MMS estmate s the mean of the dstrbuton

For the Gaussan: MA = MMS Most lel value s also he MAN value Would be true of an smmetrc dstrbuton 2 Nov 203 755/8797

MMS estmates for mture dstrbutons 2 Let be a mture denst he MMS estmate of s gven b Just a eghted combnaton of the MMS estmates from the component dstrbutons d ] [ d ] [

MMS estmates from a Gaussan mture 2 Nov 203 755/8797 3 s also a Gaussan mture Let be a Gaussan Mture ; N z z z

MMS estmates from a Gaussan mture 2 Nov 203 755/8797 4 Let s a Gaussan Mture N N N

MMS estmates from a Gaussan mture 2 Nov 203 755/8797 5 [] s also a mture s a mture Gaussan denst N ] [ ] [ ] [

MMS estmates from a Gaussan mture 2 Nov 203 755/8797 6 Weghted combnaton of MMS estmates obtaned from ndvdual Gaussans! Weght s easl computed too.. ] [ N

MMS estmates from a Gaussan mture A mture of estmates from ndvdual Gaussans 2 Nov 203 755/8797 7

Voce Morphng Algn tranng recordngs from both speaers epstral vector sequence Learn a GMM on jont vectors Gven speech from one speaer fnd MMS estmate of the other Snthesze from cepstra 2 Nov 203 755/8797 8

MMS th GMM: Voce ransformaton - Festvo GMM transformaton sute oda ab bdl jm slt ab bdl jm slt 2 Nov 203 755/8797 9

MA / ML / MMS General statstcal estmators All used to predct a varable based on other parameters related to t.. Most common assumpton: Data are Gaussan all RVs are Gaussan Other probablt denstes ma also be used.. For Gaussans relatonshps are lnear as e sa.. 2 Nov 203 755/8797 20

Gaussans and more Gaussans.. Lnear Gaussan Models.. But frst a recap 2 Nov 203 755/8797 2

A Bref Recap D B D B rncpal component analss: Fnd the K bases that best eplan the gven data Fnd B and such that the dfference beteen D and B s mnmum Whle constranng that the columns of B are orthonormal 2 Nov 203 755/8797 22

Remember genfaces Appromate ever face f as f = f V + f2 V 2 + f3 V 3 +.. + f V stmate V to mnmze the squared error rror s uneplaned b V.. V rror s orthogonal to genfaces 2 Nov 203 755/8797 23

Karhunen Loeve vs. A genvectors of the orrelaton matr: rncpal drectons of tghtest ellpse centered on orgn Drectons that retan mamum energ 2 Nov 203 755/8797 24

Karhunen Loeve vs. A genvectors of the orrelaton matr: rncpal drectons of tghtest ellpse centered on orgn Drectons that retan mamum energ genvectors of the ovarance matr: rncpal drectons of tghtest ellpse centered on data Drectons that retan mamum varance 2 Nov 203 755/8797 25

Karhunen Loeve vs. A genvectors of the orrelaton matr: rncpal drectons of tghtest ellpse centered on orgn Drectons that retan mamum energ genvectors of the ovarance matr: rncpal drectons of tghtest ellpse centered on data Drectons that retan mamum varance 2 Nov 203 755/8797 26

Karhunen Loeve vs. A genvectors of the orrelaton matr: rncpal drectons of tghtest ellpse centered on orgn Drectons that retan mamum energ genvectors of the ovarance matr: rncpal drectons of tghtest ellpse centered on data Drectons that retan mamum varance 2 Nov 203 755/8797 27

Karhunen Loeve vs. A If the data are naturall centered at orgn KL == A Follong sldes refer to A! Assume data centered at orgn for smplct Not essental as e ll see.. 2 Nov 203 755/8797 28

Remember genfaces Appromate ever face f as f = f V + f2 V 2 + f3 V 3 +.. + f V stmate V to mnmze the squared error rror s uneplaned b V.. V rror s orthogonal to genfaces 2 Nov 203 755/8797 29

gen Representaton 0 = + e e Illustraton assumng 3D space K-dmensonal representaton rror s orthogonal to representaton Weght and error are specfc to data nstance 2 Nov 203 755/8797 30

Representaton rror s at 90 o to the egenface = 2 + e 2 90 o 2 e 2 Illustraton assumng 3D space K-dmensonal representaton rror s orthogonal to representaton Weght and error are specfc to data nstance 2 Nov 203 755/8797 3

Representaton 0 All data th the same representaton V le a plane orthogonal to V K-dmensonal representaton rror s orthogonal to representaton 2 Nov 203 755/8797 32

Wth 2 bases rror s at 90 o to the egenfaces 00 = + 2 + e e 2 Illustraton assumng 3D space K-dmensonal representaton rror s orthogonal to representaton Weght and error are specfc to data nstance 2 Nov 203 755/8797 33

Wth 2 bases rror s at 90 o to the egenfaces = 2 + 22 + e 2 22 2 e 2 Illustraton assumng 3D space K-dmensonal representaton rror s orthogonal to representaton Weght and error are specfc to data nstance 2 Nov 203 755/8797 34

rror s at 90 o to the egenfaces e 2 In Vector Form K-dmensonal representaton X = V + 2 V 2 + e 2 2 22 V 2 2 D 2 V rror s orthogonal to representaton Weght and error are specfc to data nstance X V V e 2 Nov 203 755/8797 35

rror s at 90 o to the egenface e 2 In Vector Form X = V + 2 V 2 + e 2 V e 22 V 2 D 2 V K-dmensonal representaton s a D dmensonal vector V s a D K matr s a K dmensonal vector e s a D dmensonal vector 2 Nov 203 755/8797 36

Learnng A For the gven data: fnd the K-dmensonal subspace such that t captures most of the varance n the data Varance n remanng subspace s mnmal 2 Nov 203 755/8797 37

onstrants rror s at 90 o to the egenface V e 2 22 V 2 D 2 e 2 V V V = I : gen vectors are orthogonal to each other For ever vector error s orthogonal to gen vectors e V = 0 Over the collecton of data Average = Dagonal : gen representatons are uncorrelated Determnant e e = mnmum: rror varance s mnmum Mean of error s 0 2 Nov 203 755/8797 38

A Statstcal Formulaton of A rror s at 90 o to the egenface V e 22 e 2 2 V 2 D 2 V e ~ N0 B ~ N0 s a random varable generated accordng to a lnear relaton s dran from an K-dmensonal Gaussan th dagonal covarance e s dran from a 0-mean D-K-ran D-dmensonal Gaussan stmate V and B gven eamples of 2 Nov 203 755/8797 39

Lnear Gaussan Models!! V e ~ N0 B e ~ N0 s a random varable generated accordng to a lnear relaton s dran from a Gaussan e s dran from a 0-mean Gaussan stmate V gven eamples of In the process also estmate B and 40 2 Nov 203 755/8797

Lnear Gaussan Models!! V e ~ N0 B e ~ N0 s a random varable generated accordng to a lnear relaton s dran from a Gaussan e s dran from a 0-mean Gaussan stmate V gven eamples of In the process also estmate B and 4 2 Nov 203 755/8797

Lnear Gaussan Models μ V e ~ N0 B e ~ N0 Observatons are lnear functons of to uncorrelated Gaussan random varables A eght varable An error varable e rror not correlated to eght: [e ] = 0 Learnng LGMs: stmate parameters of the model gven nstances of he problem of learnng the dstrbuton of a Gaussan RV 2 Nov 203 755/8797 42

LGMs: robablt Denst μ V e ~ N0 B e ~ N0 he mean of : [ ] μ V[ ] [ e] μ he ovarance of : [ [ ] [ ] ] VBV 2 Nov 203 755/8797 43

he probablt of μ V e e ~ N0 B ~ N0 ~ N μ VBV ep 0.5 D 2 VBV μ VBV μ s a lnear functon of Gaussans: s also Gaussan Its mean and varance are as gven 2 Nov 203 755/8797 44

stmatng the varables of the μ V model e e ~ N0 B ~ N0 ~ N μ VBV stmatng the varables of the LGM s equvalent to estmatng he varables are V B and 2 Nov 203 755/8797 45

stmatng the model μ V e e ~ N0 B ~ N0 ~ N μ VBV he model s ndetermnate: V = V - = V - We need etra constrants to mae the soluton unque Usual constrant : B = I Varance of s an dentt matr 2 Nov 203 755/8797 46

stmatng the varables of the μ V model e ~ N0 I e ~ N0 ~ N μ VV stmatng the varables of the LGM s equvalent to estmatng he varables are V and 2 Nov 203 755/8797 47

he Mamum Lelhood stmate ~ N μ VV Gven tranng set 2.. N fnd V he ML estmate of does not depend on the covarance of the Gaussan μ N 2 Nov 203 755/8797 48

entered Data We can safel assume centered data = 0 If the data are not centered center t stmate mean of data Whch s the mamum lelhood estmate Subtract t from the data 2 Nov 203 755/8797 49

Smplfed Model V e ~ N0 I e ~ N0 ~ N0 VV stmatng the varables of the LGM s equvalent to estmatng he varables are V and 2 Nov 203 755/8797 50

stmatng the model V e ~ N0 VV Gven a collecton of terms 2.. N stmate V and s unnon for each But f assume e no for each then hat do e get: 2 Nov 203 755/8797 5

stmatng the arameters V e e N0 N V 2 D ep 0.5 V V We ll use a mamum-lelhood estmate he log-lelhood of.. N nong ther s log.. N.. N 0.5N log 0.5 V V 2 Nov 203 755/8797 52

Mamzng the log-lelhood Dfferentatng.r.t. V and settng to 0 2 Nov 203 755/8797 53 N LL 0.5 log 0.5 V V 0 2 V V Dfferentatng.r.t. - and settng to 0 N V

stmatng LGMs: If e no But n realt e don t no the for each So ho to deal th ths? M.. 2 Nov 203 755/8797 54 V N V e V 0 N e

Recall M Instance from blue dce Instance from red dce Dce unnon 6 6 6 6 6.... ollecton of blue numbers ollecton of red numbers.... 6 6 ollecton of blue numbers ollecton of red numbers 6.. 6.. ollecton of blue numbers ollecton of red numbers We fgured out ho to compute parameters f e ne the mssng nformaton hen e fragmented the observatons accordng to the posteror probablt z and counted as usual In effect e too the epectaton th respect to the a posteror probablt of the mssng data: z 2 Nov 203 755/8797 55

M for LGMs Replace unseen data terms th epectatons taen.r.t. 2 Nov 203 755/8797 56 V N V e V 0 N e N N V ] [ ] [ ] [ V

M for LGMs Replace unseen data terms th epectatons taen.r.t. 2 Nov 203 755/8797 57 V N V e V 0 N e N N V ] [ ] [ ] [ V

pected Value of gven V e N0 e N0 I N0 VV and are jontl Gaussan! s Gaussan s Gaussan he are lnearl related z z N z zz 2 Nov 203 755/8797 58

pected Value of gven V [ ] e N0 VV N0 I z V z N z zz zz z zz VV V 0 V I and are jontl Gaussan! 2 Nov 203 755/8797 59

he condtonal epectaton of gven z z s a Gaussan 2 Nov 203 755/8797 60 N I V V VV zz zz 0 z V VV V VV V I N VV V ] [ Var ] [ ] [ ] [ I ] [ ] [ ] [ V VV V

LGM: he complete M algorthm Intalze V and step: M step: 2 Nov 203 755/8797 6 VV V ] [ I ] [ ] [ ] [ V VV V ] [ ] [ V N N V ] [

So hat have e acheved mploed a complcated M algorthm to learn a Gaussan DF for a varable What have e ganed??? Net class: A Sensble A M algorthms for A Factor Analss FA for feature etracton 2 Nov 203 755/8797 62

LGMs : Applcaton Learnng prncpal components V e ~ N0 I e ~ N0 Fnd drectons that capture most of the varaton n the data rror s orthogonal to these varatons 3 Oct 20 755/8797 63

LGMs : Applcaton 2 Learnng th nsuffcent data FULL OV FIGUR he full covarance matr of a Gaussan has D 2 terms Full captures the relatonshps beteen varables roblem: Needs a lot of data to estmate robustl 3 Oct 20 755/8797 64

o be contnued.. Other applcatons.. Net class 2 Nov 203 755/8797 65