Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin
|
|
- Ambrose Andrews
- 5 years ago
- Views:
Transcription
1 Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn
2 Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of DHS Assume examples n each class come from a parameterzed Gaussan densty Estmate the parameters (mean, varance) of the Gaussan densty for each class, and use them for classfcaton Estmaton uses Maxmum Lkelhood approach.
3 Revew of Maxmum Lkelhood Gven n..d. examples from a densty p(x;θ), wth known form p and unknown parameter θ. Goal: estmate θ, denoted by θˆ, such that the observed data s most lkely to be from the dstrbuton wth that θ. Steps nvolved: Wrte the lkelhood of the observed data. Maxmze the lkelhood wth respect to the parameter. 3
4 Example: D Gaussan Dstrbuton Maxmum Lkelhood Estmaton of Mean of a Gaussan Dstrbuton 0.4 True Densty MLE from 0 Samples MLE from 00 samples x
5 Example: D Gaussan Dstrbuton 5 D Parameter Estmaton for Gaussan Dstrbuton 4 3 Blue: True densty Red : Estmated from 50 examples. y x
6 Multmodal Class Dstrbutons A sngle Gaussan may not accurately model the classes. Fnd subclasses n handwrtten onlne characters (,000 characters wrtten by 00 wrters) Performance mproves by modelng subclasses Connell and Jan, Wrter Adaptaton for Onlne Handwrtng Recognton, IEEE PAMI, Mar 00
7 Multmodal Classes Handwrtten f vs y classfcaton task. 7 A sngle Gaussan dstrbuton may not model the classes accurately.
8 An extreme example of multmodal classes 0 Lmtatons of Unmodal class modellng y Red vs. Blue classfcaton. The classes are well separated. However, ncorrect model assumptons result n hgh classfcaton error x The red class s a mxture of two Gaussan dstrbutons There s no class label nformaton, when modelng the densty of just the red class.
9 Fnte mxtures k random sources, probablty densty functons f (x),,,k f (x) f (x) Choose at random X random varable f (x) f k (x) 9
10 Fnte mxtures Example: 3 speces (Irs) 0
11 Fnte mxtures f (x) f (x) X Choose at random, Prob.(source ) α random varable f (x) f k (x) Condtonal: Jont: f (x source ) f (x) f (x and source ) f (x) α Uncondtonal: f (x and source ) f(x) all sources k α f (x)
12 Fnte mxtures f (x) k α f(x) Component denstes Mxng probabltes: α 0 and k α Parameterzed components (e.g., Gaussan): f (x) f (x θ ) Θ f (x Θ) k α f (x θ) { θ,θ,...,θk, α, α,..., αk}
13 Gaussan mxtures f (x θ ) Gaussan Arbtrary covarances: f (x θ ) N(x µ, C ) Θ { µ, µ,..., µ k, C, C,..., Ck, α, α,... αk } Common covarance: f (x θ ) N(x µ, C) 3 Θ { µ, µ,..., µ k, C, α, α,... αk }
14 Mxture fttng / estmaton () () (n) Data: n ndependent observatons, x {x, x,..., x } Goals: estmate the parameter set Θ, maybe classfy the observatons Example: - How many speces? Mean of each speces? - Whch ponts belong to each speces? Classfed data (classes unknown) Observed data 4
15 Gaussan mxtures (d), an example µ σ 3 µ 4 σ µ 7 3 σ α 0.6 α 0.3 α 0. 3
16 6 Gaussan mxtures, an R example k µ 0 0 C µ C 4 4 µ C (500 ponts)
17 Uses of mxtures n pattern recognton Unsupervsed learnng (model-based clusterng): - each component models one cluster - clusterng mxture fttng Observatons: - unclassfed ponts Goals: - fnd the classes, - classfy the ponts 7
18 Uses of mxtures n pattern recognton Mxtures can approxmate arbtrary denstes Good to represent class condtonal denstes n supervsed learnng Example: - two strongly non-gaussan classes. - Use mxtures to model each class-condtonal densty. 8
19 Uses of mxtures n pattern recognton Fnd subclasses (lexemes) Eg. onlne characters Performance mproves by modelng subclasses,000 characters wrtten by 00 wrters Connell and Jan, Wrter Adaptaton for Onlne Handwrtng Recognton, IEEE PAMI, 00
20 Fttng mxtures n ndependent observatons x {x (), x (),..., x (n) } Maxmum (log)lkelhood (ML) estmate of Θ: Θ ˆ arg max Θ L(x, Θ) L(x, Θ) log n j f (x ( j) Θ) n k log j ( j) α f ( x θ ) 0 mxture ML estmate has no closed-form soluton
21 Gaussan mxtures: A pecular type of ML Θ { µ, µ,..., µ k, C, C,..., Ck, α, α,... αk } Maxmum (log)lkelhood (ML) estmate of Θ: Θ ˆ arg max L(x, Θ) Θ Subject to: C α 0 postve and defnte k α Problem: the lkelhood functon s unbounded as det( C ) 0 There s no global maxmum. Unusual goal: a good local maxmum
22 A Pecular type of ML problem Example: a -component Gaussan mxture f (x µ,µ,σ, α) α πσ e (x µ σ ) + α π e (x µ ) Some data ponts: { x, x,..., x n} µ x L(x, Θ) log α πσ + α π e (x µ ) + n j log(...), as σ 0
23 Fttng mxtures: a mssng data problem ML estmate has no closed-form soluton Standard alternatve: expectaton-maxmzaton (EM) algorthm Mssng data problem: Observed data: x {x (), x (),..., x (n) () () (n) Mssng data: z { z, z,..., z } Mssng labels ( colors ) } z ( j) [ ] ( j) ( j) z, z,..., z ( kj), [ ] T 3 at poston x ( j) generated by component
24 Fttng mxtures: a mssng data problem Observed data: x {x z { z (), x, z (),..., x,..., z (n) () () (n) Mssng data: } } z ( j) [ ] ( j) ( j) z,...,z k Complete log-lkelhood functon: L n k Θ ( j) c ( x, z, ) z log ) j ( ( j) α f ( x θ ) k- zeros, one log f (x ( j),z ( j) Θ) In the presence of both x and z, Θ would be easy to estmate, but z s mssng. 4
25 The EM algorthm ˆ ˆ ˆ ˆ ) ( 0) () (t) (t+ Iteratve procedure: Θ, Θ,..., Θ, Θ,... Under mld condtons: Θˆ (t) local maxmum of L( x, Θ) t The E-step: compute the expected value of ( x, z, Θ) L c E[L c ( x, z, Θ) x, Θˆ (t) ] Q( Θ, Θˆ (t) ) The M-step: update parameter estmates 5 ˆ (t+ ) arg max Q(, ˆ (t Θ Θ Θ ) ) Θ
26 The EM algorthm: the Gaussan case The E-step: Q( Θ, Θˆ Because ( x, z, Θ) E[z (j) L c Θ ˆ (t) x, ] Pr{z (j) (t) ) E Bnary varable Z [L c ( x, z, Θ) x, Θˆ L ( x,e[ z x, Θˆ c (t) s lnear n z Bayes law x (j), Θˆ (t) } k αˆ n αˆ f(x n (t) ], Θ) (j) f(x (j) ] θˆ (t) θˆ ) (t) n ) w (j,t) 6 (j,t) w Estmate, at teraton t, of the probablty that x ( j) was produced by component Soft probablstc assgnment
27 The EM algorthm: the Gaussan case Result of the E-step: (j,t) w Estmate, at teraton t, of the probablty that x ( j) was produced by component The M-step: αˆ (t + ) n n j w ( j,t ) 7 ˆ n (t+ ) j µ n w j ( j,t) w x ( j,t) ( j) Ĉ (t+ ) n ( j,t) ( j) w (x j n µ ˆ j (t+ ) w ( j,t) ) (x ( j) µ ˆ (t+ ) ) T
28 Dffcultes wth EM It s a local (greedy) algorthm (lkelhood never dcreases) Intalzaton dependent 74 teratons 70 teratons 8
29 Automatcally decdng the number of components Add a penalty term to the objectve functon, whch ncreases wth the number of clusters Start wth a large number of clusters Modfy the M-step to nclude a kller crteron whch removes components satsfyng certan crteron Fnally, choose the number of components, resultng n wth the largest objectve functon value (lkelhood - penalty). 9
30 Example Same as n [Ueda and Nakano, 998]. 30
31 Example k 4 n 00 k max µ µ 0 µ 4 µ 0 4 α C I m 4 3
32 Example Same as n [Ueda, Nakano, Ghahraman and Hnton, 000]. 3
33 Example An example wth overlappng components 33
34 34 The rs (4-dm.) data-set: 3 components correctly dentfed
35 Another supervsed learnng example Problem: learn to classfy textures, from 9 Gabor features. - Four classes: 35 -Ft Gaussan mxtures to 800 randomly located feature vectors from each class/texture. -Test on the remanng data. Mxture-based Lnear dscrmnant Quadratc dsrmnant Error rate
36 Resultng decson regons -d projecton of the texture data and the obtaned mxtures 36
37 Propertes of EM EM s extremely popular because of the followng propertes: Easy to mplement Guarantees the lkelhood ncreases monotoncally (why?) Guarantees the convergence of the soluton to a statonary pont.e., local maxma (why?). Lmtatons of EM resultng soluton depends hghly on the ntalzaton Could be slow n several cases compared to drect optmzaton methods (e.g., Iteratve scalng) 37
38 EM as lower bound optmzaton Start wth ntal guess θ 0 0, θ l( θ, θ ) θ 0 0, θ
39 EM as lower bound optmzaton Touch Pont l( θ, θ ) l( θ, θ ) l( θ, θ ) + Q( θ, θ ) 0 0 Start wth ntal guess { θ, θ } 0 Come up wth a lower bounded l( θ, θ ) l( θ, θ ) + Q( θ, θ ) 0 0 Q( θ, θ ) s a concave functon Touch pont: Q( θ θ, θ θ ) { θ 0, θ 0 }
40 EM as lower bound optmzaton Start wth ntal guess { θ, θ } 0 l( θ, θ ) l( θ, θ ) l( θ, θ ) + Q( θ, θ ) 0 0 Come up wth a lower bounded l( θ, θ ) l( θ, θ ) + Q( θ, θ ) 0 0 Q( θ, θ ) s a concave functon Touch pont: Q( θ θ, θ θ ) { θ 0, θ 0 }{ θ, θ } Search the optmal soluton that maxmzes Q( θ, θ )
41 EM as lower bound optmzaton Start wth ntal guess { θ, θ } 0 l( θ, θ ) l( θ, θ ) + Q( θ, θ ) l( θ, θ ) Come up wth a lower bounded l( θ, θ ) l( θ, θ ) + Q( θ, θ ) 0 0 Q( θ, θ ) s a concave functon Touch pont: Q( θ θ, θ θ ) { θ 0, θ 0 }{ θ, θ } { θ, θ } Search the optmal soluton that maxmzes Q( θ, θ ) Repeat the procedure
42 EM as lower bound optmzaton Optmal Pont Start wth ntal guess { θ, θ } 0 l( θ, θ ) { θ 0, θ 0 }{ θ, θ } { θ θ },,... Come up wth a lower bounded l( θ, θ ) l( θ, θ ) + Q( θ, θ ) 0 0 Q( θ, θ ) s a concave functon Touch pont: Q( θ θ, θ θ ) Search the optmal soluton that maxmzes Q( θ, θ ) Repeat the procedure Converge to the local optmal
43 Summary Expectaton-Maxmzaton algorthm E step: Compute expected complete data lkelhood Mstep: Maxmze the lkelhood to fnd parameters Can be used wth any model wth hdden (latent) varables Hdden varables can be natural to the model or can be artfcally ntroduced. Makes the parameter estmaton smpler, and effcent EM algorthm can be explaned from many perspectves Bound optmzaton Proxmal pont optmzaton, etc Several generalzatons/specalzatons exst Easy to mplement, and s wdely used! 43
MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)
1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons
More informationEM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationSpace of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics
/7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space
More informationClustering & Unsupervised Learning
Clusterng & Unsupervsed Learnng Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 2012 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationCourse 395: Machine Learning - Lectures
Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng
More informationxp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ
CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and
More informationClustering & (Ken Kreutz-Delgado) UCSD
Clusterng & Unsupervsed Learnng Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y ), fnd an approxmatng
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationMaximum Likelihood Estimation
Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?
More informationSDMML HT MSc Problem Sheet 4
SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models
More informationClustering gene expression data & the EM algorithm
CG, Fall 2011-12 Clusterng gene expresson data & the EM algorthm CG 08 Ron Shamr 1 How Gene Expresson Data Looks Entres of the Raw Data matrx: Rato values Absolute values Row = gene s expresson pattern
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationOverview. Hidden Markov Models and Gaussian Mixture Models. Acoustic Modelling. Fundamental Equation of Statistical Speech Recognition
Overvew Hdden Marov Models and Gaussan Mxture Models Steve Renals and Peter Bell Automatc Speech Recognton ASR Lectures &5 8/3 January 3 HMMs and GMMs Key models and algorthms for HMM acoustc models Gaussans
More informationENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition
EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30
More informationMaximum Likelihood Estimation (MLE)
Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationP R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /
Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More informationLecture 12: Classification
Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna
More informationThe big picture. Outline
The bg pcture Vncent Claveau IRISA - CNRS, sldes from E. Kjak INSA Rennes Notatons classes: C = {ω = 1,.., C} tranng set S of sze m, composed of m ponts (x, ω ) per class ω representaton space: R d (=
More informationProbability Density Function Estimation by different Methods
EEE 739Q SPRIG 00 COURSE ASSIGMET REPORT Probablty Densty Functon Estmaton by dfferent Methods Vas Chandraant Rayar Abstract The am of the assgnment was to estmate the probablty densty functon (PDF of
More informationOn an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1
On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool
More informationStatistical pattern recognition
Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve
More informationAn Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation
An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads
More information9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov
9.93 Class IV Part I Bayesan Decson Theory Yur Ivanov TOC Roadmap to Machne Learnng Bayesan Decson Makng Mnmum Error Rate Decsons Mnmum Rsk Decsons Mnmax Crteron Operatng Characterstcs Notaton x - scalar
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationParametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010
Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton
More informationMultilayer neural networks
Lecture Multlayer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Mdterm exam Mdterm Monday, March 2, 205 In-class (75 mnutes) closed book materal covered by February 25, 205 Multlayer
More informationAdvanced Statistical Methods: Beyond Linear Regression
Advanced Statstcal Methods: Beyond Lnear Regresson John R. Stevens Utah State Unversty Notes 2. Statstcal Methods I Mathematcs Educators Workshop 28 March 2009 1 http://www.stat.usu.edu/~rstevens/pcm 2
More informationLarge-Margin HMM Estimation for Speech Recognition
Large-Margn HMM Estmaton for Speech Recognton Prof. Hu Jang Department of Computer Scence and Engneerng York Unversty, Toronto, Ont. M3J 1P3, CANADA Emal: hj@cs.yorku.ca Ths s a jont work wth Chao-Jun
More informationStatistical learning
Statstcal learnng Model the data generaton process Learn the model parameters Crteron to optmze: Lkelhood of the dataset (maxmzaton) Maxmum Lkelhood (ML) Estmaton: Dataset X Statstcal model p(x;θ) (θ parameters)
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for
More informationSemi-Supervised Learning
Sem-Supervsed Learnng Consder the problem of Prepostonal Phrase Attachment. Buy car wth money ; buy car wth wheel There are several ways to generate features. Gven the lmted representaton, we can assume
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationPredictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore
Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.
More informationHidden Markov Models
Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,
More informationComposite Hypotheses testing
Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter
More informationThe Basic Idea of EM
The Basc Idea of EM Janxn Wu LAMDA Group Natonal Key Lab for Novel Software Technology Nanjng Unversty, Chna wujx2001@gmal.com June 7, 2017 Contents 1 Introducton 1 2 GMM: A workng example 2 2.1 Gaussan
More informationWhy Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)
Why Bayesan? 3. Bayes and Normal Models Alex M. Martnez alex@ece.osu.edu Handouts Handoutsfor forece ECE874 874Sp Sp007 If all our research (n PR was to dsappear and you could only save one theory, whch
More informationThe Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD
he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s
More informationLearning undirected Models. Instructor: Su-In Lee University of Washington, Seattle. Mean Field Approximation
Readngs: K&F 0.3, 0.4, 0.6, 0.7 Learnng undrected Models Lecture 8 June, 0 CSE 55, Statstcal Methods, Sprng 0 Instructor: Su-In Lee Unversty of Washngton, Seattle Mean Feld Approxmaton Is the energy functonal
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More informationClustering with Gaussian Mixtures
Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your
More informationThe EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X
The EM Algorthm (Dempster, Lard, Rubn 1977 The mssng data or ncomplete data settng: An Observed Data Lkelhood (ODL that s a mxture or ntegral of Complete Data Lkelhoods (CDL. (1a ODL(;Y = [Y;] = [Y,][
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationAutomatic Object Trajectory- Based Motion Recognition Using Gaussian Mixture Models
Automatc Object Trajectory- Based Moton Recognton Usng Gaussan Mxture Models Fasal I. Bashr, Ashfaq A. Khokhar, Dan Schonfeld Electrcal and Computer Engneerng, Unversty of Illnos at Chcago. Chcago, IL,
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces
More informationMixture o f of Gaussian Gaussian clustering Nov
Mture of Gaussan clusterng Nov 11 2009 Soft vs hard lusterng Kmeans performs Hard clusterng: Data pont s determnstcally assgned to one and only one cluster But n realty clusters may overlap Soft-clusterng:
More informationA Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach
A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland
More informationChapter Newton s Method
Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve
More informationMACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression
11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING
More informationThe Expectation-Maximization Algorithm
The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.
More informationGenerative classification models
CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn
More informationMulti-layer neural networks
Lecture 0 Mult-layer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Lnear regresson w Lnear unts f () Logstc regresson T T = w = p( y =, w) = g( w ) w z f () = p ( y = ) w d w d Gradent
More informationClassification learning II
Lecture 8 Classfcaton learnng II Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Logstc regresson model Defnes a lnear decson boundar Dscrmnant functons: g g g g here g z / e z f, g g - s a logstc functon
More informationLecture Nov
Lecture 18 Nov 07 2008 Revew Clusterng Groupng smlar obects nto clusters Herarchcal clusterng Agglomeratve approach (HAC: teratvely merge smlar clusters Dfferent lnkage algorthms for computng dstances
More information2 STATISTICALLY OPTIMAL TRAINING DATA 2.1 A CRITERION OF OPTIMALITY We revew the crteron of statstcally optmal tranng data (Fukumzu et al., 1994). We
Advances n Neural Informaton Processng Systems 8 Actve Learnng n Multlayer Perceptrons Kenj Fukumzu Informaton and Communcaton R&D Center, Rcoh Co., Ltd. 3-2-3, Shn-yokohama, Yokohama, 222 Japan E-mal:
More informationEvaluation of classifiers MLPs
Lecture Evaluaton of classfers MLPs Mlos Hausrecht mlos@cs.ptt.edu 539 Sennott Square Evaluaton For any data set e use to test the model e can buld a confuson matrx: Counts of examples th: class label
More informationThe conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above
The conjugate pror to a Bernoull s A) Bernoull B) Gaussan C) Beta D) none of the above The conjugate pror to a Gaussan s A) Bernoull B) Gaussan C) Beta D) none of the above MAP estmates A) argmax θ p(θ
More informationSTATS 306B: Unsupervised Learning Spring Lecture 10 April 30
STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear
More informationMotion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong
Moton Percepton Under Uncertanty Hongjng Lu Department of Psychology Unversty of Hong Kong Outlne Uncertanty n moton stmulus Correspondence problem Qualtatve fttng usng deal observer models Based on sgnal
More informationGaussian Mixture Models
Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous
More information9 : Learning Partially Observed GM : EM Algorithm
10-708: Probablstc Graphcal Models 10-708, Sprng 2012 9 : Learnng Partally Observed GM : EM Algorthm Lecturer: Erc P. Xng Scrbes: Mrnmaya Sachan, Phan Gadde, Vswanathan Srpradha 1 Introducton So far n
More informationAdvances in Longitudinal Methods in the Social and Behavioral Sciences. Finite Mixtures of Nonlinear Mixed-Effects Models.
Advances n Longtudnal Methods n the Socal and Behavoral Scences Fnte Mxtures of Nonlnear Mxed-Effects Models Jeff Harrng Department of Measurement, Statstcs and Evaluaton The Center for Integrated Latent
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far
More informationPrimer on High-Order Moment Estimators
Prmer on Hgh-Order Moment Estmators Ton M. Whted July 2007 The Errors-n-Varables Model We wll start wth the classcal EIV for one msmeasured regressor. The general case s n Erckson and Whted Econometrc
More informationMachine learning: Density estimation
CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of
More informationDiscriminative classifier: Logistic Regression. CS534-Machine Learning
Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng 2 Logstc Regresson Gven tranng set D stc regresson learns the condtonal dstrbuton We ll assume onl to classes and a parametrc form for here s
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear
More informationGrenoble, France Grenoble University, F Grenoble Cedex, France
MODIFIED K-MEA CLUSTERIG METHOD OF HMM STATES FOR IITIALIZATIO OF BAUM-WELCH TRAIIG ALGORITHM Paulne Larue 1, Perre Jallon 1, Bertrand Rvet 2 1 CEA LETI - MIATEC Campus Grenoble, France emal: perre.jallon@cea.fr
More information8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF
10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the
More informationHidden Markov Models
CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte
More informationIntroduction to Hidden Markov Models
Introducton to Hdden Markov Models Alperen Degrmenc Ths document contans dervatons and algorthms for mplementng Hdden Markov Models. The content presented here s a collecton of my notes and personal nsghts
More information8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore
8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationOutline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline
Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number
More informationMIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU
Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern
More informationOther NN Models. Reinforcement learning (RL) Probabilistic neural networks
Other NN Models Renforcement learnng (RL) Probablstc neural networks Support vector machne (SVM) Renforcement learnng g( (RL) Basc deas: Supervsed dlearnng: (delta rule, BP) Samples (x, f(x)) to learn
More informationCSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing
CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationProbabilistic Classification: Bayes Classifiers. Lecture 6:
Probablstc Classfcaton: Bayes Classfers Lecture : Classfcaton Models Sam Rowes January, Generatve model: p(x, y) = p(y)p(x y). p(y) are called class prors. p(x y) are called class condtonal feature dstrbutons.
More informationNewton s Method for One - Dimensional Optimization - Theory
Numercal Methods Newton s Method for One - Dmensonal Optmzaton - Theory For more detals on ths topc Go to Clck on Keyword Clck on Newton s Method for One- Dmensonal Optmzaton You are free to Share to copy,
More informationExpectation Maximization Mixture Models HMMs
-755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood
More informationj) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1
Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons
More informationINF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018
INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton
More informationIV. Performance Optimization
IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationRobust mixture modeling using multivariate skew t distributions
Robust mxture modelng usng multvarate skew t dstrbutons Tsung-I Ln Department of Appled Mathematcs and Insttute of Statstcs Natonal Chung Hsng Unversty, Tawan August, 1 T.I. Ln (NCHU Natonal Chung Hsng
More informationUsing T.O.M to Estimate Parameter of distributions that have not Single Exponential Family
IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran
More informationLecture 10 Support Vector Machines. Oct
Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron
More informationNeural networks. Nuno Vasconcelos ECE Department, UCSD
Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X
More information1 Motivation and Introduction
Instructor: Dr. Volkan Cevher EXPECTATION PROPAGATION September 30, 2008 Rce Unversty STAT 63 / ELEC 633: Graphcal Models Scrbes: Ahmad Beram Andrew Waters Matthew Nokleby Index terms: Approxmate nference,
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationOPTIMAL COMBINATION OF FOURTH ORDER STATISTICS FOR NON-CIRCULAR SOURCE SEPARATION. Christophe De Luigi and Eric Moreau
OPTIMAL COMBINATION OF FOURTH ORDER STATISTICS FOR NON-CIRCULAR SOURCE SEPARATION Chrstophe De Lug and Erc Moreau Unversty of Toulon LSEET UMR CNRS 607 av. G. Pompdou BP56 F-8362 La Valette du Var Cedex
More information