Soft k-means Clustering. Comp 135 Machine Learning Computer Science Tufts University. Mixture Models. Mixture of Normals in 1D

Similar documents
10/7/14. Mixture Models. Comp 135 Introduction to Machine Learning and Data Mining. Maximum likelihood estimation. Mixture of Normals in 1D

Review - Probabilistic Classification

Outlier-tolerant parameter estimation

An Overview of Markov Random Field and Application to Texture Segmentation

CHAPTER 7d. DIFFERENTIATION AND INTEGRATION

Logistic Regression I. HRP 261 2/10/ am

2. Grundlegende Verfahren zur Übertragung digitaler Signale (Zusammenfassung) Informationstechnik Universität Ulm

Analyzing Frequencies

Grand Canonical Ensemble

Introduction to logistic regression

From Structural Analysis to FEM. Dhiman Basu

A Note on Estimability in Linear Models

EM and Structure Learning

te Finance (4th Edition), July 2017.

ST 524 NCSU - Fall 2008 One way Analysis of variance Variances not homogeneous

Physics 256: Lecture 2. Physics

??? Dynamic Causal Modelling for M/EEG. Electroencephalography (EEG) Dynamic Causal Modelling. M/EEG analysis at sensor level. time.

SCITECH Volume 5, Issue 1 RESEARCH ORGANISATION November 17, 2015

Discrete Shells Simulation

Introduction to logistic regression

Econ107 Applied Econometrics Topic 10: Dummy Dependent Variable (Studenmund, Chapter 13)

Lecture 23 APPLICATIONS OF FINITE ELEMENT METHOD TO SCALAR TRANSPORT PROBLEMS

COMPLEX NUMBER PAIRWISE COMPARISON AND COMPLEX NUMBER AHP

8-node quadrilateral element. Numerical integration

Lecture 3: Phasor notation, Transfer Functions. Context

The Hyperelastic material is examined in this section.

Fakultät III Wirtschaftswissenschaften Univ.-Prof. Dr. Jan Franke-Viebach

Economics 600: August, 2007 Dynamic Part: Problem Set 5. Problems on Differential Equations and Continuous Time Optimization

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Chapter 6 Student Lecture Notes 6-1

September 27, Introduction to Ordinary Differential Equations. ME 501A Seminar in Engineering Analysis Page 1. Outline

External Equivalent. EE 521 Analysis of Power Systems. Chen-Ching Liu, Boeing Distinguished Professor Washington State University

Fakultät III Univ.-Prof. Dr. Jan Franke-Viebach

Semi-Supervised Learning

You already learned about dummies as independent variables. But. what do you do if the dependent variable is a dummy?

From Elimination to Belief Propagation

Journal of Theoretical and Applied Information Technology 10 th January Vol. 47 No JATIT & LLS. All rights reserved.

Introduction to Arithmetic Geometry Fall 2013 Lecture #20 11/14/2013

Problem Set #2 Due: Friday April 20, 2018 at 5 PM.

Clustering gene expression data & the EM algorithm

Hydrogen Atom and One Electron Ions

Maximum Likelihood Estimation

6.3.4 Modified Euler s method of integration

Section 11.6: Directional Derivatives and the Gradient Vector

Α complete processing methodology for 3D monitoring using GNSS receivers

Machine learning: Density estimation

On Parameter Estimation of the Envelope Gaussian Mixture Model

Lecture 14. Relic neutrinos Temperature at neutrino decoupling and today Effective degeneracy factor Neutrino mass limits Saha equation

Computing and Communications -- Network Coding

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

ON THE COMPLEXITY OF K-STEP AND K-HOP DOMINATING SETS IN GRAPHS

The exam is closed book, closed notes except your one-page cheat sheet.

Quantifying Uncertainty

A Probabilistic Characterization of Simulation Model Uncertainties

Addition of angular momentum

LECTURE 6 TRANSFORMATION OF RANDOM VARIABLES

THE ARIMOTO-BLAHUT ALGORITHM FOR COMPUTATION OF CHANNEL CAPACITY. William A. Pearlman. References: S. Arimoto - IEEE Trans. Inform. Thy., Jan.

IV. First Law of Thermodynamics. Cooler. IV. First Law of Thermodynamics

Math 656 Midterm Examination March 27, 2015 Prof. Victor Matveev

The Fourier Transform

Addition of angular momentum

Search sequence databases 3 10/25/2016

Abstract Interpretation: concrete and abstract semantics

2F1120 Spektrala transformer för Media Solutions to Steiglitz, Chapter 1

That is, we start with a general matrix: And end with a simpler matrix:

Expectation Maximization Mixture Models HMMs

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Heisenberg Model. Sayed Mohammad Mahdi Sadrnezhaad. Supervisor: Prof. Abdollah Langari

Gaussian Mixture Models

Lecture 10 Support Vector Machines II

Roadmap. XML Indexing. DataGuide example. DataGuides. Strong DataGuides. Multiple DataGuides for same data. CPS Topics in Database Systems

Continuous probability distributions

The Geometry of Logit and Probit

Computing Correlated Equilibria in Multi-Player Games

Note on EM-training of IBM-model 1

A Propagating Wave Packet Group Velocity Dispersion

The Expectation-Maximization Algorithm

Today s logistic regression topics. Lecture 15: Effect modification, and confounding in logistic regression. Variables. Example

Elements of Statistical Thermodynamics

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

10-701/ Machine Learning, Fall 2005 Homework 3

Lucas Test is based on Euler s theorem which states that if n is any integer and a is coprime to n, then a φ(n) 1modn.

Quasi-Classical States of the Simple Harmonic Oscillator

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

3.1 ML and Empirical Distribution

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Sundials and Linear Algebra

Ch. 24 Molecular Reaction Dynamics 1. Collision Theory

Chemical Physics II. More Stat. Thermo Kinetics Protein Folding...

1- Summary of Kinetic Theory of Gases

22/ Breakdown of the Born-Oppenheimer approximation. Selection rules for rotational-vibrational transitions. P, R branches.

FEFF and Related Codes

Probability Theory (revisited)

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Basic Statistical Analysis and Yield Calculations

Total Least Squares Fitting the Three-Parameter Inverse Weibull Density

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Shortest Paths in Graphs. Paths in graphs. Shortest paths CS 445. Alon Efrat Slides courtesy of Erik Demaine and Carola Wenk

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 12

Transcription:

Comp 35 Machn Larnng Computr Scnc Tufts Unvrsty Fall 207 Ron Khardon Th EM Algorthm Mxtur Modls Sm-Suprvsd Larnng Soft k-mans Clustrng ck k clustr cntrs : Assocat xampls wth cntrs p,j ~~ smlarty b/w cntr and x j R-calculat mans as wghtd avrag of xampls n clustr Untl convrgnc Mxtur Modls Motvatd by soft k-mans W dvlop a gnratv modl for clustrng: Assum thr ar k clustrs Clustrs ar not rqurd to hav th sam numbr of ponts And not rqurd to hav th sam shap Mxtur of Normals n D Rpat for =,...,N ck clustr Id z from dscrt dstrbuton wth paramtrs p,p 2,...,p k Not: z 2 {, 2,...,k} ck th xampl x from normal dstrbuton wth paramtrs µ z, z Exampl: whn z =3usngµ 3 and 3 Gvn a datast gnratd by ths procss th clustrng task s to dntfy th paramtrs {p j,µ j, j} j =,...,k Mxtur of Normals n D Maxmum lklhood stmaton Rpat for =,...,N ck clustr Id z from dscrt dstrbuton wth paramtrs p,p 2,...,p k Not: z 2 {, 2,...,k} ck th xampl x from normal dstrbuton wth paramtrs µ z, z Exampl: whn z =3usngµ 3 and 3 To smplfy analyss n class w assum 8j, p j =/k and 8j, j =, ar known and that th x ar dmnsonal Frst analyz assumng z ar known Convnnt notaton: rprsnt th numbr z as a unt vctor bt squnc Exampl: k=4 z =) 000 z =2) 000 z =3) 000 z =4) 000 Notaton: z,j s j th bt wthn z z =2) 000 ) z,2 = z,3 =0

Maxmum lklhood stmaton Frst analyz assumng z ar known Th Complt Data ncluds all th x,z Maxmum lklhood stmaton Th Lklhood p(z )p(x z,µ z ) Data =(x,z ), (x 2,z 2 ),...,(x N,z N ) L = Y = Y (/k) p 2 2 (x µz )2 = Y (/k) p 2 2 j z,j(x µj)2 Notaton trck: xactly on trm rmans from th sum! Maxmum lklhood stmaton Maxmum lklhood stmaton L = Y (/k) p LogL = const @LogL µ j =...=0 ) µ j = z,j x z,j 2 2 j z,j(x j µj)2 X X 2 2 z,j (x µ j ) 2 Ths s not surprsng. Why? Frst analyz assumng z ar known Th Complt Data ncluds all th x,z Data =(x,z ), (x 2,z 2 ),...,(x N,z N ) Th Obsrvd Data ncluds all th Data = x,x 2...,x N à Cannot us prvous stmat. What s th lklhood n ths cas? x Maxmum lklhood stmaton Th EM Algorthm Th Obsrvd Data ncluds z all th Data = x,x 2...,x N Maxmum lklhood prscrbs that w should optmz: p(obsrvd) = p(x,...,x N ) = X X X... p(x,...,x N,z,...,z N ) z z 2 z N Th Equaton for th lklhood nds to sum out (margnalz) ovr th z. No smpl closd form. x A gnral algorthm for maxmzng lklhood whn w hav hddn random varabls Th algorthm has a smpl form whn appld to mxtur modls W wll constran ourslvs to that smpl form And wll mnton th gnral schm of th EM algorthm brfly 2

Th EM Algorthm EM s an tratv algorthm Intalz probablty modl p us p to calculat an mprovd modl p St p =p Untl no furthr mprovmnt EM Algorthm for Mxtur Modls [E] Calculat usng p f,j = E p(z X,{µ 0`})[z,j ]=p(z = j {µ 0`},Data) [M] Estmat p paramtrs usng max lklhood soluton of th complt data by rplacng th unknown z,j by f,j EM for Mxturs n D EM for Mxturs n D [E] Calculat f,j = E p(z X,{µ 0`})[z,j ]=p(z = j {µ 0`},Data) f,j = p((z = j) and x ) p(x ) = p((z = j) and x ) ` p((z = `) and x ) = (/k) p `(/k) p 2 2 (x µ0 j )2 2 2 (x µ0`)2 [M] Estmat paramtrs usng max lklhood rplacng th unknown z,j by µ j = z,j x z ),j µ 00 j = f,j x f,j f,j Frst part holds for any mxtur modl. EM for Mxturs n D [E] Calculat for all,j f,j = (/k) p `(/k) p [M] Calculat for all j µ 00 j = f,j x f,j 2 2 (x µ0 j )2 2 2 (x µ0`)2 Gnral form of EM Dfn an auxlary functon Q(p,p ) Rlatv to obsrvd varabls O and hddn varabls H Q(p 0,p 00 )=E p0 (H O)[log p 00 (H, O)] Assgn for all j: µ 0 j = µ 00 j 3

Th EM Algorthm EM s an tratv algorthm Th EM Algorthm EM s an tratv algorthm Intalz probablty modl p us p to calculat an mprovd modl p St p =p Untl no furthr mprovmnt Intalz probablty modl p ck p so as to maxmz Q(p,p ) St p =p Untl no furthr mprovmnt EM Algorthm for Mxtur Modls [E] Calculat usng p f,j = E p(z X,{µ 0`})[z,j ]=p(z = j {µ 0`},Data) [M] Estmat p paramtrs usng max lklhood rplacng th unknown z,j by Usng th sam mthodology on any mxtur modl (not just Gaussan) ylds th sam tmplat. f,j Sm-Suprvsd Naïv Bays Modl Naïv Bays: robablstc modl wth strong smplfyng assumptons Illustratng applcaton: txt catgorzaton whr w hav data for (documnt,labl ) What f w hav many documnts but labls for only a fw of thm? Can th unlabld documnts hlp? Sm-Suprvsd Naïv Bays Modl What f w hav many documnts but labls for only a fw of thm? Can th unlabld documnts hlp? Bfor xplorng ths quston w wll dvlop th EM algorthm for ths modl whr th labls ar not known Rcall: Naïv Bays Modl Each class nducs a dstrbuton ovr faturs. Faturs ar condtonally ndpndnt gvn th class In ths slds I us th modl wth bnary faturs 4

Rcall: Naïv Bays Modl Rcall: Maxmum Lklhood p(z = j) =p j p(x,` = class j) =q j,` p(x class j) = Ỳ q x,` j,` ( q ( x,`) p(z = j and x )=p j Y p(z and x )= Y j " ` p j Y ` q x,` j,` ( q ( x,`) q x,` j,` ( q ( x,`) # z,j p j = p(z = j) = numbr of xampls wth class j numbr of xampls q j,` = p(x,` = z = j) = num of x wth class j and x,` = numbr of xampls wth class j Naïv Bays as Mxtur Modl EM Algorthm Rpat for =,...,N ck clustr Id z from dscrt dstrbuton wth paramtrs p,p 2,...,p k ck th xampl x from Nav Bays dstrbuton wth paramtrs q z Complt Data Lklhood L = Y " Y Y p j q x,` j,` ( j ` q ( x,`) Log Lklhood # z,j LogL = X X z,j (log p j + X` x,` log q j,` +( x,`) log( q ) j EM Algorthm EM Algorthm Maxmum Lklhood for complt data LogL = X X z,j (log p j + X` x,` log q j,` +( x,`) log( q ) j [w alrady solvd ths a fw lcturs ago] p j = z,j N q j,` = z,jx,` z,j E Stp: Calculatng f,j f,j = E p 0 (Z X)[z,j ]= p0 (z = j and x ) c p0 (z = c and x ) = p0 j Q` q0x,` j,` ( q0 ( x,`) c p0 c Q` q0x,` c,` ( q0 ( x,`) c,`) 5

EM Algorthm for Naïv Bays Calculat: Calculat: f,j = p 00 j = f,j N q 00 j,` = p0 j Q` q0x,` c p0 c f,jx,` f,j Assgn: p ßp and q ß q j,` ( q0 ( x,`) Q` q0x,` c,` ( q0 ( x,`) c,`) Sm-Suprvsd Naïv Bays Modl Naïv Bays for txt catgorzaton What f w hav many documnts but labls for only a fw of thm? Can th unlabld documnts hlp? Us EM: for xampls whr z s known us f,j =z,j nstad of stmatng t Nothng ls changs n th algorthm! 20 nwgroups data 20 nwgroups data 00% 90% 80% 0000 unlabld documnts No unlabld documnts 00% 90% 80% 3000 labld documnts 600 labld documnts 300 labld documnts 40 labld documnts 40 labld documnts 70% 70% Accuracy 60% 50% 40% Accuracy 60% 50% 40% 30% 30% 20% 20% 0% 0% 0% 0 20 50 00 200 500 000 2000 5000 Numbr of Labld Documnts [From Ngam t all MLJ 999.] 0% 0 000 3000 5000 7000 9000 000 3000 Numbr of Unlabld Documnts [From Ngam t all MLJ 999.] Summary EM s a gnral algorthmc framwork for nfrnc wth hddn random varabls It taks a smpl form for mxtur modls altrnatng btwn stmatng fractonal mmbrshps and usng ths n maxmum lklhood calculatons. Gnral drvaton through th Q(p,p ) functon s applcabl n mor complx modls. Mxtur modl asly gnralzs to captur sm-suprvsd larnng 6