Learning of Graphical Models Parameter Estimation and Structure Learning

Similar documents
EE 6885 Statistical Pattern Recognition

Density estimation III.

Density estimation. Density estimations. CS 2750 Machine Learning. Lecture 5. Milos Hauskrecht 5329 Sennott Square

Comparison of the Bayesian and Maximum Likelihood Estimation for Weibull Distribution

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Machine Learning. Hidden Markov Model. Eric Xing / /15-781, 781, Fall Lecture 17, March 24, 2008

Density estimation III.

Solution. The straightforward approach is surprisingly difficult because one has to be careful about the limits.

Other Topics in Kernel Method Statistical Inference with Reproducing Kernel Hilbert Space

Three Main Questions on HMMs

Fundamentals of Speech Recognition Suggested Project The Hidden Markov Model

Brownian Motion and Stochastic Calculus. Brownian Motion and Stochastic Calculus

EE 6885 Statistical Pattern Recognition

Continuous Time Markov Chains

Cyclically Interval Total Colorings of Cycles and Middle Graphs of Cycles

The Poisson Process Properties of the Poisson Process

8. Queueing systems lect08.ppt S Introduction to Teletraffic Theory - Fall

STK4011 and STK9011 Autumn 2016

Least Squares Fitting (LSQF) with a complicated function Theexampleswehavelookedatsofarhavebeenlinearintheparameters

Cyclone. Anti-cyclone

Partial Molar Properties of solutions

Density estimation III. Linear regression.

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Real-time Classification of Large Data Sets using Binary Knapsack

Fault Tolerant Computing. Fault Tolerant Computing CS 530 Probabilistic methods: overview

Point Estimation: definition of estimators

Chapter 8. Simple Linear Regression

Solution set Stat 471/Spring 06. Homework 2

Spike-and-Slab Dirichlet Process Mixture Models

Speech, NLP and the Web

Statistics: Part 1 Parameter Estimation

Linear Regression Linear Regression with Shrinkage

COMPARISON OF ESTIMATORS OF PARAMETERS FOR THE RAYLEIGH DISTRIBUTION

Pattern Classification (III) & Pattern Verification

To Estimate or to Predict

An Introduction to. Support Vector Machine

International Journal Of Engineering And Computer Science ISSN: Volume 5 Issue 12 Dec. 2016, Page No.

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model

Broadband Constraint Based Simulated Annealing Impedance Inversion

14. Poisson Processes

Optimal Eye Movement Strategies in Visual Search (Supplement)

A New Generalized Gronwall-Bellman Type Inequality

ASYMPTOTIC BEHAVIOR OF SOLUTIONS OF DISCRETE EQUATIONS ON DISCRETE REAL TIME SCALES

Bayes (Naïve or not) Classifiers: Generative Approach

CS344: Introduction to Artificial Intelligence

Unit 10. The Lie Algebra of Vector Fields

Moments of Order Statistics from Nonidentically Distributed Three Parameters Beta typei and Erlang Truncated Exponential Variables

Quantum Mechanics II Lecture 11 Time-dependent perturbation theory. Time-dependent perturbation theory (degenerate or non-degenerate starting state)

On Metric Dimension of Two Constructed Families from Antiprism Graph

Modeling and Predicting Sequences: HMM and (may be) CRF. Amr Ahmed Feb 25

EE 6885 Statistical Pattern Recognition

Midterm Exam. Tuesday, September hour, 15 minutes

ANSWERS TO ODD NUMBERED EXERCISES IN CHAPTER 2

Geometric Modeling

FALL HOMEWORK NO. 6 - SOLUTION Problem 1.: Use the Storage-Indication Method to route the Input hydrograph tabulated below.

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

(This summarizes what you basically need to know about joint distributions in this course.)

Determination of Antoine Equation Parameters. December 4, 2012 PreFEED Corporation Yoshio Kumagae. Introduction

Filtrage particulaire et suivi multi-pistes Carine Hue Jean-Pierre Le Cadre and Patrick Pérez

Upper Bound For Matrix Operators On Some Sequence Spaces

Fall 2010 Graduate Course on Dynamic Learning

Advanced Machine Learning

Efficient Estimators for Population Variance using Auxiliary Information

Foundations of State Estimation Part II

THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Series A, OF THE ROMANIAN ACADEMY Volume 10, Number 2/2009, pp

Some Probability Inequalities for Quadratic Forms of Negatively Dependent Subgaussian Random Variables

IMPROVED PORTFOLIO OPTIMIZATION MODEL WITH TRANSACTION COST AND MINIMAL TRANSACTION LOTS

Machine Learning. Introduction to Regression. Lecture 3, September 19, Reading: Chap. 3, CB

Maximum Likelihood Estimation

PubH 7440 Spring 2010 Midterm 2 April

As evident from the full-sample-model, we continue to assume that individual errors are identically and

Collocation Method for Nonlinear Volterra-Fredholm Integral Equations

θ = θ Π Π Parametric counting process models θ θ θ Log-likelihood: Consider counting processes: Score functions:

Chebyshev Polynomials for Solving a Class of Singular Integral Equations

Point Estimation: definition of estimators

Solution of Impulsive Differential Equations with Boundary Conditions in Terms of Integral Equations

Bilinear estimation of pollution source profiles in receptor models. Clifford H Spiegelman Ronald C. Henry NRCSE

Discrete Markov Process. Introduction. Example: Balls and Urns. Stochastic Automaton. INTRODUCTION TO Machine Learning 3rd Edition

Multivariate Transformation of Variables and Maximum Likelihood Estimation

EP2200 Queuing theory and teletraffic systems. 3rd lecture Markov chains Birth-death process - Poisson process. Viktoria Fodor KTH EES

(1) Cov(, ) E[( E( ))( E( ))]

Introduction ( Week 1-2) Course introduction A brief introduction to molecular biology A brief introduction to sequence comparison Part I: Algorithms

Real-Time Systems. Example: scheduling using EDF. Feasibility analysis for EDF. Example: scheduling using EDF

CS 2750 Machine Learning Lecture 5. Density estimation. Density estimation

ASYMPTOTIC APPROXIMATIONS FOR DISTRIBUTIONS OF TEST STATISTICS OF PROFILE HYPOTHESES FOR SEVERAL GROUPS UNDER NON-NORMALITY

Chain Rules for Entropy

4. THE DENSITY MATRIX

D KL (P Q) := p i ln p i q i

Outline. simplest HMM (1) simple HMMs? simplest HMM (2) Parameter estimation for discrete hidden Markov models

Regression Approach to Parameter Estimation of an Exponential Software Reliability Model

Unscented Transformation Unscented Kalman Filter

Feature Space. 4. Feature Space and Feature Extraction. Example: DNA. Example: Faces (appearance-based)

John Geweke a and Gianni Amisano b a Departments of Economics and Statistics, University of Iowa, USA b European Central Bank, Frankfurt, Germany

Final Exam Applied Econometrics

Lecture 3. Sampling, sampling distributions, and parameter estimation

ONLINE APPENDIX A: Connection between the MRF and discrete-time Markov Chain

Qualifying Exam Statistical Theory Problem Solutions August 2005

i 2 σ ) i = 1,2,...,n , and = 3.01 = 4.01

Clustering (Bishop ch 9)

Least squares and motion. Nuno Vasconcelos ECE Department, UCSD

Transcription:

Learg of Grahal Models Parameer Esmao ad Sruure Learg e Fukumzu he Isue of Sasal Mahemas Comuaoal Mehodology Sasal Iferee II

Work wh Grahal Models Deermg sruure Sruure gve by modelg d e.g. Mxure model HMM e Sruure learg sruure Par IV Parameer esmao Parameer gve by some kowledge Parameer esmao wh daa suh as MLE or Bayesa esmao Par IV Iferee Comuao of oseror ad margal robables Already see Par III. a b a \ a 2 3.2.3.4 2.8.7.6 arameer 2

Parameer Esmao 3

Sasal Esmao Esmao from daa Sasal model wh a arameer: : arameer I..d. Daa: D 2 Maxmum lkelhood esmao or arg max L arg max l L Lkelhood fuo l log L log Log lkelhood fuo 4

5 Sasal Esmao Bayesa esmao Dsrbuo of he arameer s esmaed Pror robably oseror robably D Bayes rule gves Maxmum a oseror MAP esmao d D D D arg max D MAP

Cogey able ML esmao for dsree varables b a {... M} b {... L} D..d. samle a b a b a d e a a 2 3 2 8 4 2 6 9 4 : umber of ous a 2 3 2 3 2 2 22 22 Esmao of robables ML esmaor 6

Bayesa Esmao: Dsree Case Bayesa esmao for dsree varables Model: a b a Pror: o b Δ ML Δ ML Δ { R } Lkelhood: D a b Mulomal Bayesa esmao: D D D Δ D D d Δ d hs egral s dfful o omue geeral. 7

Drhle Dsrbuo Drhle dsrbuo Desy fuo of -dmesoal Drhle dsrbuo Dr α α Γ α Γ α α α o Δ { R } where α α : arameer α > Γα : Gamma fuo α Γ α e d Γ α α Γ α for α > Γ! for a osve eger. 8

Drhle Dsrbuo α 622 α 375 α 234 α 626 Exeao E[ ] α α he mea o s rooroal o he veor α. he mea o s a sable o.e. dffereal ad may be eher maxmum or mmum. 9

Drhle Pror Drhle dsrbuo works as a ror o mulomal dsrbuo Poseror s also Drhle -- ougae ror k kk Dr α D Dr ~ α k Dr α d Δ k ~ α α α * α works as a ror ou. MAP esmaor Proof of * MAP ~ α ~ α Dr α D α α Lα α By he ormalzao he rgh had sde mus be Dr α. ~

EM Algorhm for Models wh Hdde Varables

ML Esmao wh Hdde Varable Sasal model wh hdde varables Suose we a assume hdde uobservable varables addo o observable varables : observable varable : hdde varable : arameer We have daa oly for observable varables: he ML esmao mus be doe wh D 2 log log Bu hs maxmzao s ofe dfful by oleary w.r.. 2

3 3 ML Esmao wh Hdde Varable Examle: Gaussa mxure model ML esmao ad are ouled dfful o solve aalyally. x x φ x φ akes values {...}: omoe... Wh hdde varable: Margal of : log max log max φ

Esmao wh Comlee Daa Comlee daa Suose... are kow. D { } : omlee daa ML esmao wh D s ofe easer ha esmao wh D. max l D where l D log Comlee log lkelhood 4

Esmao wh Comlee Daa Examle: Mxure of Gaussa Redefe he hdde varable by dmesoal bary veor: { aφ x a a } a a... akes values { } lass oe: aφ x a a a 5

Esmao wh Comlee Daa ML esmao wh omlee daa: log log { φ } { log logφ } ad are deouled hey a be maxmzed searaely. max max log logφ sub. o Maxmzao s easy. Bu he omlee daa s o avalable rae! 6

Exeed Comlee Log Lkelhood Use exeed omlee log lkelhood sead of omlee log lkelhood. Comlee log lkelhood l D log Exeed omlee log lkelhood Suose we have a urre guess Use exeao w.r.. l D log Maxmze of l D 7

EM Algorhm Ialzao Ialze by some mehod.. Reea he followg ses ul sog rero s sasfed. E-se Comue he exeed omlee log lkelhood M-se Maxmze of l D arg max l D l D Comuaoal dffuly of M-se deeds o a model 8

9 9 EM Algorhm for Gaussa Mxure Comlee log lkelhood Exeed omlee log lkelhood E-se { } D log log φ l ] [ E τ φ φ Rao of orbuo of o he -h omoe. { } D log log φ τ l

2 2 EM Algorhm for Gaussa Mxure M-se τ τ τ τ τ weghed mea weghed ovarae marx Proof omed. Exerse

EM Algorhm for Gaussa Mxure Meag of τ : uobserved τ E 2 3 2 3 2 3 2 3..7.2.2..2.5.8..5.5 SUM.3..6.7 2

Proeres of EM Algorhm EM overges ukly for may roblems. Mooo rease of lkelhood of s guaraeed dsussed laer. EM may be raed by loal oma. he soluo deeds srogly o he al sae. EM algorhm a be aled o ay model wh hdde varables. Mssg value e. 22

Demosrao Web se for Gaussa mxure demo: h://www.euros.as.go./~akaho/mxureem.hml 23

heoreal Jusfao of EM 24

heoreal Jusfao of EM EM as lkelhood maxmzao he goal s o maxmze he omlee log lkelhood o he exeed omlee log lkelhood. : arbrary.d.f. of may deed o. Defe a auxlary fuo L by L log. heorem E-se: M-se: arg max L arg max L l D ad omue Alerag omzao w.r.. ad. 25

26 heoreal Jusfao of EM Prooso L ad lkelhood of Proof For ay ad he log lkelhood of s deomosed as L l log log log log L L l L l I arular for all ad ad he eualy holds f ad oly f.

27 27 heoreal Jusfao of EM Prooso 2 L ad exeed omlee lkelhood roof L log l log L log log l log log

heoreal Jusfao of EM Proof of heorem E-se: From Prooso l L L deede of maxmze arg max L mmze M-se: From Prooso 2 L M-se s l os. w.r.. max L 28

heoreal Jusfao of EM Mooo rease of lkelhood by EM heorem l l for all. Proof l L E-se Pro. L l M-se Pro. 29

Remarks o EM Algorhm EM always reases he lkelhood of observable varables bu here are o heoreal guaraees of global maxmzao. I geeral a overge oly o a loal maxmum. here s a suffe odo of overgee by Wu 983. Praally EM overges very ukly. For Gaussa mxure model If he mea ad varae are s arameers he lkelhood fuo a ake a arbrary large value. here s o global maxmum of lkelhood. EM ofe fds a reasoable loal omum by a good hoe of alzao. he resuls deed muh o he alzao. Furher readgs: he EM Algorhm ad Exesos MLahla & rsha 997 Fe Mxure Models MLahla & Peel 2 3

EM Algorhm for Hdde Markov Model 3

32 32 Maxmum Lkelhood for HMM Paramer model of Gaussa HMM 2 2 A ; y φ Gaussa wh mea ad ovarae...... A arameer: raso marx al robably y A φ L max log s dfful.

33 33 EM for HMM Comlee lkelhood log A φ log A m log2 2 log de 2 2 A log δ δ δ m log2 2 log de 2 2 δ log l

34 EM for HMM Exeed omlee lkelhood Suose we already have a esmae : dex for erao 34 log l δ δ δ δ δ δ I reures ξ γ ad a be omued by he forward-bakward algorhm.

35 EM for HMM Baum-Welh Algorhm E-se Forward-bakward o omue ad. Exeed omlee log lkelhood M-se l A log ξ γ m log2 2 log de 2 2 γ γ k k A γ ξ ξ ξ γ γ γ γ ξ.f. EM for Gaussa mxure γ

Summary: Parameer learg Dsree varables whou hdde varables Maxmum lkelhood esmao s easy by freuees. Bayesa esmao s ofe doe wh Drhle ror. Dsree varables wh hdde varables Maxmum lkelhood esmao a be doe wh EM algorhm. Bayesa aroah omuaoal dffuly. varaoal mehod ad so o. 36