Learning Graphical Models

Similar documents
CS 2750 Machine Learning Lecture 5. Density estimation. Density estimation

Probabilistic Graphical Models

Parameter Estimation

Machine Learning. Tutorial on Basic Probability. Lecture 2, September 15, 2006

Lecture 3 Naïve Bayes, Maximum Entropy and Text Classification COSI 134

Lecture 3 Probability review (cont d)

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018

Nonparametric Density Estimation Intro

STK4011 and STK9011 Autumn 2016

2. Independence and Bernoulli Trials

Part I: Background on the Binomial Distribution

Econometric Methods. Review of Estimation

Parametric Density Estimation: Bayesian Estimation. Naïve Bayes Classifier

Lecture 9. Some Useful Discrete Distributions. Some Useful Discrete Distributions. The observations generated by different experiments have

2SLS Estimates ECON In this case, begin with the assumption that E[ i

Point Estimation: definition of estimators

Generative classification models

Bayes (Naïve or not) Classifiers: Generative Approach

STK3100 and STK4100 Autumn 2017

STK3100 and STK4100 Autumn 2018

Probability and Statistics. What is probability? What is statistics?

Unsupervised Learning and Other Neural Networks

Random Variables. ECE 313 Probability with Engineering Applications Lecture 8 Professor Ravi K. Iyer University of Illinois

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

ρ < 1 be five real numbers. The

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Chapter 3 Sampling For Proportions and Percentages

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

Chapter 4 Multiple Random Variables

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Channel Models with Memory. Channel Models with Memory. Channel Models with Memory. Channel Models with Memory

Maximum Likelihood Estimation

D KL (P Q) := p i ln p i q i

Training Sample Model: Given n observations, [[( Yi, x i the sample model can be expressed as (1) where, zero and variance σ

Entropy, Relative Entropy and Mutual Information

Bayes Estimator for Exponential Distribution with Extension of Jeffery Prior Information

Chapter 5 Properties of a Random Sample

CHAPTER 6. d. With success = observation greater than 10, x = # of successes = 4, and

Multivariate Transformation of Variables and Maximum Likelihood Estimation

BASIC PRINCIPLES OF STATISTICS

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

Lecture Notes Types of economic variables

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Bayesian belief networks

Module 7. Lecture 7: Statistical parameter estimation

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I

1 Solution to Problem 6.40

Summary of the lecture in Biostatistics

MIMA Group. Chapter 4 Non-Parameter Estimation. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

CS 3710 Advanced Topics in AI Lecture 17. Density estimation. CS 3710 Probabilistic graphical models. Administration

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Chapter 14 Logistic Regression Models

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

Application of Generating Functions to the Theory of Success Runs

Lecture 9: Tolerant Testing

Continuous Random Variables: Conditioning, Expectation and Independence

Functions of Random Variables

Artificial Intelligence Learning of decision trees

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Analysis of Variance with Weibull Data

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

Lecture 3. Sampling, sampling distributions, and parameter estimation

Parametric Density Estimation: Bayesian Estimation. Naïve Bayes Classifier

Qualifying Exam Statistical Theory Problem Solutions August 2005

LECTURE - 4 SIMPLE RANDOM SAMPLING DR. SHALABH DEPARTMENT OF MATHEMATICS AND STATISTICS INDIAN INSTITUTE OF TECHNOLOGY KANPUR

Special Instructions / Useful Data

STRONG CONSISTENCY FOR SIMPLE LINEAR EV MODEL WITH v/ -MIXING

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

Point Estimation: definition of estimators

Bayesian Classifier. v MAP. argmax v j V P(x 1,x 2,...,x n v j )P(v j ) ,..., x. x) argmax. )P(v j

Section 2 Notes. Elizabeth Stone and Charles Wang. January 15, Expectation and Conditional Expectation of a Random Variable.

= 2. Statistic - function that doesn't depend on any of the known parameters; examples:

Median as a Weighted Arithmetic Mean of All Sample Observations

Lecture 02: Bounding tail distributions of a random variable

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

A BAYESIAN APPROACH TO SHRINKAGE ESTIMATORS

Chapter 8: Statistical Analysis of Simulated Data

Density estimation II

Introduction to local (nonparametric) density estimation. methods

Random Variables and Probability Distributions

Chapter 5 Properties of a Random Sample

Comparison of Parameters of Lognormal Distribution Based On the Classical and Posterior Estimates

i 2 σ ) i = 1,2,...,n , and = 3.01 = 4.01

Statistical modelling and latent variables (2)

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

7. Joint Distributions

Pr[X (p + t)n] e D KL(p+t p)n.

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Bayesian belief networks

Can we take the Mysticism Out of the Pearson Coefficient of Linear Correlation?

Periodic Table of Elements. EE105 - Spring 2007 Microelectronic Devices and Circuits. The Diamond Structure. Electronic Properties of Silicon

Continuous Distributions

d dt d d dt dt Also recall that by Taylor series, / 2 (enables use of sin instead of cos-see p.27 of A&F) dsin

A New Family of Transformations for Lifetime Data

Block-Based Compact Thermal Modeling of Semiconductor Integrated Circuits

THE ROYAL STATISTICAL SOCIETY 2010 EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA MODULE 2 STATISTICAL INFERENCE

Transcription:

School of omuter Scece Statstcal learg wth basc grahcal models robablstc Grahcal Models -78 Lecture 7 Oct 8 7 Recetor A Recetor B ase ase D ase E 3 4 5 Erc g Gee G TF F 6 7 Gee H 8 Readg: J-ha. 56 F-ha. 8 Learg Grahcal Models The goal: Ge set of deedet samles assgmets of radom arables fd the best the most lely? Bayesa etwor both DAG ad Ds E B E B R A R A Structural learg BEARTFFTF BEARTFTTF.. BEARFTTTF E B e b e b e b e b A EB.9...8.9...99 arameter learg Erc g

Learg Grahcal Models Scearos: comletely obsered GMs drected udrected artally or uobsered GMs drected udrected a oe research toc Estmato rcles: Mamal lelhood estmato MLE Bayesa estmato Mamal codtoal lelhood Mamal "Marg" We use learg as a ame for the rocess of estmatg the arameters ad some cases the toology of the etwor from data. Erc g 3 Score-based aroach Data ossble structures E B Lear arameters Score struc/aram R A Mamum lelhood 5 M M R E B A Bayesa odtoal lelhood Marg 3 5. Erc g 4

Z ML arameter Est. for comletely obsered GMs of ge structure The data: {z z z 3 3... z } Erc g 5 arameter Learg Assume G s ow ad fed from eert desg from a termedate outcome of terate structure learg Goal: estmate from a dataset of deedet detcally dstrbuted d trag cases D {... }. I geeral each trag case... M s a ector of M alues oe er ode the model ca be comletely obserable.e. eery elemet s ow o mssg alues o hdde arables or artally obserable.e. s.t. s ot obsered. I ths lecture we cosder learg arameters for a sgle ode. Frequetst s. Bayesa estmate Erc g 6 3

Bayesa arameter Estmato Bayesas treat the uow arameters as a radom arable whose dstrbuto ca be ferred usg Bayes rule: D D D Ths crucal equato ca be wrtte words: For d data the lelhood s The ror. ecodes our ror owledge about the doma therefore Bayesa estmato has bee crtczed for beg "subecte" emrcal Bayes ft ror from "trag" data Erc g 7 D D d lelhood ror osteror margal lelhood D Frequetst arameter Estmato Two eole wth dfferet rors wll ed u wth dfferet estmates D. Frequetsts dsle ths subectty. Frequetsts th of the arameter as a fed uow costat ot a radom arable. Hece they hae to come u wth dfferet "obecte" estmators ways of comutg from data stead of usg Bayes rule. These estmators hae dfferet roertes such as beg ubased mmum arace etc. A ery oular estmator s the mamum lelhood estmator whch s smle ad has good statstcal roertes. Erc g 8 4

Dscusso or ths s the roblem! Bayesas ow t Erc g 9 Mamum Lelhood Estmato The log-lelhood s mootocally related to the lelhood: The Idea uderlyg mamum lelhood estmato MLE: c the settg of arameters most lely to hae geerated the data we saw: arg ma ; D roblem of MLE: Oerfttg: meas that "some of the relatoshs that aear statstcally sgfcat are actually ust ose. It occurs whe the comlety of the statstcal model s too great for the amout of data that you hae" Ofte the MLE oerfts the trag data so t s commo to mamze a regularzed loglelhood stead: ' Isuffcet trag data ca lead to surous estmator e.g. certa ossble alues are ot obsered due to data sarsty so t s commo to smooth the estmated arameter l ; D log D log ML l arg ma l ; D c Erc g 5

Eamle: Beroull model Data: We obsered d co tossg: D{ } Reresetato: Bary r.: Model: for for { } How to wrte the lelhood of a sgle obserato? The lelhood of datasetd{ }:... #head #tals Erc g MLE Obecte fucto: h t l ; D log D log log log We eed to mamze ths w.r.t. Tae derates wrt h h l h h h MLE or MLE Frequecy as samle mea Suffcet statstcs The couts h where are suffcet statstcs of data D Erc g 6

Beg a ragmatc frequetst Mamum a osteror MA estmato: arg ma D arg ma l ; D Smoothg wth seudo-couts Recall that for Bomal Dstrbuto we hae What f we tossed too few tmes so that we saw zero head? head We hae ML ad we wll redct that the robablty of seeg a head et s zero!!! The rescue: MA head MLE head MLE head head Where ' s ow as the seudo- magary cout log Erc g 3 tal head ' tal ' head But are we stll obecte? Bayesa estmato for Beroull Beta dstrbuto: Γ β ; B Γ Γ β β β β β Whe s dscrete Γ Γ! osteror dstrbuto of :...... h t β h... t β otce the somorhsm of the osteror to the ror such a ror s called a cougate ror ad β are hyerarameters arameters of the ror ad corresod to the umber of rtual heads/tals seudo couts Erc g 4 7

Bayesa estmato for Beroull co'd osteror dstrbuto of :... h t β h...... Mamum a osteror MA estmato: t β MA arg ma log osteror mea estmato:... Bata arameters ca be uderstood as seudo-couts Bayes ror stregth: Aβ β h t β h D d d A ca be teroerated as the sze of a magary data set from whch we obta the seudo-couts Erc g 5 Effect of ror Stregth Suose we hae a uform ror β/ ad we obsere h 8 t Wea ror A. osteror redcto: h h t 8 '. 5 Strog ror A. osteror redcto: h h t 8 '. 4 Howeer f we hae eough data t washes away the ror. e.g. h t 8. The the estmates uder wea ad strog ror are ad resectely both of whch are close to. Erc g 6 8

How estmators should be used? MA s ot Bayesa ee though t uses a ror sce t s a ot estmate. osder redctg the future. A sesble way s to combe redctos based o all ossble alues of weghted by ther osteror robablty ths s what a Bayesa wll do: ew ew ew ew d d d A frequetst wll tycally use a lug- estmator such as ML/MA: or ew ew ML ew ew MA The Bayesa estmate wll collase to MA for cocetrated osteror Erc g 7 Frequetst s. Beyesa Ths s a theologcal war. Adatages of Bayesa aroach: Mathematcally elegat. Wors well whe amout of data s much less tha umber of arameters e.g. oe-shot learg. Easy to do cremetal sequetal learg. a be used for model selecto ma lelhood wll always c the most comle model. Adatages of frequetst aroach: Mathematcally/ comutatoally smler. "obecte" ubased arat to rearameterzato As the two aroaches become the same: D D δ ML Erc g 8 9

Smlest GMs: the buldg blocs Desty estmato arametrc ad oarametrc methods Regresso Lear codtoal mture oarametrc lassfcato Geerate ad dscrmate aroach Q σ Y Q Erc g 9 lates A late s a macro that allows subgrahs to be relcated For d echageable data the lelhood s D We ca rereset ths as a Bayes et wth odes. The rules of lates are smle: reeat eery structure a bo a umber of tmes ge by the teger the corer of the bo e.g. udatg the late de arable e.g. as you go. Dulcate eery arrow gog to the late ad eery arrow leag the late by coectg the arrows to each coy of the structure. Erc g

Erc g Beroull dstrbuto: Ber Multomal dstrbuto: Mult Multomal dcator arable:. w.. ad ] [ where [...6] [...6] 6 5 4 3 de the dce - face} where { 6 5 4 3 6 5 4 3 Dscrete Dstrbutos for for Erc g Multomal dstrbuto: Mult out arable: Dscrete Dstrbutos where M!!!!!!!! L L L

Erc g 3 Eamle: multomal model Data: We obsered d de rolls -sded: D{5 3} Reresetato: Ut bass ectors: Model: How to wrte the lelhood of a sgle obserato? The lelhood of datasetd{ }: L } th roll the de the de -sde of where {... ad } { where M } { ad w..... 3 GM: Erc g 4 MLE: costraed otmzato wth Lagrage multlers Obecte fucto: We eed to mamze ths subect to the costra ostraed cost fucto wth a Lagrage multler Tae derates wrt Suffcet statstcs The couts are suffcet statstcs of data D D D log log log ; l λ log l λ λ λ λ l MLE MLE or Frequecy as samle mea L

3 Erc g 5 Bayesa estmato: Drchlet dstrbuto: osteror dstrbuto of : otce the somorhsm of the osteror to the ror such a ror s called a cougate ror osteror mea estmato: - - Γ Γ......... d d D 3 GM: Drchlet arameters ca be uderstood as seudo-couts Erc g 6 More o Drchlet ror: Where s the ormalze costat come from? Itegrato by arts Γ s the gamma fucto: For regers Margal lelhood: osteror closed-form: osteror redcte rate: Γ Γ d d L L L Γ dt e t t! Γ }... { r r r r r r r d }... { Dr }... { d

Sequetal Bayesa udatg Start wth Drchlet ror Dr : Obsere ' samles wth suffcet statstcs '. osteror becomes: ' Dr : ' Obsere aother " samles wth suffcet statstcs ". osteror becomes: ' " Dr : ' " So sequetally absorbg data ay order s equalet to batch udate. Erc g 7 Effect of ror Stregth Let be the umber of obsered samles Let A be the umber of "seudo obseratos" ---- the stregth of the ror Let ' /A deote the ror meas The osteror mea s a coe combato of the ror mea ad the MLE: where {... } A A A A A λ ' λ λ A. A MLE Erc g 8 4

Herarchcal Bayesa Models are the arameters for the lelhood are the arameters for the ror. We ca hae hyer-hyer-arameters etc. We sto whe the choce of hyer-arameters maes o dfferece to the margal lelhood; tycally mae hyerarameters costats. Where do we get the ror? Itellget guesses Emrcal Bayes Tye-II mamum lelhood comutg ot estmates of : MLE arg ma Erc g 9 Lmtato of Drchlet ror: Erc g 3 5

The Logstc ormal ror ~ L Σ Σ γ ~ Σ e γ log γ γ e roblem γ log γ e - Log artto Fucto - ormalzato ostat ro: co-arace structure o: o-cougate we wll dscuss how to sole ths later Erc g 3 Logstc ormal Destes Logstc ormal Erc g 3 6

Eamle : uarate-gaussa Data: We obsered d real samles: Model: D{-. -5. 3} / πσ e{ / σ } Log lelhood: l ; D log D log πσ σ GM: 3 MLE: tae derate ad set to zero: l / σ MLE l 4 σ σ σ MLE ML Erc g 33 σ MLE for a multarate-gaussa It ca be show that the MLE for ad Σ s Σ MLE MLE where the scatter matr s T ML ML S T T T ML ML S ML ML M T T M T The suffcet statstcs are Σ ad Σ T. ote that T Σ T may ot be full ra eg. f <D whch case Σ ML s ot ertble Erc g 34 7

Bayesa arameter estmato for a Gaussa There are arous reasos to ursue a Bayesa aroach We would le to udate our estmates sequetally oer tme. We may hae ror owledge about the eected magtude of the arameters. The MLE for Σ may ot be full ra f we do t hae eough data. We wll restrct our atteto to cougate rors. We wll cosder arous cases order of creasg comlety: ow σ uow ow uow σ Uow ad σ Erc g 35 Bayesa estmato: uow ow σ ormal ror: Jot robablty: / πτ e{ τ } / / πσ e σ / πτ e{ / τ } GM: 3 osteror: where πσ~ / e{ ~ / σ } ~ ~ / / σ / τ / / σ / τ σ τ ad ~ σ Samle mea σ τ Erc g 36 8

Bayesa estmato: uow ow σ / σ / σ / σ / σ / σ / σ ~ σ σ σ The osteror mea s a coe combato of the ror ad the MLE wth weghts roortoal to the relate ose leels. The recso of the osteror /σ s the recso of the ror /σ lus oe cotrbuto of data recso /σ for each obsered data ot. Sequetally udatg the mea.8 uow σ. ow Effect of sgle data ot σ σ σ σ σ σ Uformate ague/ flat ror σ Erc g 37 Other scearos ow uow λ /σ The cougate ror for λ s a Gamma wth shae a ad rate erse scale b The cougate ror for σ s Ierse-Gamma Uow ad uow σ The cougate ror s ormal-ierse-gamma Sem cougate ror Multarate case: The cougate ror s ormal-ierse-wshart Erc g 38 9

Summary Learg scearos: Data Obecte fucto Frequetst ad Bayesa Learg sgle-ode GM desty estmato Tycal dscrete dstrbuto Tycal cotuous dstrbuto ougate rors Erc g 39