An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation
|
|
- Hugo York
- 5 years ago
- Views:
Transcription
1 An Experment/Some Intuton I have three cons n my pocket, (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads For each tral I do the followng: Frst I toss Con 0 If Con 0 turns up heads, I toss con 1 three tmes If Con 0 turns up tals, I toss con 2 three tmes I don t tell you whether Con 0 came up heads or tals, or whether Con 1 or 2 was tossed three tmes, but I do tell you how many heads/tals are seen at each tral you see the followng sequence: HHH, T T T, HHH, T T T, HHH What would you estmate as the values for λ, p 1 and p 2? 1 3 Overvew The EM algorthm n general form The EM algorthm for hdden markov models (brute force) The EM algorthm for hdden markov models (dynamc programmng) Maxmum Lkelhood Estmaton We have data ponts x 1, x 2,... x n drawn from some (fnte or countable) set X We have a parameter vector Θ We have a parameter space Ω We have a dstrbuton P (x Θ) for any Θ Ω, such that P (x Θ) 1 and P (x Θ) 0 for all x x X We assume that our data ponts x 1, x 2,... x n are drawn at random (ndependently, dentcally dstrbuted) from a dstrbuton P (x Θ ) for some Θ Ω 2 4
2 Log-Lkelhood We have data ponts x 1, x 2,... x n drawn from some (fnte or countable) set X We have a parameter vector Θ, and a parameter space Ω We have a dstrbuton P (x Θ) for any Θ Ω The lkelhood s n Lkelhood(Θ) P (x 1, x 2,... x n Θ) P (x Θ) 1 Maxmum Lkelhood Estmaton Gven a sample x 1, x 2,... x n, choose Θ ML argmax Θ Ω L(Θ) argmax Θ Ω log P (x Θ) For example, take the con example: say x 1... x n has Count(H) heads, and (n Count(H)) tals L(Θ) log (Θ Count(H) (1 Θ) n Count(H)) Count(H) log Θ + (n Count(H)) log(1 Θ) The log-lkelhood s L(Θ) log Lkelhood(Θ) n log P (x Θ) 1 We now have Θ ML Count(H) n 5 7 A Frst Example: Con Tossng X {H,T}. Our data ponts x 1, x 2,... x n are a sequence of heads and tals, e.g. HHTTHHHTHH Parameter vector Θ s a sngle parameter,.e., the probablty of con comng up heads Parameter space Ω [0, 1] Dstrbuton P (x Θ) s defned as { Θ If x H P (x Θ) 1 Θ If x T A Second Example: Probablstc Context-Free Grammars X s the set of all parse trees generated by the underlyng context-free grammar. Our sample s n trees T 1... T n such that each T X. R s the set of rules n the context free grammar N s the set of non-termnals n the grammar Θ r for r R s the parameter for rule r Let R(α) R be the rules of the form α β for some α The parameter space Ω s the set of Θ [0, 1] R such that for all α N r R(α) Θ r 1 6 8
3 We have P (T Θ) Θr Count(T,r) r R Multnomal Dstrbutons X s a fnte set, e.g., X {dog, cat, the, saw} where Count(T, r) s the number of tmes rule r s seen n the tree T log P (T Θ) r R Count(T, r) log Θ r Our sample x 1, x 2,... x n s drawn from X e.g., x 1, x 2, x 3 dog, the, saw The parameter Θ s a vector n R m where m X e.g., Θ 1 P (dog), Θ 2 P (cat), Θ 3 P (the), Θ 4 P (saw) The parameter space s m Ω {Θ : Θ 1 and, Θ 0} 1 If our sample s x 1, x 2, x 3 dog, the, saw, then L(Θ) log P (x 1, x 2, x 3 dog, the, saw) log Θ 1 +log Θ 3 +log Θ Maxmum Lkelhood Estmaton for PCFGs Models wth Hdden Varables We have log P (T Θ) r R Count(T, r) log Θ r Now say we have two sets X and Y, and a jont dstrbuton P (x, y Θ) where Count(T, r) s the number of tmes rule r s seen n the tree T And, L(Θ) log P (T Θ) Solvng Θ ML argmax Θ Ω L(Θ) gves Count(T, r) Θ r s R(α) Count(T, s) where r s of the form α β for some β Count(T, r) log Θ r r R If we had fully observed data, (x, y ) pars, then L(Θ) log P (x, y Θ) If we have partally observed data, x examples, then L(Θ) log P (x Θ) log P (x, y Θ) y Y 10 12
4 The EM (Expectaton Maxmzaton) algorthm s a method for fndng Θ ML argmax Θ log P (x, y Θ) y Y Varous probabltes can be calculated, for example: P (x THT, y H Θ) λp 1 (1 p 1 ) 2 P (x THT, y T Θ) (1 λ)p 2 (1 p 2 ) 2 P (x THT Θ) P (x THT, y H Θ) +P (x THT, y T Θ) λp 1 (1 p 1 ) 2 + (1 λ)p 2 (1 p 2 ) 2 P (y H x THT, Θ) P (x THT, y H Θ) P (x THT Θ) λp 1 (1 p 1 ) 2 λp 1 (1 p 1 ) 2 + (1 λ)p 2 (1 p 2 ) e.g., n the three cons example: Y {H,T} X {HHH,TTT,HTT,THH,HHT,TTH,HTH,THT} Θ {λ, p 1, p 2 } and where and P (x, y Θ) P (y Θ)P (x y, Θ) P (y Θ) { λ If y H 1 λ If y T { p h P (x y, Θ) 1 (1 p 1 ) t If y H p h 2(1 p 2 ) t If y T where h number of heads n x, t number of tals n x Varous probabltes can be calculated, for example: P (x THT, y H Θ) λp 1 (1 p 1 ) 2 P (x THT, y T Θ) (1 λ)p 2 (1 p 2 ) 2 P (x THT Θ) P (x THT, y H Θ) +P (x THT, y T Θ) λp 1 (1 p 1 ) 2 + (1 λ)p 2 (1 p 2 ) 2 P (y H x THT, Θ) P (x THT, y H Θ) P (x THT Θ) λp 1 (1 p 1 ) 2 λp 1 (1 p 1 ) 2 + (1 λ)p 2 (1 p 2 )
5 Varous probabltes can be calculated, for example: P (x THT, y H Θ) λp 1 (1 p 1 ) 2 P (x THT, y T Θ) (1 λ)p 2 (1 p 2 ) 2 Fully observed data mght look lke: ( HHH, H), ( T T T, T ), ( HHH, H), ( T T T, T ), ( HHH, H) P (x THT Θ) P (x THT, y H Θ) +P (x THT, y T Θ) λp 1 (1 p 1 ) 2 + (1 λ)p 2 (1 p 2 ) 2 In ths case maxmum lkelhood estmates are: λ 3 5 P (y H x THT, Θ) P (x THT, y H Θ) P (x THT Θ) λp 1 (1 p 1 ) 2 λp 1 (1 p 1 ) 2 + (1 λ)p 2 (1 p 2 ) 2 p p Varous probabltes can be calculated, for example: P (x THT, y H Θ) λp 1 (1 p 1 ) 2 P (x THT, y T Θ) (1 λ)p 2 (1 p 2 ) 2 Partally observed data mght look lke: HHH, T T T, HHH, T T T, HHH P (x THT Θ) P (x THT, y H Θ) +P (x THT, y T Θ) λp 1 (1 p 1 ) 2 + (1 λ)p 2 (1 p 2 ) 2 How do we fnd the maxmum lkelhood parameters? P (y H x THT, Θ) P (x THT, y H Θ) P (x THT Θ) λp 1 (1 p 1 ) 2 λp 1 (1 p 1 ) 2 + (1 λ)p 2 (1 p 2 )
6 Partally observed data mght look lke: HHH, T T T, HHH, T T T, HHH If current parameters are λ, p 1, p 2 P (y H x HHH ) P (y H x TTT ) 21 P ( HHH, H) P ( HHH, H) + P ( HHH, T) λp 3 1 λp (1 λ)p 3 2 P ( TTT, H) P ( TTT, H) + P ( TTT, T) λ(1 p 1 ) 3 λ(1 p 1 ) 3 + (1 λ)(1 p 2 ) 3 After fllng n hdden varables for each example, partally observed data mght look lke: ( HHH, H) P (y H HHH) ( HHH, T ) P (y T HHH) ( TTT, H) P (y H TTT) ( TTT, T ) P (y T TTT) ( HHH, H) P (y H HHH) ( HHH, T ) P (y T HHH) ( TTT, H) P (y H TTT) ( TTT, T ) P (y T TTT) ( HHH, H) P (y H HHH) ( HHH, T ) P (y T HHH) If current parameters are λ, p 1, p 2 P (y H x HHH ) P (y H x TTT ) If λ 0.3, p 1 0.3, p 2 0.6: λp 3 1 λp (1 λ)p 3 2 λ(1 p 1 ) 3 λ(1 p 1 ) 3 + (1 λ)(1 p 2 ) 3 P (y H x HHH ) P (y H x TTT ) New Estmates: p 1 p 2 ( HHH, H) P (y H HHH) ( HHH, T ) P (y T HHH) ( TTT, H) P (y H TTT) ( TTT, T ) P (y T TTT) λ
7 : Summary Begn wth parameters λ 0.3, p 1 0.3, p Fll n hdden varables, usng P (y H x HHH ) P (y H x TTT ) Iteraton λ p 1 p 2 p 1 p 2 p 3 p 4 p The con example for { HHH, T T T, HHH, T T T, HHH }. λ s now 0.4, ndcatng that the con-tosser has probablty 0.4 of selectng the tal-based con. Re-estmate parameters to be λ , p , p Iteraton λ p 1 p 2 p 1 p 2 p 3 p The con example for y { HHH, T T T, HHH, T T T }. The soluton that EM reaches s ntutvely correct: the con-tosser has two cons, one whch always shows up heads, the other whch always shows tals, and s pckng between them wth equal probablty (λ 0.5). The posteror probabltes p show that we are certan that con 1 (tal-based) generated y 2 and y 4, whereas con 2 generated y 1 and y 3. Iteraton λ p 1 p 2 p 1 p 2 p 3 p The con example for y { HHT, T T T, HHH, T T T }. EM selects a tals-only con, and a con whch s heavly heads-based (p ). It s certan that y 1 and y 3 were generated by con 2, as they contan heads. y 2 and y 4 could have been generated by ether con, but con 1 s far more lkely
8 Iteraton λ p 1 p 2 p 1 p 2 p 3 p The con example for y { HHH, T T T, HHH, T T T }, wth p 1 and p 2 ntalsed to the same value. EM s stuck at a saddle pont Iteraton λ p 1 p 2 p 1 p 2 p 3 p The con example for y { HHH, T T T, HHH, T T T }. If we ntalse p 1 and p 2 to be a small amount away from the saddle pont p 1 p 2, the algorthm dverges from the saddle pont and eventually reaches the global maxmum The EM Algorthm Iteraton λ p 1 p 2 p 1 p 2 p 3 p The con example for y { HHH, T T T, HHH, T T T }. If we ntalse p 1 and p 2 to be a small amount away from the saddle pont p 1 p 2, the algorthm dverges from the saddle pont and eventually reaches the global maxmum. Θ t s the parameter vector at t th teraton Choose Θ 0 (at random, or usng varous heurstcs) Iteratve procedure s defned as Θ t argmax Θ Q(Θ, Θ t 1 ) where Q(Θ, Θ t 1 ) P (y x, Θ t 1 ) log P (x, y Θ) y Y 30 32
9 The EM Algorthm Iteratve procedure s defned as Θ t argmax Θ Q(Θ, Θ t 1 ), where Q(Θ, Θ t 1 ) P (y x, Θ t 1 ) log P (x, y Θ) Key ponts: y Y Intuton: fll n hdden varables y accordng to P (y x, Θ) EM s guaranteed to converge to a local maxmum, or saddle-pont, of the lkelhood functon In general, f argmax Θ log P (x, y Θ) has a smple (analytc) soluton, then argmax Θ P (y x, Θ) log P (x, y Θ) also has a smple (analytc) soluton. y The Structure of Hdden Markov Models Have N states, states 1... N Wthout loss of generalty, take N to be the fnal or stop state Have an alphabet K. For example K {a, b} Parameter π for 1... N s probablty of startng n state Parameter a,j for 1... (N 1), and j 1... N s probablty of state j followng state Parameter b (o) for 1... (N 1), and o K s probablty of state emttng symbol o Overvew The EM algorthm n general form The EM algorthm for hdden markov models (brute force) The EM algorthm for hdden markov models (dynamc programmng) An Example Take N 3 states. States are {1, 2, 3}. Fnal state s state 3. Alphabet K {the, dog}. Dstrbuton over ntal state s π 1 1.0, π 2 0, π 3 0. Parameters a,j are j1 j2 j Parameters b (o) are othe odog
10 A Generatve Process A Hdden Varable Problem Pck the start state s 1 probablty π. to be state for 1... N wth We have an HMM wth N 3, K {e, f, g, h} We see the followng output sequences n tranng data Set t 1 Repeat whle current state s t s not the stop state (N): Emt a symbol o t K wth probablty b st (o t ) Pck the next state s t+1 as state j wth probablty a st,j. t t + 1 e e f f g h h g How would you choose the parameter values for π, a,j, and b (o)? Probabltes Over Sequences An output sequence s a sequence of observatons o 1... o T where each o K e.g. the dog the dog dog the A state sequence s a sequence of states s 1... s T where each s {1... N} e.g HMM defnes a probablty for each state/output sequence par e.g. the/1 dog/2 the/1 dog/2 the/2 dog/1 has probablty Another Hdden Varable Problem We have an HMM wth N 3, K {e, f, g, h} We see the followng output sequences n tranng data e g h e h f h g f g g e h π 1 b 1 (the) a 1,2 b 2 (dog) a 2,1 b 1 (the) a 1,2 b 2 (dog) a 2,2 b 2 (the) a 2,1 b 1 (dog)a 1,3 Formally: ( T ) ( T ) P (s 1... s T, o 1... o T ) π s1 P (s s 1 ) P (o s ) P (N s T ) How would you choose the parameter values for π, a,j, and b (o)? 40
6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm
6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm Overview The EM algorithm in general form The EM algorithm for hidden markov models (brute force) The EM algorithm for hidden markov models (dynamic
More informationCourse 395: Machine Learning - Lectures
Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng
More informationMachine learning: Density estimation
CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More informationRules of Probability
( ) ( ) = for all Corollary: Rules of robablty The probablty of the unon of any two events and B s roof: ( Φ) = 0. F. ( B) = ( ) + ( B) ( B) If B then, ( ) ( B). roof: week 2 week 2 2 Incluson / Excluson
More informationHidden Markov Model Cheat Sheet
Hdden Markov Model Cheat Sheet (GIT ID: dc2f391536d67ed5847290d5250d4baae103487e) Ths document s a cheat sheet on Hdden Markov Models (HMMs). It resembles lecture notes, excet that t cuts to the chase
More informationMATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)
1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons
More informationxp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ
CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and
More informationHidden Markov Models
Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationEM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More informationLimited Dependent Variables
Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages
More informationRetrieval Models: Language models
CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationHidden Markov Models
CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationFinite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin
Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of
More informationSemi-Supervised Learning
Sem-Supervsed Learnng Consder the problem of Prepostonal Phrase Attachment. Buy car wth money ; buy car wth wheel There are several ways to generate features. Gven the lmted representaton, we can assume
More informationExpectation Maximization Mixture Models HMMs
-755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood
More informationStochastic Structural Dynamics
Stochastc Structural Dynamcs Lecture-1 Defnton of probablty measure and condtonal probablty Dr C S Manohar Department of Cvl Engneerng Professor of Structural Engneerng Indan Insttute of Scence angalore
More informationMotion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong
Moton Percepton Under Uncertanty Hongjng Lu Department of Psychology Unversty of Hong Kong Outlne Uncertanty n moton stmulus Correspondence problem Qualtatve fttng usng deal observer models Based on sgnal
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models
More informationConditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
Condtonal Random Felds: Probablstc Models for Segmentng and Labelng Sequence Data Paper by John Lafferty, Andrew McCallum, and Fernando Perera ICML 2001 Presentaton by Joe Drsh May 9, 2002 Man Goals Present
More informationGaussian Mixture Models
Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous
More informationProbability and Random Variable Primer
B. Maddah ENMG 622 Smulaton 2/22/ Probablty and Random Varable Prmer Sample space and Events Suppose that an eperment wth an uncertan outcome s performed (e.g., rollng a de). Whle the outcome of the eperment
More information1/10/18. Definitions. Probabilistic models. Why probabilistic models. Example: a fair 6-sided dice. Probability
/0/8 I529: Machne Learnng n Bonformatcs Defntons Probablstc models Probablstc models A model means a system that smulates the object under consderaton A probablstc model s one that produces dfferent outcomes
More informationLecture Nov
Lecture 18 Nov 07 2008 Revew Clusterng Groupng smlar obects nto clusters Herarchcal clusterng Agglomeratve approach (HAC: teratvely merge smlar clusters Dfferent lnkage algorthms for computng dstances
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationChapter Newton s Method
Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve
More informationThe Expectation-Maximization Algorithm
The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.
More informationStatistics and Quantitative Analysis U4320. Segment 3: Probability Prof. Sharyn O Halloran
Statstcs and Quanttatve Analyss U430 Segment 3: Probablty Prof. Sharyn O Halloran Revew: Descrptve Statstcs Code book for Measures Sample Data Relgon Employed 1. Catholc 0. Unemployed. Protestant 1. Employed
More informationFirst Year Examination Department of Statistics, University of Florida
Frst Year Examnaton Department of Statstcs, Unversty of Florda May 7, 010, 8:00 am - 1:00 noon Instructons: 1. You have four hours to answer questons n ths examnaton.. You must show your work to receve
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More information3.1 ML and Empirical Distribution
67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationLecture 3: Probability Distributions
Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the
More informationHidden Markov Models
Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your
More information8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF
10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the
More informationSTATS 306B: Unsupervised Learning Spring Lecture 10 April 30
STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear
More information} Often, when learning, we deal with uncertainty:
Uncertanty and Learnng } Often, when learnng, we deal wth uncertanty: } Incomplete data sets, wth mssng nformaton } Nosy data sets, wth unrelable nformaton } Stochastcty: causes and effects related non-determnstcally
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationSpace of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics
/7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space
More information9 : Learning Partially Observed GM : EM Algorithm
10-708: Probablstc Graphcal Models 10-708, Sprng 2012 9 : Learnng Partally Observed GM : EM Algorthm Lecturer: Erc P. Xng Scrbes: Mrnmaya Sachan, Phan Gadde, Vswanathan Srpradha 1 Introducton So far n
More informationTHE ARIMOTO-BLAHUT ALGORITHM FOR COMPUTATION OF CHANNEL CAPACITY. William A. Pearlman. References: S. Arimoto - IEEE Trans. Inform. Thy., Jan.
THE ARIMOTO-BLAHUT ALGORITHM FOR COMPUTATION OF CHANNEL CAPACITY Wllam A. Pearlman 2002 References: S. Armoto - IEEE Trans. Inform. Thy., Jan. 1972 R. Blahut - IEEE Trans. Inform. Thy., July 1972 Recall
More informationMixture o f of Gaussian Gaussian clustering Nov
Mture of Gaussan clusterng Nov 11 2009 Soft vs hard lusterng Kmeans performs Hard clusterng: Data pont s determnstcally assgned to one and only one cluster But n realty clusters may overlap Soft-clusterng:
More informationI529: Machine Learning in Bioinformatics (Spring 2017) Markov Models
I529: Machne Learnng n Bonformatcs (Sprng 217) Markov Models Yuzhen Ye School of Informatcs and Computng Indana Unversty, Bloomngton Sprng 217 Outlne Smple model (frequency & profle) revew Markov chan
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for
More informationChapter 1. Probability
Chapter. Probablty Mcroscopc propertes of matter: quantum mechancs, atomc and molecular propertes Macroscopc propertes of matter: thermodynamcs, E, H, C V, C p, S, A, G How do we relate these two propertes?
More informationMaxent Models & Deep Learning
Maxent Models & Deep Learnng 1. Last bts of maxent (sequence) models 1.MEMMs vs. CRFs 2.Smoothng/regularzaton n maxent models 2. Deep Learnng 1. What s t? Why s t good? (Part 1) 2. From logstc regresson
More informationThe Expectation-Maximisation Algorithm
Chapter 4 The Expectaton-Maxmsaton Algorthm 4. The EM algorthm - a method for maxmsng the lkelhood Let us suppose that we observe Y {Y } n. The jont densty of Y s f(y ; θ 0), and θ 0 s an unknown parameter.
More informationOverview. Hidden Markov Models and Gaussian Mixture Models. Acoustic Modelling. Fundamental Equation of Statistical Speech Recognition
Overvew Hdden Marov Models and Gaussan Mxture Models Steve Renals and Peter Bell Automatc Speech Recognton ASR Lectures &5 8/3 January 3 HMMs and GMMs Key models and algorthms for HMM acoustc models Gaussans
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More information6.891: Lecture 8 (October 1st, 2003) Log-Linear Models for Parsing, and the EM Algorithm Part I
6.891: Lecture 8 (October 1st, 2003) Log-Linear Models for Parsing, and EM Algorithm Part I Overview Ratnaparkhi s Maximum-Entropy Parser The EM Algorithm Part I Log-Linear Taggers: Independence Assumptions
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationGoodness of fit and Wilks theorem
DRAFT 0.0 Glen Cowan 3 June, 2013 Goodness of ft and Wlks theorem Suppose we model data y wth a lkelhood L(µ) that depends on a set of N parameters µ = (µ 1,...,µ N ). Defne the statstc t µ ln L(µ) L(ˆµ),
More informationSampling Theory MODULE VII LECTURE - 23 VARYING PROBABILITY SAMPLING
Samplng heory MODULE VII LECURE - 3 VARYIG PROBABILIY SAMPLIG DR. SHALABH DEPARME OF MAHEMAICS AD SAISICS IDIA ISIUE OF ECHOLOGY KAPUR he smple random samplng scheme provdes a random sample where every
More informationCalculation of time complexity (3%)
Problem 1. (30%) Calculaton of tme complexty (3%) Gven n ctes, usng exhaust search to see every result takes O(n!). Calculaton of tme needed to solve the problem (2%) 40 ctes:40! dfferent tours 40 add
More informationThe Basic Idea of EM
The Basc Idea of EM Janxn Wu LAMDA Group Natonal Key Lab for Novel Software Technology Nanjng Unversty, Chna wujx2001@gmal.com June 7, 2017 Contents 1 Introducton 1 2 GMM: A workng example 2 2.1 Gaussan
More informationMarkov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement
Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs
More informationMACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression
11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING
More information8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore
8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationRepresenting arbitrary probability distributions Inference. Exact inference; Approximate inference
Bayesan Learnng So far What does t mean to be Bayesan? Naïve Bayes Independence assumptons EM Algorthm Learnng wth hdden varables Today: Representng arbtrary probablty dstrbutons Inference Exact nference;
More informationFinding Dense Subgraphs in G(n, 1/2)
Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng
More informationCS-433: Simulation and Modeling Modeling and Probability Review
CS-433: Smulaton and Modelng Modelng and Probablty Revew Exercse 1. (Probablty of Smple Events) Exercse 1.1 The owner of a camera shop receves a shpment of fve cameras from a camera manufacturer. Unknown
More informationNote on EM-training of IBM-model 1
Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are
More informationEngineering Risk Benefit Analysis
Engneerng Rsk Beneft Analyss.55, 2.943, 3.577, 6.938, 0.86, 3.62, 6.862, 22.82, ESD.72, ESD.72 RPRA 2. Elements of Probablty Theory George E. Apostolaks Massachusetts Insttute of Technology Sprng 2007
More informationRandomness and Computation
Randomness and Computaton or, Randomzed Algorthms Mary Cryan School of Informatcs Unversty of Ednburgh RC 208/9) Lecture 0 slde Balls n Bns m balls, n bns, and balls thrown unformly at random nto bns usually
More informationIntroduction to Hidden Markov Models
Introducton to Hdden Markov Models Alperen Degrmenc Ths document contans dervatons and algorthms for mplementng Hdden Markov Models. The content presented here s a collecton of my notes and personal nsghts
More informationprinceton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora
prnceton unv. F 13 cos 521: Advanced Algorthm Desgn Lecture 3: Large devatons bounds and applcatons Lecturer: Sanjeev Arora Scrbe: Today s topc s devaton bounds: what s the probablty that a random varable
More informationWhy BP Works STAT 232B
Why BP Works STAT 232B Free Energes Helmholz & Gbbs Free Energes 1 Dstance between Probablstc Models - K-L dvergence b{ KL b{ p{ = b{ ln { } p{ Here, p{ s the eact ont prob. b{ s the appromaton, called
More informationHidden Markov Models. Hongxin Zhang State Key Lab of CAD&CG, ZJU
Hdden Markov Models Hongxn Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 00-03-5 utlne Background Markov Chans Hdden Markov Models Example: Vdeo extures Problem statement vdeo clp vdeo texture
More informationSpeech and Language Processing
Speech and Language rocessng Lecture 3 ayesan network and ayesan nference Informaton and ommuncatons Engneerng ourse Takahro Shnozak 08//5 Lecture lan (Shnozak s part) I gves the frst 6 lectures about
More information18.1 Introduction and Recap
CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng
More informationParametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010
Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton
More informationA PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS
HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,
More informationFor example, if the drawing pin was tossed 200 times and it landed point up on 140 of these trials,
Probablty In ths actvty you wll use some real data to estmate the probablty of an event happenng. You wll also use a varety of methods to work out theoretcal probabltes. heoretcal and expermental probabltes
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More informationProbability-Theoretic Junction Trees
Probablty-Theoretc Juncton Trees Payam Pakzad, (wth Venkat Anantharam, EECS Dept, U.C. Berkeley EPFL, ALGO/LMA Semnar 2/2/2004 Margnalzaton Problem Gven an arbtrary functon of many varables, fnd (some
More informationExpected Value and Variance
MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or
More informationCSC401/2511 Spring CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto
CSC41/2511 Natural Language Computng Sprng 219 Lecture 5 Frank Rudzcz and Chloé Pou-Prom Unversty of Toronto Defnton of an HMM θ A hdden Markov model (HMM) s specfed by the 5-tuple {S, W, Π, A, B}: S =
More informationCIS 519/419 Appled Machne Learnng www.seas.upenn.edu/~cs519 Dan Roth danroth@seas.upenn.edu http://www.cs.upenn.edu/~danroth/ 461C, 3401 Walnut Sldes were created by Dan Roth (for CIS519/419 at Penn or
More informationIntroduction to Algorithms
Introducton to Algorthms 6.046J/8.40J Lecture 7 Prof. Potr Indyk Data Structures Role of data structures: Encapsulate data Support certan operatons (e.g., INSERT, DELETE, SEARCH) Our focus: effcency of
More informationMaximum Likelihood Estimation (MLE)
Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationSimulation and Random Number Generation
Smulaton and Random Number Generaton Summary Dscrete Tme vs Dscrete Event Smulaton Random number generaton Generatng a random sequence Generatng random varates from a Unform dstrbuton Testng the qualty
More informationExpectation Maximization Mixture Models
-755 Machne Learnng for Sgnal Processng Mxture Models Understandng (and Predctng Data Many dfferent data streams around us We process, understand and respond What s the response based on? Class 0. Oct
More informationMultilayer Perceptron (MLP)
Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne
More informationVapnik-Chervonenkis theory
Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011
Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected
More informationRELIABILITY ASSESSMENT
CHAPTER Rsk Analyss n Engneerng and Economcs RELIABILITY ASSESSMENT A. J. Clark School of Engneerng Department of Cvl and Envronmental Engneerng 4a CHAPMAN HALL/CRC Rsk Analyss for Engneerng Department
More informationCS286r Assign One. Answer Key
CS286r Assgn One Answer Key 1 Game theory 1.1 1.1.1 Let off-equlbrum strateges also be that people contnue to play n Nash equlbrum. Devatng from any Nash equlbrum s a weakly domnated strategy. That s,
More informationMaximum Likelihood Estimation
Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationIntroduction to the R Statistical Computing Environment R Programming
Introducton to the R Statstcal Computng Envronment R Programmng John Fox McMaster Unversty ICPSR 2018 John Fox (McMaster Unversty) R Programmng ICPSR 2018 1 / 14 Programmng Bascs Topcs Functon defnton
More informationAndreas C. Drichoutis Agriculural University of Athens. Abstract
Heteroskedastcty, the sngle crossng property and ordered response models Andreas C. Drchouts Agrculural Unversty of Athens Panagots Lazards Agrculural Unversty of Athens Rodolfo M. Nayga, Jr. Texas AMUnversty
More informationLecture 14: Bandits with Budget Constraints
IEOR 8100-001: Learnng and Optmzaton for Sequental Decson Makng 03/07/16 Lecture 14: andts wth udget Constrants Instructor: Shpra Agrawal Scrbed by: Zhpeng Lu 1 Problem defnton In the regular Mult-armed
More informationIntroduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:
CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and
More informationLecture 21: Numerical methods for pricing American type derivatives
Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)
More informationCHAPTER 3: BAYESIAN DECISION THEORY
HATER 3: BAYESIAN DEISION THEORY Decson mang under uncertanty 3 Data comes from a process that s completely not nown The lac of nowledge can be compensated by modelng t as a random process May be the underlyng
More informationLogistic Classifier CISC 5800 Professor Daniel Leeds
lon 9/7/8 Logstc Classfer CISC 58 Professor Danel Leeds Classfcaton strategy: generatve vs. dscrmnatve Generatve, e.g., Bayes/Naïve Bayes: 5 5 Identfy probablty dstrbuton for each class Determne class
More information