Probabilistic Graphical Models
|
|
- Preston Hardy
- 5 years ago
- Views:
Transcription
1 School of Computer Scence robablstc Graphcal Models Appromate Inference: Markov Chan Monte Carlo Erc Xng Lecture 7 March 9 04 X X X 3 Erc CMU
2 Recap of Monte Carlo Monte Carlo methods are algorthms that: Generate samples from a gven probablty dstrbuton Estmate epectatons of functons [ f ] under a dstrbuton p E p Why s ths useful? Can use samples of p to appromate p tself Allows us to do graphcal model nference when we can t compute E [ f ] p Epectatons reveal nterestng propertes about eg means and varances of p p Erc CMU
3 Lmtatons of Monte Carlo Drect samplng Hard to get rare events n hgh-dmensonal spaces Infeasble for MRFs unless we know the normalzer Z Rejecton samplng Importance samplng Do not work well f the proposal Q s very dfferent from Yet constructng a Q smlar to can be dffcult Makng a good proposal usually requres knowledge of the analytc form of but f we had that we wouldn t even need to sample! Intuton: nstead of a fed proposal Q what f we could use an adaptve proposal? Erc CMU
4 Markov Chan Monte Carlo MCMC algorthms feature adaptve proposals Instead of Q they use Q where s the new state beng sampled and s the prevous sample As changes Q can also change as a functon of Importance samplng wth a bad proposal Q MCMC wth adaptve proposal Q Q Q Q 3 Q Erc CMU
5 Metropols-Hastngs Let s see how MCMC works n practce Later we ll look at the theoretcal aspects Metropols-Hastngs algorthm Draws a sample from Q where s the prevous sample The new sample s accepted or rejected wth some probablty A Ths acceptance probablty s ' Q ' A ' mn Q ' A s lke a rato of mportance samplng weghts /Q s the mportance weght for /Q s the mportance weght for We dvde the mportance weght for by that of Notce that we only need to compute / rather than or separately A ensures that after suffcently many draws our samples wll come from the true dstrbuton we shall learn why later n ths lecture Erc CMU
6 The MH Algorthm Intalze startng state 0 set t =0 Burn-n: whle samples have not converged = t t =t + sample * ~ Q* // draw from proposal sample u ~ Unform0 // draw acceptance threshold * Q * -f u A * mn Q * t = * // transton -else t = // stay n current state Take samples from = : Reset t=0 for t =:N t+ Draw sample t Functon Draw sample t Erc CMU
7 The MH Algorthm A ' mn ' Q ' Q ' Eample: Let Q be a Gaussan centered on We re tryng to sample from a bmodal dstrbuton Intalze 0 Q 0 0 Erc CMU
8 The MH Algorthm A ' mn ' Q ' Q ' Eample: Let Q be a Gaussan centered on We re tryng to sample from a bmodal dstrbuton Intalze 0 Draw accept Q 0 0 Erc CMU
9 The MH Algorthm A ' mn ' Q ' Q ' Eample: Let Q be a Gaussan centered on We re tryng to sample from a bmodal dstrbuton Intalze 0 Draw accept Draw accept Q 0 Erc CMU
10 The MH Algorthm A ' mn ' Q ' Q ' Eample: Let Q be a Gaussan centered on We re tryng to sample from a bmodal dstrbuton Intalze 0 Draw accept Draw accept Draw but reject; set 3 = Q 3 0 rejected 3 Erc CMU
11 The MH Algorthm A ' mn ' Q ' Q ' Eample: Let Q be a Gaussan centered on We re tryng to sample from a bmodal dstrbuton Intalze 0 Draw accept Draw accept Draw but reject; set 3 = We reject because /Q < and /Q > hence A s close to zero! Q 3 0 rejected 3 Erc CMU
12 The MH Algorthm A ' mn ' Q ' Q ' Eample: Let Q be a Gaussan centered on We re tryng to sample from a bmodal dstrbuton Intalze 0 Draw accept Draw accept Draw but reject; set 3 = Draw accept 4 Q Erc CMU
13 The MH Algorthm A ' mn ' Q ' Q ' Eample: Let Q be a Gaussan centered on We re tryng to sample from a bmodal dstrbuton Intalze 0 Draw accept Draw accept Draw but reject; set 3 = Draw accept 4 Draw accept 5 Q Erc CMU
14 The MH Algorthm A ' mn ' Q ' Q ' Eample: Let Q be a Gaussan centered on We re tryng to sample from a bmodal dstrbuton Intalze 0 Draw accept Draw accept Draw but reject; set 3 = Draw accept 4 Draw accept 5 The adaptve proposal Q allows us to sample both modes of! Q Erc CMU
15 Theoretcal aspects of MCMC The MH algorthm has a burn-n perod Why do we throw away samples from burn-n? Why are the MH samples guaranteed to be from? The proposal Q keeps changng wth the value of ; how do we know the samples wll eventually come from? What s the connecton between Markov Chans and MCMC? Erc CMU
16 Markov Chans A Markov Chan s a sequence of random varables n wth the Markov roperty n n n n n n s known as the transton kernel The net state depends only on the precedng state recall HMMs! Note: the rvs can be vectors We defne t to be the t-th sample of all varables n a graphcal model X t represents the entre state of the graphcal model at tme t We study homogeneous Markov Chans n whch the t t transton kernel s fed wth tme T To emphasze ths we wll call the kernel where s the prevous state and s the net state Erc CMU
17 MC Concepts To understand MCs we need to defne a few concepts: t robablty dstrbutons over states: s a dstrbuton over the state of the system at tme t When dealng wth MCs we don t thnk of the system as beng n one state but as havng a dstrbuton over states For graphcal models remember that represents all varables Transtons: recall that states transton from t to t+ accordng to the transton kernel T We can also transton entre dstrbutons: t t T At tme t state has probablty mass π t The transton probablty redstrbutes ths mass to other states Statonary dstrbutons: s statonary f t does not change under the transton kernel: T for all Erc CMU
18 MC Concepts Statonary dstrbutons are of great mportance n MCMC To understand them we need to defne some notons: Irreducble: an MC s rreducble f you can get from any state to any other state wth probablty > 0 n a fnte number of steps e there are no unreachable parts of the state space Aperodc: an MC s aperodc f you can return to any state at any tme erodc MCs have states that need tme steps to return to cycles Ergodc or regular: an MC s ergodc f t s rreducble and aperodc Ergodcty s mportant: t mples you can reach the statonary 0 dstrbuton no matter the ntal dstrbuton st All good MCMC algorthms must satsfy ergodcty so that you can t ntalze n a way that wll never converge Erc CMU
19 MC Concepts Reversble detaled balance: an MC s reversble f there ests a dstrbuton such that the detaled balance condton s satsfed: robablty of and can be dfferent but the jont of amd reman the same no matter whch drecton to go Reversble MCs always have a statonary dstrbuton! roof: The last lne s the defnton of a statonary dstrbuton! 9 T T T T T T T T T Erc CMU
20 Why does Metropols-Hastngs work? Recall that we draw a sample accordng to Q and then accept/reject accordng to A In other words the transton kernel s We can prove that MH satsfes detaled balance Recall that Notce ths mples the followng: 0 ' ' A Q T mn ' Q Q A ' A Q Q f then and thus ' A Erc CMU
21 Why does Metropols-Hastngs work? Now suppose A < and A = We have The last lne s eactly the detaled balance condton In other words the MH algorthm leads to a statonary dstrbuton Recall we defned to be the true dstrbuton of Thus the MH algorthm eventually converges to the true dstrbuton! ' ' ' T T A Q A Q Q A Q Q Q A Erc CMU ' A Q Q f then and thus ' A
22 Caveats Although MH eventually converges to the true dstrbuton we have no guarantees as to when ths wll occur The burn-n perod represents the un-converged part of the Markov Chan that s why we throw those samples away! Knowng when to halt burn-n s an art We wll look at some technques later n ths lecture Erc CMU
23 Gbbs Samplng Gbbs Samplng s an MCMC algorthm that samples each random varable of a graphcal model one at a tme GS s a specal case of the MH algorthm GS algorthms Are farly easy to derve for many graphcal models eg mture models Latent Drchlet allocaton Have reasonable computaton and memory requrements because they sample one rv at a tme Can be Rao-Blackwellzed ntegrate out some rvs to decrease the samplng varance Erc CMU
24 Gbbs Samplng The GS algorthm: Suppose the graphcal model contans varables n Intalze startng values for n 3 Do untl convergence: ck an orderng of the n varables can be fed or random For each varable n order: Sample from - + n e the condtonal dstrbuton of gven the current values of all other varables Update When we update we mmedately use ts new value for samplng other varables j Erc CMU
25 Markov Blankets The condtonal - + n looks ntmdatng but recall Markov Blankets: Let MB be the Markov Blanket of then MB n For a BN the Markov Blanket of s the set contanng ts parents chldren and co-parents For an MRF the Markov Blanket of s ts mmedate neghbors Erc CMU
26 Gbbs Samplng: An Eample t B E A J M 0 F F F F F 3 4 Consder the alarm network Assume we sample varables n the order BEAJM Intalze all varables at t = 0 to False Erc CMU
27 Gbbs Samplng: An Eample Samplng BAE at t = : Usng Bayes Rule AE = FF so we compute the followng and sample B = F 7 t B E A J M 0 F F F F F F 3 4 B E B A E A B F E F A F B F E F A T B Erc CMU
28 Gbbs Samplng: An Eample Samplng EAB: Usng Bayes Rule AB = FF so we compute the followng and sample E = T 8 t B E A J M 0 F F F F F F T 3 4 E E B A B A E F B F A F E F B F A T E Erc CMU
29 Gbbs Samplng: An Eample Samplng ABEJM: Usng Bayes Rule BEJM = FTFF so we compute the followng and sample A = F 9 t B E A J M 0 F F F F F F T F 3 4 E B A A M A J M J E B A F M F J T E F B F A F M F J T E F B T A Erc CMU
30 Gbbs Samplng: An Eample t B E A J M 0 F F F F F F T F T 3 4 Samplng JA: No need to apply Bayes Rule A = F so we compute the followng and sample J = T J T A F 005 J F A F 095 Erc CMU
31 Gbbs Samplng: An Eample t B E A J M 0 F F F F F F T F T F 3 4 Samplng MA: No need to apply Bayes Rule A = F so we compute the followng and sample M = F M T A F 00 M F A F 099 Erc CMU
32 Gbbs Samplng: An Eample t B E A J M 0 F F F F F F T F T F F T T T T 3 4 Now t = and we repeat the procedure to sample new values of BEAJM Erc CMU
33 Gbbs Samplng: An Eample t B E A J M 0 F F F F F F T F T F F T T T T 3 T F T F T 4 T F T F F Now t = and we repeat the procedure to sample new values of BEAJM And smlarly for t = 3 4 etc Erc CMU
34 Topc Models: Collapsed Gbbs Tom Grffths & Mark Steyvers Collapsed Gbbs samplng opular nference algorthm for topc models α Integrate out topc vectors π and topcs B Only need to sample word-topc assgnments z β π Algorthm: K B z For all varables z = z z z n Draw z t+ from z z - w where z - = z t+ z t+ z t+ - z t + z t n w M N Erc CMU
35 Collapsed Gbbs samplng What s z z - w? It s a product of two Drchlet-Multnomal condtonal dstrbutons: word-topc term doc-topc term Erc CMU
36 Collapsed Gbbs samplng What s z z - w? It s a product of two Drchlet-Multnomal condtonal dstrbutons: # word postons a ecludng w such that: w a = w z a = j # word postons a n the current document d ecludng w such that: z a = j # word postons a ecludng w such that: z a = j # word postons a n the current document d ecludng w Erc CMU
37 Collapsed Gbbs llustraton w d z MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5 teraton Erc CMU
38 Collapsed Gbbs llustraton w d z z MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5 teraton? Erc CMU
39 Collapsed Gbbs llustraton w d z z MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5 teraton? Erc CMU
40 Collapsed Gbbs llustraton w d z z MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5 teraton? Erc CMU
41 Collapsed Gbbs llustraton w d z z MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5 teraton? Erc CMU
42 Collapsed Gbbs llustraton w d z z MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5 teraton? Erc CMU
43 Collapsed Gbbs llustraton w d z z MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5 teraton? Erc CMU
44 Collapsed Gbbs llustraton w d z z MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5 teraton? Erc CMU
45 Collapsed Gbbs llustraton w d z z z MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK SCIENTIFIC KNOWLEDGE JOY 5 teraton 000 Erc CMU
46 Gbbs Samplng s a specal case of MH The GS proposal dstrbuton s Where - denotes all varables ecept Applyng MH to ths proposal we fnd that samples are always accepted whch s eactly what GS does: GS s smply MH wth a proposal that s always accepted! 46 Q mn mn mn mn Q Q A Erc CMU
47 ractcal Aspects of MCMC How do we know f our proposal Q s any good? Montor the acceptance rate lot the autocorrelaton functon How do we know when to stop burn-n? lot the sample values vs tme lot the log-lkelhood vs tme Erc CMU
48 Acceptance Rate Low-varance proposal Q Hgh-varance proposal Q Choosng the proposal Q s a tradeoff: Narrow low-varance proposals have hgh acceptance but take many teratons to eplore fully because the proposed are too close Wde hgh-varance proposals have the potental to eplore much of but many proposals are rejected whch slows down the sampler A good Q proposes dstant samples wth a suffcently hgh acceptance rate Erc CMU
49 Acceptance Rate Low-varance proposal Q Hgh-varance proposal Q Acceptance rate s the fracton of samples that MH accepts General gudelne: proposals should have ~05 acceptance rate [] Gaussan specal case: If both and Q are Gaussan the optmal acceptance rate s ~045 for D= dmenson and approaches ~03 as D tends to nfnty [] [] Muller 993 A Generc Approach to osteror Integraton and Gbbs Samplng [] Roberts GO Gelman A and Glks WR 994 Weak Convergence and Optmal Scalng of Random Walk Metropols Algorthms Erc CMU
50 Autocorrelaton functon MCMC chans always show autocorrelaton AC AC means that adjacent samples n tme are hghly correlated We quantfy AC wth the autocorrelaton functon of an rv : 50 Low autocorrelaton Hgh autocorrelaton k n t t k n t k t t k R Erc CMU
51 Autocorrelaton functon R k nk t t nk t t t k Low autocorrelaton Hgh autocorrelaton The frst-order AC R can be used to estmate the Sample Sze Inflaton Factor SSIF: R s R If we took n samples wth SSIF s then the effectve sample sze s n/s Hgh autocorrelaton leads to smaller effectve sample sze! We want proposals Q wth low autocorrelaton Erc CMU
52 Sample Values vs Tme Well-med chans oorly-med chans Montor convergence by plottng samples of rvs from multple MH runs chans If the chans are well-med left they are probably converged If the chans are poorly-med rght we should contnue burn-n Erc CMU
53 Log-lkelhood vs Tme Not converged Converged Many graphcal models are hgh-dmensonal Hard to vsualze all rv chans at once Instead plot the complete log-lkelhood vs tme The complete log-lkelhood s an rv that depends on all model rvs Generally the log-lkelhood wll clmb then eventually plateau Erc CMU
54 Summary Markov Chan Monte Carlo methods use adaptve proposals Q to sample from the true dstrbuton Metropols-Hastngs allows you to specfy any proposal Q But choosng a good Q requres care Gbbs samplng sets the proposal Q to the condtonal dstrbuton Acceptance rate always! But remember that hgh acceptance usually entals slow eploraton In fact there are better MCMC algorthms for certan models Knowng when to halt burn-n s an art Erc CMU
Markov chain Monte Carlo Lecture 9
Markov chain Monte Carlo Lecture 9 David Sontag New York University Slides adapted from Eric Xing and Qirong Ho (CMU) Limitations of Monte Carlo Direct (unconditional) sampling Hard to get rare events
More informationMarkov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement
Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs
More informationCS 3750 Machine Learning Lecture 6. Monte Carlo methods. CS 3750 Advanced Machine Learning. Markov chain Monte Carlo
CS 3750 Machne Learnng Lectre 6 Monte Carlo methods Mlos Haskrecht mlos@cs.ptt.ed 5329 Sennott Sqare Markov chan Monte Carlo Importance samplng: samples are generated accordng to Q and every sample from
More information( ) ( ) ( ) ( ) STOCHASTIC SIMULATION FOR BLOCKED DATA. Monte Carlo simulation Rejection sampling Importance sampling Markov chain Monte Carlo
SOCHASIC SIMULAIO FOR BLOCKED DAA Stochastc System Analyss and Bayesan Model Updatng Monte Carlo smulaton Rejecton samplng Importance samplng Markov chan Monte Carlo Monte Carlo smulaton Introducton: If
More informationMarkov Chain Monte Carlo Lecture 6
where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways
More informationConvergence of random processes
DS-GA 12 Lecture notes 6 Fall 216 Convergence of random processes 1 Introducton In these notes we study convergence of dscrete random processes. Ths allows to characterze phenomena such as the law of large
More informationSpeech and Language Processing
Speech and Language rocessng Lecture 3 ayesan network and ayesan nference Informaton and ommuncatons Engneerng ourse Takahro Shnozak 08//5 Lecture lan (Shnozak s part) I gves the frst 6 lectures about
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationOutline for today. Markov chain Monte Carlo. Example: spatial statistics (Christensen and Waagepetersen 2001)
Markov chan Monte Carlo Rasmus Waagepetersen Department of Mathematcs Aalborg Unversty Denmark November, / Outlne for today MCMC / Condtonal smulaton for hgh-dmensonal U: Markov chan Monte Carlo Consder
More informationWeb Appendix B Estimation. We base our sampling procedure on the method of data augmentation (e.g., Tanner and Wong,
Web Appendx B Estmaton Lkelhood and Data Augmentaton We base our samplng procedure on the method of data augmentaton (eg anner and Wong 987) here e treat the unobserved ndvdual choces as parameters Specfcally
More informationTarget tracking example Filtering: Xt. (main interest) Smoothing: X1: t. (also given with SIS)
Target trackng example Flterng: Xt Y1: t (man nterest) Smoothng: X1: t Y1: t (also gven wth SIS) However as we have seen, the estmate of ths dstrbuton breaks down when t gets large due to the weghts becomng
More information1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands
Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of
More information6. Stochastic processes (2)
Contents Markov processes Brth-death processes Lect6.ppt S-38.45 - Introducton to Teletraffc Theory Sprng 5 Markov process Consder a contnuous-tme and dscrete-state stochastc process X(t) wth state space
More information6. Stochastic processes (2)
6. Stochastc processes () Lect6.ppt S-38.45 - Introducton to Teletraffc Theory Sprng 5 6. Stochastc processes () Contents Markov processes Brth-death processes 6. Stochastc processes () Markov process
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationParametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010
Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton
More informationDepartment of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING
MACHINE LEANING Vasant Honavar Bonformatcs and Computatonal Bology rogram Center for Computatonal Intellgence, Learnng, & Dscovery Iowa State Unversty honavar@cs.astate.edu www.cs.astate.edu/~honavar/
More informationLecture 10: May 6, 2013
TTIC/CMSC 31150 Mathematcal Toolkt Sprng 013 Madhur Tulsan Lecture 10: May 6, 013 Scrbe: Wenje Luo In today s lecture, we manly talked about random walk on graphs and ntroduce the concept of graph expander,
More informationThe Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD
he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s
More informationHidden Markov Models
Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,
More informationprinceton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora
prnceton unv. F 13 cos 521: Advanced Algorthm Desgn Lecture 3: Large devatons bounds and applcatons Lecturer: Sanjeev Arora Scrbe: Today s topc s devaton bounds: what s the probablty that a random varable
More informationLecture Nov
Lecture 18 Nov 07 2008 Revew Clusterng Groupng smlar obects nto clusters Herarchcal clusterng Agglomeratve approach (HAC: teratvely merge smlar clusters Dfferent lnkage algorthms for computng dstances
More informationEM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More informationHopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen
Hopfeld networks and Boltzmann machnes Geoffrey Hnton et al. Presented by Tambet Matsen 18.11.2014 Hopfeld network Bnary unts Symmetrcal connectons http://www.nnwj.de/hopfeld-net.html Energy functon The
More informationOn an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1
On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool
More informationEconomics 101. Lecture 4 - Equilibrium and Efficiency
Economcs 0 Lecture 4 - Equlbrum and Effcency Intro As dscussed n the prevous lecture, we wll now move from an envronment where we looed at consumers mang decsons n solaton to analyzng economes full of
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationGaussian Mixture Models
Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationWhy Monte Carlo Integration? Introduction to Monte Carlo Method. Continuous Probability. Continuous Probability
Introducton to Monte Carlo Method Kad Bouatouch IRISA Emal: kad@rsa.fr Wh Monte Carlo Integraton? To generate realstc lookng mages, we need to solve ntegrals of or hgher dmenson Pel flterng and lens smulaton
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationj) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1
Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationInformation Geometry of Gibbs Sampler
Informaton Geometry of Gbbs Sampler Kazuya Takabatake Neuroscence Research Insttute AIST Central 2, Umezono 1-1-1, Tsukuba JAPAN 305-8568 k.takabatake@ast.go.jp Abstract: - Ths paper shows some nformaton
More informationMixture o f of Gaussian Gaussian clustering Nov
Mture of Gaussan clusterng Nov 11 2009 Soft vs hard lusterng Kmeans performs Hard clusterng: Data pont s determnstcally assgned to one and only one cluster But n realty clusters may overlap Soft-clusterng:
More information6 Supplementary Materials
6 Supplementar Materals 61 Proof of Theorem 31 Proof Let m Xt z 1:T : l m Xt X,z 1:t Wethenhave mxt z1:t ˆm HX Xt z 1:T mxt z1:t m HX Xt z 1:T + mxt z 1:T HX We consder each of the two terms n equaton
More informationAppendix B: Resampling Algorithms
407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles
More informationCSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing
CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann
More informationModule 9. Lecture 6. Duality in Assignment Problems
Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept
More informationAn Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation
An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011
Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected
More informationSection 8.3 Polar Form of Complex Numbers
80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the
More information10.34 Fall 2015 Metropolis Monte Carlo Algorithm
10.34 Fall 2015 Metropols Monte Carlo Algorthm The Metropols Monte Carlo method s very useful for calculatng manydmensonal ntegraton. For e.g. n statstcal mechancs n order to calculate the prospertes of
More informationProbability Theory (revisited)
Probablty Theory (revsted) Summary Probablty v.s. plausblty Random varables Smulaton of Random Experments Challenge The alarm of a shop rang. Soon afterwards, a man was seen runnng n the street, persecuted
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationLecture 2: Prelude to the big shrink
Lecture 2: Prelude to the bg shrnk Last tme A slght detour wth vsualzaton tools (hey, t was the frst day... why not start out wth somethng pretty to look at?) Then, we consdered a smple 120a-style regresson
More informationLecture 12: Classification
Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationRandomness and Computation
Randomness and Computaton or, Randomzed Algorthms Mary Cryan School of Informatcs Unversty of Ednburgh RC 208/9) Lecture 0 slde Balls n Bns m balls, n bns, and balls thrown unformly at random nto bns usually
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More informationNP-Completeness : Proofs
NP-Completeness : Proofs Proof Methods A method to show a decson problem Π NP-complete s as follows. (1) Show Π NP. (2) Choose an NP-complete problem Π. (3) Show Π Π. A method to show an optmzaton problem
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationChapter Newton s Method
Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve
More informationQuantifying Uncertainty
Partcle Flters Quantfyng Uncertanty Sa Ravela M. I. T Last Updated: Sprng 2013 1 Quantfyng Uncertanty Partcle Flters Partcle Flters Appled to Sequental flterng problems Can also be appled to smoothng problems
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and
More informationLecture 4: November 17, Part 1 Single Buffer Management
Lecturer: Ad Rosén Algorthms for the anagement of Networs Fall 2003-2004 Lecture 4: November 7, 2003 Scrbe: Guy Grebla Part Sngle Buffer anagement In the prevous lecture we taled about the Combned Input
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationA Robust Method for Calculating the Correlation Coefficient
A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal
More informationOpen Systems: Chemical Potential and Partial Molar Quantities Chemical Potential
Open Systems: Chemcal Potental and Partal Molar Quanttes Chemcal Potental For closed systems, we have derved the followng relatonshps: du = TdS pdv dh = TdS + Vdp da = SdT pdv dg = VdP SdT For open systems,
More informationReport on Image warping
Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.
More informationConjugacy and the Exponential Family
CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the
More information1 GSW Iterative Techniques for y = Ax
1 for y = A I m gong to cheat here. here are a lot of teratve technques that can be used to solve the general case of a set of smultaneous equatons (wrtten n the matr form as y = A), but ths chapter sn
More informationMore metrics on cartesian products
More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of
More informationMATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)
1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons
More informationEcon Statistical Properties of the OLS estimator. Sanjaya DeSilva
Econ 39 - Statstcal Propertes of the OLS estmator Sanjaya DeSlva September, 008 1 Overvew Recall that the true regresson model s Y = β 0 + β 1 X + u (1) Applyng the OLS method to a sample of data, we estmate
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationChapter Twelve. Integration. We now turn our attention to the idea of an integral in dimensions higher than one. Consider a real-valued function f : D
Chapter Twelve Integraton 12.1 Introducton We now turn our attenton to the dea of an ntegral n dmensons hgher than one. Consder a real-valued functon f : R, where the doman s a nce closed subset of Eucldean
More informationGrover s Algorithm + Quantum Zeno Effect + Vaidman
Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the
More informationDS-GA 1002 Lecture notes 5 Fall Random processes
DS-GA Lecture notes 5 Fall 6 Introducton Random processes Random processes, also known as stochastc processes, allow us to model quanttes that evolve n tme (or space n an uncertan way: the trajectory of
More informationSTATS 306B: Unsupervised Learning Spring Lecture 10 April 30
STATS 306B: Unsupervsed Learnng Sprng 2014 Lecture 10 Aprl 30 Lecturer: Lester Mackey Scrbe: Joey Arthur, Rakesh Achanta 10.1 Factor Analyss 10.1.1 Recap Recall the factor analyss (FA) model for lnear
More informationGoodness of fit and Wilks theorem
DRAFT 0.0 Glen Cowan 3 June, 2013 Goodness of ft and Wlks theorem Suppose we model data y wth a lkelhood L(µ) that depends on a set of N parameters µ = (µ 1,...,µ N ). Defne the statstc t µ ln L(µ) L(ˆµ),
More informationContinuous Time Markov Chain
Contnuous Tme Markov Chan Hu Jn Department of Electroncs and Communcaton Engneerng Hanyang Unversty ERICA Campus Contents Contnuous tme Markov Chan (CTMC) Propertes of sojourn tme Relatons Transton probablty
More information12. The Hamilton-Jacobi Equation Michael Fowler
1. The Hamlton-Jacob Equaton Mchael Fowler Back to Confguraton Space We ve establshed that the acton, regarded as a functon of ts coordnate endponts and tme, satsfes ( ) ( ) S q, t / t+ H qpt,, = 0, and
More informationLecture 21: Numerical methods for pricing American type derivatives
Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)
More informationProbability-Theoretic Junction Trees
Probablty-Theoretc Juncton Trees Payam Pakzad, (wth Venkat Anantharam, EECS Dept, U.C. Berkeley EPFL, ALGO/LMA Semnar 2/2/2004 Margnalzaton Problem Gven an arbtrary functon of many varables, fnd (some
More informationErrors for Linear Systems
Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch
More informationLecture 4. Instructor: Haipeng Luo
Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would
More informationChapter 12. Ordinary Differential Equation Boundary Value (BV) Problems
Chapter. Ordnar Dfferental Equaton Boundar Value (BV) Problems In ths chapter we wll learn how to solve ODE boundar value problem. BV ODE s usuall gven wth x beng the ndependent space varable. p( x) q(
More informationLecture 3: Probability Distributions
Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the
More informationWeek3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity
Week3, Chapter 4 Moton n Two Dmensons Lecture Quz A partcle confned to moton along the x axs moves wth constant acceleraton from x =.0 m to x = 8.0 m durng a 1-s tme nterval. The velocty of the partcle
More information18.1 Introduction and Recap
CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng
More informationVQ widely used in coding speech, image, and video
at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng
More informationx = , so that calculated
Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to
More informationTracking with Kalman Filter
Trackng wth Kalman Flter Scott T. Acton Vrgna Image and Vdeo Analyss (VIVA), Charles L. Brown Department of Electrcal and Computer Engneerng Department of Bomedcal Engneerng Unversty of Vrgna, Charlottesvlle,
More informationn α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0
MODULE 2 Topcs: Lnear ndependence, bass and dmenson We have seen that f n a set of vectors one vector s a lnear combnaton of the remanng vectors n the set then the span of the set s unchanged f that vector
More informationArtificial Intelligence Bayesian Networks
Artfcal Intellgence Bayesan Networks Adapted from sldes by Tm Fnn and Mare desjardns. Some materal borrowed from Lse Getoor. 1 Outlne Bayesan networks Network structure Condtonal probablty tables Condtonal
More information1 The Mistake Bound Model
5-850: Advanced Algorthms CMU, Sprng 07 Lecture #: Onlne Learnng and Multplcatve Weghts February 7, 07 Lecturer: Anupam Gupta Scrbe: Bryan Lee,Albert Gu, Eugene Cho he Mstake Bound Model Suppose there
More informationGoogle PageRank with Stochastic Matrix
Google PageRank wth Stochastc Matrx Md. Sharq, Puranjt Sanyal, Samk Mtra (M.Sc. Applcatons of Mathematcs) Dscrete Tme Markov Chan Let S be a countable set (usually S s a subset of Z or Z d or R or R d
More informationIRO0140 Advanced space time-frequency signal processing
IRO4 Advanced space tme-frequency sgnal processng Lecture Toomas Ruuben Takng nto account propertes of the sgnals, we can group these as followng: Regular and random sgnals (are all sgnal parameters determned
More informationECE559VV Project Report
ECE559VV Project Report (Supplementary Notes Loc Xuan Bu I. MAX SUM-RATE SCHEDULING: THE UPLINK CASE We have seen (n the presentaton that, for downlnk (broadcast channels, the strategy maxmzng the sum-rate
More informationChapter 1. Probability
Chapter. Probablty Mcroscopc propertes of matter: quantum mechancs, atomc and molecular propertes Macroscopc propertes of matter: thermodynamcs, E, H, C V, C p, S, A, G How do we relate these two propertes?
More informationThe Feynman path integral
The Feynman path ntegral Aprl 3, 205 Hesenberg and Schrödnger pctures The Schrödnger wave functon places the tme dependence of a physcal system n the state, ψ, t, where the state s a vector n Hlbert space
More informationExpected Value and Variance
MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or
More informationHidden Markov Models
CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte
More informationBezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0
Bezer curves Mchael S. Floater August 25, 211 These notes provde an ntroducton to Bezer curves. 1 Bernsten polynomals Recall that a real polynomal of a real varable x R, wth degree n, s a functon of the
More informationLecture 4: Constant Time SVD Approximation
Spectral Algorthms and Representatons eb. 17, Mar. 3 and 8, 005 Lecture 4: Constant Tme SVD Approxmaton Lecturer: Santosh Vempala Scrbe: Jangzhuo Chen Ths topc conssts of three lectures 0/17, 03/03, 03/08),
More informationTopics in Probability Theory and Stochastic Processes Steven R. Dunbar. Classes of States and Stationary Distributions
Steven R. Dunbar Department of Mathematcs 203 Avery Hall Unversty of Nebraska-Lncoln Lncoln, NE 68588-0130 http://www.math.unl.edu Voce: 402-472-3731 Fax: 402-472-8466 Topcs n Probablty Theory and Stochastc
More informationLecture 3. Ax x i a i. i i
18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest
More informationIntroduction to Algorithms
Introducton to Algorthms 6.046J/8.40J Lecture 7 Prof. Potr Indyk Data Structures Role of data structures: Encapsulate data Support certan operatons (e.g., INSERT, DELETE, SEARCH) Our focus: effcency of
More information