Markov Chain Monte-Carlo (MCMC)

Similar documents
Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS. Dariusz Biskup

Excess Error, Approximation Error, and Estimation Error

A Differential Evaluation Markov Chain Monte Carlo algorithm for Bayesian Model Updating M. Sherri a, I. Boulkaibet b, T. Marwala b, M. I.

XII.3 The EM (Expectation-Maximization) Algorithm

Several generation methods of multinomial distributed random number Tian Lei 1, a,linxihe 1,b,Zhigang Zhang 1,c

COS 511: Theoretical Machine Learning

Bayesian estimation using MCMC approach based on progressive first-failure censoring from generalized Pareto distribution

1 Review From Last Time

1 Definition of Rademacher Complexity

( ) ( ) ( ) ( ) STOCHASTIC SIMULATION FOR BLOCKED DATA. Monte Carlo simulation Rejection sampling Importance sampling Markov chain Monte Carlo

LECTURE :FACTOR ANALYSIS

Multipoint Analysis for Sibling Pairs. Biostatistics 666 Lecture 18

NONMEM7_Technical_Guide.doc. Technical Guide on the Expectation-Maximization Population Analysis Methods in the NONMEM 7 Program

BAYESIAN AND NON BAYESIAN ESTIMATION OF ERLANG DISTRIBUTION UNDER PROGRESSIVE CENSORING

Departure Process from a M/M/m/ Queue

Markov Chain Monte Carlo Lecture 6

Integral Transforms and Dual Integral Equations to Solve Heat Equation with Mixed Conditions

Centroid Uncertainty Bounds for Interval Type-2 Fuzzy Sets: Forward and Inverse Problems

Quantum Particle Motion in Physical Space

Information Geometry of Gibbs Sampler

PARAMETER ESTIMATION IN WEIBULL DISTRIBUTION ON PROGRESSIVELY TYPE- II CENSORED SAMPLE WITH BETA-BINOMIAL REMOVALS

System in Weibull Distribution

Reliability estimation in Pareto-I distribution based on progressively type II censored sample with binomial removals

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Xiangwen Li. March 8th and March 13th, 2001

Lecture 19 of 42. MAP and MLE continued, Minimum Description Length (MDL)

Least Squares Fitting of Data

Need for Probabilistic Reasoning. Raymond J. Mooney. Conditional Probability. Axioms of Probability Theory. Classification (Categorization)

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

Applied Mathematics Letters

PGM Learning Tasks and Metrics

Computational and Statistical Learning theory Assignment 4

CHAPT II : Prob-stats, estimation

Introducing Entropy Distributions

First Year Examination Department of Statistics, University of Florida

Designing Fuzzy Time Series Model Using Generalized Wang s Method and Its application to Forecasting Interest Rate of Bank Indonesia Certificate

Finite Element Model Updating Using Bayesian Approach

Determination of the Confidence Level of PSD Estimation with Given D.O.F. Based on WELCH Algorithm

PROPERTIES I. INTRODUCTION. Finite element (FE) models are widely used to predict the dynamic characteristics of aerospace

Small-Sample Equating With Prior Information

Chapter 11: Simple Linear Regression and Correlation

Outline. Prior Information and Subjective Probability. Subjective Probability. The Histogram Approach. Subjective Determination of the Prior Density

Outline for today. Markov chain Monte Carlo. Example: spatial statistics (Christensen and Waagepetersen 2001)

Statistical analysis of Accelerated life testing under Weibull distribution based on fuzzy theory

Introduction to Hidden Markov Models

Motion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong

Probability Theory (revisited)

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Signal-noise Ratio Recognition Algorithm Based on Singular Value Decomposition

Denote the function derivatives f(x) in given points. x a b. Using relationships (1.2), polynomials (1.1) are written in the form

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

CS 3750 Machine Learning Lecture 6. Monte Carlo methods. CS 3750 Advanced Machine Learning. Markov chain Monte Carlo

Probabilistic Graphical Models

Conjugacy and the Exponential Family

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Interval Estimation of Stress-Strength Reliability for a General Exponential Form Distribution with Different Unknown Parameters

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Small Area Interval Estimation

Probability, Statistics, and Reliability for Engineers and Scientists SIMULATION

PROBABILITY AND STATISTICS Vol. III - Analysis of Variance and Analysis of Covariance - V. Nollau ANALYSIS OF VARIANCE AND ANALYSIS OF COVARIANCE

10.34 Fall 2015 Metropolis Monte Carlo Algorithm

EFFECTS OF MAGNITUDE UNCERTAINTIES ON SEISMIC HAZARD ESTIMATES

Estimating Parameters of Sinusoids from Noisy Data Using Bayesian Inference with Simulated Annealing

Economics 130. Lecture 4 Simple Linear Regression Continued

Handling Overload (G. Buttazzo, Hard Real-Time Systems, Ch. 9) Causes for Overload

RELIABILITY ASSESSMENT

A MULTIPLE TIME SCALE SURVIVAL MODEL

On the Eigenspectrum of the Gram Matrix and the Generalisation Error of Kernel PCA (Shawe-Taylor, et al. 2005) Ameet Talwalkar 02/13/07

An Application of Fuzzy Hypotheses Testing in Radar Detection

Artificial Intelligence Bayesian Networks

On Reducing False Alarms in Multivariate Statistical Process Control

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

AN ANALYSIS OF A FRACTAL KINETICS CURVE OF SAVAGEAU

Research Article Green s Theorem for Sign Data

The Parity of the Number of Irreducible Factors for Some Pentanomials

Review: Fit a line to N data points

Conditionalization for Interval Probabilities

Revision: December 13, E Main Suite D Pullman, WA (509) Voice and Fax

y new = M x old Feature Selection: Linear Transformations Constraint Optimization (insertion)

Our focus will be on linear systems. A system is linear if it obeys the principle of superposition and homogenity, i.e.

SEMI-EMPIRICAL LIKELIHOOD RATIO CONFIDENCE INTERVALS FOR THE DIFFERENCE OF TWO SAMPLE MEANS

Maintenance Scheduling and Production Control of Multiple-Machine Manufacturing Systems

, are assumed to fluctuate around zero, with E( i) 0. Now imagine that this overall random effect, , is composed of many independent factors,

halftoning Journal of Electronic Imaging, vol. 11, no. 4, Oct Je-Ho Lee and Jan P. Allebach

Fast Computing Techniques for Bayesian Uncertainty Quantification in Structural Dynamics

Engineering Risk Benefit Analysis

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

EM and Structure Learning

Monte Carlo Event Generators

Lecture Slides for. ETHEM ALPAYDIN The MIT Press,

Collaborative Filtering Recommendation Algorithm

BINARY OPTIMIZATION: A RELATION BETWEEN THE DEPTH OF A LOCAL MINIMUM AND THE PROBABILITY OF ITS DETECTION

Cubature Kalman Particle Filters

A Knowledge-Based Feature Selection Method for Text Categorization

Hidden Markov Models

Degradation Data Analysis Using Wiener Process and MCMC Approach

Lecture Notes on Linear Regression

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

Transcription:

Markov Chan Monte-Carlo (MCMC) What for s t and what does t look lke? A. Favorov, 2003-2017 favorov@sens.org favorov@gal.co

Monte Carlo ethod: a fgure square The value s unknown. Let s saple a rando value (r.v.) : x, y:..d. as flat0,1 1 x, y 0 x, y 1 μ Clever notaton: I x, y..d. s Identcally Independently Dstrbuted 1 Expectaton of : E S S

Monte Carlo ethod: effcency Large Nubers Law: S 1 S 1 Central Lt Theore: 1 S S N 0,var Varance var 2 E E, also notated as 2.

Monte Carlo Integraton We are evaluatng We can saple r.v. I f xdx. D s doan of f D x D : The Monte Carlo estaton: I f x D I I N 0,varD f x x or ts subset. 1 E f x f x dx I D x are..d. unforly n D : D 1, D f x. Advantage: o The ultpler 1 2 does not depend on the space denson. D Dsadvantage: o a lot of saples are spent n the area where f o the varaton value x s sall; var D f x that deterne convergence te can be large.

Monte Carlo portance ntegraton We are evaluatng Let s saple I f x dx D x D fro a tral dstrbuton g x that looks lke f x 0 g x 0. x..d. n D as g x that resebles f x f x and Thus f x f x Eg g xdx f xdx. g x g x D D MC evaluaton: 1 f x I 1 g x ; 1 f x I I N 0,varD g x More unfor eans better.

Another exaple of portance ntegraton We are evaluatng xdx 1 saple x fro a dstrbuton E h x hx xdx, where Iportance weght g. so that x g x x w x g x ; g g x 0 0 x s a dstrbuton, e.g. 1 E w x g x dx 1 x 1 h x w x h x w x h x w x 1 g x 1 1 1 1 w x h x w x h x w x 1 1 1 1 Saplng fro x : h x 1

Rejecton saplng (Von Neuann, 1951) We have a dstrbuton we want to saple fro t. x and f x g x We are able to calculate f x c ( x ) for x. Any c. We are able to saple, : Thus, we can saple g x M Mg x f x. x : Draw a value x fro g x. Accept the value x wth the probablty f x Mg x. Mg x Paccept x P x c x M Paccept Mg x c c x c P accept P accept x P x dx g x dx P x accept g x x M x

Metropols-Hastngs algorth (1953,1970) ( ) We want to be able to draw x fro a dstrbuton x T x y (nstruental dstrbuton, transton kernel). ( ) Let s denote the -th step result as x.. We know how to copute the value of a functon f x so that f x x each pont and we are able to draw x fro at Draw ( ) ( ) y fro ( ) T y x. T y x s flat n pure Metropols. It s an analog of g n portance saplng. y Transton probablty ( ) ( ) ( ) T x y f y x n 1, ( ) ( ) ( ). T y x f x ( ) ( ) The new value s accepted ( 1) ( ) x y wth probablty ( ) ( ) y x ( 1) ( ). Otherwse, t s rejected x x.

Why does t work: the local balance Let s show that f x s already dstrbuted as keeps the dstrbuton. Local balance condton for two ponts x and y : x f x, then the MH algorth f x T y x y x f y T x y x y. Let s check t: T x y f y f xt y x y x f xt y x n 1, T y x f x T y x f x T x y f y f y T x y x y n, The balance s stable: f x T y x y x s the flow fro x to y and f y T x y x y s the flow fro y to x. The stable local balance s enough (BTW, t s not a necessary condton).

Markov chans, Maxzaton, Sulated Annealng x created as descrbed above s a Markov chan (MC) wth transton kernel ( 1) ( ) ( 1) ( ) x x T x x. The fact that the chan has a statonary dstrbuton and the convergence of the chan to the dstrbuton can be proved by the MC theory ethods. Mnzaton. C x s a cost (a fne). f x C x exp C t n. We can characterze the transton kernel wth a teperature. Then we can decrease the teperature step-by-step (sulated annealng). MCMC and SA are very effectve for optzaton snce gradent ethods use to be locked s a local axu whle pure MC s extreely neffectve.

MCMC pror and Bayesan paradg P( D M ) P( M ) P( M D) P( D M ) P( M ) P( D) posteror lkelhood pror here, evdence MCMC and ts varatons are often used for the best odel search. Let s can forulate soe requreents for the algorth and thus for the transton kernel: o We want t not to depend on the current data. o We want to nze the rejecton rate. So, an effectve transton kernel s so that the pror P( M ) s ts statonary dstrbuton.

Ternology: naes of relatve algorths o MCMC, Metropols, Metropols-Hastngs, hybrd Metropols, confguratonal bas Monte-Carlo, exchange Monte-Carlo, ultgrd Monte-Carlo (MGMC), slce saplng, RJMCMC (saples the densonalty of the space), Multple-Try Metropols, Hybrd Monte-Carlo.. o Sulated annealng, Monte-Carlo annealng, statstcal coolng, ubrella saplng, probablstc hll clbng, probablstc exchange algorth, parallel teperng, stochastc relaxaton. o Gbbs algorth, successve over-relaxaton

Gbbs Sapler (Gean and Gean, 1984) Now, x s a k -densonal varable x1, x2... x k. x x, x.., x, x,.. x,1 k Let s denote 1 2 1 1 k On each step of the Markov Chan we choose the current coordnate ( ) Then, we calculate the dstrbuton f x x and draw the next value. y fro the ( ) dstrbuton. All other coords are the sae as on the prevous step, y x. ( ) ( ) For such a transton kernel, ( ) ( ) ( ) T x y ( ) ( ) f y y x n 1, 1 ( ) ( ) ( ) T y x f x. o We have no rejects, so the procedure s very effectve. o The teperature decreases rather fast.

Inverse transfor saplng (well-known) We want to saple fro the densty functon for the cuulatve dstrbuton. x. We know how to calculate the nverse Generate a rando nuber fro the 0,1 unfor dstrbuton; call ths u. Copute the value x such that x x dx u x s the rando nuber that s drawn fro the dstrbuton descrbed by x. 1 0 x dx x, x x u, u u p x x unfor u u u p x unfor x x

Slce saplng (Neal, 2003) Saplng of x fro f x s equvalent to saplng of, x y pars fro they area. So, we ntroduce an auxlary varable y and terate as follows: for a saple x t we choose t gven y t we choose 1 the saple of x dstrbuted as y unforly fro the nterval f x x unforly at rando fro f 0, t x f x y : t x s obtaned by gnorng the y values.

Lterature Lu, J.S. (2002) Monte Carlo Strateges n Scentfc Coputng. Sprnger-Verlag, NY, Berln, Hedelberg. Robert, C.P. (1998) Dscretzaton and MCMC Convergence Assessent, Sprnger-Verlag. Laarhoven, van, P.M.J. and Aarts, E.H.L (1988) Sulated Annealng: Theory and Applcatons. Kluwer Acadec Publshers. Gean, S and Gean, D (1984). Stochastc relaxaton, Gbbs dstrbuton and the Bayesan restoraton of ages. IEEE Transactons on Pattern Analyss and Machne Intellgence. 6, 621-641. Besag, J., Green, P., Hgdon, D., and Mengersen, K. (1996) Bayesan coputaton and Stochastc Sytes. Statstcal Scence, 10, 1, 3-66. Lawrence, C.E., Altschul, S.F., Bogusk, M.S., Lu, J.S., Neuwald, A.F., and Wootton, J.C. (1993). Detectng subtle sequence sgnals: a Gbbs saplng strategy for ultple algnent. Scence 262, 208-214. Sva, D.S. (1996) Data Analyss. A Bayesan tutoral. Clarendon Press, Oxford. Neal, Radford M. (2003) Slce Saplng. The Annals of Statstcs 31(3):705-767. http://cvs.ucla.edu/mcmc/mcmc_tutoral.ht Sheldon Ross. A Frst Course n Probablty Соболь И.М. Метод Монте-Карло Soetes, t works Favorov, A.V., Andreewsk, T.V., Sudoona, M.A., Favorova O.O., Pargan, G. Ochs, M.F. (2005). A Markov chan Monte Carlo technque for dentfcaton of cobnatons of allelc varants underlyng coplex dseases n huans. Genetcs 171(4): 2113-21. Favorov, A.V., Gelfand, M.S., Gerasova, A.V. Ravcheev, D.A., Mronov, A.A., Makeev, V. J. (2005). A Gbbs sapler for dentfcaton of syetrcally structured, spaced DNA otfs wth proved estaton of the sgnal length. Bonforatcs 21(10): 2240-2245.