Important sampling the union of rare events, with an application to power systems analysis
|
|
- Lester Shaw
- 5 years ago
- Views:
Transcription
1 SAMSI Monte Carlo workshop 1 Important sampling the union of rare events, with an application to power systems analysis Art B. Owen Stanford University With Yury Maximov and Michael Chertkov of Los Alamos National Laboratory.
2 SAMSI Monte Carlo workshop 2 MCQMC 2018 July 1 8, 2018 Rennes, France MC, QMC, MCMC, RQMC, SMC, MLMC, MIMC, Call for papers is now online:
3 SAMSI Monte Carlo workshop 3 Rare event sampling Motivation: an electrical grid has N nodes. Power p 1, p 2,, p N Random p i > 0, e.g., wind generation, Random p i < 0, e.g. consumption, Fixed p i for controllable nodes, AC phase angles θ i (θ 1,..., θ N ) = F(p 1,..., p N ) Constraints: θ i θ j < θ if i j ( Find P max i j θ i θ j θ ) Simplified model (p 1,..., p N ) Gaussian θ linear in p 1,..., p N
4 SAMSI Monte Carlo workshop 4 Gaussian setup For x N (0, I d ), H j = {x ω T j x τ j } find µ = P(x J j=1h j ). WLOG ω j = 1. Ordinarily τ j > 0. d hundreds. J thousands. c( 7, 6) Solid: deciles of x. Dashed:
5 SAMSI Monte Carlo workshop 5 Basic bounds Let P j P(ω T j x τ j ) = Φ( τ j ) Then max 1 j J P j µ µ µ Inclusion-exclusion J j=1 P j For u 1:J = {1, 2,..., J}, let H u = j u H j H u (x) 1{x H u } P u = P(H u ) = E(H u (x)) µ = ( 1) u 1 P u u >0 Plethora of bounds Survey by Yang, Alajaji & Takahara (2014)
6 SAMSI Monte Carlo workshop 6 Other uses Other engineering reliability False discoveries in statistics: J correlated test statistics Speculative: inference after model selection Taylor, Fithian, Markovic, Tian
7 SAMSI Monte Carlo workshop 7 Importance sampling For x p, seek η = E p (f(x)) = f(x)p(x) dx. Take ˆη = 1 n n i=1 f(x i )p(x i ) iid, x i q q(x i ) Unbiased if q(x) > 0 whenever f(x)p(x) 0. Var(ˆη) = σ 2 q/n, where Variance f 2 p 2 (fp µq) 2 σ 2 q = q µ 2 = = q Num: seek q fp/µ, i.e., nearly proportional to fp Den: watch out for small q.
8 SAMSI Monte Carlo workshop 8 Self-normalized IS ˆη SNIS = n i=1 f(x i)p(x i )/q(x i ) n i=1 p(x i)/q(x i ) Available for unnormalized p and / or q. Good for Bayes, limited effectiveness for rare events. Optimal SNIS puts 1/2 the samples in the rare event. Optimal plain IS puts all samples there. Best possible asymptotic coefficient of variation is 2/ n.
9 SAMSI Monte Carlo workshop 9 For α j 0, j α j = 1 Mixture sampling ˆη α = 1 n n i=1 f(x i )p(x i ) j α jq j (x i ), x i Defensive mixtures iid qα j α j q j Take q 0 = p, α 0 > 0. Get p/q α 1/α 0. Hesterberg (1995) Additional refs Use q j = 1 as control variates. O & Zhou (2000). Optimization over α is convex. He & O (2014). Multiple IS. Veach & Guibas (1994). Veach s Oscar. Elvira, Martino, Luengo, Bugallo (2015) Generalizations.
10 SAMSI Monte Carlo workshop 10 Instantons µ j = arg max x Hj p(x) Chertkov, Pan, Stepanov (2011) c( 7, 6) Solid dots µ j Initial thought: q j = N (µ j, I) and q α = j α jq j I.e., mixture of exponential tilting Conditional sampling q j = L(x x H j ) = p(x)h j (x)/p j
11 SAMSI Monte Carlo workshop 11 Mixture of conditional sampling α 0, α 1,..., α J 0, α j = 1, q α = α j q j, q 0 p j j µ = P(x J j=1h j ) = P(x H 1:J ) = E ( H 1:J (x) ) Mixture IS ˆµ α = 1 n = 1 n n i=1 n i=1 H 1:J (x i )p(x i ) J j=0 α jq j (x i ) H 1:J (x i ) J j=0 α jh j (x i )/P j (where q j = ph j /P j ) with H 0 1 and P 0 = 1. It would be nice to have α j /P j constant.
12 SAMSI Monte Carlo workshop 12 ALORE At Least One Rare Event Like AMIS of Cornuet, Marin, Mira, Robert (2012) Put α 0 = 0, and take α j P j. That is α j = P j J j =1 P j = P j µ, ( µ is the union bound ) ˆµ α = 1 n n i=1 Then H 1:J (x i ) J j=1 α jh j (x i )/P j = µ 1 n n i=1 H 1:J (x i ) J j=1 H j(x i ) If x q α then H 1:J (x) = 1. So ˆµ α = µ 1 n n i=1 1 S(x i ), J S(x) H j (x) j=1
13 SAMSI Monte Carlo workshop 13 Prior work Adler, Blanchet, Liu (2008, 2012) estimate ( ) w(b) = P max f(t) > b t T for a Gaussian random field f(t) over T R d. They also consider a finite set T = {t 1,..., t N }. Comparisons We use the same mixture estimate as them for finite T and Gaussian data. Their analysis is for Gaussian distributions. We consider arbitrary sets of J events. They take limits as b. We have non-asymptotic bounds and more general limits. Our analysis is limited to finite sets. They handle extrema over a continuum.
14 SAMSI Monte Carlo workshop 14 O, Maximov & Cherkov (2017) Theorem Let H 1,..., H J be events defined by x p. Let q j (x) = p(x)h j (x)/p j for P j = P(x H j ). iid Let x i qα = J j=1 α j q j for αj = P j/ µ. Take ˆµ = µ 1 n Var(ˆµ) = 1 n where T s P(S(x) = s). The RHS follows because n i=1 1 S(x i ), S(x i) = Then E(ˆµ) = µ and ( J s=1 µ J s=1 T s s µ2 T s s J s=1 ) T s = µ. J H j (x i ) j=1 µ( µ µ) n
15 SAMSI Monte Carlo workshop 15 Remarks With S(x) equal to the number of rare events at x, ˆµ = µ 1 n n i=1 1 S(x i ) 1 S(x) J = µ J ˆµ µ Robustness The usual problem with rare event estimation is getting no rare events in n tries. Then ˆµ = 0. Here the corresponding failure is never seeing S 2 rare events. Then ˆµ = µ, an upper bound and probably fairly accurate if all S(x i ) = 1 Conditioning vs sampling from N (µ j, I) Avoids wasting samples outside the failure zone. Avoids awkward likelihood ratios.
16 SAMSI Monte Carlo workshop 16 General Gaussians For y N (η, Σ) let the event be γ T j y κ j. Translate into x T ω j τ j for ω j = γt j Σ1/2 γ T j Σγ j and τ j = κ j γ T j η γ T j Σγ j Adler et al. Their context has all κ j = b
17 SAMSI Monte Carlo workshop 17 Get x N (0, I), such that x T ω τ. y will be x T ω 1) Sample u U(0, 1). Sampling 2) Let y = Φ 1 (Φ(τ) + u(1 Φ(τ))). May easily get y =. 3) Sample z N (0, I). 4) Deliver x = ωy + (I ωω T )z. 1) Sample u U(0, 1). 2) Let y = Φ 1 (uφ( τ)). 3) Sample z N (0, I). 4) Let x = ωy + (I ωω T )z. 5) Deliver x = x. Better numerics from I.E., sample x N (0, I) subject to x T ω τ and deliver x. Step 4 via ωy + z ω(ω T z).
18 SAMSI Monte Carlo workshop 18 More bounds Recall that T s = P(S(x) = s). Therefore ( J J ) µ = P j = E H j (x) = E(S) = j=1 j=1 ( ) µ = E max H j(x) 1 j J = P(S > 0), so J st s s=1 µ = E(S S > 0) P(S > 0) = µ E(S S > 0) n Var(ˆµ) = µ J s=1 Therefore T s s µ2 = ( µe(s S > 0) )( µe(s 1 S > 0) ) µ 2 = µ 2( E(S S > 0)E(S 1 S > 0) 1 )
19 SAMSI Monte Carlo workshop 19 Var(ˆµ) 1 n Bounds continued ( ) E(S S > 0) E(S 1 S > 0) 1 Lemma Let S be a random variable in {1, 2,..., J} for J N. Then E(S)E(S 1 ) J + J with equality if and only if S U{1, J}. Var(ˆµ) µ2 n Corollary J + J Thanks to Yanbo Tang and Jeffrey Negrea for an improved proof of the lemma.
20 SAMSI Monte Carlo workshop 20 Numerical comparison R function mvtnorm of Genz & Bretz (2009) gets P ( a y b ) = P ( j {a j y j b j } ) for y N (η, Σ) Their code makes sophisticated use of quasi-monte Carlo. Adaptive, up to 25,000 evals in FORTRAN. It was not designed for rare events. It computes an intersection. Usage Pack ω j into Ω and τ j into T 1 µ = P(Ω T x T ) = P(y T ), y N (0, Ω T Ω) It can handle up to 1000 inputs. IE J 1000 Upshot ALORE works (much) better for rare events. mvtnorm works better for non-rare events.
21 SAMSI Monte Carlo workshop 21 Polygon example P(J, τ) regular J sided polygon in R 2 outside circle of radius τ > 0. P = x sin(2πj/j) cos(2πj/j) T x τ, j = 1,..., J a priori bounds µ = P(x P c ) P(χ 2 (2) τ 2 ) = exp( τ 2 /2) This is pretty tight. A trigonometric argument gives 1 µ exp( τ 2 /2) 1 ( ( J tan π ) ) J π τ 2 2π Let s use J = 360. = 1 π2 τ 2 6J 2 So µ. = exp( τ 2 /2) for reasonable τ.
22 SAMSI Monte Carlo workshop 22 IS vs MVN ALORE had n = MVN had up to 25, random repeats. τ µ E((ˆµ ALORE /µ 1) 2 ) E((ˆµ MVN /µ 1) 2 ) For τ = 5 MVN had a few outliers.
23 SAMSI Monte Carlo workshop 23 Polygon again ALORE Pmvnorm Frequency Frequency Relative estimate Relative estimate Results of 100 estimates of the P(x P(360, 6)), divided by exp( 6 2 /2). Left panel: ALORE. Right panel: pmvnorm.
24 SAMSI Monte Carlo workshop 24 Symmetry Maybe the circumscribed polygon is too easy due to symmetry. Redo it with just ω j = (cos(2πj/j), sin(2πj/j)) T for the 72 prime numbers j {1, 2,..., 360}. For τ = 6 variance of ˆµ/ exp( 18) is for ALORE (n = 1000), and 8.5 for pmvnorm.
25 SAMSI Monte Carlo workshop 25 High dimensional half spaces Take random ω j R d for large d. By concentration of measure they re nearly orthogonal. µ. = 1 J P j ) = j=1(1. P j = µ j Simulation (for rare events) Dimension d U{20, 50, 100, 200, 500} Constraints J U{d/2, d, 2d} τ such that log 10 ( µ) U[4, 8] (then rounded to 2 places).
26 SAMSI Monte Carlo workshop 26 High dimensional results Estimate / Bound ALORE pmvnorm 1e 08 1e 07 1e 06 1e 05 1e 04 Union bound Results of 200 estimates of the µ for varying high dimensional problems with nearly independent events.
27 SAMSI Monte Carlo workshop 27 Power systems Network with N nodes called busses and M edges. Typically M/N is not very large. Power p i at bus i. Each bus is Fixed or Random or Slack: slack bus S has p S = i S p i. p = (p T F, p T R, p S ) T R N Randomness driven by p R N (η R, Σ RR ) Inductances B ij A Laplacian B ij 0 if i j in the network. B ii = j i B ij. Taylor approximation θ = B + p (pseudo inverse) Therefore θ is Gaussian as are all θ i θ j for i j.
28 SAMSI Monte Carlo workshop 28 Examples Our examples come from MATPOWER (a Matlab toolbox). Zimmernan, Murillo-Sánchez & Thomas (2011) We used n = 10,000. Polish winter peak grid 2383 busses, d = 326 random busses, J = 5772 phase constraints. ω ˆµ se/ˆµ µ µ π/ π/ π/ π/ ω is the phase constraint, ˆµ is the ALORE estimate, se is the estimated standard error, µ is the largest single event probability and µ is the union bound.
29 SAMSI Monte Carlo workshop 29 Pegase 2869 Fliscounakis et al. (2013) large part of the European system N = 2869 busses. d = 509 random busses. J = 7936 phase constraints. Results ω ˆµ se/ˆµ µ µ π/ π/ π/ π/ Notes θ = π/2 is unrealistically large. We got ˆµ = µ. All 10,000 samples had S = 1. Some sort of Wilson interval or Bayesian approach could help. One half space was sampled 9408 times, another 592 times.
30 SAMSI Monte Carlo workshop 30 Other models IEEE case 14 and IEEE case 300 and Pegase 1354 were all dominated by one failure mode so µ. = µ and no sampling is needed. Another model had random power corresponding to wind generators but phase failures were not rare events in that model. The Pegase model was too large for our computer. The Laplacian had 37,250 rows and columns. The Pegase 9241 model was large and slow and phase failure was not a rare event. Caveats We used a DC approximation to AC power flow (which is common) and the phase estimates were based on a Taylor approximation.
31 SAMSI Monte Carlo workshop 31 Next steps 1) Nonlinear boundaries 2) Non-Gaussian models 3) Optimizing cost subject to a constraint on µ Sampling half-spaces will work if we have convex failure regions and a log-concave nominal density.
32 SAMSI Monte Carlo workshop 32 Thanks Michael Chertkov and Yury Maximov, co-authors Center for Nonlinear Studies at LANL, for hospitality Alan Genz for help on mvtnorm Bert Zwart, pointing out Adler et al. Yanbo Tang and Jeffrey Negrea, improved Lemma proof NSF DMS & DMS DOE/GMLC 2.0 Project: Emergency monitoring and controls through new technologies and analytics Jianfeng Lu, Ilse Ipsen Sue McDonald, Kerem Jackson, Thomas Gehrmann
33 SAMSI Monte Carlo workshop 33 Bonus topic Thinning MCMC output: It really can improve statistical efficiency. Short story If it costs 1 unit to advance x i 1 x i and θ > 0 units to compute y i = f(x i ) then thinning to every k th value lets us get larger n and less variance if ACF(y i ) decays slowly.
34 SAMSI Monte Carlo workshop 34 MCMC (notation for) A simple MCMC generates: x i = ϕ(x i 1, u i ) R d, u i U(0, 1) m More general ones use u i R m i where m i may be random. E.g., step i consume m i uniform random variables. The function ϕ is constructed so x i has desired stationary distribution π. µ = We approximate f(x)π(x) dx by ˆµ = 1 n n f(x i ) i=1 For simplicity, ignore burn-in / warmup.
35 SAMSI Monte Carlo workshop 35 Thinning Get x ki and f(x ki ) for i = 1,..., n and k 1. Thinning by a factor k usually reduces autocorrelations. Then f(x ki ) are more nearly IID. Thinning can also save storage x i.
36 SAMSI Monte Carlo workshop 36 Statistical efficiency Let y i = f(x i ) R. Geyer (1992) shows that for k > 1 Link & Eaton (2011): ( 1 Var n n i=1 ) ( 1 y i Var n/k Quotes n/k i=1 y ki ) Thinning is often unnecessary and always inefficient. MacEachern & Berliner (1994): This article provides a justification of the ban against sub-sampling the output of a stationary Markov chain that is suitable for presentation in undergraduate and beginning graduate-level courses. Gamerman & Lopes (2006) on thinning the Gibbs sampler: There is no gain in efficiency, however, by this approach and estimation is shown below to be always less precise than retaining all chain values.
37 SAMSI Monte Carlo workshop 37 Not so fast The analysis assumes that we compute f(x i ) and only use every k th one. Suppose that it costs 1 unit to advance the chain: x i x i+1. θ units to compute y i = f(x). If we thin the chain we compute f less often and can use larger n. When it pays If θ is large and Corr(y i, y i+k ) decays slowly, then thinning can be much more efficienty. When can that happen? E.g., x describes a set of particles and f computes interpoint distances. Updating f will cost proportionally to updating x, maybe much more.
38 SAMSI Monte Carlo workshop 38 Thinned estimate ˆµ k = 1 n k n k kn k advances x i x i+1 and n k computations x i y i = f(x i ). i=1 f(x ik ) Budget: kn k + θn k B n k = B k + θ B k + θ. Variance Var(ˆµ k ) =. ( σ n k ρ kl ), ρ l = Corr(y t, y t+l ). l=1
39 SAMSI Monte Carlo workshop 39 Efficiency eff(k) = Var(ˆµ 1) Var(ˆµ k ). = NB: n k < n 1 but ordinarily l=1 ρ l > ρ kl. ( σ 2 n l=1 ρ ) l ( σ 2 n k l=1 ρ ) = n ( k l=1 ρ l) ( kl n l=1 ρ ) kl n k n 1 Sample size ratio B/(k + θ) B/(1 + θ) = 1 + θ k + θ Therefore eff(k) = (1 + θ)( l=1 ρ ) l (k + θ) ( l=1 ρ ) kl
40 SAMSI Monte Carlo workshop 40 Autocorrelation models ACF plots often resemble AR(1): ρ l = ρ l, 0 < ρ < 1 E.g., figures in Jackman (2009). Newman & Barkema (1999): the autocorrelation is expected to fall off exponentially at long times. Geyer (1991) notes exponential upper bound under ρ-mixing. Monotone non-negative autocorrelations ρ 1 ρ 2 ρ l ρ l+1 0
41 SAMSI Monte Carlo workshop 41 Under the AR(1) model eff(k) = eff AR (k) = (1 + θ)( l=1 ρl) (k + θ) ( l=1 ρkl) = (1 + θ)( 1 + 2ρ/(1 ρ) ). (k + θ) ( 1 + 2ρ k /(1 ρ k ) ) = 1 + θ k + θ 1 + ρ 1 ρ For large θ and ρ near 1, thinning will be very efficient. 1 ρ k 1 + ρ k
42 SAMSI Monte Carlo workshop 42 Optimal thinning factor k θ \ ρ θ is the cost to compute y = f(x). ρ is the autocorrelation.
43 SAMSI Monte Carlo workshop 43 Efficiency of optimal k θ \ ρ Versus k = 1.
44 SAMSI Monte Carlo workshop 44 Least k for 95% efficiency θ \ ρ Table 1: Smallest k to give at least 95% of the efficiency of the most efficient k, as a function of θ and the autoregression parameter ρ.
45 SAMSI Monte Carlo workshop 45 Additional Thinning can pay when ρ 1 > 0 and ρ 1 ρ 2 0. θ > 0 reduces the optimal acceptance rate from Jeffrey Rosenthal has a quantitative version.
arxiv: v1 [stat.co] 19 Oct 2017
Importance sampling the union of rare events with an application to power systems analysis arxiv:1710.06965v1 [stat.co] 19 Oct 2017 Art B. Owen Stanford University Yury Maximov Los Alamos National Laboratory
More informationStatistically efficient thinning of a Markov chain sampler
Thinning Markov chains 1 Statistically efficient thinning of a Markov chain sampler Art B. Owen Stanford University From: article of same title, O (2017) JCGS Thinning Markov chains 2 Note These are slides
More informationThe square root rule for adaptive importance sampling
The square root rule for adaptive importance sampling Art B. Owen Stanford University Yi Zhou January 2019 Abstract In adaptive importance sampling, and other contexts, we have unbiased and uncorrelated
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationBrief introduction to Markov Chain Monte Carlo
Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical
More informationLecture 4: Dynamic models
linear s Lecture 4: s Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu
More informationAdaptive Rejection Sampling with fixed number of nodes
Adaptive Rejection Sampling with fixed number of nodes L. Martino, F. Louzada Institute of Mathematical Sciences and Computing, Universidade de São Paulo, São Carlos (São Paulo). Abstract The adaptive
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationOptimal mixture weights in multiple importance sampling
Optimal mixture weights in multiple importance sampling Hera Y. He Stanford University Art B. Owen Stanford University November 204 Abstract In multiple importance sampling we combine samples from a finite
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationAdaptive Rejection Sampling with fixed number of nodes
Adaptive Rejection Sampling with fixed number of nodes L. Martino, F. Louzada Institute of Mathematical Sciences and Computing, Universidade de São Paulo, Brazil. Abstract The adaptive rejection sampling
More information6 Markov Chain Monte Carlo (MCMC)
6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution
More informationMCMC Sampling for Bayesian Inference using L1-type Priors
MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling
More informationMonte Carlo Methods. Geoff Gordon February 9, 2006
Monte Carlo Methods Geoff Gordon ggordon@cs.cmu.edu February 9, 2006 Numerical integration problem 5 4 3 f(x,y) 2 1 1 0 0.5 0 X 0.5 1 1 0.8 0.6 0.4 Y 0.2 0 0.2 0.4 0.6 0.8 1 x X f(x)dx Used for: function
More informationCS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling
CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy
More informationBayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference
1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE
More informationStat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC
Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationAdaptive Monte Carlo methods
Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo
Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationSampling from complex probability distributions
Sampling from complex probability distributions Louis J. M. Aslett (louis.aslett@durham.ac.uk) Department of Mathematical Sciences Durham University UTOPIAE Training School II 4 July 2017 1/37 Motivation
More informationIntroduction to Rare Event Simulation
Introduction to Rare Event Simulation Brown University: Summer School on Rare Event Simulation Jose Blanchet Columbia University. Department of Statistics, Department of IEOR. Blanchet (Columbia) 1 / 31
More informationMARKOV CHAIN MONTE CARLO
MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with
More informationHastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model
UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced
More informationEco517 Fall 2013 C. Sims MCMC. October 8, 2013
Eco517 Fall 2013 C. Sims MCMC October 8, 2013 c 2013 by Christopher A. Sims. This document may be reproduced for educational and research purposes, so long as the copies contain this notice and are retained
More informationMCMC and Gibbs Sampling. Kayhan Batmanghelich
MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction
More informationDeblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.
Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox fox@physics.otago.ac.nz Richard A. Norton, J. Andrés Christen Topics... Backstory (?) Sampling in linear-gaussian hierarchical
More informationWeb Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.
Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we
More informationDavid Giles Bayesian Econometrics
David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically
More informationRobert Collins CSE586, PSU Intro to Sampling Methods
Robert Collins Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Robert Collins A Brief Overview of Sampling Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos Contents Markov Chain Monte Carlo Methods Sampling Rejection Importance Hastings-Metropolis Gibbs Markov Chains
More informationDoing Bayesian Integrals
ASTR509-13 Doing Bayesian Integrals The Reverend Thomas Bayes (c.1702 1761) Philosopher, theologian, mathematician Presbyterian (non-conformist) minister Tunbridge Wells, UK Elected FRS, perhaps due to
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationBayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems
Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems John Bardsley, University of Montana Collaborators: H. Haario, J. Kaipio, M. Laine, Y. Marzouk, A. Seppänen, A. Solonen, Z.
More informationExercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters
Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for
More informationA quick introduction to Markov chains and Markov chain Monte Carlo (revised version)
A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to
More informationECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering
ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Siwei Guo: s9guo@eng.ucsd.edu Anwesan Pal:
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More information2 Contents. Art Owen do not distribute or post electronically without author s permission
Contents 10 Advanced variance reduction 3 10.1 Grid-based stratification....................... 3 10.2 Stratification and antithetics.................... 6 10.3 Latin hypercube sampling......................
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationAsymptotic efficiency of simple decisions for the compound decision problem
Asymptotic efficiency of simple decisions for the compound decision problem Eitan Greenshtein and Ya acov Ritov Department of Statistical Sciences Duke University Durham, NC 27708-0251, USA e-mail: eitan.greenshtein@gmail.com
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationOptimization of a Convex Program with a Polynomial Perturbation
Optimization of a Convex Program with a Polynomial Perturbation Ravi Kannan Microsoft Research Bangalore Luis Rademacher College of Computing Georgia Institute of Technology Abstract We consider the problem
More informationStochastic optimization Markov Chain Monte Carlo
Stochastic optimization Markov Chain Monte Carlo Ethan Fetaya Weizmann Institute of Science 1 Motivation Markov chains Stationary distribution Mixing time 2 Algorithms Metropolis-Hastings Simulated Annealing
More informationDynamic System Identification using HDMR-Bayesian Technique
Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationSampling Methods (11/30/04)
CS281A/Stat241A: Statistical Learning Theory Sampling Methods (11/30/04) Lecturer: Michael I. Jordan Scribe: Jaspal S. Sandhu 1 Gibbs Sampling Figure 1: Undirected and directed graphs, respectively, with
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationUniversity of Toronto Department of Statistics
Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704
More informationBayesian Dynamic Linear Modelling for. Complex Computer Models
Bayesian Dynamic Linear Modelling for Complex Computer Models Fei Liu, Liang Zhang, Mike West Abstract Computer models may have functional outputs. With no loss of generality, we assume that a single computer
More informationSupplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements
Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Jeffrey N. Rouder Francis Tuerlinckx Paul L. Speckman Jun Lu & Pablo Gomez May 4 008 1 The Weibull regression model
More information19 : Slice Sampling and HMC
10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often
More informationAnswers and expectations
Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E
More informationSemi-Parametric Importance Sampling for Rare-event probability Estimation
Semi-Parametric Importance Sampling for Rare-event probability Estimation Z. I. Botev and P. L Ecuyer IMACS Seminar 2011 Borovets, Bulgaria Semi-Parametric Importance Sampling for Rare-event probability
More informationStat 451 Lecture Notes Numerical Integration
Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29
More informationSequential Monte Carlo Samplers for Applications in High Dimensions
Sequential Monte Carlo Samplers for Applications in High Dimensions Alexandros Beskos National University of Singapore KAUST, 26th February 2014 Joint work with: Dan Crisan, Ajay Jasra, Nik Kantas, Alex
More informationPrimer on statistics:
Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood
More informationMH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution
MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous
More informationAn Brief Overview of Particle Filtering
1 An Brief Overview of Particle Filtering Adam M. Johansen a.m.johansen@warwick.ac.uk www2.warwick.ac.uk/fac/sci/statistics/staff/academic/johansen/talks/ May 11th, 2010 Warwick University Centre for Systems
More informationMarkov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018
Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling
More informationSlice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method
Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at
More informationComputer Vision Group Prof. Daniel Cremers. 14. Sampling Methods
Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric
More informationMarkov chain Monte Carlo methods in atmospheric remote sensing
1 / 45 Markov chain Monte Carlo methods in atmospheric remote sensing Johanna Tamminen johanna.tamminen@fmi.fi ESA Summer School on Earth System Monitoring and Modeling July 3 Aug 11, 212, Frascati July,
More informationU Logo Use Guidelines
Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity
More informationExponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that
1 More examples 1.1 Exponential families under conditioning Exponential families also behave nicely under conditioning. Specifically, suppose we write η = η 1, η 2 R k R p k so that dp η dm 0 = e ηt 1
More informationLearning in graphical models & MC methods Fall Cours 8 November 18
Learning in graphical models & MC methods Fall 2015 Cours 8 November 18 Enseignant: Guillaume Obozinsi Scribe: Khalife Sammy, Maryan Morel 8.1 HMM (end) As a reminder, the message propagation algorithm
More informationStat 451 Lecture Notes Monte Carlo Integration
Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:
More informationGenerating Generalized Exponentially Distributed Random Variates with Transformed Density Rejection and Ratio-of-Uniform Methods
Generating Generalized Exponentially Distributed Random Variates with Transformed Density Rejection and Ratio-of-Uniform Methods Yik Yang Thesis submitted to the Faculty of the Virginia Polytechnic Institute
More informationSequential Monte Carlo Methods for Bayesian Computation
Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More informationMinicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics
Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture
More informationOn Bayesian Computation
On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationST 740: Markov Chain Monte Carlo
ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationMarkov Chain Monte Carlo (MCMC)
School of Computer Science 10-708 Probabilistic Graphical Models Markov Chain Monte Carlo (MCMC) Readings: MacKay Ch. 29 Jordan Ch. 21 Matt Gormley Lecture 16 March 14, 2016 1 Homework 2 Housekeeping Due
More informationMarkov chain Monte Carlo Lecture 9
Markov chain Monte Carlo Lecture 9 David Sontag New York University Slides adapted from Eric Xing and Qirong Ho (CMU) Limitations of Monte Carlo Direct (unconditional) sampling Hard to get rare events
More informationParsimonious Adaptive Rejection Sampling
Parsimonious Adaptive Rejection Sampling Luca Martino Image Processing Laboratory, Universitat de València (Spain). Abstract Monte Carlo (MC) methods have become very popular in signal processing during
More informationE cient Monte Carlo for Gaussian Fields and Processes
E cient Monte Carlo for Gaussian Fields and Processes Jose Blanchet (with R. Adler, J. C. Liu, and C. Li) Columbia University Nov, 2010 Jose Blanchet (Columbia) Monte Carlo for Gaussian Fields Nov, 2010
More informationIf we want to analyze experimental or simulated data we might encounter the following tasks:
Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction
More informationKatsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract
Bayesian analysis of a vector autoregressive model with multiple structural breaks Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus Abstract This paper develops a Bayesian approach
More informationMetropolis-Hastings Algorithm
Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to
More informationTime-Sensitive Dirichlet Process Mixture Models
Time-Sensitive Dirichlet Process Mixture Models Xiaojin Zhu Zoubin Ghahramani John Lafferty May 25 CMU-CALD-5-4 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 Abstract We introduce
More informationState-Space Methods for Inferring Spike Trains from Calcium Imaging
State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline
More informationSAMPLING ALGORITHMS. In general. Inference in Bayesian models
SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be
More informationControlled sequential Monte Carlo
Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 5 Sequential Monte Carlo methods I 31 March 2017 Computer Intensive Methods (1) Plan of today s lecture
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationLecture 6: Bayesian Inference in SDE Models
Lecture 6: Bayesian Inference in SDE Models Bayesian Filtering and Smoothing Point of View Simo Särkkä Aalto University Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 1 / 45 Contents 1 SDEs
More information