Important sampling the union of rare events, with an application to power systems analysis

Size: px

Start display at page:

Download "Important sampling the union of rare events, with an application to power systems analysis"

Lester Shaw
5 years ago
Views:

1 SAMSI Monte Carlo workshop 1 Important sampling the union of rare events, with an application to power systems analysis Art B. Owen Stanford University With Yury Maximov and Michael Chertkov of Los Alamos National Laboratory.

2 SAMSI Monte Carlo workshop 2 MCQMC 2018 July 1 8, 2018 Rennes, France MC, QMC, MCMC, RQMC, SMC, MLMC, MIMC, Call for papers is now online:

3 SAMSI Monte Carlo workshop 3 Rare event sampling Motivation: an electrical grid has N nodes. Power p 1, p 2,, p N Random p i > 0, e.g., wind generation, Random p i < 0, e.g. consumption, Fixed p i for controllable nodes, AC phase angles θ i (θ 1,..., θ N ) = F(p 1,..., p N ) Constraints: θ i θ j < θ if i j ( Find P max i j θ i θ j θ ) Simplified model (p 1,..., p N ) Gaussian θ linear in p 1,..., p N

4 SAMSI Monte Carlo workshop 4 Gaussian setup For x N (0, I d ), H j = {x ω T j x τ j } find µ = P(x J j=1h j ). WLOG ω j = 1. Ordinarily τ j > 0. d hundreds. J thousands. c( 7, 6) Solid: deciles of x. Dashed:

5 SAMSI Monte Carlo workshop 5 Basic bounds Let P j P(ω T j x τ j ) = Φ( τ j ) Then max 1 j J P j µ µ µ Inclusion-exclusion J j=1 P j For u 1:J = {1, 2,..., J}, let H u = j u H j H u (x) 1{x H u } P u = P(H u ) = E(H u (x)) µ = ( 1) u 1 P u u >0 Plethora of bounds Survey by Yang, Alajaji & Takahara (2014)

6 SAMSI Monte Carlo workshop 6 Other uses Other engineering reliability False discoveries in statistics: J correlated test statistics Speculative: inference after model selection Taylor, Fithian, Markovic, Tian

7 SAMSI Monte Carlo workshop 7 Importance sampling For x p, seek η = E p (f(x)) = f(x)p(x) dx. Take ˆη = 1 n n i=1 f(x i )p(x i ) iid, x i q q(x i ) Unbiased if q(x) > 0 whenever f(x)p(x) 0. Var(ˆη) = σ 2 q/n, where Variance f 2 p 2 (fp µq) 2 σ 2 q = q µ 2 = = q Num: seek q fp/µ, i.e., nearly proportional to fp Den: watch out for small q.

8 SAMSI Monte Carlo workshop 8 Self-normalized IS ˆη SNIS = n i=1 f(x i)p(x i )/q(x i ) n i=1 p(x i)/q(x i ) Available for unnormalized p and / or q. Good for Bayes, limited effectiveness for rare events. Optimal SNIS puts 1/2 the samples in the rare event. Optimal plain IS puts all samples there. Best possible asymptotic coefficient of variation is 2/ n.

9 SAMSI Monte Carlo workshop 9 For α j 0, j α j = 1 Mixture sampling ˆη α = 1 n n i=1 f(x i )p(x i ) j α jq j (x i ), x i Defensive mixtures iid qα j α j q j Take q 0 = p, α 0 > 0. Get p/q α 1/α 0. Hesterberg (1995) Additional refs Use q j = 1 as control variates. O & Zhou (2000). Optimization over α is convex. He & O (2014). Multiple IS. Veach & Guibas (1994). Veach s Oscar. Elvira, Martino, Luengo, Bugallo (2015) Generalizations.

10 SAMSI Monte Carlo workshop 10 Instantons µ j = arg max x Hj p(x) Chertkov, Pan, Stepanov (2011) c( 7, 6) Solid dots µ j Initial thought: q j = N (µ j, I) and q α = j α jq j I.e., mixture of exponential tilting Conditional sampling q j = L(x x H j ) = p(x)h j (x)/p j

11 SAMSI Monte Carlo workshop 11 Mixture of conditional sampling α 0, α 1,..., α J 0, α j = 1, q α = α j q j, q 0 p j j µ = P(x J j=1h j ) = P(x H 1:J ) = E ( H 1:J (x) ) Mixture IS ˆµ α = 1 n = 1 n n i=1 n i=1 H 1:J (x i )p(x i ) J j=0 α jq j (x i ) H 1:J (x i ) J j=0 α jh j (x i )/P j (where q j = ph j /P j ) with H 0 1 and P 0 = 1. It would be nice to have α j /P j constant.

12 SAMSI Monte Carlo workshop 12 ALORE At Least One Rare Event Like AMIS of Cornuet, Marin, Mira, Robert (2012) Put α 0 = 0, and take α j P j. That is α j = P j J j =1 P j = P j µ, ( µ is the union bound ) ˆµ α = 1 n n i=1 Then H 1:J (x i ) J j=1 α jh j (x i )/P j = µ 1 n n i=1 H 1:J (x i ) J j=1 H j(x i ) If x q α then H 1:J (x) = 1. So ˆµ α = µ 1 n n i=1 1 S(x i ), J S(x) H j (x) j=1

13 SAMSI Monte Carlo workshop 13 Prior work Adler, Blanchet, Liu (2008, 2012) estimate ( ) w(b) = P max f(t) > b t T for a Gaussian random field f(t) over T R d. They also consider a finite set T = {t 1,..., t N }. Comparisons We use the same mixture estimate as them for finite T and Gaussian data. Their analysis is for Gaussian distributions. We consider arbitrary sets of J events. They take limits as b. We have non-asymptotic bounds and more general limits. Our analysis is limited to finite sets. They handle extrema over a continuum.

14 SAMSI Monte Carlo workshop 14 O, Maximov & Cherkov (2017) Theorem Let H 1,..., H J be events defined by x p. Let q j (x) = p(x)h j (x)/p j for P j = P(x H j ). iid Let x i qα = J j=1 α j q j for αj = P j/ µ. Take ˆµ = µ 1 n Var(ˆµ) = 1 n where T s P(S(x) = s). The RHS follows because n i=1 1 S(x i ), S(x i) = Then E(ˆµ) = µ and ( J s=1 µ J s=1 T s s µ2 T s s J s=1 ) T s = µ. J H j (x i ) j=1 µ( µ µ) n

15 SAMSI Monte Carlo workshop 15 Remarks With S(x) equal to the number of rare events at x, ˆµ = µ 1 n n i=1 1 S(x i ) 1 S(x) J = µ J ˆµ µ Robustness The usual problem with rare event estimation is getting no rare events in n tries. Then ˆµ = 0. Here the corresponding failure is never seeing S 2 rare events. Then ˆµ = µ, an upper bound and probably fairly accurate if all S(x i ) = 1 Conditioning vs sampling from N (µ j, I) Avoids wasting samples outside the failure zone. Avoids awkward likelihood ratios.

16 SAMSI Monte Carlo workshop 16 General Gaussians For y N (η, Σ) let the event be γ T j y κ j. Translate into x T ω j τ j for ω j = γt j Σ1/2 γ T j Σγ j and τ j = κ j γ T j η γ T j Σγ j Adler et al. Their context has all κ j = b

17 SAMSI Monte Carlo workshop 17 Get x N (0, I), such that x T ω τ. y will be x T ω 1) Sample u U(0, 1). Sampling 2) Let y = Φ 1 (Φ(τ) + u(1 Φ(τ))). May easily get y =. 3) Sample z N (0, I). 4) Deliver x = ωy + (I ωω T )z. 1) Sample u U(0, 1). 2) Let y = Φ 1 (uφ( τ)). 3) Sample z N (0, I). 4) Let x = ωy + (I ωω T )z. 5) Deliver x = x. Better numerics from I.E., sample x N (0, I) subject to x T ω τ and deliver x. Step 4 via ωy + z ω(ω T z).

18 SAMSI Monte Carlo workshop 18 More bounds Recall that T s = P(S(x) = s). Therefore ( J J ) µ = P j = E H j (x) = E(S) = j=1 j=1 ( ) µ = E max H j(x) 1 j J = P(S > 0), so J st s s=1 µ = E(S S > 0) P(S > 0) = µ E(S S > 0) n Var(ˆµ) = µ J s=1 Therefore T s s µ2 = ( µe(s S > 0) )( µe(s 1 S > 0) ) µ 2 = µ 2( E(S S > 0)E(S 1 S > 0) 1 )

19 SAMSI Monte Carlo workshop 19 Var(ˆµ) 1 n Bounds continued ( ) E(S S > 0) E(S 1 S > 0) 1 Lemma Let S be a random variable in {1, 2,..., J} for J N. Then E(S)E(S 1 ) J + J with equality if and only if S U{1, J}. Var(ˆµ) µ2 n Corollary J + J Thanks to Yanbo Tang and Jeffrey Negrea for an improved proof of the lemma.

20 SAMSI Monte Carlo workshop 20 Numerical comparison R function mvtnorm of Genz & Bretz (2009) gets P ( a y b ) = P ( j {a j y j b j } ) for y N (η, Σ) Their code makes sophisticated use of quasi-monte Carlo. Adaptive, up to 25,000 evals in FORTRAN. It was not designed for rare events. It computes an intersection. Usage Pack ω j into Ω and τ j into T 1 µ = P(Ω T x T ) = P(y T ), y N (0, Ω T Ω) It can handle up to 1000 inputs. IE J 1000 Upshot ALORE works (much) better for rare events. mvtnorm works better for non-rare events.

21 SAMSI Monte Carlo workshop 21 Polygon example P(J, τ) regular J sided polygon in R 2 outside circle of radius τ > 0. P = x sin(2πj/j) cos(2πj/j) T x τ, j = 1,..., J a priori bounds µ = P(x P c ) P(χ 2 (2) τ 2 ) = exp( τ 2 /2) This is pretty tight. A trigonometric argument gives 1 µ exp( τ 2 /2) 1 ( ( J tan π ) ) J π τ 2 2π Let s use J = 360. = 1 π2 τ 2 6J 2 So µ. = exp( τ 2 /2) for reasonable τ.

22 SAMSI Monte Carlo workshop 22 IS vs MVN ALORE had n = MVN had up to 25, random repeats. τ µ E((ˆµ ALORE /µ 1) 2 ) E((ˆµ MVN /µ 1) 2 ) For τ = 5 MVN had a few outliers.

23 SAMSI Monte Carlo workshop 23 Polygon again ALORE Pmvnorm Frequency Frequency Relative estimate Relative estimate Results of 100 estimates of the P(x P(360, 6)), divided by exp( 6 2 /2). Left panel: ALORE. Right panel: pmvnorm.

24 SAMSI Monte Carlo workshop 24 Symmetry Maybe the circumscribed polygon is too easy due to symmetry. Redo it with just ω j = (cos(2πj/j), sin(2πj/j)) T for the 72 prime numbers j {1, 2,..., 360}. For τ = 6 variance of ˆµ/ exp( 18) is for ALORE (n = 1000), and 8.5 for pmvnorm.

25 SAMSI Monte Carlo workshop 25 High dimensional half spaces Take random ω j R d for large d. By concentration of measure they re nearly orthogonal. µ. = 1 J P j ) = j=1(1. P j = µ j Simulation (for rare events) Dimension d U{20, 50, 100, 200, 500} Constraints J U{d/2, d, 2d} τ such that log 10 ( µ) U[4, 8] (then rounded to 2 places).

26 SAMSI Monte Carlo workshop 26 High dimensional results Estimate / Bound ALORE pmvnorm 1e 08 1e 07 1e 06 1e 05 1e 04 Union bound Results of 200 estimates of the µ for varying high dimensional problems with nearly independent events.

27 SAMSI Monte Carlo workshop 27 Power systems Network with N nodes called busses and M edges. Typically M/N is not very large. Power p i at bus i. Each bus is Fixed or Random or Slack: slack bus S has p S = i S p i. p = (p T F, p T R, p S ) T R N Randomness driven by p R N (η R, Σ RR ) Inductances B ij A Laplacian B ij 0 if i j in the network. B ii = j i B ij. Taylor approximation θ = B + p (pseudo inverse) Therefore θ is Gaussian as are all θ i θ j for i j.

28 SAMSI Monte Carlo workshop 28 Examples Our examples come from MATPOWER (a Matlab toolbox). Zimmernan, Murillo-Sánchez & Thomas (2011) We used n = 10,000. Polish winter peak grid 2383 busses, d = 326 random busses, J = 5772 phase constraints. ω ˆµ se/ˆµ µ µ π/ π/ π/ π/ ω is the phase constraint, ˆµ is the ALORE estimate, se is the estimated standard error, µ is the largest single event probability and µ is the union bound.

29 SAMSI Monte Carlo workshop 29 Pegase 2869 Fliscounakis et al. (2013) large part of the European system N = 2869 busses. d = 509 random busses. J = 7936 phase constraints. Results ω ˆµ se/ˆµ µ µ π/ π/ π/ π/ Notes θ = π/2 is unrealistically large. We got ˆµ = µ. All 10,000 samples had S = 1. Some sort of Wilson interval or Bayesian approach could help. One half space was sampled 9408 times, another 592 times.

30 SAMSI Monte Carlo workshop 30 Other models IEEE case 14 and IEEE case 300 and Pegase 1354 were all dominated by one failure mode so µ. = µ and no sampling is needed. Another model had random power corresponding to wind generators but phase failures were not rare events in that model. The Pegase model was too large for our computer. The Laplacian had 37,250 rows and columns. The Pegase 9241 model was large and slow and phase failure was not a rare event. Caveats We used a DC approximation to AC power flow (which is common) and the phase estimates were based on a Taylor approximation.

31 SAMSI Monte Carlo workshop 31 Next steps 1) Nonlinear boundaries 2) Non-Gaussian models 3) Optimizing cost subject to a constraint on µ Sampling half-spaces will work if we have convex failure regions and a log-concave nominal density.

32 SAMSI Monte Carlo workshop 32 Thanks Michael Chertkov and Yury Maximov, co-authors Center for Nonlinear Studies at LANL, for hospitality Alan Genz for help on mvtnorm Bert Zwart, pointing out Adler et al. Yanbo Tang and Jeffrey Negrea, improved Lemma proof NSF DMS & DMS DOE/GMLC 2.0 Project: Emergency monitoring and controls through new technologies and analytics Jianfeng Lu, Ilse Ipsen Sue McDonald, Kerem Jackson, Thomas Gehrmann

33 SAMSI Monte Carlo workshop 33 Bonus topic Thinning MCMC output: It really can improve statistical efficiency. Short story If it costs 1 unit to advance x i 1 x i and θ > 0 units to compute y i = f(x i ) then thinning to every k th value lets us get larger n and less variance if ACF(y i ) decays slowly.

34 SAMSI Monte Carlo workshop 34 MCMC (notation for) A simple MCMC generates: x i = ϕ(x i 1, u i ) R d, u i U(0, 1) m More general ones use u i R m i where m i may be random. E.g., step i consume m i uniform random variables. The function ϕ is constructed so x i has desired stationary distribution π. µ = We approximate f(x)π(x) dx by ˆµ = 1 n n f(x i ) i=1 For simplicity, ignore burn-in / warmup.

35 SAMSI Monte Carlo workshop 35 Thinning Get x ki and f(x ki ) for i = 1,..., n and k 1. Thinning by a factor k usually reduces autocorrelations. Then f(x ki ) are more nearly IID. Thinning can also save storage x i.

36 SAMSI Monte Carlo workshop 36 Statistical efficiency Let y i = f(x i ) R. Geyer (1992) shows that for k > 1 Link & Eaton (2011): ( 1 Var n n i=1 ) ( 1 y i Var n/k Quotes n/k i=1 y ki ) Thinning is often unnecessary and always inefficient. MacEachern & Berliner (1994): This article provides a justification of the ban against sub-sampling the output of a stationary Markov chain that is suitable for presentation in undergraduate and beginning graduate-level courses. Gamerman & Lopes (2006) on thinning the Gibbs sampler: There is no gain in efficiency, however, by this approach and estimation is shown below to be always less precise than retaining all chain values.

37 SAMSI Monte Carlo workshop 37 Not so fast The analysis assumes that we compute f(x i ) and only use every k th one. Suppose that it costs 1 unit to advance the chain: x i x i+1. θ units to compute y i = f(x). If we thin the chain we compute f less often and can use larger n. When it pays If θ is large and Corr(y i, y i+k ) decays slowly, then thinning can be much more efficienty. When can that happen? E.g., x describes a set of particles and f computes interpoint distances. Updating f will cost proportionally to updating x, maybe much more.

38 SAMSI Monte Carlo workshop 38 Thinned estimate ˆµ k = 1 n k n k kn k advances x i x i+1 and n k computations x i y i = f(x i ). i=1 f(x ik ) Budget: kn k + θn k B n k = B k + θ B k + θ. Variance Var(ˆµ k ) =. ( σ n k ρ kl ), ρ l = Corr(y t, y t+l ). l=1

39 SAMSI Monte Carlo workshop 39 Efficiency eff(k) = Var(ˆµ 1) Var(ˆµ k ). = NB: n k < n 1 but ordinarily l=1 ρ l > ρ kl. ( σ 2 n l=1 ρ ) l ( σ 2 n k l=1 ρ ) = n ( k l=1 ρ l) ( kl n l=1 ρ ) kl n k n 1 Sample size ratio B/(k + θ) B/(1 + θ) = 1 + θ k + θ Therefore eff(k) = (1 + θ)( l=1 ρ ) l (k + θ) ( l=1 ρ ) kl

40 SAMSI Monte Carlo workshop 40 Autocorrelation models ACF plots often resemble AR(1): ρ l = ρ l, 0 < ρ < 1 E.g., figures in Jackman (2009). Newman & Barkema (1999): the autocorrelation is expected to fall off exponentially at long times. Geyer (1991) notes exponential upper bound under ρ-mixing. Monotone non-negative autocorrelations ρ 1 ρ 2 ρ l ρ l+1 0

41 SAMSI Monte Carlo workshop 41 Under the AR(1) model eff(k) = eff AR (k) = (1 + θ)( l=1 ρl) (k + θ) ( l=1 ρkl) = (1 + θ)( 1 + 2ρ/(1 ρ) ). (k + θ) ( 1 + 2ρ k /(1 ρ k ) ) = 1 + θ k + θ 1 + ρ 1 ρ For large θ and ρ near 1, thinning will be very efficient. 1 ρ k 1 + ρ k

42 SAMSI Monte Carlo workshop 42 Optimal thinning factor k θ \ ρ θ is the cost to compute y = f(x). ρ is the autocorrelation.

43 SAMSI Monte Carlo workshop 43 Efficiency of optimal k θ \ ρ Versus k = 1.

44 SAMSI Monte Carlo workshop 44 Least k for 95% efficiency θ \ ρ Table 1: Smallest k to give at least 95% of the efficiency of the most efficient k, as a function of θ and the autoregression parameter ρ.

45 SAMSI Monte Carlo workshop 45 Additional Thinning can pay when ρ 1 > 0 and ρ 1 ρ 2 0. θ > 0 reduces the optimal acceptance rate from Jeffrey Rosenthal has a quantitative version.

arxiv: v1 [stat.co] 19 Oct 2017

arxiv: v1 [stat.co] 19 Oct 2017 Importance sampling the union of rare events with an application to power systems analysis arxiv:1710.06965v1 [stat.co] 19 Oct 2017 Art B. Owen Stanford University Yury Maximov Los Alamos National Laboratory