Approximating mixture distributions using finite numbers of components

Size: px

Start display at page:

Download "Approximating mixture distributions using finite numbers of components"

Morgan Bond
5 years ago
Views:

Approximating mixture distributions using finite numbers of components Christian Röver and Tim Friede Department of Medical Statistics University Medical Center Göttingen March 17, 2016 This project

1 Approximating mixture distributions using finite numbers of components Christian Röver and Tim Friede Department of Medical Statistics University Medical Center Göttingen March 17, 2016 This project has received funding from the European Union s Seventh Framework Programme for research, technological development and demonstration under grant agreement number FP HEALTH C. Röver, T. Friede Approximating mixture distributions March 17, / 23

2 Overview meta analysis example general problem: mixture distributions discrete grid approximations design strategy & algorithm application to meta analysis problem C. Röver, T. Friede Approximating mixture distributions March 17, / 23

3 Meta analysis Context: random-effects meta-analysis # 1 # 2 # 3 # 4 # 5 # 6 # 7 have: estimates y i standard errors σ i want: combined estimate ˆΘ nuisance parameter: between-trial C. Röver, T. Friede March 17, / 23

4 Meta analysis Context: random-effects meta-analysis # 1 # 2 # 3 # 4 # 5 # 6 # 7 have: estimates y i standard errors σ i want: combined estimate ˆΘ nuisance parameter: between-trial C. Röver, T. Friede March 17, / 23

5 Meta analysis Context: random-effects meta-analysis # 1 # 2 # 3 # 4 # 5 # 6 # 7 have: estimates y i standard errors σ i want: combined estimate ˆΘ Θ nuisance parameter: between-trial C. Röver, T. Friede March 17, / 23

6 Meta analysis Context: random-effects meta-analysis # 1 # 2 # 3 # 4 # 5 # 6 # 7 have: estimates y i standard errors σ i want: combined estimate ˆΘ Θ nuisance parameter: between-trial C. Röver, T. Friede March 17, / 23

7 Meta analysis Context: random-effects meta-analysis # 1 # 2 # 3 # 4 # 5 # 6 # 7 Θ marginal posterior p(θ) estimation: via marginal posterior distribution of parameterθ C. Röver, T. Friede March 17, / 23

8 Meta analysis Context: random-effects meta-analysis 50% 90% 95% 99% marginal posterior p(τ) marginal posterior p(θ) two parameters parameter estimation with two unknowns: joint & marginal posterior distributions C. Röver, T. Friede March 17, / 23

9 Meta analysis Context: random-effects meta-analysis here: easy to derive one of the marginals: p(τ y) and conditional posteriors p(θ τ, y) p(τ y) =... (... function of y i, σ i,... ) p(θ τ, y) = Normal(µ = f 1 (τ),σ = f 2 (τ)) but main interest in other marginal: p(θ y) p(θ y) = joint {}}{ p(θ,τ, y) dτ = p(θ τ, y) }{{} conditional p(τ y) }{{} marginal dτ is a mixture distribution C. Röver, T. Friede March 17, / 23

10 Meta analysis Context: random-effects meta-analysis marginal posterior p(τ) conditional posterior p(θ τ i) marginal posterior p(θ y) C. Röver, T. Friede March 17, / 23

11 Meta analysis Context: random-effects meta-analysis conditional mean + sd conditional mean conditional mean sd marginal posterior p(τ) conditional posterior p(θ τ i) marginal posterior p(θ y) C. Röver, T. Friede March 17, / 23

12 Meta analysis Context: random-effects meta-analysis τ 1 marginal posterior p(τ) τ conditional posterior p(θ τ i) τ = τ 1 marginal posterior p(θ y) C. Röver, T. Friede March 17, / 23

13 Meta analysis Context: random-effects meta-analysis τ 1 τ 2 τ 3 τ 4 marginal posterior p(τ) τ 1 τ 2 τ 3 τ conditional posterior p(θ τ i) τ = τ 1 τ = τ 2 τ = τ 3 τ = τ 4 marginal posterior p(θ y) C. Röver, T. Friede March 17, / 23

14 Meta analysis Context: random-effects meta-analysis τ 1 τ 2 τ 3 τ 4 marginal posterior p(τ) τ 1 τ 2 τ 3 τ conditional posterior p(θ τ i) τ = τ 1 τ = τ 2 τ = τ 3 τ = τ 4 marginal posterior p(θ y) C. Röver, T. Friede March 17, / 23

15 Mixture distributions The general problem mixture distribution: a convex combination of component distributions a distribution whose parameters are random variables ( conditional ) distribution with density p(y x) parameter x follows a distribution p(x) marginal / mixture is p(y) = X p(y x) dp(x) x discrete: p(y) = i p(y x i) p(x i ) ubiquitous in many applications Student-t distribution negative binomial distribution marginal distributions convolution... C. Röver, T. Friede March 17, / 23

16 Mixture distributions How to approximate? approximating the continuous mixture through a discrete set of points in τ... actual marginal: approximation: p(θ) = p(θ τ) p(τ) dτ p(θ) i p(θ τ i )π i Questions: how to set up the discrete grid of points? how well can we approximate? do we have a handle on accuracy? C. Röver, T. Friede March 17, / 23

17 Mixture distributions Motivation: discretizing a mixture τ 1 τ 2 τ 3 τ 4 conditional posterior p(θ τ i) τ = τ 1 τ = τ 2 τ = τ 3 τ = τ Note: conditional distributions p(θ τ, y) are very different for τ 1 and τ 2 and rather similar for τ 3 and τ 4. idea: may need fewer bins for larger τ values...?... bin spacing based on similarity / dissimilarity of conditionals? C. Röver, T. Friede March 17, / 23

18 Discretizing mixture distributions Setting up a binning need: discretization of the mixing distribution p(x). domain of X: R (or subset) define bin margins: x (1) < x (2) <... < x (k 1) bins: X i = {x : x x (1) } if i = 1 {x : x (i 1) < x x (i) } if 1 < i < k {x : x (k 1) < x} if i = k. reference points: x 1,..., x k, where x i X i bin probabilities: π i = P ( x (i 1) < x x (i) ) = P ( x Xi ) C. Röver, T. Friede March 17, / 23

19 Discretizing mixture distributions Setting up a binned mixture actual distribution: p(x, y) discrete approximation: q(x, y) same marginal (mixing distribution): q(x) = p(x) but binned conditionals: q(y x) = p(y x = x i ) for x X i. q similar to p, instead of conditioning on exact x, conditioning on corresponding bin s reference point x i marginal: q(y) = q(y x) q(x) dx = i π i p(y x i ) C. Röver, T. Friede March 17, / 23

20 Discretizing mixture distributions Setting up a binned mixture in previous example: bin margins: τ (1) = 10, τ (2) = 20, τ (3) = 30 reference points: τ 1 = 5, τ 2 = 15, τ 3 = 25, τ 4 = 35 probabilities: π 1 = 0.34, π 2 = 0.44, π 3 = 0.15, π 4 = 0.07 τ 1 τ 2 τ 3 τ 4 τ~ 1 τ ~ 2 τ ~ 3 τ ~ 4 τ(1) τ(2) τ(3) marginal posterior p(θ y) C. Röver, T. Friede March 17, / 23

21 Similarity / dissimilarity of distributions Kullback-Leibler divergence The Kullback-Leibler divergence of two distributions with density functions p and q is defined as ( D KL p(θ) q(θ) ) ( p(θ) ) = log p(θ) dθ Θ q(θ) ( p(θ) ) = E p(θ) [log ] q(θ) The symmetrized KL-divergence of two distributions is defined as D s ( p(θ) q(θ) ) = DKL ( p(θ) q(θ) ) +DKL ( q(θ) p(θ) ) the symmetrized KL-divergence... is symmetric: D s ( p(θ) q(θ) ) = Ds ( q(θ) p(θ) ) is always positive: D s ( p(θ) q(θ) ) 0 C. Röver, T. Friede March 17, / 23

22 Divergence Interpretation How to interpret divergences? measure of discrepancy between distributions heuristically: expected log ratio of densities... relevant case here: p(x) q(x). D KL ( p(x), q(x) ) = 0 for p = q D KL ( p(x), q(x) ) = 0.01 corresponds to (expected) 1% difference in densities C. Röver, T. Friede March 17, / 23

23 Divergence Bin-wise maximum divergence: definition consider: divergence between reference point and other points within each bin define: d i { ( = max D s p(y x) p(y xi ) )} { ( ) } = max D s p(y x) q(y x), x X i x X i the bin-wise maximum divergence worst-case discrepancy introduced within each bin C. Röver, T. Friede March 17, / 23

24 Divergence Bin-wise maximum divergence: example τ~ 1 τ ~ 2 τ ~ 3 τ ~ 4 τ (1) τ (2) τ (3) recall: actual parameters of conditionals p(y x) (in black) vs. parameters of q(y x) assumed through binning (in red) C. Röver, T. Friede March 17, / 23

25 Divergence Bin-wise maximum divergence: example τ~ 2 τ (1) τ (2) divergence D s(p(θ τ) p(θ τ ~ 2)) conditional density p(θ τ) C. Röver, T. Friede March 17, / 23

26 Divergence Bin-wise maximum divergence: example τ~ 2 τ (1) τ (2) divergence D s(p(θ τ) p(θ τ ~ 2)) conditional density p(θ τ) C. Röver, T. Friede March 17, / 23

27 Divergence Bin-wise maximum divergence: example τ~ 2 τ (1) τ (2) divergence D s(p(θ τ) p(θ τ ~ 2)) conditional density p(θ τ) C. Röver, T. Friede March 17, / 23

28 Divergence Bin-wise maximum divergence: example τ~ 2 τ (1) τ (2) divergence D s(p(θ τ) p(θ τ ~ 2)) maximum conditional density p(θ τ) determine maximum d i for each bin i (usually at bin margin) C. Röver, T. Friede March 17, / 23

29 Bounding divergence Idea now consider divergences of true and approximate marginals p(y) and q(y) (not the conditionals!) what about D s ( p(y) q(y) )? having the individual bin-wise divergences d i, we can show: D s ( p(y) q(y) ) π i d i i max d i i in other words: by bounding bin-wise divergences (of conditionals) we can bound the overall divergence (of marginals) DIRECT (Divergence Restricted Conditional Tesselation) method C. Röver, T. Friede March 17, / 23

30 Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 δ st reference point τ 1 at zero C. Röver, T. Friede March 17, / 23

31 Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) δ st reference point τ 1 at zero, first margin τ (1) at C. Röver, T. Friede March 17, / 23

32 Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) δ st reference point τ 1 at zero, first margin τ (1) at (... ) C. Röver, T. Friede March 17, / 23

33 Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) τ ~ 2 δ st reference point τ 1 at zero, first margin τ (1) at (... ) C. Röver, T. Friede March 17, / 23

34 Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) τ ~ 2 δ st reference point τ 1 at zero, first margin τ (1) at (... ) C. Röver, T. Friede March 17, / 23

35 Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) τ ~ 2 τ (2) δ st reference point τ 1 at zero, first margin τ (1) at (... ) C. Röver, T. Friede March 17, / 23

36 Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) τ ~ 2 τ (2) τ ~ 3 τ (3) δ st reference point τ 1 at zero, first margin τ (1) at (... ) C. Röver, T. Friede March 17, / 23

37 Discretizing mixtures Sequential DIRECT algorithm 1st bin 2nd bin 3rd bin 4th bin (...) divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) τ ~ 2 τ (2) τ ~ 3 τ (3) τ ~ 4 τ (4) δ st reference point τ 1 at zero, first margin τ (1) at (... ) result: binning with bounded divergence ( δ) per bin C. Röver, T. Friede March 17, / 23

38 Discretizing mixtures Sequential DIRECT algorithm 1st bin 2nd bin 3rd bin 4th bin (...) divergence D s (p(θ τ) p(θ τ ~ i)) τ~ 1 τ (1) τ ~ 2 τ (2) τ ~ 3 τ (3) τ ~ 4 τ (4) δ st reference point τ 1 at zero, first margin τ (1) at (... ) result: binning with bounded divergence ( δ) per bin (when to stop?) C. Röver, T. Friede March 17, / 23

39 Discretizing mixtures Sequential DIRECT algorithm (variations possible) 1 Specify δ > 0, 0 ǫ 1, and starting reference point x 1 (e.g. minimum possible value, or ǫ 2 -quantile). Define ǫ 1 0 as ǫ 1 := P(X x 1 ). Set i = 1. 2 Set x = x 1. Obviously, D s ( p(y x1 ) p(y x ) ) = 0. Now increase x as far as possible while ensuring that D s ( p(y x1 ) p(y x ) ) δ. Use this point as the first bin margin: x (1) = x. Compute π 1 = P(x < x (1) ). Set i = i Increase x until D s ( p(y x(i 1) ) p(y x ) ) = δ. Use this point as the next reference point: x i = x. 4 Increase x again until D s ( p(y xi ) p(y x ) ) = δ. Use this point as the next bin margin: x (i) = x. 5 Compute the bin weight π i = P(x (i 1) < X x (i) ). 6 If P(X > x (i) ) > (ǫ ǫ 1 ), set i = i + 1 and proceed at step 3. Otherwise stop. C. Röver, T. Friede March 17, / 23

40 Discretizing mixtures General algorithm remaining issue: ignored ǫ > 0 tail probability (usually: problems at domain s margins) only need to keep track of reference points x i and probabilities π i meta-analysis example: 35 reference ( support ) points required (δ = 0.01, ǫ = 0.001) 50% 90% 95% 99% C. Röver, T. Friede March 17, / 23

41 Discretizing mixtures General algorithm remaining issue: ignored ǫ > 0 tail probability (usually: problems at domain s margins) only need to keep track of reference points x i and probabilities π i meta-analysis example: 35 reference ( support ) points required (δ = 0.01, ǫ = 0.001) 50% 90% 95% 99% C. Röver, T. Friede March 17, / 23

42 Conclusions approximation allows to compute density, quantiles, moments,... algorithm yields quick-and-easy solution need to specify error budget in terms of divergence δ tail probability ǫ also works for discrete distributions (p(x) or p(y x)) meta-analysis application: implemented in bayesmeta R package general procedure: C. Röver, T. Friede. Discrete approximation of a mixture distribution via restricted divergence. (submitted for publication.) C. Röver, T. Friede Approximating mixture distributions March 17, / 23

Evidence synthesis for a single randomized controlled trial and observational data in small populations

Evidence synthesis for a single randomized controlled trial and observational data in small populations Steffen Unkel, Christian Röver and Tim Friede Department of Medical Statistics University Medical