Approximating mixture distributions using finite numbers of components

Similar documents
Evidence synthesis for a single randomized controlled trial and observational data in small populations

Effect and shrinkage estimation in meta-analyses of two studies

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

Foundations of Statistical Inference

θ 1 θ 2 θ n y i1 y i2 y in Hierarchical models (chapter 5) Hierarchical model Introduction to hierarchical models - sometimes called multilevel model

Cheng Soon Ong & Christian Walder. Canberra February June 2017

Variational Inference (11/04/13)

Statistical Machine Learning Lectures 4: Variational Bayes

CS Lecture 19. Exponential Families & Expectation Propagation

5 Mutual Information and Channel Capacity

Foundations of Nonparametric Bayesian Methods

Expectation Propagation Algorithm

Lecture 6: Model Checking and Selection

Latent Variable Models and EM algorithm

Bayesian RL Seminar. Chris Mansley September 9, 2008

Quantitative Biology II Lecture 4: Variational Methods

Latent Dirichlet Allocation

The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

CS 630 Basic Probability and Information Theory. Tim Campbell

Posterior Regularization

Bayesian Statistics Comparing Two Proportions or Means

G8325: Variational Bayes

CSC321 Lecture 18: Learning Probabilistic Models

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)

Information Theory Primer:

Harrison B. Prosper. CMS Statistics Committee

Information Theory and Communication

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 35: December The fundamental statistical distances

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Expectation Propagation for Approximate Bayesian Inference

Dynamic models 1 Kalman filters, linearization,

Order Statistics and Distributions

Estimation and Maintenance of Measurement Rates for Multiple Extended Target Tracking

5. Conditional Distributions

Introduction to Machine Learning

Principles of Bayesian Inference

Variational Bayes. A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M

Group sequential designs with negative binomial data

Self Adaptive Particle Filter

Introduction to Probabilistic Machine Learning

Integrated Non-Factorized Variational Inference

Hands-On Learning Theory Fall 2016, Lecture 3

An introduction to Variational calculus in Machine Learning

Stratégies bayésiennes et fréquentistes dans un modèle de bandit

Principles of Bayesian Inference

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Lecture 14. Clustering, K-means, and EM

An Introduction to Expectation-Maximization

Markov Chain Monte Carlo

Unsupervised Learning

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Statistics 135 Fall 2007 Midterm Exam

an introduction to bayesian inference

Bayesian Inference and MCMC

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018

Lecture 7 and 8: Markov Chain Monte Carlo

Bayesian Inference Course, WTCN, UCL, March 2013

Understanding Covariance Estimates in Expectation Propagation

Curve Fitting Re-visited, Bishop1.2.5

Joint Probability Distributions

Probabilistic Graphical Models

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Lecture 13 Fundamentals of Bayesian Inference

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

Machine Learning Srihari. Information Theory. Sargur N. Srihari

Conjugate Predictive Distributions and Generalized Entropies

ICES REPORT Model Misspecification and Plausibility

Hierarchical Models & Bayesian Model Selection

Variational Principal Components

MIT Spring 2016

Bayesian estimation of complex networks and dynamic choice in the music industry

Methods of Estimation

Probabilistic Graphical Models

Bayesian Inference: Posterior Intervals

1 Priors, Contd. ISyE8843A, Brani Vidakovic Handout Nuisance Parameters

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs

Bayesian Semiparametric GARCH Models

Notes on pseudo-marginal methods, variational Bayes and ABC

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY

Bayesian Semiparametric GARCH Models

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

13: Variational inference II

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Statistics. Statistics

Principles of Bayesian Inference

topics about f-divergence

Bayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang

Gaussian Mixture Models

Predictive Hypothesis Identification

STAT 430/510: Lecture 15

Expectation Maximization

Basic math for biology

On the Chi square and higher-order Chi distances for approximating f-divergences

Instructor: Dr. Volkan Cevher. 1. Background

Spring 2006: Examples: Laplace s Method; Hierarchical Models

Computer Intensive Methods in Mathematical Statistics

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Transcription:

Approximating mixture distributions using finite numbers of components Christian Röver and Tim Friede Department of Medical Statistics University Medical Center Göttingen March 17, 2016 This project has received funding from the European Union s Seventh Framework Programme for research, technological development and demonstration under grant agreement number FP HEALTH 2013-602144. C. Röver, T. Friede Approximating mixture distributions March 17, 2016 1 / 23

Overview meta analysis example general problem: mixture distributions discrete grid approximations design strategy & algorithm application to meta analysis problem C. Röver, T. Friede Approximating mixture distributions March 17, 2016 2 / 23

Meta analysis Context: random-effects meta-analysis # 1 # 2 # 3 # 4 # 5 # 6 # 7 have: estimates y i standard errors σ i want: combined estimate ˆΘ 120 140 160 180 200 220 240 nuisance parameter: between-trial C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 3 / 23

Meta analysis Context: random-effects meta-analysis # 1 # 2 # 3 # 4 # 5 # 6 # 7 have: estimates y i standard errors σ i want: combined estimate ˆΘ 120 140 160 180 200 220 240 nuisance parameter: between-trial C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 3 / 23

Meta analysis Context: random-effects meta-analysis # 1 # 2 # 3 # 4 # 5 # 6 # 7 have: estimates y i standard errors σ i want: combined estimate ˆΘ Θ 120 140 160 180 200 220 240 nuisance parameter: between-trial C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 3 / 23

Meta analysis Context: random-effects meta-analysis # 1 # 2 # 3 # 4 # 5 # 6 # 7 have: estimates y i standard errors σ i want: combined estimate ˆΘ Θ 120 140 160 180 200 220 240 nuisance parameter: between-trial C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 3 / 23

Meta analysis Context: random-effects meta-analysis # 1 # 2 # 3 # 4 # 5 # 6 # 7 Θ marginal posterior p(θ) 0.00 0.01 0.02 0.03 0.04 0.05 0.0 120 140 160 180 200 220 240 140 160 180 200 estimation: via marginal posterior distribution of parameterθ C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 4 / 23

Meta analysis Context: random-effects meta-analysis 50% 90% 95% 99% marginal posterior p(τ) 0.00 0.01 0.02 0.03 0.04 0.05 marginal posterior p(θ) 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0 10 20 30 40 0 20 40 60 80 140 160 180 200 two parameters parameter estimation with two unknowns: joint & marginal posterior distributions C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 5 / 23

Meta analysis Context: random-effects meta-analysis here: easy to derive one of the marginals: p(τ y) and conditional posteriors p(θ τ, y) p(τ y) =... (... function of y i, σ i,... ) p(θ τ, y) = Normal(µ = f 1 (τ),σ = f 2 (τ)) but main interest in other marginal: p(θ y) p(θ y) = joint {}}{ p(θ,τ, y) dτ = p(θ τ, y) }{{} conditional p(τ y) }{{} marginal dτ is a mixture distribution C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 6 / 23

Meta analysis Context: random-effects meta-analysis marginal posterior p(τ) 0.00 0.01 0.02 0.03 0.04 0.05 0 10 20 30 40 0 10 20 30 40 50 conditional posterior p(θ τ i) 0.00 0.02 0.04 0.06 0.08 0.10 marginal posterior p(θ y) 0.00 0.01 0.02 0.03 0.04 0.05 0.06 C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 7 / 23

Meta analysis Context: random-effects meta-analysis conditional mean + sd conditional mean conditional mean sd marginal posterior p(τ) 0.00 0.01 0.02 0.03 0.04 0.05 0 10 20 30 40 0 10 20 30 40 50 conditional posterior p(θ τ i) 0.00 0.02 0.04 0.06 0.08 0.10 marginal posterior p(θ y) 0.00 0.01 0.02 0.03 0.04 0.05 0.06 C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 7 / 23

Meta analysis Context: random-effects meta-analysis τ 1 marginal posterior p(τ) 0.00 0.01 0.02 0.03 0.04 0.05 τ 1 0 10 20 30 40 0 10 20 30 40 50 conditional posterior p(θ τ i) 0.00 0.02 0.04 0.06 0.08 0.10 τ = τ 1 marginal posterior p(θ y) 0.00 0.01 0.02 0.03 0.04 0.05 0.06 C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 7 / 23

Meta analysis Context: random-effects meta-analysis τ 1 τ 2 τ 3 τ 4 marginal posterior p(τ) 0.00 0.01 0.02 0.03 0.04 0.05 τ 1 τ 2 τ 3 τ 4 0 10 20 30 40 0 10 20 30 40 50 conditional posterior p(θ τ i) 0.00 0.02 0.04 0.06 0.08 0.10 τ = τ 1 τ = τ 2 τ = τ 3 τ = τ 4 marginal posterior p(θ y) 0.00 0.01 0.02 0.03 0.04 0.05 0.06 C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 7 / 23

Meta analysis Context: random-effects meta-analysis τ 1 τ 2 τ 3 τ 4 marginal posterior p(τ) 0.00 0.01 0.02 0.03 0.04 0.05 τ 1 τ 2 τ 3 τ 4 0 10 20 30 40 0 10 20 30 40 50 conditional posterior p(θ τ i) 0.00 0.02 0.04 0.06 0.08 0.10 τ = τ 1 τ = τ 2 τ = τ 3 τ = τ 4 marginal posterior p(θ y) 0.00 0.01 0.02 0.03 0.04 0.05 0.06 C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 7 / 23

Mixture distributions The general problem mixture distribution: a convex combination of component distributions a distribution whose parameters are random variables ( conditional ) distribution with density p(y x) parameter x follows a distribution p(x) marginal / mixture is p(y) = X p(y x) dp(x) x discrete: p(y) = i p(y x i) p(x i ) ubiquitous in many applications Student-t distribution negative binomial distribution marginal distributions convolution... C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 8 / 23

Mixture distributions How to approximate? approximating the continuous mixture through a discrete set of points in τ... actual marginal: approximation: p(θ) = p(θ τ) p(τ) dτ p(θ) i p(θ τ i )π i Questions: how to set up the discrete grid of points? how well can we approximate? do we have a handle on accuracy? C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 9 / 23

Mixture distributions Motivation: discretizing a mixture τ 1 τ 2 τ 3 τ 4 conditional posterior p(θ τ i) 0.00 0.02 0.04 0.06 0.08 0.10 τ = τ 1 τ = τ 2 τ = τ 3 τ = τ 4 0 10 20 30 40 Note: conditional distributions p(θ τ, y) are very different for τ 1 and τ 2 and rather similar for τ 3 and τ 4. idea: may need fewer bins for larger τ values...?... bin spacing based on similarity / dissimilarity of conditionals? C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 10 / 23

Discretizing mixture distributions Setting up a binning need: discretization of the mixing distribution p(x). domain of X: R (or subset) define bin margins: x (1) < x (2) <... < x (k 1) bins: X i = {x : x x (1) } if i = 1 {x : x (i 1) < x x (i) } if 1 < i < k {x : x (k 1) < x} if i = k. reference points: x 1,..., x k, where x i X i bin probabilities: π i = P ( x (i 1) < x x (i) ) = P ( x Xi ) C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 11 / 23

Discretizing mixture distributions Setting up a binned mixture actual distribution: p(x, y) discrete approximation: q(x, y) same marginal (mixing distribution): q(x) = p(x) but binned conditionals: q(y x) = p(y x = x i ) for x X i. q similar to p, instead of conditioning on exact x, conditioning on corresponding bin s reference point x i marginal: q(y) = q(y x) q(x) dx = i π i p(y x i ) C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 12 / 23

Discretizing mixture distributions Setting up a binned mixture in previous example: bin margins: τ (1) = 10, τ (2) = 20, τ (3) = 30 reference points: τ 1 = 5, τ 2 = 15, τ 3 = 25, τ 4 = 35 probabilities: π 1 = 0.34, π 2 = 0.44, π 3 = 0.15, π 4 = 0.07 τ 1 τ 2 τ 3 τ 4 τ~ 1 τ ~ 2 τ ~ 3 τ ~ 4 τ(1) τ(2) τ(3) marginal posterior p(θ y) 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0 10 20 30 40 0 10 20 30 40 C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 13 / 23

Similarity / dissimilarity of distributions Kullback-Leibler divergence The Kullback-Leibler divergence of two distributions with density functions p and q is defined as ( D KL p(θ) q(θ) ) ( p(θ) ) = log p(θ) dθ Θ q(θ) ( p(θ) ) = E p(θ) [log ] q(θ) The symmetrized KL-divergence of two distributions is defined as D s ( p(θ) q(θ) ) = DKL ( p(θ) q(θ) ) +DKL ( q(θ) p(θ) ) the symmetrized KL-divergence... is symmetric: D s ( p(θ) q(θ) ) = Ds ( q(θ) p(θ) ) is always positive: D s ( p(θ) q(θ) ) 0 C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 14 / 23

Divergence Interpretation How to interpret divergences? measure of discrepancy between distributions heuristically: expected log ratio of densities... relevant case here: p(x) q(x). D KL ( p(x), q(x) ) = 0 for p = q D KL ( p(x), q(x) ) = 0.01 corresponds to (expected) 1% difference in densities C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 15 / 23

Divergence Bin-wise maximum divergence: definition consider: divergence between reference point and other points within each bin define: d i { ( = max D s p(y x) p(y xi ) )} { ( ) } = max D s p(y x) q(y x), x X i x X i the bin-wise maximum divergence worst-case discrepancy introduced within each bin C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 16 / 23

Divergence Bin-wise maximum divergence: example τ~ 1 τ ~ 2 τ ~ 3 τ ~ 4 τ (1) τ (2) τ (3) 0 10 20 30 40 recall: actual parameters of conditionals p(y x) (in black) vs. parameters of q(y x) assumed through binning (in red) C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 17 / 23

Divergence Bin-wise maximum divergence: example 150 155 160 165 170 τ~ 2 τ (1) τ (2) divergence D s(p(θ τ) p(θ τ ~ 2)) 0.00 0.05 0.10 0.15 0.20 conditional density p(θ τ) 0.00 0.02 0.04 0.06 10 12 14 16 18 20 10 12 14 16 18 20 C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 18 / 23

Divergence Bin-wise maximum divergence: example 150 155 160 165 170 τ~ 2 τ (1) τ (2) divergence D s(p(θ τ) p(θ τ ~ 2)) 0.00 0.05 0.10 0.15 0.20 conditional density p(θ τ) 0.00 0.02 0.04 0.06 10 12 14 16 18 20 10 12 14 16 18 20 C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 18 / 23

Divergence Bin-wise maximum divergence: example 150 155 160 165 170 τ~ 2 τ (1) τ (2) divergence D s(p(θ τ) p(θ τ ~ 2)) 0.00 0.05 0.10 0.15 0.20 conditional density p(θ τ) 0.00 0.02 0.04 0.06 10 12 14 16 18 20 10 12 14 16 18 20 C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 18 / 23

Divergence Bin-wise maximum divergence: example 150 155 160 165 170 τ~ 2 τ (1) τ (2) divergence D s(p(θ τ) p(θ τ ~ 2)) 0.00 0.05 0.10 0.15 0.20 maximum conditional density p(θ τ) 0.00 0.02 0.04 0.06 10 12 14 16 18 20 10 12 14 16 18 20 determine maximum d i for each bin i (usually at bin margin) C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 18 / 23

Bounding divergence Idea now consider divergences of true and approximate marginals p(y) and q(y) (not the conditionals!) what about D s ( p(y) q(y) )? having the individual bin-wise divergences d i, we can show: D s ( p(y) q(y) ) π i d i i max d i i in other words: by bounding bin-wise divergences (of conditionals) we can bound the overall divergence (of marginals) DIRECT (Divergence Restricted Conditional Tesselation) method C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 19 / 23

Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) 0.00 0.01 τ~ 1 δ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 1st reference point τ 1 at zero C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 20 / 23

Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) 0.00 0.01 τ~ 1 τ (1) δ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 1st reference point τ 1 at zero, first margin τ (1) at 0.904 C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 20 / 23

Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) 0.00 0.01 τ~ 1 τ (1) δ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 1st reference point τ 1 at zero, first margin τ (1) at 0.904 (... ) C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 20 / 23

Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) 0.00 0.01 τ~ 1 τ (1) τ ~ 2 δ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 1st reference point τ 1 at zero, first margin τ (1) at 0.904 (... ) C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 20 / 23

Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) 0.00 0.01 τ~ 1 τ (1) τ ~ 2 δ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 1st reference point τ 1 at zero, first margin τ (1) at 0.904 (... ) C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 20 / 23

Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) 0.00 0.01 τ~ 1 τ (1) τ ~ 2 τ (2) δ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 1st reference point τ 1 at zero, first margin τ (1) at 0.904 (... ) C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 20 / 23

Discretizing mixtures Sequential DIRECT algorithm divergence D s (p(θ τ) p(θ τ ~ i)) 0.00 0.01 τ~ 1 τ (1) τ ~ 2 τ (2) τ ~ 3 τ (3) δ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 1st reference point τ 1 at zero, first margin τ (1) at 0.904 (... ) C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 20 / 23

Discretizing mixtures Sequential DIRECT algorithm 1st bin 2nd bin 3rd bin 4th bin (...) divergence D s (p(θ τ) p(θ τ ~ i)) 0.00 0.01 τ~ 1 τ (1) τ ~ 2 τ (2) τ ~ 3 τ (3) τ ~ 4 τ (4) δ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 1st reference point τ 1 at zero, first margin τ (1) at 0.904 (... ) result: binning with bounded divergence ( δ) per bin C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 20 / 23

Discretizing mixtures Sequential DIRECT algorithm 1st bin 2nd bin 3rd bin 4th bin (...) divergence D s (p(θ τ) p(θ τ ~ i)) 0.00 0.01 τ~ 1 τ (1) τ ~ 2 τ (2) τ ~ 3 τ (3) τ ~ 4 τ (4) δ 0.0 0.5 1.0 1.5 2.0 2.5 3.0 1st reference point τ 1 at zero, first margin τ (1) at 0.904 (... ) result: binning with bounded divergence ( δ) per bin (when to stop?) C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 20 / 23

Discretizing mixtures Sequential DIRECT algorithm (variations possible) 1 Specify δ > 0, 0 ǫ 1, and starting reference point x 1 (e.g. minimum possible value, or ǫ 2 -quantile). Define ǫ 1 0 as ǫ 1 := P(X x 1 ). Set i = 1. 2 Set x = x 1. Obviously, D s ( p(y x1 ) p(y x ) ) = 0. Now increase x as far as possible while ensuring that D s ( p(y x1 ) p(y x ) ) δ. Use this point as the first bin margin: x (1) = x. Compute π 1 = P(x < x (1) ). Set i = i + 1. 3 Increase x until D s ( p(y x(i 1) ) p(y x ) ) = δ. Use this point as the next reference point: x i = x. 4 Increase x again until D s ( p(y xi ) p(y x ) ) = δ. Use this point as the next bin margin: x (i) = x. 5 Compute the bin weight π i = P(x (i 1) < X x (i) ). 6 If P(X > x (i) ) > (ǫ ǫ 1 ), set i = i + 1 and proceed at step 3. Otherwise stop. C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 21 / 23

Discretizing mixtures General algorithm remaining issue: ignored ǫ > 0 tail probability (usually: problems at domain s margins) only need to keep track of reference points x i and probabilities π i meta-analysis example: 35 reference ( support ) points required (δ = 0.01, ǫ = 0.001) 50% 90% 95% 99% 0 10 20 30 40 C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 22 / 23

Discretizing mixtures General algorithm remaining issue: ignored ǫ > 0 tail probability (usually: problems at domain s margins) only need to keep track of reference points x i and probabilities π i meta-analysis example: 35 reference ( support ) points required (δ = 0.01, ǫ = 0.001) 50% 90% 95% 99% 0 10 20 30 40 C. Röver, T. Friede http://arxiv.org/abs/1602.04060 March 17, 2016 22 / 23

Conclusions approximation allows to compute density, quantiles, moments,... algorithm yields quick-and-easy solution need to specify error budget in terms of divergence δ tail probability ǫ also works for discrete distributions (p(x) or p(y x)) meta-analysis application: implemented in bayesmeta R package http://cran.r-project.org/package=bayesmeta general procedure: C. Röver, T. Friede. Discrete approximation of a mixture distribution via restricted divergence. (submitted for publication.) http://arxiv.org/abs/1602.04060 C. Röver, T. Friede Approximating mixture distributions March 17, 2016 23 / 23