Truncation error of a superposed gamma process in a decreasing order representation

Truncation error of a superposed gamma process in a decreasing order representation B julyan.arbel@inria.fr Í www.julyanarbel.com Inria, Mistis, Grenoble, France Joint work with Igor Pru nster (Bocconi University, Milan, Italy) Advances in Approximate Bayesian Inference NIPS Workshop, December 9, 216 1/15

Table of Contents Completely random measures Ferguson and Klass moment-based algorithm Simulation study Theoretical results Application to density estimation (xkcd) 2/15

Bayesian nonparametric priors Two main categories of priors depending on parameter spaces Spaces of functions random functions Stochastic processes s.a. Gaussian processes Random basis expansions Random densities Mixtures Spaces of probability measures discrete random measures Dirichlet process Pitman Yor Gibbs-type Species sampling processes Completely random measures [Wikipedia] [Brix, 1999] 3/15

Completely random measures µ = i 1 J i δ Zi where the jumps (J i ) i 1 and the jump points (Z i ) i 1 are independent Definition (Kingman, 1967) Random measure µ s.t. A 1,..., A d disjoint sets µ(a 1),..., µ(a d ) are mutually independent Independent Increment Processes, Lévy processes Popular models with applications in biology, sparse random graphs, survival analysis, machine learning, etc. Pivotal role in BNP (Lijoi and Prünster, 21, Jordan, 21) 4/15

Ferguson and Klass algorithm and goal Jumps in decreasing order in µ = j=1 J jδ Zj Minimal error at threshold M µ(x) µ M (X) = j=m+1 J j BNPdensity R package on CRAN, for F & K mixtures of normalized CRMs Algorithm 1 Ferguson and Klass algorithm 1: sample ξ j PP for j = 1,..., M 2: define J j = N 1 (ξ j ) for j = 1,..., M 3: sample Z j P for j = 1,..., M 4: approximate µ by µ M = M j=1 J jδ Zj 5 ξ5 ξ4 ξ3 ξ2 ξ1 J 5 J 4 J 3 J 2 J 1.5 5/15

Moment matching Assessing the error of truncation at threshold M T M = µ(x) µ M (X) = j=m+1 J j Relative error index [ ] J M e M = E FK M j=1 J j Moment-based index l M = ( 1 K K ) ( m 1/n n ˆm n 1/n ) 1/2 2 n=1 Examples of completely random measures Generalized gamma process by Brix (1999), γ [, 1), θ Superposed gamma process by Regazzini et al. (23), η N Stable-beta process by Teh and Gorur (29), σ [, 1), c > σ 6/15

Moment matching Relative error index e M Moment-based index l M.3.2 GG : a = 1 γ = γ =.25 γ =.5 γ =.75.6.4 GG : a = 1 γ = γ =.25 γ =.5 γ =.75 em l M.1.2 5 1 15 2 25 M 5 1 15 2 25 M 7/15

Evaluation of error on functionals Functional of interest: the total mass, criterion 1 = µ(x) µ M (X) Define similarly k for higher order moments of the total mass.12.1.8 γ =.75 γ =.5 γ =.25 γ = 1 =.1 4 =.1 l M.6 1 =.5 γ =.75 γ =.5.4 γ =.25.2.5.1.2.5.1.2 e M 4 =.5 γ = 8/15

Moment matching Reverse moment index M(l) = M l M = l Number of jumps M needed to achieve a given precision, here of l = 1% M(l) 15 1 5.5 γ 1 a 2. 9/15

Posterior moment match Theorem (James et al., 29, Teh and Gorur, 29) In mixture models with normalized generalized gamma (left) and the Indian buffet process based on the stable beta process (right) the posterior distribution of µ is essentially (conditional on some latent variables) µ + k j=1 J j δ Y j.6.4 IG : γ =.5, a = 1 Prior k = 1 k = 3 k = 1.6.4 SBP : σ =.5, a = 1, c = 1 Prior n = 5 n = 1 n = 2 l M l M.2.2 5 1 15 2 25 M 5 1 15 2 25 M 1/15

Posterior of inverse-gaussian process: µ + k j=1 J j δ Y j ( k ) E /E ( µ (X) ) j=1 J j Posterior moment match 1 k\n 1 3 1 1 3.34 7.3 13.5 n γ 2.65 4.68 6.5 5 n.89.98.99 5 Theorem (Arbel, De Blasi, Prünster, 215) Denote by P the true data distribution. In the NRMI model with prior guess P, the posterior of P converges weakly to P : k 1 5 n 1 if P is discrete, then P = P if P is diffuse, then P = σp + (1 σ)p 11/15

Bounding T M in probability Proposition (Arbel and Prünster, 216, Brix, 1999) Let T M be the truncation error for the Generalized Gamma or the Stable Beta Process. Then for any ɛ (, 1), ( ) P T M tm ɛ 1 ɛ 8 6 SBP : a = 1, c = 1 σ = σ =.5 for tm ɛ e CM if σ =, 1 if σ, M 1/σ 1 with ugly explicit constants depending on ɛ, γ, θ, σ and c ~ ε t M 4 2 5 1 15 2 25 M 12/15

Density estimation Mixtures of normalized random measures with independent increments Y i µ i, σ i ind k( µ i, σ i ), i = 1,..., n, (µ i, σ i ) P iid P, i = 1,..., n, P NRMI, Galaxy dataset. Kolmogorov Smirnov distance d KS ( ˆF lm, ˆF em ) between estimated cdfs ˆF lm and ˆF em under, respectively, the moment-match (with l M =.1) and the relative error (with e M =.1,.5,.1) criteria. γ e M =.1 e M =.5 e M =.1 19.4 15.5 9.2.25 31.3 23.7 15.1.5 42.4 28.9 18.3.75 64.8 41. 23.2 13/15

References Discussion Methodology based on moments for assessing quality of approximation in Ferguson and Klass algorithm, a conditional algorithm Should be preferred to relative error All-purpose criterion: validates the samples of a CRM rather than a transformation of it Going to be included in a new release of BNPdensity R package Future work: compare L 1 type bounds (Ishwaran and James, 21) in the Ferguson & Klass context and in size biased settings (see the review by Campbell et al., 216) For more details and for extensive numerical illustrations: A. and Prünster (216). A moment-matching Ferguson and Klass algorithm. Statistics and Computing. arxiv:166.2566 14/15

References References I Julyan Arbel and Igor Prünster (216). A moment-matching Ferguson and Klass algorithm. Statistics and Computing. Anders Brix (1999). Generalized gamma measures and shot-noise Cox processes. Advances in Applied Probability, pages 929 953. Trevor Campbell, Jonathan H Huggins, Jonathan How, and Tamara Broderick (216). Truncated random measures. arxiv preprint arxiv:163.861. Thomas S Ferguson and Michael J Klass (1972). A representation of independent increment processes without Gaussian components. The Annals of Mathematical Statistics, 43(5):1634 1643. H. Ishwaran and L.F. James (21). Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96(453):161 173. Michael I Jordan (21). Hierarchical models, nested models and completely random measures. Frontiers of Statistical Decision Making and Bayesian Analysis: in Honor of James O. Berger. New York: Springer. John Kingman (1967). Completely random measures. Pacific Journal of Mathematics, 21(1):59 78. Antonio Lijoi and Igor Prünster (21). Models beyond the Dirichlet process. Bayesian nonparametrics, 28:8. Eugenio Regazzini, Antonio Lijoi, and Igor Prünster (23). Distributional results for means of normalized random measures with independent increments. Ann. Statist., 31(2):56 585. Yee W. Teh and Dilan Gorur (29). Indian buffet processes with power-law behavior. In Y. Bengio, D. Schuurmans, J.D. Lafferty, C.K.I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1838 1846. Curran Associates, Inc. 15/15