Gibbs models estimation of galaxy point processes with the ABC Shadow algorithm

Size: px

Start display at page:

Download "Gibbs models estimation of galaxy point processes with the ABC Shadow algorithm"

Ursula Lawrence
6 years ago
Views:

Gibbs models estimation of galaxy point processes with the ABC Shadow algorithm Lluı s Hurtado Gil Universidad CEU San Pablo (lluis.hurtadogil@ceu.

1 Gibbs models estimation of galaxy point processes with the ABC Shadow algorithm Lluı s Hurtado Gil Universidad CEU San Pablo (lluis.hurtadogil@ceu.es) Radu Stoica Universite de Lorraine, IECL (radu-stefan.stoica@univlorraine.fr) Vale ncia - Cosmo21 May 24th, 2018 (San Pablo CEU) Cosmo21 1 / 19

2 Overwiew 1 Motivation: estimate local interactions for the galaxy point distribution 2 Gibbs point processes: Geyer, Area Interaction and Connected Components 3 ABC Shadow (Stoica et al., 2016): new algorithm for posterior distributions sampling and parameter estimation 4 Validation: summary statistics, residuals analysis, posterior distribution statistics 5 Results : simulated and real data 6 Conclusions and future work (San Pablo CEU) Cosmo21 2 / 19

3 Motivation Gibbs models are parametric models for galaxy distributions with two probabilistic components: 1. Intensity 2. Interaction a) Estimation of the interaction parameter of point processes is always a challenging problem take advantage of the new possibilies given by the ABC Shadow algorithm b) Inhomogeneities estimation the very next step in our study (van Lieshout 2011) c) Galaxy distribution inside filaments Necklace pattern (Tempel et al. 2014) (San Pablo CEU) Cosmo21 3 / 19

4 Gibbs point processes The probability densisty of a Pairwise Gibbs point process is n(y) f (y 1,..., y n ) = α i=1 β(y i ) i<j h(y i, y j ) β(u) and it takes into account the inhomogeneity of the process. h(y i, y j ): pairwise interaction function. Interactions are usually defined inside an interaction range r. In our models, h(y i, y j ) = γ t(y), where γ is the level of aggregation i<j (> 1 if clustered) and t(y) the number of pairs i j that interact. (San Pablo CEU) Cosmo21 4 / 19

Geyer saturation process n(y) f (y) = α G β n(y) G i=1 γ min(s,t(y i y)) G where s is a saturation constant that avoids uncontrolled aggregation of points t(y i y) = {(y i, y j ) : y i y j <

5 Geyer saturation process n(y) f (y) = α G β n(y) G i=1 γ min(s,t(y i y)) G where s is a saturation constant that avoids uncontrolled aggregation of points t(y i y) = {(y i, y j ) : y i y j < r}, the number of points separated less than r. This model understands galaxy distribution as a distance based interaction (San Pablo CEU) Cosmo21 5 / 19

6 Connected Components process f (y) = α C β C (y) n(y) γ U(y) C where U(y) = C(y) n(y) and C(y) is the number of components of points connected by friends-of-friends. This model understands galaxy distribution as a membership based interaction. (San Pablo CEU) Cosmo21 6 / 19

7 Area Interaction process f (y) = α A β n(y) A γ B(y) A where B(y) = A(y) V (r) A(y): volume covered by spheres of radius r centered at point y i. V (r): volume of a sphere of radius r. This model understands galaxy distribution as a volume occupancy based interaction. (San Pablo CEU) Cosmo21 7 / 19

8 ABC Shadow algorithm Let y be an observed point pattern. Under the assumption of a Gibbs model, the probability density writes as p(y θ) = exp[ U(y θ)] c(θ) where U : Ω R + is the energy function. The model parameters θ are estimated by the MAP, obtained by maximising : p(θ y) = exp[ U(y θ)]p(θ) Z(y)c(θ) with Z(y) and c(θ) the normalising constants, respectively, and p(θ) the prior knowledge of the parameters. (San Pablo CEU) Cosmo21 8 / 19

9 MAP computation : usually done through Monte Carlo methodology - highly costly from a computational point of view several strategies available : PL (not Monte Carlo), MCMCML, ABC (approximate Bayesian computation) ABC Shadow : new algorithm for approximated posterior sampling that combines two ideas : ABC principles : (Grelaud et al., 2009), (Blum, 2009),(Marin et al., 2012), (Biau et al., 2015) the use of auxiliary variables : (Møller et al., 2006) technical details : (Stoica et al., 2016) Synthesis ABC Shadow : better results than PL : control the distance of the approximated posterior to the true posterior at least equivalent computational costs as MCMCML : no need for re-sampling allow statistical inference based on the posterior (San Pablo CEU) Cosmo21 9 / 19

10 Validation 1 Summary statistics functions: pair correlation (ξ), K-Ripley (K), empty space function (F ) and nearest neighbour distribution (G). generate samples Y 1, Y 2,..., Y n given our estimated parameters ˆθ and use the summary statistics to build envelope test for the observed real data set y. 2 Residuals (Baddeley et al., 2005) Based on the idea number of points in a region B minus number of predicted points on that region : R(B, ˆθ) = n(y B) λˆθ (u; y)du where λ is the conditional intensity function λ(u) = f ({u} y)/f (y) (Papangelou, 1974). B (San Pablo CEU) Cosmo21 10 / 19

11 Test samples In a cube W = [0, 1] 3 and interaction radius r = 0.1, we generate a sample following each one of our distributions and use ABC Shadow to estimate the true parameters: Geyer Con-Com Area-Int True log β parameters log γ Estimated log ˆβ 4.95 ± ± ± 0.14 parameters log ˆγ 0.41 ± ± ± 0.4 (San Pablo CEU) Cosmo21 11 / 19

12 Test samples - Posteriors Geyer Con. Components Area Interaction (San Pablo CEU) Cosmo21 12 / 19

13 Test samples - Validation Geyer Con. Components Area Interaction R(B, ˆθ) λˆθ Data density (San Pablo CEU) Cosmo21 13 / 19

14 Galaxy data set: data selection Our datset is a cub of side 50 h 1 Mpc from the SDSS (Tempel et al., 2014). It contains 889 galaxies with luminosity threshold M th r (0) 5 log(h) = 20 a median redshift x = (San Pablo CEU) Cosmo21 14 / 19

15 Galaxy data set: model estimation We estimate our three models for different radii: Radius Geyer ConCom AreaInt r(h 1 Mpc) log ˆβ G log ˆγ G log ˆβ C log ˆγ C log ˆβ A log ˆγ A ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± 0.7 For radii bigger than 5h 1 Mpc (not shown) the log γ parameter is close to or smaller than 0, i.e., not clustering is found at these scales. The expected intensity for a process without interaction is log β = Geyer and ConCom show results below that level, an evidence of the interaction contribution to the distribution. Area Interaction estimates greater γ values for lower β s. The higher the interaction radius, the greater the impact of the point interaction. (San Pablo CEU) Cosmo21 15 / 19

16 Validation: Geyer r = 2.5h 1 Mpc Geyer is unable to recreate the steep profile of the sample. Presence of a cluster jeopardizes the entire distribution. (San Pablo CEU) Cosmo21 16 / 19

17 Validation: Connected Components r = 3h 1 Mpc Dependence on local structures is smaller, but still an unsatisfactory model. (San Pablo CEU) Cosmo21 17 / 19

18 Validation: Area Interaction r = 2.5h 1 Mpc Much better reconstruction of the middle and large interaction distances. Still unable to correctly describe the highly clustered regions (cluster in the center). Problems with the void function (F). (San Pablo CEU) Cosmo21 18 / 19

19 Conclusions and future work 1 two distances regimes : one single distribution with homogeneous trend might not be enough to model the patterns at both small and large scales. 2 importance of point pairwise interactions. Gibbs interaction processes can significantly contribute to the galaxy pattern description. 3 the Area Interaction model: seems to be the most effective model - less good behaviour at small scales 4 future work: introduce a non stationary trend in the model using the intensity function (β(u)) - filamentary structure (Tempel et al., 2016). 5 priors used : non-informative maximum likelihood estimation 6 perspective ABC Shadow algorithm: use priors for the interaction radius (Berthelsen et al., 2006) and mixture modelling (San Pablo CEU) Cosmo21 19 / 19

Tutorial on Approximate Bayesian Computation

Tutorial on Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology 16 May 2016