arxiv: v1 [gr-qc] 26 Nov 2009

Size: px

Start display at page:

Download "arxiv: v1 [gr-qc] 26 Nov 2009"

Barnard Jefferson
5 years ago
Views:

1 AEI Random template placement and prior information arxiv:09.505v [gr-qc] 26 Nov 2009 Christian Röver Max-Plack-Institut für Gravitationsphysik (Albert-Einstein-Institut), Callinstraße 38, 3067 Hannover, Germany. Abstract. In signal detection problems, one is usually faced with the task of searching a parameter space for peaks in the likelihood function which indicate the presence of a signal. Random searches have proven to be very efficient as well as easy to implement, compared e.g. to searches along regular grids in parameter space. Knowledge of the parameterised shape of the signal searched for adds structure to the parameter space, i.e., there are usually regions requiring to be densely searched while in other regions a coarser search is sufficient. On the other hand, prior information identifies the regions in which a search will actually be promising or may likely be in vain. Defining specific figures of merit allows one to combine both template metric and prior distribution and devise optimal sampling schemes over the parameter space. We show an example related to the gravitational wave signal from a binary inspiral event. Here the template metric and prior information are particularly contradictory, since signals from low-mass systems tolerate the least mismatch in parameter space while high-mass systems are far more likely, as they imply a greater signal-to-noise ratio (SNR) and hence are detectable to greater distances. The derived sampling strategy is implemented in a Markov chain Monte Carlo (MCMC) algorithm where it improves convergence.. Introduction Signal detection, in gravitational wave detection in particular, frequently entails the problem of performing a computationally expensive numerical search over a large parameter space. The search here means a search for a peak in the likelihood function, or another detection statistic, based on the data at hand and varying the unknown signal parameters. A peak or a threshold excess then indicates the presence of a signal [, 2]. Such brute-force searches may be implemented as grid searches, evaluating the detection statistic at regularly placed points in parameter space. Computing the detection statistic usually means evaluating the match between a signal template and the data; the spacing between evaluated points in parameter space is then usually based on a template metric which ensures that all possible signals (corresponding to points in parameter space) have at least a certain minimal match with one of the evaluated templates (corresponding to the grid points). Instead of using regularly spaced template banks, the use of random template banks has recently gained popularity, as these are often very easily implemented, and have also be shown to be very efficient, especially in higher dimensions [3]. Here the idea is to populate the parameter space randomly, but uniformly with respect to the template metric. These template placement strategies have by now usually been based on minimax reasoning, by aiming at minimizing the maximal (worst-case) mismatch across the whole parameter space. Once one takes prior information on the unknown parameters into consideration, by accounting for a priori probabilities attached to different regions of parameter space, a decision-theoretic

2 approach allows us to devise other strategies, effectively concentrating efforts on the more promising regions of parameter space in pursuit of a certain optimality criterion [4, 5]. In fact, a minimax strategy may often only exist once one imposes hard bounds on the parameter space (and by that ensuring the existence of an absolute worst case). Markov chain Monte Carlo (MCMC) methods are meanwhile widely used for (Bayesian) parameter estimation in the signal processing stage for gravitational-wave signals [6, 7]. MCMC algorithms are, first of all, methods for stochastic integration [8, 9], although by the way they work they often behave similarly to stochastic search algorithms as well. This is in fact a most welcome property, as part of the parameter estimation problem is usually also a search/optimization problem, as, besides integration over the parameters posterior distribution, it requires finding the global mode or secondary modes. Parallel tempering [0, ] is a variety of the Metropolis-Hastings MCMC algorithm (and a special case of Metropolis-coupled MCMC algorithm [2, 9]) aimed at enhancing these stochastic search capabilities. This is done by basically running several MCMC chains in parallel, where tempering at increasing temperature values is applied to subsequent chains (as in simulated annealing methods [3]), and additional steps are introduced to allow for communication between chains [4]. Parallel tempering methods have been applied to gravitational-wave data analysis for binary inspiral signals in the context of ground-based [5] and space-based (LISA) measurements [4], where they have proven advantageous especially in cases of high SNR and of posterior distributions exhibiting multiple modes or degeneracies [6, 7]. They have meanwhile also been adopted for the analysis of burst signals [8]. Among the parallel Markov chains being run at different temperatures within the parallel tempering implementation, the cool ones with no tempering applied produce samples from the posterior distribution for the stochastic integration part, while the high-temperature chains are producing samples for the stochastic search. The question now is how to set up the algorithm so that the search is most efficient, given our knowledge of prior and template metric, i.e., our knowledge of where the true parameters are (un-) likely to be, and how hard one needs to look across the parameter space. The problem is of special interest in the context of binary inspiral signals, as prior and template metric are particularly contradictory: a priori one is most likely do detect an inspiral involving high masses, as these result in a high-snr signal that is detectable to a greater distance. On the other hand, considering the template metric only, one might want to mostly try low-mass templates, since at low masses the template s and true signal parameters need to be in very close agreement in order for them to match, while at high masses greater discrepancies still yield a good match. What needs to be defined is the distribution to sample from in order to find the mode(s) fastest, which is very similar to setting up a random template bank, the difference being that one does not settle on some fixed number of templates, as the MCMC sampler in principle is thought to sample indefinitely. In the following Sec. 2, we will introduce the problem for the case of binary inspiral signals, and Sec. 3 briefly introduces the parallel tempering context. In Sec. 4 the general problem is formulated in decision-theoretic terms and solved for a particular optimality criterion. Sec. 5 shows some illustrative examples, and Sec. 6 eventually closes with conclusions and perspectives. 2. Binary inspiral parameters In the simplest description, a binary inspiral signal as measured by ground-based interferometers is determined by 9 parameters: sky location (declination δ, right ascension α), polaristion (ψ), companion masses (m, m 2 ), luminosity distance (d L ), time of arrival (t c ), phase (φ) and inclination angle (ι). Assuming some prior distribution for the masses (in the following simply defined to be uniform, m,m 2 [M,0M ]), and an isotropic distribution of events across space while folding in the detectability as a function of signal-to-noise ratio (SNR), one can derive a joint prior distribution whose marginal distribution of masses is shown in Fig.

3 prior: π(θ) minimax/equalizer rule: ρ^(θ) g(θ) optimal rule: p*(θ) π(θ) g(θ) m2 m2 m2 m m m Figure. (Marginal) densities of the distributions π, ˆρ and p for the two mass parameters (m, m 2 ) of a binary inspiral signal. The prior (left plot) indicates that high masses are most likely, which is because they result in stronger signals that are detectable to greater distances. The template metric on the other hand implies that low masses require a dense template spacing (middle plot). [9, 4]. A template metric may be defined following [20, 2], assuming the metric to be constant in the space of the Newtonian and.5 PN chirp times λ and λ 2, which are functions of the mass parameters. For the remaining parameters, for now, we again assume the metric to be uniform (t c, log(d L )) and isotropic (δ, α, ψ, ι, φ). The implied distribution in terms of (m,m 2 ) following from a uniform spacing in (λ,λ 2 ) may be derived using the reparametrisation explicated in [22]. This distribution is shown in Fig.. 3. Parallel tempering In the context of Monte Carlo integration, tempering is utilised to prevent the integration algorithm from getting stuck in local modes of the distribution from which it is sampling. A temperature parameter T is introduced, and instead of sampling from the distribution of actual interest, with density function f(θ), the modified distribution f (T) (θ) f(θ) T () is used. The introduced exponent is supposed to make the distribution more tractable, as it has a flattening effect on the density; the same effect is also taken advantage of in simulated annealing methods [3]. In the limit of T, the density f (T) (θ) then approaches a uniform distribution [9]. In the context of posterior inference, when the target distribution f(θ) is the product of prior π(θ) and likelihood L(θ), it may be more sensible to use a scheme only tempering the likelihood part: f (T) (θ) π(θ) L(θ) T, (2) in which case f (T) (θ) goes towards the prior π(θ) for T [4]. Both uniform distribution and prior distribution may in general not be the most sensible choice, as was pointed out above, since the tempering is also supposed to enhance the algorithm s stochastic search properties. Assume that one had a distribution p (θ) available, which leads to an optimal sampling (w.r.t. to some pre-specified criterion), and which is then the desired density for T. This suggests a generalized tempering parametrisation: ( ) f(θ) f (T) (θ) p T (θ) = p p (θ) T f(θ) T (3) (θ)

4 which in the special cases of p (θ) and p (θ) = π(θ) again yields the tempering schemes from () and (2) above. The question now is how to choose such a limiting distribution p (θ) based on given prior information and template metric. 4. The decision theoretic approach Let g(θ) be the determinant of the template metric as a function of the signal parameters. A large value of g means that that templates need to be densely spaced around θ, while a smaller g indicates that a coarser spacing is sufficient. The volume covered by a template placed at parameter θ is proportional to g(θ) 2, and hence the probability density to sample from for setting up a random template bank is given by ˆρ(θ) g(θ) [3]. Now consider the case of the true parameter value being θ 0 Θ. The actual value θ 0 is unknown, what is known is the prior probability density π(θ). Whenever a template θ is placed in parameter space, it is considered a match if it was sufficiently close to the true value θ 0. What exactly is sufficiently close is determined via mismatch considerations and is expressed through the template metric. Then the probability of a match is P(match θ ) = c g(θ ) π(θ ), (4) where c R + is a constant depending on how close a match actually is required to be. If one was to pick a single template θ, the chances for success would obviously be maximal where the above product reaches its maximum. Analogously, consider the case of a given true value θ 0 and repeated, independent guesses drawn from p (θ). Then for each single guess the probability of success is P(match θ 0 ) = c g(θ0 ) p (θ 0 ). (5) What is desired is a distribution p from which to generate independent draws so that the chances of getting a match are optimal. Whether or when one will get a match is a matter of chance, depending on both the true value θ 0 Θ and the choice of p P, where P is the space of probability distributions over Θ. Suppose we are interested in minimizing the expected number of trials T (or waiting time) until the first match. Any choice of p implies a probability distribution for T; for a given true value θ 0 and a sampling distribution p, T follows a geometric distribution with density and expectation: P(T =t θ 0 ) = ( c ) t ( g(θ0 ) p (θ 0 ) c ) g(θ0 ) p (θ 0 ), E[T θ 0 ] = g(θ0 ) p (θ 0 ). (6) c In decision theoretic terms, we are given a state-of-nature space Θ, an action space P, and a loss function L : Θ P R with L(θ 0,p ) = E p [T θ 0 ] [4, 5]. An optimal choice of p may now be determined by minimizing the expected loss; integrating over the possible values that θ 0 could take, that (prior) expectation is E[T] = c Θ π(θ)dθ, (7) g(θ0 ) p (θ 0 ) which is minimized by choosing p (θ) π(θ) g(θ) = π(θ) ˆρ(θ), (8) i.e., the optimal p here is proportional to the geometric mean of π and ˆρ, and independent of c.

5 π(θ) ρ^(θ) g(θ) p*(θ) π(θ) g(θ) Figure 2. Densities of the distributions π, ˆρ and p for the toy example discussed in Sec The distribution defined through the density ˆρ that is usually utilized for random template banks [3] plays a particular role in this context. From equation (5) one can see that by setting p := ˆρ the probability of a match (and with that also the waiting time) becomes independent of the actual parameter value θ 0, so that ˆρ constitutes an equalizer rule. From (8) it follows that ˆρ will be optimal in the case that the prior happens to be π = ˆρ. This implies that π = ˆρ defines the least favourable prior distribution for this case, and that p = ˆρ also constitutes the minimax strategy (independent from the particular prior π), as it minimizes the maximum of E[T θ 0 ] across all possible true values θ 0 [4]. Since p = ˆρ leads to a uniform match probability in (5), it actually constitutes the equalizer rule for the wider family of optimality criteria that are functions of P(match θ 0 ). 5. Examples 5.. Toy example : Gaussian prior Consider a parameter space Θ =Rwhere the prior is Gaussian with mean µ and variance σ 2 : π = N(µ,σ 2 ), and the template metric is flat, i.e., g(θ) = γ is independent of θ. Then the equalizer rule ˆρ does not exist, and the optimal rule would be p = N(µ,2σ 2 ) Toy example 2: Numerical simulation Consider a parameter space Θ = [0,], where the prior and template metric behave as shown in Fig. 2. For this simple case the behaviour of different sampling strategies can be simulated numerically, by drawing true parameter values θ 0 from the prior distribution and then drawing guesses θ from either ˆρ or p in order to see how the strategies differ P(T t) p* ρ^ waiting time t P(T t) waiting time t Figure 3. Cumulative distributions of the resulting waiting times T when using sampling strategies p and ˆρ in the toy example of Sec The right panel shows a zoom-in on the differing tail behaviour. Fig. 3 illustrates the distribution of the resulting times T, for both the minimax and optimal strategies ˆρ and p. As expected, the average waiting time is lower for p, and one can see that the minimax strategy performs better in the unlikely worst cases.

6 (unnormalized) log likelihood chain # (T=.00) chain #2 (T=.50) chain #3 (T=2.25) chain #4 (T=3.38) chain #5 (T=5.06) chain #6 (T=7.59) chain #7 (T=.4) chain #8 (T=7.) mass ratio (η) mass ratio (η) MCMC iteration 2 chirp mass (m c ) chirp mass (m c ) Figure 4. This plot illustrates the behaviour of a Parallel Tempering algorithm utilizing the distribution p when running on simulated data. The left panel shows how the algorithm s cool chains manage to ascend to greater likelihood values while the tempered chains keep sampling at lower likelihood values. The 2nd panel is a scatter plot of mass parameter samples from all the different chains (after the algorithm s burn-in phase). The right panel eventually shows the resulting mass parameters marginal posterior density derived from the cool chain # alone; the cross indicates the true parameter value Binary inspiral example The prior π and minimax sampling rule ˆρ for the mass parameters of a binary inspiral event were shown in Fig.. The right panel of the same figure also shows the resulting optimized sampling distribution p. The obvious discrepancy between least favourable (ˆρ) and actual prior (π) suggests that there actually is a gain in doing the optimization. Fig. 4 shows how a parallel tempering algorithm for parameter estimation behaves when utilizing the distribution p for high-temperature chains as described in Sec. 3 (3). The MCMC chains quickly converge to the true parameter values, while the higher-temperature chains keep scanning the parameter space efficiently. 6. Conclusions and outlook We have applied a decision-theoretic approach in order to derive an optimized sampling distribution to be used within a parallel tempering MCMC implementation. The optimization step here provides a natural link between the parameter space metric and the prior information about the parameter values. The particular optimality criterion chosen here (the expected time until a matching template is found, E[T θ 0 ]) turns out to be computationally convenient, as the resulting sampling distribution p is independent of the particular mismatch threshold c, and is almost trivial to implement within an MCMC application. Other criteria are conceivable though, like the probability of a missed detection within N samples P(T > N θ 0 ) for example, which may then lead to more complicated results. The general approach used here should also be useful in other contexts; it turns out that the distribution usually used for setting up random template banks here constitutes the special case of a minimax strategy, which implies that the explicit specification of particular figures-of-merit and the consideration of prior information may yield great efficiency improvements, especially in cases where the implicitly assumed least favourable prior greatly deviates from the actual prior information as in the binary inspiral case. In the framework discussed above, the resulting optimized sampling distribution p even exists for cases where the minimax rule does not (as in the example of Sec. 5. above). This suggests that a similar approach may also make other ad-hoc fixes like the mass parameter bounds in the binary inspiral example dispensable, as it would naturally focus in on the promising parameter range while ruling out too unlikely and

7 too costly regions of parameter space. Acknowledgments The author would like to thank Chris Messenger, Reinhard Prix and Graham Woan for helpful discussions. This work was supported by the Max-Planck-Society. References [] McDonough R N and Whalen A D 995 Detection of signals in noise 2nd ed (New York: Academic Press) [2] Wainstein L A and Zubakov V D 962 Extraction of signals from noise (Englewood Cliffs, NJ: Prentice-Hall) [3] Messenger C, Prix R and Papa M A 2009 Physical Review D [4] Berger J O 985 Statistical decision theory and Bayesian analysis 2nd ed (Springer-Verlag) [5] Ferguson T S 967 Mathematical Statistics: A Decision Theoretic Approach (New York: Academic Press) [6] Christensen N and Meyer R 998 Physical Review D [7] Umstätter R, Meyer R, Dupuis R, Veitch J, Woan G and Christensen N 2004 Classical and Quantum Gravity 2 S655 S665 [8] Metropolis N and Ulam S 949 Journal of the American Statistical Association [9] Gilks W R, Richardson S and Spiegelhalter D J 996 Markov chain Monte Carlo in practice (Boca Raton: Chapman & Hall / CRC) [0] Hukushima K and Nemoto K 996 Journal of the Physical Society of Japan [] Hansmann U H E 997 Chemical Physics Letters [2] Geyer C J 99 Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface ed Keramidas E M (Fairfax Station: Interface Foundation) pp [3] Press W H, Teukolsky S A, Vetterling W T and Flannery B P 992 Numerical recipes in C: The art of scientific computing (Cambridge: Cambridge University Press) [4] Röver C 2007 Bayesian inference on astrophysical binary inspirals based on gravitational-wave measurements Ph.D. thesis The University of Auckland URL [5] Röver C, Meyer R and Christensen N 2007 Physical Review D [6] van der Sluys M V, Röver C, Stroeer A, Christensen N, Kalogera V, Meyer R and Vecchio A 2008 The Astrophysical Journal Letters 688 L6 L64 [7] Raymond V, van der Sluys M V, Mandel I, Kalogera V, Röver C and Christensen N 2009 Classical and Quantum Gravity [8] Key J S and Cornish N J 2009 Physical Review D [9] Röver C, Meyer R, Guidi G M, Viceré A and Christensen N 2007 Classical and Quantum Gravity 24 S607 S65 [20] Owen B J and Sathyaprakash B S 999 Physical Review D [2] Chronopoulos A E and Apostolatos T A 200 Physical Review D [22] Umstätter R and Tinto M 2008 Physical Review D

Inference on inspiral signals using LISA MLDC data

Inference on inspiral signals using LISA MLDC data Christian Röver 1, Alexander Stroeer 2,3, Ed Bloomer 4, Nelson Christensen 5, James Clark 4, Martin Hendry 4, Chris Messenger 4, Renate Meyer 1, Matt