Fitting Narrow Emission Lines in X-ray Spectra

Size: px

Start display at page:

Download "Fitting Narrow Emission Lines in X-ray Spectra"

Candace Thornton
5 years ago
Views:

1 Outline Fitting Narrow Emission Lines in X-ray Spectra Taeyoung Park Department of Statistics, University of Pittsburgh October 11, 2007

2 Outline of Presentation Outline This talk has three components: A. Motivation Looking for narrow emission lines in high energy spectra. Spectral analysis of the high redshift quasar, PG B. Building and fitting highly structured spectral models. Model-based statistical inference for line locations. Posterior predictive checks for a model component. C.

3 Astronomical Source Motivation Spectrum of a Quasar Typical Questions of Interest

4 Chandra X-ray Observatory Spectrum of a Quasar Typical Questions of Interest

5 Spectrum of a Quasar Typical Questions of Interest Spectrum of an Astronomical Source An observed spectrum consists of photon counts in a number of energy bins. Use non-homogenous Poisson process. The photons are originated from two groups: (1) the continuum and (2) several emission lines. Observed photon counts per energy bin Expected photon counts per energy bin

6 Typical Questions of Interest Spectrum of a Quasar Typical Questions of Interest Observed spectrum of the high redshift quasar, PG : Observed photon counts per energy bin Estimate parameters that describe a spectrum. 2. Investigate the uncertainty of the estimates. 3. Test the evidence for the emission lines.

7 The Basic Source Model A simplified Poisson process for the scientific model, ( ) X i Poisson Λ i = αe β i }{{} Continuum + λπ i (µ, σ 2 ) }{{} Emission line where i = 1, 2,..., n index the energy bins. Construct a narrow emission line model using the delta function so that 1 the emission line is contained entirely in one energy bin, but 2 we do not know which bin. That is, {π i } can be parameterized in terms of a single location parameter µ, and σ 2 is set equal to zero.

8 The Basic Source Model (Cont d) ( ) The Poisson process: X i Poisson Λ i = αe β i + λπ i (µ). Using Data Augmentation to fit this finite mixture model: ( ) indicator that photon l in bin i Z il =. corresponds to the emission line With the target posterior distribution p(z, ψ, µ X) where ψ = (α, β, λ), the Gibbs sampler iterates among ( ) 1 Draw Z p(z ψ, µ, X), via Z il Bernoulli λπ i (µ) αe β i + λπ i (µ), 2 Draw ψ p(ψ Z, µ, X), and 3 Draw µ p(µ Z, ψ, X). In This Case, the Gibbs Sampler Breaks Down!

9 The Basic Source Model (Cont d) ( ) The Poisson process: X i Poisson Λ i = αe β i + λπ i (µ). Using Data Augmentation to fit this finite mixture model: ( ) indicator that photon l in bin i Z il =. corresponds to the emission line With the target posterior distribution p(z, ψ, µ X) where ψ = (α, β, λ), the Gibbs sampler iterates among ( ) 1 Draw Z p(z ψ, µ, X), via Z il Bernoulli λπ i (µ) αe β i + λπ i (µ), 2 Draw ψ p(ψ Z, µ, X), and 3 Draw µ p(µ Z, ψ, X). In This Case, the Gibbs Sampler Breaks Down!

10 Why the Gibbs Sampler Fails Total counts per bin Total counts, X µ (0) π i (µ (0) ) Continuum counts Line counts µ (1) ( iid Z il Ber Z = 0 Z = 1 λπ i (µ) αe β i +λπ i (µ) )

11 Why the Gibbs Sampler Fails Total counts per bin Total counts, X µ (0) π i (µ (0) ) Continuum counts Line counts µ (1) ( iid Z il Ber Z = 0 Z = 1 λπ i (µ) αe β i +λπ i (µ) )

12 Improving the Convergence of the Gibbs Sampler The Gibbs sampler breaks down because the line location µ cannot move from its starting value. Because of the binning, we are left with a finite number of possible line locations. Thus, we can evaluate the observed posterior probability of µ at the midpoint of each bin. The observed posterior distribution for µ is given by ( p(µ ψ, X) = Multinomial 1; { } ) p(µ ψ, X) µ=ei. To improve convergence, can we capitalize on p(µ ψ, X)? Note that p(µ ψ, X) p(z, ψ, µ X) dz! }{{} target distribution

13 Introducing Incompatibility into the Gibbs Sampler The target posterior distribution is given by p(z, ψ, µ X). Iteratively sample from complete conditional distributions: Step 1 : Draw Z p(z ψ, µ, X), Step 2 : Draw ψ p(ψ Z, µ, X), and Step 3 : Draw µ p(µ Z, ψ, X). Form a Markov chain with stationary distribution p(z, ψ, µ X). Basic Question : When p(µ ψ, X) is available, use it or not? And the answer is... USE WITH CAUTION!

14 Introducing Incompatibility into the Gibbs Sampler The target posterior distribution is given by p(z, ψ, µ X). Iteratively sample from complete conditional distributions: Step 1 : Draw Z p(z ψ, µ, X), Step 2 : Draw ψ p(ψ Z, µ, X), and Step 3 : Draw µ p(µ Z, ψ, X). Form a Markov chain with stationary distribution p(z, ψ, µ X). Basic Question : When p(µ ψ, X) is available, use it or not? And the answer is... USE WITH CAUTION!

15 Construction of Partially Collapsed Gibbs Samplers Steps of Construction p(z ψ, µ, X) p(ψ Z, µ, X) p(µ Z, ψ, X) p(z ψ, µ, X) p(ψ Z, µ, X) p(z, µ ψ, X) General Principle We move Z to the left of the conditioning sign in Step 3. This does not alter the stationary distribution, but improves the rate of convergence. p(z, µ ψ, X) We permute the order of the steps. This can p(z ψ, µ, X) have minor effects on the rate of convergence, p(ψ Z, µ, X) but does not affect the stationary distribution. p(µ ψ, X) We remove Z from the draws in Step 1, since p(z ψ, µ, X) the transition kernel does not depend on this p(ψ Z, µ, X) quantity. We refer to these three steps as Marginalization, Permutation, and Trimming. They form a general strategy for constructing partially collapsed Gibbs (PCG) samplers.

16 Construction of Partially Collapsed Gibbs Samplers Steps of Construction p(z ψ, µ, X) p(ψ Z, µ, X) p(µ Z, ψ, X) p(z ψ, µ, X) p(ψ Z, µ, X) p(z, µ ψ, X) General Principle We move Z to the left of the conditioning sign in Step 3. This does not alter the stationary distribution, but improves the rate of convergence. p(z, µ ψ, X) We permute the order of the steps. This can p(z ψ, µ, X) have minor effects on the rate of convergence, p(ψ Z, µ, X) but does not affect the stationary distribution. We remove Z from the draws in Step 1, since p(z, µ ψ, X) the transition kernel does not depend on this p(ψ Z, µ, X) quantity. We refer to these three steps as Marginalization, Permutation, and Trimming. They form a general strategy for constructing partially collapsed Gibbs (PCG) samplers.

17 Improved Convergence Characteristics Theorem (Park and van Dyk, 2007) Sampling more model components in any single step of a Gibbs sampler reduces the resulting maximal autocorrelation. Directly assessing the convergence rate is difficult. The spectral radius of a forward operator generally governs the convergence (Liu et al, 1995), and is bounded above by the maximal autocorrelation. By reducing conditioning in any step (i.e., partial marginalization) we reduce the maximal autocorrelation, the upper bound of the spectral radius. We thus expect partially collapsed Gibbs samplers to have better convergence than ordinary Gibbs samplers.

18 Highly Structured Multilevel Model Actually we do not observe the latent Poisson process, ( ) X i Poisson Λ i = αe β i + λπ i (µ). Rather, we observe Y j Poisson (γ j M ijλ i + θ B i j ), because the photons are subject to a series of non-trivial physical processes: γ j : non-homogeneous stochastic censoring (absorption), M ij : stochastic redistribution (blurring), and θ B j : background contamination.

19 Data Augmentation Motivation We consider a hierarchical missing data structure: Variable Ideal data separated by mixture indicators Notation Z Mixed ideal data Mixed ideal data after absorption X = {X i } Mixed ideal data after absorption and blurring Mixed ideal data after absorption, blurring, and background contamination, i.e., Y obs = {Y j } observed data

20 The Gibbs Sampler Motivation The target posterior distribution is p(x, Z, ψ, µ Y obs ). The Gibbs sampler now iterates among Step 1 : Draw (X, Z ) p(x, Z ψ, µ, Y obs ), Step 2 : Draw ψ p(ψ X, Z, µ, Y obs ), and Step 3 : Draw µ p(µ X, Z, ψ, Y obs ), where ψ represents all model parameters but µ. With a Delta Function Line Model, the Gibbs Sampler Fails! The observed posterior distribution for µ is computed as ( { } ) p(µ ψ, Y obs ) = Multinomial 1; p(µ ψ, Y obs ) µ=ej, j. To improve convergence, we capitalize on p(µ ψ, Y obs ).

21 PCG Sampler I Motivation p(µ ψ, Y obs ) is available. Partially marginalize over all of the missing data, (X, Z ). p(x, Z ψ, µ, Y obs ) p(ψ X, Z, µ, Y obs ) p(µ X, Z, ψ, Y obs ) Marginalize p(x, Z ψ, µ, Y obs ) p(ψ X, Z, µ, Y obs ) p(x, Z, µ ψ, Y obs ) Permute p(x, Z, µ ψ, Y obs ) p(x, Z ψ, µ, Y obs ) p(ψ X, Z, µ, Y obs ) Trim p(µ ψ, Y obs ) p(x, Z ψ, µ, Y obs ) p(ψ X, Z, µ, Y obs )

22 PCG Sampler I Motivation p(µ ψ, Y obs ) is available. Partially marginalize over all of the missing data, (X, Z ). p(x, Z ψ, µ, Y obs ) p(ψ X, Z, µ, Y obs ) p(µ X, Z, ψ, Y obs ) Marginalize p(x, Z ψ, µ, Y obs ) p(ψ X, Z, µ, Y obs ) p(x, Z, µ ψ, Y obs ) Permute p(x, Z, µ ψ, Y obs ) p(x, Z ψ, µ, Y obs ) p(ψ X, Z, µ, Y obs ) Trim p(x, Z, µ ψ, Y obs ) p(ψ X, Z, µ, Y obs ) The resulting sampler corresponds to a blocked version of the original Gibbs sampler!

23 PCG Sampler II Motivation p(µ X, ψ, Y obs ) is also available and quick to compute. Partially marginalize over part of the missing data, Z. p(x, Z ψ, µ, Y obs ) p(ψ X, Z, µ, Y obs ) p(µ X, Z, ψ, Y obs ) Marginalize p(x, Z ψ, µ, Y obs ) p(ψ X, Z, µ, Y obs ) p(z, µ X, ψ, Y obs ) Permute p(z, µ X, ψ, Y obs ) p(x, Z ψ, µ, Y obs ) p(ψ X, Z, µ, Y obs ) Trim p(µ X, ψ, Y obs ) p(x, Z ψ, µ, Y obs ) p(ψ X, Z, µ, Y obs ) The resulting sampler cannot be blocked! It is the partially collapsed Gibbs sampler.

24 PCG Sampler II Motivation p(µ X, ψ, Y obs ) is also available and quick to compute. Partially marginalize over part of the missing data, Z. p(x, Z ψ, µ, Y obs ) p(ψ X, Z, µ, Y obs ) p(µ X, Z, ψ, Y obs ) Marginalize p(x, Z ψ, µ, Y obs ) p(ψ X, Z, µ, Y obs ) p(z, µ X, ψ, Y obs ) Permute p(z, µ X, ψ, Y obs ) p(x, Z ψ, µ, Y obs ) p(ψ X, Z, µ, Y obs ) Trim p(µ X, ψ, Y obs ) p(x, Z ψ, µ, Y obs ) p(ψ X, Z, µ, Y obs ) The resulting sampler cannot be blocked! It is the partially collapsed Gibbs sampler.

25 Computational Gains Motivation Compare the Gibbs sampler, PCG I, and PCG II. The Gibbs sampler does not move from its starting value. PCG I has better convergence characteristics than PCG II. However, each iteration of PCG I is more expensive. PCG II PCG I μ (kev) μ (kev) Iterations Autocorrelation Autocorrelation Lag log λ (log counts) log λ (log counts) μ (kev) Iterations Lag μ (kev)

26 A Simulation Study Motivation To illustrate the statistical properties of fitted spectra, we run a simulation. We generate 10 data sets (1500 counts) from 6 spectra. We use a typical continuum, effective area, and instrumental response function. spectrum Case 1 Case 3 spectrum Case 2 Case 4 There are 0, 1, or 2 lines. Each line is of moderate width (SD = 4 bins) or of broad width (SD = 21 bins), and weak or strong. Fitted models include one or two emission lines. spectrum spectrum Case 5 spectrum spectrum Case 6

27 Results: Highly Multimodal Posterior Distributions Run PCG samplers for model fitting. Results with one delta function line in the fitted model. The marginal posterior density is estimated using Gaussian kernel smoothing (SD = 0.01). Smooth out lower posterior probabilities while not flattening higher posterior probabilities. The posterior density is highly multimodal. Posterior probability Posterior probability Posterior probability density function density function density function Case 1 Case 3 Case 5 Posterior probability Posterior probability Posterior probability density function density function density function Case 2 Case 4 Case 6

28 Posterior Regions Motivation The highly multimodal posterior distribution cannot be summarized in standard ways familiar to astronomers (e.g., 5 ± 2 or, for asymmetric intervals, ). Use a transformation of the posterior density function to visualize the HPD regions of varying levels. Level of HPD Level of HPD Case 1 Line location (kev) Case 3 Level of HPD Level of HPD Case 2 Line location (kev) Case 4 Construct a HPD region for each of the 100 different levels equally spaced between 1% and 100%. Summarize the highly multimodal posterior distribution using a HPD distribution with 100 different levels. Level of HPD Line location (kev) Case 5 Line location (kev) Level of HPD Line location (kev) Case 6 Line location (kev)

29 Fitting Two Emission Lines The multimodal posterior distribution may suggest multiple lines. The joint posterior distribution of two line locations with data generated under Case 2 (one narrow weak line at 2.85 kev). Second line location Posterior probability Posterior probability density function density function Case 2 Line location closest to the MLE Case 2 First ine location Other line location

30 Fitting Two Emission Lines (Cont d) The joint posterior distribution of two line locations with data generated under Case 5 (one narrow weak line at 2.85 kev and one narrow strong line at 1.2 kev). Second line location Posterior probability Posterior probability density function density function Case 5 Line location closest to the MLE Case 5 First ine location Other line location

31 An Advantage of Model Misspecification? Consider a simple Gaussian model with known SD: Y N(µ, σ) A 95% CI for µ is given by Y ± 1.96σ. Misspecification of σ with ς < σ results in a shorter interval, with lower coverage. Misspecification of the width of a spectral line: Might there be an advantage of using a delta function line rather than a Gaussian line (with fitted width) if we know the spectral line is not too broad? The tradeoff is not as simple as in the simple Gaussian model, so we return to the simulation study.

32 More Precise Inference? Delta Function Line Gaussian Line Test data index Case 1 Test data index Case 2 Test data index Case 1 Test data index Case 2 Test data index Case 3 Test data index Case 4 Test data index Case 3 Test data index Case 4 Test data index Case 5 Test data index Case 6 Test data index Case 5 Test data index Case 6

33 A Closer Look Motivation Delta function line Gaussian line Case Line type 1 Coverage 2 Mean length Coverage 2 Mean length 1 no lines NA NA one narrow 90% % one wide 50% % one narrow 100% % two narrow 90% % two narrow 100% % total narrow 95% % Narrow lines are 17 bins wide (four SDs); wide lines are 85 bins wide. 2 Coverage: % of ten 95% HPD regions containing at least one true line location Misspecification appears to give better mean length in all cases and better coverage for narrow lines

34 Proceed with Caution Motivation Exhaustive simulations are difficult. Fitting involves MCMC sampling, which is slow and requires some supervision. Results may depend on the line location, line strength, line width, characteristics of the continuum, sample size, etc. Delta functions emission lines are useful for exploratory data analysis and for inference when the true line is believed to be narrow, and also show promise for use for inference with moderately broad lines.

35 Posterior Distribution of the Line Location Posterior probability density function Posterior probability density function Six Observations were independently observed for PG Delta Function Line obs id 47 obs id 69 Posterior probability density function Posterior probability density function obs id 62 obs id 70 p(µ Y obs ) = = Z Z 6Y Z i=1 6Y i=1 6 Y i=1 p(µ, ψ i Y obs i )dψ 1 dψ 6 p(µ, ψ i Yi obs )dψ i p(µ Y obs i ). Posterior probability density function obs id 71 Posterior probability density function obs id 1269 Posterior probability density function Delta function line profile Posterior probability density function Delta function line profile

36 Results Motivation Under the delta function emission line model: Given all six observations, the posterior mode of the line location is identified at kev. The nominal 95% posterior region consists of (2.83 kev, 2.92 kev) with 94.8% and (0.50 kev, 0.51 kev) with 2.2%. The detected line is red-shifted to 6.69 kev in the quasar rest frame, which indicates the ionization state of iron.

37 Posterior Predictive Checks Quantify the evidence of the emission line in the spectrum using ppp-values (Meng, 1994; Protassov et al, 2002). Compare the following three models: MODEL 0 : No emission line in the spectrum. MODEL 1 : One line at 2.74 kev with unknown intensity. MODEL 2 : One line with unknown location and intensity. The test statistic is the sum of the log likelihood ratios: { 6 T m (Ỹ (l) supθ Θm L(θ Ỹ (l) } i ) ) = log i=1 sup θ Θ0 L(θ Ỹ (l), m = 1 and 2, i ) Θ m : the parameter space under MODEL m. Ỹ (l) : the simulated data sets under MODEL 0.

38 Posterior Predictive P-Values The posterior predictive method is a parameterized bootstrap that accounts for posterior uncertainty in the parameters. Small ppp-values give strong evidence for the presence of the delta function emission line in the X-ray spectrum. Model 0 vs. Model 1 Model 0 vs. Model 2 ppp-value = ppp-value = T1(Y obs )=7.450 T2(Y obs )= T1(Ỹ (l) ) T2(Ỹ (l) )

39 Motivation Estimating the location of narrow emission lines causes problems for standard statistical methods and algorithms. Devise partially collapsed Gibbs (PCG) samplers that generalize the composition of conditional distributions, generalize a blocked Gibbs sampler (Liu et al., 1995), are viewed as a stochastic counterpart to the ECME algorithm (Liu and Rubin, 1994) and the AECM algorithm (Meng and van Dyk, 1997). Summarize the highly multimodal posterior distribution using a HPD distribution with varying levels. Use posterior predictive checks (Protassov et al., 2002) to test for the evidence of a narrow emission line.

40 Thank you!

Fitting Narrow Emission Lines in X-ray Spectra

Fitting Narrow Emission Lines in X-ray Spectra Taeyoung Park Department of Statistics, Harvard University October 25, 2005 X-ray Spectrum Data Description Statistical Model for the Spectrum Quasars are