Multistage Anslysis on Solar Spectral Analyses with Uncertainties in Atomic Physical Models

Similar documents
Solar Spectral Analyses with Uncertainties in Atomic Physical Models

Accounting for Calibration Uncertainty in Spectral Analysis. David A. van Dyk

Accounting for Calibration Uncertainty in Spectral Analysis. David A. van Dyk

New Results of Fully Bayesian

Accounting for Calibration Uncertainty

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Bayesian Phylogenetics:

Fully Bayesian Analysis of Calibration Uncertainty In High Energy Spectral Analysis

Bayesian Inference for the Multivariate Normal

A new Hierarchical Bayes approach to ensemble-variational data assimilation

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

STA 4273H: Sta-s-cal Machine Learning

Big Data Inference. Combining Hierarchical Bayes and Machine Learning to Improve Photometric Redshifts. Josh Speagle 1,

Embedding Supernova Cosmology into a Bayesian Hierarchical Model

Bayesian Computation in Color-Magnitude Diagrams

Efficient Likelihood-Free Inference

Hale Collage. Spectropolarimetric Diagnostic Techniques!!!!!!!! Rebecca Centeno

Temperature Reconstruction from SDO:AIA Filter Images

Data Integration Model for Air Quality: A Hierarchical Approach to the Global Estimation of Exposures to Ambient Air Pollution

Statistical Data Analysis Stat 3: p-values, parameter estimation

Contents. Part I: Fundamentals of Bayesian Inference 1

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

ST 740: Markov Chain Monte Carlo

Bayesian Estimation with Sparse Grids

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

1.3j describe how astronomers observe the Sun at different wavelengths

Principal Components Analysis (PCA)

Using Estimating Equations for Spatially Correlated A

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Using This Flip Chart

Approximate Inference Part 1 of 2

Fractional Imputation in Survey Sampling: A Comparative Review

Data-Intensive Statistical Challenges in Astrophysics

Approximate Inference Part 1 of 2

Detection ASTR ASTR509 Jasper Wall Fall term. William Sealey Gosset

Statistical Tools and Techniques for Solar Astronomers

Using training sets and SVD to separate global 21-cm signal from foreground and instrument systematics

Data Uncertainty, MCML and Sampling Density

Large Scale Bayesian Inference

Bayesian Methods for Machine Learning

Climate Change: the Uncertainty of Certainty

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

Bayesian Inference and MCMC

Statistical techniques for data analysis in Cosmology

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Distribution of Eigenvalues of Weighted, Structured Matrix Ensembles

LECTURE 15 Markov chain Monte Carlo

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Using Bayes Factors for Model Selection in High-Energy Astrophysics

arxiv: v1 [stat.co] 8 Feb 2013

Solar Spectroscopy. with Hinode/EIS and IRIS. Magnus Woods With thanks to David Long UCL-Mullard Space Science Laboratory

Identifying Emission Lines in the Solar Extreme Ultraviolet (EUV) Irradiance Spectrum

Point, Interval, and Density Forecast Evaluation of Linear versus Nonlinear DSGE Models

Astronomy 1 Fall 2016

Notes on pseudo-marginal methods, variational Bayes and ABC

Accounting for Complex Sample Designs via Mixture Models

Frequentist-Bayesian Model Comparisons: A Simple Example

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian Regression Linear and Logistic Regression

Statistics: Learning models from data

AIA DATA ANALYSIS OVERVIEW OF THE AIA INSTRUMENT

Supplementary Note on Bayesian analysis

STA 294: Stochastic Processes & Bayesian Nonparametrics

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

Robust and accurate inference via a mixture of Gaussian and terrors

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Three data analysis problems

Discrete Variables and Gradient Estimators

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates

STAT 518 Intro Student Presentation

Modelling and forecasting of offshore wind power fluctuations with Markov-Switching models

Density Estimation. Seungjin Choi

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Bagging During Markov Chain Monte Carlo for Smoother Predictions

GAUSSIAN PROCESS REGRESSION

Benchmark Exercises for stellar X-ray Spectroscopy Testing (BEXST): Initial Results

Estimation of Mars surface physical properties from hyperspectral images using the SIR method

Bootstrapping, Randomization, 2B-PLS

Two Statistical Problems in X-ray Astronomy

Computational statistics

Confidence Estimation Methods for Neural Networks: A Practical Comparison

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

STA 216, GLM, Lecture 16. October 29, 2007

pyblocxs Bayesian Low-Counts X-ray Spectral Analysis in Sherpa

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Afternoon Meeting on Bayesian Computation 2018 University of Reading

Multimodal Nested Sampling

MCMC Sampling for Bayesian Inference using L1-type Priors

Approximate Bayesian Computation for Astrostatistics

The Sun s Dynamic Atmosphere

Graphical Models for Collaborative Filtering

Introduction to Bayesian methods in inverse problems

Approximate Bayesian Computation

Timevarying VARs. Wouter J. Den Haan London School of Economics. c Wouter J. Den Haan

Correcting biased observation model error in data assimilation

Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis

Thomas Bayes versus the wedge model: An example inference using a geostatistical prior function

Transcription:

Multistage Anslysis on Solar Spectral Analyses with Uncertainties in Atomic Physical Models Xixi Yu Imperial College London xixi.yu16@imperial.ac.uk 23 October 2018 Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 1 / 27

Acknowledgments Joint work with the International Space Science Institute (ISSI) team Improving the Analysis of Solar and Stellar Observations Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 2 / 27

The Solar Corona The solar corona is a complex and dynamic system Measuring physical properties in any solar region is important for understanding the processes that lead to these events Figure: The photospheric magnetic field measured with HMI, million degree emission observed with the AIA Fe IX 171,A channel, and high temperature loops observed with XRT Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 3 / 27

Aim We want to infer physical quantities of the solar atmosphere (density, temperature, path length, etc.), but we only observe intensity Inferences also rely on models for the underlying atomic physics How to address uncertainty in the atomic physics models? Figure: Hinode spacecraft. Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 4 / 27

An Astrophysical Perspective The importance of accounting for statistical errors is well established in astronomical analysis: a measurement is of little value without an estimate of its credible range However: Uncertainty is often ignored entirely If the second analysis is depending on the first analysis, use the best-fit value, e.g., by minimizing χ 2 results Lead to erroneous interpretation of the data! Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 5 / 27

A Statistical Perspective Problem: A two-stage analysis where primary: X ɛ secondary: Y θ, ɛ Aim: To design a robust principled method Method: To estimate θ via the analysis of Y, but still depends on ɛ that can be estimated in the primary analysis Idea: The output from the primary analysis is required for the secondary analysis Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 6 / 27

Statistical Methods Standard method p(θ Y, ˆɛ 0 ) (1) where ˆɛ 0 is the best-fit of p(ɛ X ) from the primary analysis Multiple imputation: multiple imputation combining rules with Gaussian approximation Pragmatic Bayesian method Fully Bayesian method p pb (θ, ɛ Y ) = p(θ Y, ɛ) p(ɛ) (2) p fb (θ, ɛ Y ) = p(θ Y, ɛ) p(ɛ Y ) (3) Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 7 / 27

Multiple imputation Sample M sets of independent parameter estimates from p(ɛ X ) along with their estimated variance-covariance matrices: ɛ (m) and Var(ɛ (m) ) for m = 1,..., M (4) Make Gaussian assumption and use multiple imputation combining rules (a set of simple moment calculations) Parameter estimate: ɛ = 1 M ɛ (m) (5) M Total uncertainty: m=1 T = W + (1 + 1 )B (6) M W = 1 M M m=1 Var(ɛ(m) ) and B = 1 M M 1 m=1 (ɛ(m) ɛ)(ɛ (m) ɛ) are the statistical and systematic uncertainties respectively However: M is typically small What if we have a large Monte Carlo (MC) sample? Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 8 / 27

A Particular Problem We have a large ensemble of MC sample that represents the variability from the primary analysis: M = {ɛ (m), m = 1,..., M} How to use this ensemble? Two methods: Discrete uniform Gaussian approximation via Principal Component Analysis (PCA) A Case Study in FeXIII Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 9 / 27

Physical Parameters n k 1 : number of free electrons per unit volume in plasma T k : electron temperature d k : path length through the solar atmosphere θ k = (log n k, log d k ) m: index of the emissivity curve Expected intensity of line with wavelength λ: ɛ (m) λ ɛ (m) λ (n k, T k )n 2 k d k (n k, T k ) is the plasma emissivity for the line with wavelength λ in pixel k 1 Subscript k is the pixel index Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 10 / 27

Data: Observed Intensity Data from the Extreme-Ultraviolet Imaging Spectrometer (EIS) on Hinode spacecraft. Figure: Example EIS spectrum of seven Fe XIII lines Spectral lines with wavelengths Λ = {λ 1,..., λ J } Observed intensities for K pixels and J wavelengths: ˆD = {D k = (I kλ1,..., I kλj ), k = 1,..., K} Standard deviation σ kλj are also measured Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 11 / 27

Uncertainty: Emissivity Emissivity: how strongly energy is radiated at a given wavelength Simulated from a model accounting for uncertainty in the atomic data Suppose a collection of M emissivity curves are known M = {ɛ (m) λ (n k, T k ), λ Λ, m = 1,..., M} Figure: A simplified level diagram for the transitions relevant to the 7 lines considered here. Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 12 / 27

Statistical Model Independent Prior distributions Likeihood L(m, θ k D k ) p(m, θ k ) = p(m) p(log n k ) p(log d k ) (7) m p(m) (8) log 10 n k Uniform(min = 7, max = 12) (9) log 10 d k Cauchy(center = 9, scale = 5) (10) ( indep I kλ m, n k, d k Normal Joint posterior distribution ɛ (m) λ ) (n k, T k )n 2 k d k, σkλ 2, for λ Λ (11) p(m, θ k D k ) L(m, θ k D k ) p(m, θ k ), (12) Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 13 / 27

Standard VS Pragmatic VS Fully Bayesian Models Standard method p(θ k D k, m = 1) with m = 1 the default emissivity (13) Pragmatic Bayesian posterior distribution p(m, θ k D k ) = p(θ k D k, m) p(m). (14) Fully Bayesian posterior distribution p(m, θ k D k ) = p(θ k D k, m) p(m D k ). (15) Aim: To compare the three methods, standard, pragmatic and fully Bayesian, applied to a single-pixel and multiple pixels Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 14 / 27

Prior I: Discrete Uniform (Done) Fully Bayesian Model I: Use Bayesian Methods to incorporate information in the data for narrowing the uncertainty in the atomic physics calculation Joint posterior distribution: p(m, θ k D k ) L(m, θ k D k ) p(m) p(θ k ) p(m) = 1 M, i.e., there are only M = 1000 equally likely emissivity curves as a priori Solution: Obtain a sample of M that accounts for the uncertainty in the atomic data, m m is treated as an unknown parameter Obtain sample from p(m, θ D) via two-step Monte Carlo (MC) samplers or Hamiltonian Monte Carlo (HMC) Conclusions: We are able to incorporate uncertainties in atomic physics calculations into analyses of solar spectroscopic data Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 15 / 27

Prior I: Multimodal Posterior Distributions Bimodal posterior distributions occur Two modes correspond to two emissivity curves Inaccurate relative size of two modes in HMC Reason: Not enough emissivity curves Challenge: Sparse selection of emissivity curves Density 0 10 20 30 40 H Algorithm Density 0 5 10 15 20 H Algorithm 9.20 9.25 9.30 log ne 9.30 9.40 9.50 9.60 log ds Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 16 / 27

Prior I: Multimodal Posterior Distributions A computational issue: Inaccurate relative size of two modes A computational solution: Adding a few synthetic replicate emissivity curves with augmented set M aug M: M aug /M = {w 1 Emis 471 + w 2 Emis 368 }, where (w 1, w 2 ) = (0.75, 0.25), (0.50, 0.50), &(0.25, 0.75) Density 0 10 20 30 40 H Algorithm Density 0 5 10 15 20 H Algorithm 9.20 9.25 9.30 log ne 9.30 9.40 9.50 9.60 log ds Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 17 / 27

Aim of Prior II Come up with a way to efficiently represent the high dimensional joint distribution of the uncertainty of the emissivity curves. Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 18 / 27

Comparison of Prior I and Prior II Prior I (Done) Joint posterior distribution: p(m, θ k D k ) L(m, θ k D k ) p(m) p(θ k ) p(m) = 1 M A computational trick: adding a few synthetic replicate emissivity curves Prior II Joint posterior distribution: p(ɛ(r k ), θ k D k ) L(ɛ(r k ), θ k D k ) p(r k, θ k ) r k is the PCA transformation of emissivity curve, ɛ p(r) is a high dimensional distribution An algorithm: summarizing the distribution with multivariate standard Normal distribution via PCA Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 19 / 27

Prior II: Gaussian approximation via PCA In space J = 16 PCs capture 99% of total variation PCA generated emissivity curve replicate based on the first J PCs ɛ rep = ɛ + J r j β j v j, j=i where ɛ: average of all 1000 emissivity curves r j : random variate generated from the standard Normal distribution β 2 j, v j: eigenvalue and eigenvector of component j in the PCA representation Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 20 / 27

Prior II: Plot of Original Emissivity Curves C 61 59 57 55 C C_ave 1.0 0.0 0.5 1.0 0 50 100 150 var index 0 50 100 150 var index Top panel: Each part corresponds one of the seven lines Light gray area: all 1000 emissivities Dark gray area: middle 68% of emissivities Solid black curve: ɛ Bottom panel: Same as above, but using ɛ ɛ Colored dashed curves: six randomly selected curves Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 21 / 27

Prior II: Plot of PCA Generated Emissivity Curves C C_ave 1.0 0.0 0.5 1.0 C C_ave 1.0 0.0 0.5 1.0 0 50 100 150 var index 0 50 100 150 var index Light & dark gray areas are same as above Top panel: Dashed lines: all 1500 PCA generated emissivities Dotted lines: the middle 68% of PCA emissivities Bottom panel: Colored dashed curves: selection of PCA curves Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 22 / 27

Prior II: Pragmatic VS Fully Bayesian Models Pragmatic Bayesian model Fully Bayesian model p(r, θ D) = p(θ D, r) p(r D) = p(θ D, r) p(r) (16) p(r, θ D) L(r, θ D) p(r) p(θ) K L(r, θ k D k ) p(θ k ) p(r) (17) Two different computing algorithms are used: two-step Gibbs sampler and Hamiltonian Monte Carlo (HMC) k=1 Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 23 / 27

Prior II: Data Simulation, J = 3 and K = 2 Pick the first two plixels: #1 and #2, i.e., here K = 2 Fix θ k = (log n k, log d k ) = (9.4, 9.3) for k = 1, 2 (posterior mean for Pix1 in paper). Emissivity curve = ɛ 471, which is computed from the first J = 3 PCs accounting for 42.23% of the total variance. (To make sure we are simulating and fitting under the same model.) Simulate 200 replicates for each pixel. Here, each of pixels is simulated with same values of parameters, but different replicates have different simulated data. Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 24 / 27

Prior II: Output summary, J = 3 and K = 2 std 9.35 9.40 9.45 9.50 log n1 9.35 9.40 9.45 9.50 log n2 9.05 9.15 9.25 9.35 log ds1 9.05 9.15 9.25 9.35 log ds2 pb.pca 9.35 9.40 9.45 9.50 log n1 9.35 9.40 9.45 9.50 log n2 9.05 9.15 9.25 9.35 log ds1 9.05 9.15 9.25 9.35 log ds2 fb.pca.g 9.35 9.40 9.45 9.50 log n1 9.35 9.40 9.45 9.50 log n2 9.05 9.15 9.25 9.35 log ds1 9.05 9.15 9.25 9.35 log ds2 fb.pca.h 9.35 9.40 9.45 9.50 9.35 9.40 9.45 9.50 9.05 9.15 9.25 9.35 9.05 9.15 9.25 9.35 log n1 log n2 log ds1 log ds2 Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 25 / 27

Prior II: Output summary, J = 3 and K = 2 Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 26 / 27

Prior II: Output summary, J = 3 and K = 2 Pragmatic Bayesian: biased the 68% intervals are significantly under coverage, the 95% intervals are significantly over coverage Fully Bayesian: the 68% intervals are a little bit over coverage smaller bias and RMSE the 95% coverage is better, the smaller coverage length results from the two different algorithms are almost the same and applying HMC is significantly faster, so keep using it Next step, try to increase J and K! Xixi Yu (Imperial College London) Multistage Anslysis in Astrophysics 23 October 2018 27 / 27