Robust and accurate inference via a mixture of Gaussian and terrors

Similar documents
Time Delay Estimation: Microlensing

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007

Bayesian Regression Linear and Logistic Regression

Robust Bayesian Simple Linear Regression

Approximate Bayesian Computation for Astrostatistics

Introduction to Probabilistic Machine Learning

Celeste: Variational inference for a generative model of astronomical images

Bayesian linear regression

Default Priors and Effcient Posterior Computation in Bayesian

Hierarchical Bayesian Modeling

Gravitational Wave Astronomy s Next Frontier in Computation

Fitting Narrow Emission Lines in X-ray Spectra

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

A Review of Pseudo-Marginal Markov Chain Monte Carlo

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Hierarchical Bayesian Modeling

Linear Models A linear model is defined by the expression

Bayesian Methods for Machine Learning

Hierarchical Modeling for Univariate Spatial Data

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Pulsar Timing for GWB: Sta5s5cal Methods

Paul Demorest (NRAO) for NANOGrav collaboration, CHIME pulsar team John Galt Symposium, DRAO, Sept 23, 2014

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Hierarchical Bayesian Modeling of Planet Populations

Wrapped Gaussian processes: a short review and some new results

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

A note on multiple imputation for general purpose estimation

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

GAUSSIAN PROCESS REGRESSION

Nearest Neighbor Gaussian Processes for Large Spatial Data

Delphine Perrodin INAF-Osservatorio Astronomico di Cagliari (Italy) IPTA Student Week 2017, Sèvres, France

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Stochastic Modeling of Astronomical Time Series

Lecture 4: Probabilistic Learning

Approximate Bayesian Computation and Particle Filters

Mixtures of Gaussians. Sargur Srihari

Fitting Narrow Emission Lines in X-ray Spectra

Hierarchical Modelling for Univariate Spatial Data

Detecting Gravitational Waves with a pulsar timing array

Detection ASTR ASTR509 Jasper Wall Fall term. William Sealey Gosset

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Stat 451 Lecture Notes Monte Carlo Integration

Probabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model

Nonparametric Bayes tensor factorizations for big data

The Impact of Positional Uncertainty on Gamma-Ray Burst Environment Studies

Bayesian non-parametric model to longitudinally predict churn

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Lecture 16: Mixtures of Generalized Linear Models

Overlapping Astronomical Sources: Utilizing Spectral Information

Non-gaussian spatiotemporal modeling

Celeste: Scalable variational inference for a generative model of astronomical images

A Repelling-Attracting Metropolis Algorithm for Multimodality

Computational statistics

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Outline lecture 2 2(30)

STA 216, GLM, Lecture 16. October 29, 2007

Bayesian Models in Machine Learning

Three data analysis problems

Large Scale Bayesian Inference

Multistage Anslysis on Solar Spectral Analyses with Uncertainties in Atomic Physical Models

13: Variational inference II

Likelihood-free MCMC

The search for continuous gravitational waves: analyses from LIGO s second science run

ECE521 W17 Tutorial 6. Min Bai and Yuhuai (Tony) Wu

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

GARCH Models Estimation and Inference

The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference

COMP90051 Statistical Machine Learning

The Expectation-Maximization Algorithm

A Derivation of the EM Updates for Finding the Maximum Likelihood Parameter Estimates of the Student s t Distribution

A523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

High (Angular) Resolution Astronomy

PMR Learning as Inference

Partial factor modeling: predictor-dependent shrinkage for linear regression

Heriot-Watt University

Hierarchical Modeling of Astronomical Images and Uncertainty in Truncated Data Sets. Brandon Kelly Harvard Smithsonian Center for Astrophysics

Markov Chain Monte Carlo methods

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Pulsar timing and the IISM: dispersion, scattering,

Recent Advances in Bayesian Inference Techniques

Multiview Geometry and Bundle Adjustment. CSE P576 David M. Rosen

17 : Markov Chain Monte Carlo

Bayesian Inference for Normal Mean

Make Your Own Radio Image Large Public Venue Edition

Dark Matter ASTR 2120 Sarazin. Bullet Cluster of Galaxies - Dark Matter Lab

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Astrophysical constraints on massive black hole binary evolution from Pulsar Timing Arrays

Using Model Selection and Prior Specification to Improve Regime-switching Asset Simulations

Foundations of Statistical Inference

Outline Challenges of Massive Data Combining approaches Application: Event Detection for Astronomical Data Conclusion. Abstract

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Towards inference for skewed alpha stable Levy processes

A Bayesian method for the analysis of deterministic and stochastic time series

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Divergence Based priors for the problem of hypothesis testing

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Transcription:

Robust and accurate inference via a mixture of Gaussian and terrors Hyungsuk Tak ICTS/SAMSI Time Series Analysis for Synoptic Surveys and Gravitational Wave Astronomy 22 March 2017 Joint work with Justin Ellis (Caltech, JPL) 1 / 20

Motivation Gaussian error, ɛ N 1 (0, V ) - Pros : (Thin tails) Efficient inference when outliers. - Cons: Can bias inference when outliers. Student s t ν error, ɛ V 0.5 t ν - Pros: (Heavy tails) Robust to outliers. - Cons: Can be less efficient due to unnecessarily heavy tails for most of the normally observed data. Why not use both? A mixture of Gaussian and Student s t ν errors, ɛ N 1 (0, V ) with probability 1 θ, V 0.5 t ν with probability θ, enables a robust & accurate estimation. 2 / 20

The mixture error A p-dimensional mixture error with known (or very accurately estimated) variance component V is ɛ z, α N p (0, α z V ), z, a latent outlier indicator. z θ Bernoulli(θ ), θ Uniform(0, 1), α Inv.Gamma(ν/2, ν/2). θ, probability of datum being an outlier. α, an auxiliary variable to express t ν as a scale mixture of Gaussian. E.g., if x α N(0, α) & α Inv.Gamma(ν/2, ν/2), then x t ν. (West, 1987; Peel and McLachlan, 2000; Gelman et al., 2013) 3 / 20

Relationship with other errors ɛ z, α N p (0, α z V ), z θ Bernoulli(θ ), θ Uniform(0, 1), α inverse-gamma(ν/2, ν/2). This proposed mixture error is marginally equivalent to a Gaussian error if z = 0 for all. a t ν error if z = 1 for all. a mixture of Gaussian & t ν error. a mixture of two Gaussians if α is fixed at a constant (e.g., MLE) (Aitkin and Wilson, 1980; Hogg et al., 2010; Vallisneri and van Haasteren, 2017). 4 / 20

Converting Gaussian error to mixture error Conversion: We simply multiply α z additional parameters: to V with prior distributions on the From ɛ N p (0, V ) to ɛ z, α N p (0, α z V ), z θ Bernoulli(θ ), θ Uniform(0, 1), α Inv.Gamma(ν/2, ν/2). Implementation: We can use any sampler derived from a Gaussian error model (V α z V ), additionally updating z, θ, and α. 5 / 20

Example 1: Unknown location Three data sets: i.i.d. Original Data: y N(0, 1) for = 1, 2,..., 20. Data with an outlier: The same data except y 20 = 10 or y 20 = 10. Suppose the mean (µ) of the generative Gaussian distribution is unknown. Three error models with an improper flat prior on µ: Gaussian error: y µ = µ + ɛ, ɛ N(0, 1) t 4 error (Chp 17, Gelman et al., 2013): Mixture error: y µ = µ + ɛ, ɛ t 4 y µ, z, α = µ + ɛ, ɛ N(0, α z ), z Bernoulli(0.1), α Inv.Gamma(2, 2). 6 / 20

Example 1: Unknown location (cont.) Marginal posterior distribution of µ based on a million posterior samples. Without outlier, the dotted blue curve (mixture) passes in-between the dashed (Gaussian) and solid red (t 4 ) curves, i.e., a mixture effect. With outlier, the mixture error robustly maintains the mixture effect, enabling a robust and more accurate inference. CPU time (seconds): 11 for t 4 and 27 for mixture. 7 / 20

Example 3: Pulsar timing data Pulse timing array for detecting gravitational waves Credit: John Rowe, Swinbourne Interacting black holes in merging galaxies generates low frequency gravitational waves. As these waves propagate through space, they cause coordinated changes in the arrival times of radio signals from pulsars, the universe s most stable natural clocks. These telltale variations can be detected by powerful radio telescopes, NRAO Outreach 8 / 20

Example 3: Pulsar timing data (cont.) TOA τ TM = ɛ WN + δt TM + τ EC + τ DM + τ RN + τ GW 9 / 20

Example 3: Pulsar timing data (cont.) Statistical modeling: With φ = (A, γ) - ε WN Normal n (0, N) - δt TM + τ EC + τ DM Probability distribution: + τ RN Let δt be the observed residuals. + τ GW φ Normal n (0, T B(φ)T ) δt φ Normal n (0, V + T B(φ)T ) - Likelihood function of φ: ( ) L(φ) exp 0.5 δt (V + T B(φ)T ) 1 δt V + T B(φ)T 0.5 - Prior dist. (f ) of φ: log 10 (A) Unif( 18, 10) and γ Unif(0, 7). - Posterior dist. (π) of φ: π(φ δt) L(φ) f (φ) 10 / 20

Example 3: Pulsar timing data (cont.) Original model: δt φ Normal n (0, V + T B(φ)T ) (Ellis, 2016) Equivalent hierarchical model: δt b Normal n (T b, V ) b φ Normal k (0, B(φ)) Converting the hierarchical model to outlier model: δt b Normal n (T b, α z V ) b φ Normal k (0, B(φ)) z θ Bernoulli(θ ), θ Uniform(0, 1), α Inv.Gamma(2, 2). 11 / 20

Example 3: Pulsar timing data (cont.) A simulation study (done by Justin Ellis) Yellow crosses for synthetic outliers and red dots for the data that our model considers as outliers. 12 / 20

Example 3: Pulsar timing data (cont.) Marginal posterior distributions of log 10 (A) and γ with their scatter plot. - Left three panels: The original and hierarchical Gaussian error models. - Right three panels: Red from t 4 errors, green from mixture errors, and Blue lines for true values. 13 / 20

Example 3: Pulsar timing data (cont.) Orange curve for the fit with Gaussian errors, and green curve for the fit with mixture errors. 14 / 20

Conclusion A mixture error model can result in a robust and more accurate parameter estimation in the presence of outliers than a t 4 error model. It is simple and always possible to convert a Gaussian error to a mixture error by multiplying α z to the (known) variance component V. Additional computational cost is not expensive. Detecting outliers is another venue. 15 / 20

References 1. Aitkin, M. and Wilson, G. T. (1980) Mixture Models, Outliers, and the EM Algorithm Technometrics, 22, 3, 325 331. 2. Geha et al. (2003) Variability-Selected Quasars in MACHO Proect Magellanic Cloud Fields The Astrophysical Journal, 125, 1, 1. 3. Gelman et al. (2013). Bayesian Data Analysis. CRC press. 4. Kelly et al. (2009) Are the Variations in Quasar Optical Flux Driven by Thermal Fluctuations? The Astrophysical Journal, 698, 1, 895. 5. Peel et al. (2000) Robust Mixture Modelling using the t Distribution Statistics and Computing, 10, 4, 339 348. 6. Hogg et al. (2010) Data Analysis Recipes: Fitting a Model to Data arxiv:1008.4686. 7. Vallisneri, M. and van Haasteren, R. (2017) Taming Outliers in Pulsar-Timing Datasets with Hierarchical Likelihoods and Hamiltonian Sampling Monthly Notices of the Royal Astronomical Society, 466, 4954. 8. West, M. (1987) On Scale Mixtures of Normal Distributions Biometrika, 74, 3, 646 648. 9. Ellis, J. (2016) Pulsar Timing Data Analysis, SAMSI ASTRO WG3 16 / 20