Sampler of Interdisciplinary Measurement Error and Complex Data Problems James Long April 22, 2016 1 / 28
Time Domain Astronomy and Variable Stars An Observation on Heteroskedasticity and Misspecified Models Period Estimation and Classification for M33 Miras 2 / 28
Outline Time Domain Astronomy and Variable Stars An Observation on Heteroskedasticity and Misspecified Models Period Estimation and Classification for M33 Miras 3 / 28
Periodic Variable Stars Periodic variables: Stars that repeat brightness variation over a fixed period. 0 500 1000 1500 2000 15.80 15.70 15.60 Time (Days) Star observed n = 367 times. Data for star is D = {t i, m i, σ i } n i=1. Observe star brightness mi at time t i with uncertainty σ i. 4 / 28
Folded Light Curve of Periodic Variable Folded light curve: Brightness versus time modulo period. 0.0 0.5 1.0 1.5 2.0 2.5 3.0 15.80 15.70 15.60 Phase (Days) 5 / 28
The OGLE III Survey [5] Collected 100,000s periodic variables in Large Magellanic Cloud Periodic variables belong to different classes Class is related to astrophysical reason for variation Two Examples Mira Variable 0 500 1000 1500 2000 2500 18.0 17.0 16.0 15.0 Time (Days) 0 50 100 150 200 250 300 18.0 17.0 16.0 15.0 Phase (Days) RR Lyrae AB Variable 0 500 1000 1500 2000 19.2 19.0 18.8 18.6 18.4 Time (Days) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 19.2 19.0 18.8 18.6 18.4 Phase (Days) 6 / 28
Size of Variable Star Data Sets is Growing Hipparcos (1989 1993): 2712 OGLE (1992 present): 100,000s DES (ongoing): 10 million LSST (starting 2020): 1 billion Data sets of varying quality: 22.0 21.5 21.0 20.5 0 500 1000 1500 Time (Days) 7 / 28
Outline Time Domain Astronomy and Variable Stars An Observation on Heteroskedasticity and Misspecified Models Period Estimation and Classification for M33 Miras 8 / 28
Folded Light Curve using Two Sinusoidal Models 0 500 1000 1500 2000 2500 3000 17.6 17.4 17.2 17.0 Time (Days) 0.1 0.2 0.3 0.4 0.5 17.6 17.4 17.2 17.0 Phase (Days) 9 / 28
Period Estimation for Variable Stars Common Model: where ɛ i N(0, σ 2 i m i = β 0 + K a k sin(t i ωk + φ k ) + ɛ i k=1 ) Zechmeister [6], Schwarzenberg [3] Maximum likelihood estimator: ω = argmin ω min φ,a,β 0 n i=1 ( m i β 0 K k=1 a k sin(ωt i k + φ k ) σ i ) 2 10 / 28
Question Misspecified models are common and can be useful. Heteroskedasticity in responses is common. Typically we weight observations by inverse of variance. Question: Is this weighting helpful when the model is misspecified? 11 / 28
Correct Model: Weighted Fit Magnitude 14.4 14.2 14.0 13.8 13.6 True Light Curve Estimate 0 1 2 3 4 Time 12 / 28
Correct Model: Unweighted Fit Magnitude 14.4 14.2 14.0 13.8 13.6 True Light Curve Estimate 0 1 2 3 4 Time 13 / 28
Summary The fitted curve (orange line) is close to observations with small error (small σ i ). This is good when the actual light curve variation is sinusoidal. Question: What happens for misspecified models (light curves that are not actually sinusoids)? 14 / 28
Misspecified Model Weighted Fit Magnitude 14.4 14.2 14.0 13.8 13.6 True Light Curve Estimate 0 1 2 3 4 Time 15 / 28
Misspecified Model Unweighted Fit Magnitude 14.4 14.2 14.0 13.8 13.6 True Light Curve Estimate 0 1 2 3 4 Time 16 / 28
Application to Variable Star Period Estimation g band light curves of 238 bright sources in Stripe 82 SDSS-III Downsampled all light curves to 10,20,30, and 40 observations Simulates difficult period recovery settings encountered by PanStarrs, DES Compare period estimation using weighted, unweighted estimators 17 / 28
Results Fraction of periods estimated correctly for different models (K = 1, 2, 3) and using weights (Σ 1 ) and unweighted (I ). K = 1 K = 2 K = 3 n Σ 1 I Σ 1 I Σ 1 I 10 0.09 0.16 0.13 0.11 0.03 0.03 20 0.46 0.58 0.63 0.68 0.69 0.77 30 0.64 0.78 0.71 0.82 0.82 0.86 40 0.75 0.79 0.80 0.85 0.87 0.92 Conclusion: Ignoring heteroskedasticity can improve model fits. 18 / 28
Notes linear model case: x i f X, σ i f σ, σ i = y i = f (x i ) + ɛ i where ɛ i N(0, σ 2 i ) β argmin β x i E[(f (x) x T β) 2 ] = E[xx T ] 1 E[xf (x)]. β = (X T Σ 1 X ) 1 X T Σ 1 Y not efficient. Adaptively choose optimal weights: (σ 2 i + ) 1 close connections with Y. Ma [1, 2] many more details: Parameter Estimation for Misspecified Regression Models with Heteroskedastic Errors http://arxiv.org/abs/1509.05810 19 / 28
Outline Time Domain Astronomy and Variable Stars An Observation on Heteroskedasticity and Misspecified Models Period Estimation and Classification for M33 Miras 20 / 28
Collaboration Astronomy Lucas Macri Wenlong Yuan Statistics Shiyuan He Jianhua Huang James Long 21 / 28
Period Luminosity Relation for Miras in the LMC W IV 15 10 5 0 Cepheid Fundamental Mode Cepheid 1st Overtone RR Lyrae A Miras O rich Miras C rich 10 0 10 1 10 2 10 3 period 18.0 17.0 16.0 15.0 0 500 1000 1500 2000 2500 Time (Days) 22 / 28
PL Relation for Miras in M33 Estimating PL Relation requires: Estimating periods and luminosities accurately. Classifying stars. Challenging case: 22.0 21.5 21.0 20.5 0 500 1000 1500 Time (Days) 23 / 28
Sinusoid Fit to LMC Mira 16.0 15.5 15.0 14.5 14.0 0 500 1000 1500 2000 2500 Time (Days) 16.0 15.5 15.0 14.5 14.0 0 100 200 300 Phase (Days) 24 / 28
Fit to M33 Mira 22.0 21.5 21.0 20.5 0 500 1000 1500 Time (Days) 22.0 21.5 21.0 20.5 0 50 100 150 Phase (Days) Improve sinusoidal model to accurately estimate periods with M33. 25 / 28
Gaussian Process Fit to OGLE Mira 16.0 15.5 15.0 14.5 0 500 1000 1500 2000 2500 Time (Days) 26 / 28
Bayes Factors for Separating Different Classes 27 / 28
Bibliography I [1] Yanyuan Ma, Jeng-Min Chiou, and Naisyin Wang. Efficient semiparametric estimator for heteroscedastic partially linear models. Biometrika, 93(1):75 84, 2006. [2] Yanyuan Ma and Liping Zhu. Doubly robust and efficient estimators for heteroscedastic partially linear single-index models allowing high dimensional covariates. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(2):305 322, 2013. [3] A Schwarzenberg-Czerny. Fast and statistically optimal period search in uneven sampled observations. The Astrophysical Journal Letters, 460(2):L107, 1996. [4] Branimir Sesar, Željko Ivezić, Skyler H Grammer, Dylan P Morgan, Andrew C Becker, Mario Jurić, Nathan De Lee, James Annis, Timothy C Beers, Xiaohui Fan, et al. Light curve templates and galactic distribution of rr lyrae stars from sloan digital sky survey stripe 82. The Astrophysical Journal, 708(1):717, 2010. [5] A Udalski, MK Szymanski, I Soszynski, and R Poleski. The optical gravitational lensing experiment. final reductions of the ogle-iii data. Acta Astronomica, 58:69 87, 2008. [6] M Zechmeister and M Kürster. The generalised lomb-scargle periodogram-a new formalism for the floating-mean and keplerian periodograms. Astronomy & Astrophysics, 496(2):577 584, 2009. 28 / 28