Sampler of Interdisciplinary Measurement Error and Complex Data Problems

Similar documents
Results of the OGLE-II and OGLE-III surveys

Igor Soszyński. Warsaw University Astronomical Observatory

Automated Variable Source Classification: Methods and Challenges

arxiv: v3 [stat.me] 15 May 2017

Optical variability of quasars: damped random walk Željko Ivezić, University of Washington with Chelsea MacLeod, U.S.

On the Detection of Heteroscedasticity by Using CUSUM Range Distribution

Curriculum Vitae. Nathan De Lee

arxiv: v1 [astro-ph.im] 16 Jun 2011

arxiv:astro-ph/ v2 7 Apr 1999

A note on parameter estimation for misspecified regression models with heteroskedastic errors

arxiv: v1 [astro-ph.sr] 29 Jan 2009

arxiv: v1 [astro-ph.sr] 19 Dec 2016

Period. End of Story?

2 Princeton University Observatory, Princeton, NJ , USA

arxiv: v3 [astro-ph.sr] 17 Nov 2016

The optical gravitational lensing experiment. Variable stars in globular clusters

arxiv: v1 [astro-ph] 12 Nov 2008

arxiv: v1 [astro-ph] 22 Dec 2008

16th Microlensing Season of the Optical Gravitational Lensing Experiment

Mira variables They have very regular light curves, with amplitude V > 2.5 m, and periods Π > 100 days.

Reddening map of the Large Magellanic Cloud bar region. A. Subramaniam

Hubble Ultra Deep Space View

Baltic Astronomy, vol. 24, , 2015 A STUDY OF DOUBLE- AND MULTI-MODE RR LYRAE VARIABLES. A. V. Khruslov 1,2

Frequency estimation uncertainties of periodic signals

Parametrization and Classification of 20 Billion LSST Objects: Lessons from SDSS

arxiv: v1 [astro-ph.sr] 17 Oct 2012

Non-radial pulsations in RR Lyrae stars from the OGLE photometry

arxiv: v3 [astro-ph.sr] 4 Jul 2014

Search for high-amplitude δ Scuti and RR Lyrae stars in Sloan Digital Sky Survey Stripe 82 using principal component analysis

An end-to-end simulation framework for the Large Synoptic Survey Telescope Andrew Connolly University of Washington

Stochastic Modeling of Astronomical Time Series

Accurate dynamical mass determination of a classical Cepheid in an eclipsing binary system

arxiv: v1 [astro-ph.im] 7 Apr 2016

Standard candles in the Gaia perspective

arxiv: v2 [astro-ph.sr] 31 Dec 2011

Lecture 25: The Cosmic Distance Scale Sections 25-1, 26-4 and Box 26-1

0HA, UK Received December 15, 2009 ABSTRACT

Cepheid Stars as standard candles for distance measurements

A Correntropy based Periodogram for Light Curves & Semi-supervised classification of VVV periodic variables

The Science Cases for CSTAR, AST3, and KDUST

Galaxies. The majority of known galaxies fall into one of three major classes: spirals (78 %), ellipticals (18 %) and irregulars (4 %).

arxiv: v2 [astro-ph.sr] 19 Jan 2016

ASTR STELLAR PULSATION COMPONENT. Peter Wood Research School of Astronomy & Astrophysics

Chapter 30. Galaxies and the Universe. Chapter 30:

Large Synoptic Survey Telescope

arxiv: v1 [astro-ph.sr] 22 Oct 2018

Mario Juric Institute for Advanced Study, Princeton

Mapping the Galactic halo with main-sequence and RR Lyrae stars

Lecture Outlines. Chapter 24. Astronomy Today 8th Edition Chaisson/McMillan Pearson Education, Inc.

Variable and Periodic Signals in Astronomy

CALCULATING DISTANCES. Cepheids and RR Lyrae India Jackson-Henry

arxiv: v1 [astro-ph.sr] 4 Aug 2015

The Milky Way, Hubble Law, the expansion of the Universe and Dark Matter Chapter 14 and 15 The Milky Way Galaxy and the two Magellanic Clouds.

arxiv:astro-ph/ v1 6 Jun 2002

The Infrared Properties and Period-Luminosity Relations of Red Supergiant Stars in the Magellanic Clouds

Recent Researches concerning Semi-Regular Variables

Lab: Distance to the Globular Cluster M15 Containing RR Lyrae Stars

ASTR 1120 General Astronomy: Stars & Galaxies

LSST Science. Željko Ivezić, LSST Project Scientist University of Washington

Stellar Astrophysics: Stellar Pulsation

ASTR 1120 General Astronomy: Stars & Galaxies

Stellar Astrophysics: Pulsating Stars. Stellar Pulsation

JINA Observations, Now and in the Near Future

Chapter 14 The Milky Way Galaxy

A Comparison of Testimation and Schwarz Information Criterion for Heteroscedasticity

Determining dust mass-loss rates

arxiv:astro-ph/ v1 25 Mar 2003

Lecture Outlines. Chapter 23. Astronomy Today 8th Edition Chaisson/McMillan Pearson Education, Inc.

The physical properties of galaxies in Universe

arxiv:astro-ph/ v1 25 Sep 2001

Comparing the Period-Luminosity relationships in variable stars

The OGLE search for microlensing events towards the LMC

arxiv: v1 [astro-ph.sr] 6 Oct 2014

Doing astronomy with SDSS from your armchair

arxiv:astro-ph/ v1 30 Aug 2001

arxiv: v1 [astro-ph.sr] 25 Jun 2014

80 2 Observational Cosmology L and the mean energy

The Sloan Digital Sky Survey

Classifying Galaxy Morphology using Machine Learning

arxiv: v2 [astro-ph.im] 28 Mar 2014

A Bayesian method for the analysis of deterministic and stochastic time series

Robust and accurate inference via a mixture of Gaussian and terrors

Variable and Periodic Signals in Astronomy

Dr Carolyn Devereux - Daphne Jackson Fellow Dr Jim Geach Prof. Martin Hardcastle. Centre for Astrophysics Research University of Hertfordshire, UK

The Milky Way. Mass of the Galaxy, Part 2. Mass of the Galaxy, Part 1. Phys1403 Stars and Galaxies Instructor: Dr. Goderya

The Milky Way Galaxy (ch. 23)

Studying the Milky Way with pulsating stars

Building the cosmic distance scale: from Hipparcos to Gaia

arxiv: v1 [astro-ph.co] 3 Apr 2019

Galaxy classification

A 103 Notes, Week 14, Kaufmann-Comins Chapter 15

24.1 Hubble s Galaxy Classification

arxiv: v1 [astro-ph.im] 9 May 2013

Modern Image Processing Techniques in Astronomical Sky Surveys

The Stellar Low-Mass IMF: SDSS Observations of 15 Million M Dwarfs

Lecture Outlines. Chapter 25. Astronomy Today 7th Edition Chaisson/McMillan Pearson Education, Inc.

The Milky Way Galaxy and Interstellar Medium

Reanalysis of the OGLE-I Observations with the Image Subtraction Method. I. Galactic Bar Fields MM1-A, MM1-B, MM7-A, and MM7-B 1

astro-ph/ May 1995

Age of the Universe Lab Session - Example report

Transcription:

Sampler of Interdisciplinary Measurement Error and Complex Data Problems James Long April 22, 2016 1 / 28

Time Domain Astronomy and Variable Stars An Observation on Heteroskedasticity and Misspecified Models Period Estimation and Classification for M33 Miras 2 / 28

Outline Time Domain Astronomy and Variable Stars An Observation on Heteroskedasticity and Misspecified Models Period Estimation and Classification for M33 Miras 3 / 28

Periodic Variable Stars Periodic variables: Stars that repeat brightness variation over a fixed period. 0 500 1000 1500 2000 15.80 15.70 15.60 Time (Days) Star observed n = 367 times. Data for star is D = {t i, m i, σ i } n i=1. Observe star brightness mi at time t i with uncertainty σ i. 4 / 28

Folded Light Curve of Periodic Variable Folded light curve: Brightness versus time modulo period. 0.0 0.5 1.0 1.5 2.0 2.5 3.0 15.80 15.70 15.60 Phase (Days) 5 / 28

The OGLE III Survey [5] Collected 100,000s periodic variables in Large Magellanic Cloud Periodic variables belong to different classes Class is related to astrophysical reason for variation Two Examples Mira Variable 0 500 1000 1500 2000 2500 18.0 17.0 16.0 15.0 Time (Days) 0 50 100 150 200 250 300 18.0 17.0 16.0 15.0 Phase (Days) RR Lyrae AB Variable 0 500 1000 1500 2000 19.2 19.0 18.8 18.6 18.4 Time (Days) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 19.2 19.0 18.8 18.6 18.4 Phase (Days) 6 / 28

Size of Variable Star Data Sets is Growing Hipparcos (1989 1993): 2712 OGLE (1992 present): 100,000s DES (ongoing): 10 million LSST (starting 2020): 1 billion Data sets of varying quality: 22.0 21.5 21.0 20.5 0 500 1000 1500 Time (Days) 7 / 28

Outline Time Domain Astronomy and Variable Stars An Observation on Heteroskedasticity and Misspecified Models Period Estimation and Classification for M33 Miras 8 / 28

Folded Light Curve using Two Sinusoidal Models 0 500 1000 1500 2000 2500 3000 17.6 17.4 17.2 17.0 Time (Days) 0.1 0.2 0.3 0.4 0.5 17.6 17.4 17.2 17.0 Phase (Days) 9 / 28

Period Estimation for Variable Stars Common Model: where ɛ i N(0, σ 2 i m i = β 0 + K a k sin(t i ωk + φ k ) + ɛ i k=1 ) Zechmeister [6], Schwarzenberg [3] Maximum likelihood estimator: ω = argmin ω min φ,a,β 0 n i=1 ( m i β 0 K k=1 a k sin(ωt i k + φ k ) σ i ) 2 10 / 28

Question Misspecified models are common and can be useful. Heteroskedasticity in responses is common. Typically we weight observations by inverse of variance. Question: Is this weighting helpful when the model is misspecified? 11 / 28

Correct Model: Weighted Fit Magnitude 14.4 14.2 14.0 13.8 13.6 True Light Curve Estimate 0 1 2 3 4 Time 12 / 28

Correct Model: Unweighted Fit Magnitude 14.4 14.2 14.0 13.8 13.6 True Light Curve Estimate 0 1 2 3 4 Time 13 / 28

Summary The fitted curve (orange line) is close to observations with small error (small σ i ). This is good when the actual light curve variation is sinusoidal. Question: What happens for misspecified models (light curves that are not actually sinusoids)? 14 / 28

Misspecified Model Weighted Fit Magnitude 14.4 14.2 14.0 13.8 13.6 True Light Curve Estimate 0 1 2 3 4 Time 15 / 28

Misspecified Model Unweighted Fit Magnitude 14.4 14.2 14.0 13.8 13.6 True Light Curve Estimate 0 1 2 3 4 Time 16 / 28

Application to Variable Star Period Estimation g band light curves of 238 bright sources in Stripe 82 SDSS-III Downsampled all light curves to 10,20,30, and 40 observations Simulates difficult period recovery settings encountered by PanStarrs, DES Compare period estimation using weighted, unweighted estimators 17 / 28

Results Fraction of periods estimated correctly for different models (K = 1, 2, 3) and using weights (Σ 1 ) and unweighted (I ). K = 1 K = 2 K = 3 n Σ 1 I Σ 1 I Σ 1 I 10 0.09 0.16 0.13 0.11 0.03 0.03 20 0.46 0.58 0.63 0.68 0.69 0.77 30 0.64 0.78 0.71 0.82 0.82 0.86 40 0.75 0.79 0.80 0.85 0.87 0.92 Conclusion: Ignoring heteroskedasticity can improve model fits. 18 / 28

Notes linear model case: x i f X, σ i f σ, σ i = y i = f (x i ) + ɛ i where ɛ i N(0, σ 2 i ) β argmin β x i E[(f (x) x T β) 2 ] = E[xx T ] 1 E[xf (x)]. β = (X T Σ 1 X ) 1 X T Σ 1 Y not efficient. Adaptively choose optimal weights: (σ 2 i + ) 1 close connections with Y. Ma [1, 2] many more details: Parameter Estimation for Misspecified Regression Models with Heteroskedastic Errors http://arxiv.org/abs/1509.05810 19 / 28

Outline Time Domain Astronomy and Variable Stars An Observation on Heteroskedasticity and Misspecified Models Period Estimation and Classification for M33 Miras 20 / 28

Collaboration Astronomy Lucas Macri Wenlong Yuan Statistics Shiyuan He Jianhua Huang James Long 21 / 28

Period Luminosity Relation for Miras in the LMC W IV 15 10 5 0 Cepheid Fundamental Mode Cepheid 1st Overtone RR Lyrae A Miras O rich Miras C rich 10 0 10 1 10 2 10 3 period 18.0 17.0 16.0 15.0 0 500 1000 1500 2000 2500 Time (Days) 22 / 28

PL Relation for Miras in M33 Estimating PL Relation requires: Estimating periods and luminosities accurately. Classifying stars. Challenging case: 22.0 21.5 21.0 20.5 0 500 1000 1500 Time (Days) 23 / 28

Sinusoid Fit to LMC Mira 16.0 15.5 15.0 14.5 14.0 0 500 1000 1500 2000 2500 Time (Days) 16.0 15.5 15.0 14.5 14.0 0 100 200 300 Phase (Days) 24 / 28

Fit to M33 Mira 22.0 21.5 21.0 20.5 0 500 1000 1500 Time (Days) 22.0 21.5 21.0 20.5 0 50 100 150 Phase (Days) Improve sinusoidal model to accurately estimate periods with M33. 25 / 28

Gaussian Process Fit to OGLE Mira 16.0 15.5 15.0 14.5 0 500 1000 1500 2000 2500 Time (Days) 26 / 28

Bayes Factors for Separating Different Classes 27 / 28

Bibliography I [1] Yanyuan Ma, Jeng-Min Chiou, and Naisyin Wang. Efficient semiparametric estimator for heteroscedastic partially linear models. Biometrika, 93(1):75 84, 2006. [2] Yanyuan Ma and Liping Zhu. Doubly robust and efficient estimators for heteroscedastic partially linear single-index models allowing high dimensional covariates. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(2):305 322, 2013. [3] A Schwarzenberg-Czerny. Fast and statistically optimal period search in uneven sampled observations. The Astrophysical Journal Letters, 460(2):L107, 1996. [4] Branimir Sesar, Željko Ivezić, Skyler H Grammer, Dylan P Morgan, Andrew C Becker, Mario Jurić, Nathan De Lee, James Annis, Timothy C Beers, Xiaohui Fan, et al. Light curve templates and galactic distribution of rr lyrae stars from sloan digital sky survey stripe 82. The Astrophysical Journal, 708(1):717, 2010. [5] A Udalski, MK Szymanski, I Soszynski, and R Poleski. The optical gravitational lensing experiment. final reductions of the ogle-iii data. Acta Astronomica, 58:69 87, 2008. [6] M Zechmeister and M Kürster. The generalised lomb-scargle periodogram-a new formalism for the floating-mean and keplerian periodograms. Astronomy & Astrophysics, 496(2):577 584, 2009. 28 / 28