Stochastic Modeling of Astronomical Time Series

Similar documents
Infow and Outfow in the Broad Line Region of AGN

Time Delay Estimation: Microlensing

Optical variability of quasars: damped random walk Željko Ivezić, University of Washington with Chelsea MacLeod, U.S.

Optical short-time variability properties of AGN

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Image Processing in Astronomy: Current Practice & Challenges Going Forward

Additional Keplerian Signals in the HARPS data for Gliese 667C from a Bayesian re-analysis

Robust and accurate inference via a mixture of Gaussian and terrors

(High Energy) Astrophysics with Novel Observables

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

On characterizing the variability properties of X-ray light curves from active galaxies

Long term multi-wavelength variability studies of the blazar PKS

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Outline Challenges of Massive Data Combining approaches Application: Event Detection for Astronomical Data Conclusion. Abstract

Three data analysis problems

Introduction to Machine Learning CMU-10701

Astronomical image reduction using the Tractor

Large Scale Bayesian Inference

Calibrating Environmental Engineering Models and Uncertainty Analysis

A Bayesian method for the analysis of deterministic and stochastic time series

Sampler of Interdisciplinary Measurement Error and Complex Data Problems

Fourier analysis of AGN light curves: practicalities, problems and solutions. Iossif Papadakis

Bayesian Dynamic Linear Modelling for. Complex Computer Models

Variable and Periodic Signals in Astronomy

Week 5 Quantitative Analysis of Financial Markets Characterizing Cycles

Data Analysis Methods

Variable and Periodic Signals in Astronomy

CHARACTERIZATION OF AGN VARIABILITY IN THE OPTICAL AND NIR REGIMES

Markov chain Monte Carlo methods in atmospheric remote sensing

Accounting for Calibration Uncertainty in Spectral Analysis. David A. van Dyk

17 : Markov Chain Monte Carlo

Bayesian Methods for Machine Learning

Time Series: Theory and Methods

Two Statistical Problems in X-ray Astronomy

Exploring the Origin of the BH Mass Scaling Relations

43 and 86 GHz VLBI Polarimetry of 3C Adrienne Hunacek, MIT Mentor Jody Attridge MIT Haystack Observatory August 12 th, 2004

JINA Observations, Now and in the Near Future

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Markov Chain Monte Carlo

Massive Event Detection. Abstract

STA 4273H: Statistical Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Embedding Computer Models for Stellar Evolution into a Coherent Statistical Analysis

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Overlapping Astronomical Sources: Utilizing Spectral Information

arxiv:astro-ph/ v1 14 Sep 2005

TAKEHOME FINAL EXAM e iω e 2iω e iω e 2iω

The Large Synoptic Survey Telescope

Introduction to Probabilistic Graphical Models: Exercises

Spectral variability of AGN Dragana Ilić

Default Priors and Effcient Posterior Computation in Bayesian

How occultations improve asteroid shape models

SDSS Data Management and Photometric Quality Assessment

Hierarchical Bayesian Modeling

A short introduction to INLA and R-INLA

16 : Approximate Inference: Markov Chain Monte Carlo

Lecture 6: Bayesian Inference in SDE Models

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Fundamental Issues in Bayesian Functional Data Analysis. Dennis D. Cox Rice University

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M.

Sequential Monte Carlo Methods for Bayesian Computation

If we want to analyze experimental or simulated data we might encounter the following tasks:

Accounting for Calibration Uncertainty in Spectral Analysis. David A. van Dyk

Nonparametric Bayesian Methods (Gaussian Processes)

A Review of AGN Variability Studies using Survey Data: Prospects for LSST

Exploiting Sparse Non-Linear Structure in Astronomical Data

IV. Covariance Analysis

Multimodal Nested Sampling

Riemann Manifold Methods in Bayesian Statistics

Detecting the cold neutral gas in young radio galaxies. James Allison Triggering Mechanisms for AGN

Conditional Least Squares and Copulae in Claims Reserving for a Single Line of Business

Fitting Narrow Emission Lines in X-ray Spectra

System identification and control with (deep) Gaussian processes. Andreas Damianou

Sparse Stochastic Inference for Latent Dirichlet Allocation

Lecture 9. Time series prediction

Miscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4)

Data-Intensive Statistical Challenges in Astrophysics

Quasars in the epoch of reioniza1on

The Impact of Positional Uncertainty on Gamma-Ray Burst Environment Studies

HIERARCHICAL MODELING OF THE RESOLVE SURVEY

Expectation propagation for signal detection in flat-fading channels

Distributed Bayesian Learning with Stochastic Natural-gradient EP and the Posterior Server

Computational statistics

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

A variational radial basis function approximation for diffusion processes

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time

Bayesian Calibration of Simulators with Structured Discretization Uncertainty

arxiv: v1 [astro-ph.co] 6 Mar 2012

Strong gravitational lenses in the 2020s

Sensor Fusion: Particle Filter

Session 5B: A worked example EGARCH model

Lecture 7 and 8: Markov Chain Monte Carlo

Microlensing Constraints on Quasar X-ray Emission Regions

Reverberation Mapping in the Era of MOS and Time-Domain Surveys: from SDSS to MSE

Recurrent Latent Variable Networks for Session-Based Recommendation

An introduction to Markov Chain Monte Carlo techniques

EL2520 Control Theory and Practice

Time Series Analysis. Eric Feigelson Penn State University. Summer School in Statistics for Astronomers XIII June 2017

Tutorial on Approximate Bayesian Computation

Transcription:

Credit: ESO/Kornmesser Stochastic Modeling of Astronomical Time Series Brandon C. Kelly (UCSB, CGE Fellow, bckelly@physics.ucsb.edu) Aneta Siemiginowska (CfA), Malgosia Sobolewska (CfA), Andy Becker (UW), Tommaso Treu (UCSB), Matt Malkan (UCLA), Anna Pancoast (UCSB), Jong-Hak Woo (Seoul), Jill Bechtold (Arizona)

Aperiodic and Quasi-Periodic Lightcurves (Time Series of Brightness) Variable Stars Protostars Geha+(2003) Flaherty+(2013) Mushotzky+(2011) Quasars

Current and Future Data Sets

Current and Future Data Sets SDSS Stripe 82 (~1998-2008) ugriz (5-d lightcurves), down to r ~ 20, ~60 epochs, ~10,000 quasars

Current and Future Data Sets SDSS Stripe 82 (~1998-2008) ugriz (5-d lightcurves), down to r ~ 20, ~60 epochs, ~10,000 quasars Pan-STARRS Medium-Deep Survey (2010-2013) griz, r ~ 23, 300-400 epochs, ~ 7,000 quasars

Current and Future Data Sets SDSS Stripe 82 (~1998-2008) ugriz (5-d lightcurves), down to r ~ 20, ~60 epochs, ~10,000 quasars Pan-STARRS Medium-Deep Survey (2010-2013) griz, r ~ 23, 300-400 epochs, ~ 7,000 quasars DES Supernovae Survey (2012-2017) griz, r ~ 25, ~ 100 epochs, ~ 15,000 quasars Overlaps with Stripe 82 and 2 MDS fields

Current and Future Data Sets SDSS Stripe 82 (~1998-2008) ugriz (5-d lightcurves), down to r ~ 20, ~60 epochs, ~10,000 quasars Pan-STARRS Medium-Deep Survey (2010-2013) griz, r ~ 23, 300-400 epochs, ~ 7,000 quasars DES Supernovae Survey (2012-2017) griz, r ~ 25, ~ 100 epochs, ~ 15,000 quasars Overlaps with Stripe 82 and 2 MDS fields Large Synoptic Survey Telescope (LSST) (2021-2031?) ugrizy, r ~ 24.5, ~ 50-200 epochs (more in deep drilling fields ), millions of quasars

The Data Analysis Challenge: Aperiodic Lightcurves Flux Flux Time Time

The Data Analysis Challenge: Aperiodic Lightcurves Mrk 766 Vaughan & Fabian (2003) What variability features can we measure for quasar lightcurves?

Quantifying Variability with the Power Spectral Density (PSD) Variance / Frequency Frequency [arbitrary] Time [arbitrary]

Quantifying Variability with the Power Spectral Density (PSD) Variance / Frequency Frequency [arbitrary] Time [arbitrary]

Quantifying Variability with the Power Spectral Density (PSD) Variance / Frequency Frequency [arbitrary] Time [arbitrary]

Quantifying Variability with the Power Spectral Density (PSD) Variance / Frequency Frequency [arbitrary] Time [arbitrary]

Disadvantages of Traditional Non-parameteric Tools for Quantifying Aperiodic Variability Lomb-Scargle Periodogram Measurements True Mock Data True Empirical Structure Function

Disadvantages of Traditional Non-parameteric Tools for Quantifying Aperiodic Variability Lomb-Scargle Periodogram Measurements True Mock Data True Empirical Structure Function

Tools for Characterizing Aperiodic (Quasar) Variability: What should they do?

Tools for Characterizing Aperiodic (Quasar) Variability: What should they do? Handle irregular/arbitrary sampling patterns and measurement errors

Tools for Characterizing Aperiodic (Quasar) Variability: What should they do? Handle irregular/arbitrary sampling patterns and measurement errors Produce interpretable results: Connection to physical models Connection to features in the power spectrum

Tools for Characterizing Aperiodic (Quasar) Variability: What should they do? Handle irregular/arbitrary sampling patterns and measurement errors Produce interpretable results: Connection to physical models Connection to features in the power spectrum Fast & scalable to massive time domain surveys By the end of LSST we will have billions of multiwavelength lightcurves with ~ 50-1000+ epochs

Tools for Characterizing Aperiodic (Quasar) Variability: What should they do? Handle irregular/arbitrary sampling patterns and measurement errors Produce interpretable results: Connection to physical models Connection to features in the power spectrum Fast & scalable to massive time domain surveys By the end of LSST we will have billions of multiwavelength lightcurves with ~ 50-1000+ epochs Handle multiwavelength/multivariate time series Account for correlations/time lags among lightcurves in different bands

Two approaches to (stochastic) modeling of real lightcurves: Frequency Domain and Time Domain

Monte-Carlo Methods (Done+1992,Uttley +2002,Emmanaloupolous 2010,2013) Extremely flexible, limited only by ability to do simulation Can be computationally expensive Reliance on χ 2 may not provide optimal use of information in lightcurve

Monte-Carlo Methods (Done+1992,Uttley +2002,Emmanaloupolous 2010,2013) Extremely flexible, limited only by ability to do simulation Can be computationally expensive Reliance on χ 2 may not provide optimal use of information in lightcurve

Monte-Carlo Methods (Done+1992,Uttley +2002,Emmanaloupolous 2010,2013) Extremely flexible, limited only by ability to do simulation Can be computationally expensive Reliance on χ 2 may not provide optimal use of information in lightcurve

Monte-Carlo Methods (Done+1992,Uttley +2002,Emmanaloupolous 2010,2013) Extremely flexible, limited only by ability to do simulation Can be computationally expensive Reliance on χ 2 may not provide optimal use of information in lightcurve

Gaussian Processes (Rybicki & Press 1992, Kelly+ 2009,2011, Miller+2010) loglik = log 1 2 (y µ)t 1 (y µ) Likelihood-based approach, enables Bayesian inference Statistically powerful, but limited by Gaussian assumption In general, computationally expensive (O(n 3 ))

Gaussian Processes (Rybicki & Press 1992, Kelly+ 2009,2011, Miller+2010) = loglik = log 0 @ n n 1 1 2 (y µ)t 1 (y µ) ij =Cov(y(t i ),y(t j )) A Z 1 = PSD(f)e 2 if t i 1 t j df Likelihood-based approach, enables Bayesian inference Statistically powerful, but limited by Gaussian assumption In general, computationally expensive (O(n 3 ))

Simple and fast tool: First order continuous autoregressive process (CAR(1), Kelly+2009) Lightcurve White Noise Power Spectrum dl(t) = dt Characteristic Time Scale (L(t) µ)+ dw (t) LC Mean 1/τ Frequency σ 2 Solution provides likelihood function, enables maximum-likelihood or Bayesian inference Fitting is fast! Only O(n) operations to evaluate likelihood function (e.g., Kelly+2009, Kozlowski +2010) or do interpolation

Fitting the CAR(1) model: Illustration τ=5 days σ=1 τ=1 day σ=1

Fitting the CAR(1) model: Illustration τ=5 days σ=1 τ=1 day σ=1

CAR(1) does well on optical lightcurves with typical sampling of current surveys MacLeod+(2010), ~10,000 quasars from stripe 82 Kozlowski+(2010), Ogle-III Kelly+(2009), AGN Watch

Trends involving the CAR(1) process parameters Kelly+(in prep)

Using the CAR(1) model to find quasars MacLeod+(2011) Quasars Based on Stripe82 variable point sources Stars Quasars Butler & Bloom (2011) Stars Works because quasars have more correlated variability on longer time scales compared to stars

Current Work: More Flexible Stochastic Models dl p (t) dt + p dl p 1 (t) dt p 1 +...+ 1 L(t) = q d q (t) dt q + q 1 d q 1 (t) dt q 1 +...+ (t) Continuous-time autoregressive moving average models (CARMA(p,q)) provide flexible modeling of variability Power spectrum is a rational function P (!) = 2 P q k=1 k(i!) k 2 P p j=1 j(i!) j 2

Example: Quasar vs Variable Stars

Example: Quasar vs Variable Stars Quasar Kelly+(in prep) LPV, AGB LPV, RGB 68% Probability Intervals

Calculation of the Likelihood function CARMA models have a state space representation: Measured Lightcurve Moving Average Coefficients Measurement Error y i = b T X i + i, i N(0,V i ) X i = A i X i 1 + u i, u i N(0, i ) Vector of lightcurve derivatives Depends autoregressive coefficients Innovation (source of stochasticity) Likelihood calculated from Kalman Recursions in O(n) operations: p(y 1,...,y n,, 2 )=p(y 1,, 2 ) ny i=2 p(y i y i 1,...,y 1,,, 2 )

Computational Techniques Use Robust Adaptive Metropolis Algorithm (Vihola 2012) Likelihood space often multimodal, so also do parallel tempering Can be slow (~minutes for ~ 100,000 iterations for ~ 100 epochs) due to complicated posterior space Exploring alternative parameterizations for improving efficiency Sampling methods/global optimization algorithms can be efficiently parallelized, exploit highperformance computing, GPUs?

Credit: ESO/Kornmesser Main Takeaway Point: Stochastic modeling provides a useful and powerful framework to quantify quasar variability that can be applied to lightcurves of arbitrary sampling and with measurement error.

Time Domain Stochastic Modeling: Outstanding issues and directions for future work

Time Domain Stochastic Modeling: Outstanding issues and directions for future work Multivariate lightcurves: Explicit modeling of time lags and correlation structure between different wavelengths Vector-valued CARMA(p,q) processes may provide general framework

Time Domain Stochastic Modeling: Outstanding issues and directions for future work Multivariate lightcurves: Explicit modeling of time lags and correlation structure between different wavelengths Vector-valued CARMA(p,q) processes may provide general framework Moving beyond a single stationary Gaussian process: Direct modeling of stochastic process + flares, other state changes (Sobolewska+, in prep) Using alternatives to Gaussian noise (Emmanaloupolous+2013) Non-stationary and non-linear models

Time Domain Stochastic Modeling: Outstanding issues and directions for future work Multivariate lightcurves: Explicit modeling of time lags and correlation structure between different wavelengths Vector-valued CARMA(p,q) processes may provide general framework Moving beyond a single stationary Gaussian process: Direct modeling of stochastic process + flares, other state changes (Sobolewska+, in prep) Using alternatives to Gaussian noise (Emmanaloupolous+2013) Non-stationary and non-linear models Uttley+(2005)

Time Domain Stochastic Modeling: Outstanding issues and directions for future work Multivariate lightcurves: Explicit modeling of time lags and correlation structure between different wavelengths Vector-valued CARMA(p,q) processes may provide general framework Moving beyond a single stationary Gaussian process: Direct modeling of stochastic process + flares, other state changes (Sobolewska+, in prep) Using alternatives to Gaussian noise (Emmanaloupolous+2013) Non-stationary and non-linear models

Time Domain Stochastic Modeling: Outstanding issues and directions for future work Multivariate lightcurves: Explicit modeling of time lags and correlation structure between different wavelengths Vector-valued CARMA(p,q) processes may provide general framework Moving beyond a single stationary Gaussian process: Direct modeling of stochastic process + flares, other state changes (Sobolewska+, in prep) Using alternatives to Gaussian noise (Emmanaloupolous+2013) Non-stationary and non-linear models Building astrophysically-motivated stochastic models Stochastic partial differential calculations + accretion flow models?