AN ASTRONOMER S INTRODUCTION TO GAUSSIAN PROCESSES. Dan Foreman-Mackey // github.com/dfm // dfm.io

Size: px

Start display at page:

Download "AN ASTRONOMER S INTRODUCTION TO GAUSSIAN PROCESSES. Dan Foreman-Mackey // github.com/dfm // dfm.io"

Raymond Hawkins
6 years ago
Views:

1 AN ASTRONOMER S INTRODUCTION TO GAUSSIAN PROCESSES Dan Foreman-Mackey CCPP@NYU // github.com/dfm // dfm.io

2 cbnd Flickr user lizphung

3 github.com/dfm/gp

4 gaussianprocess.org/gpml Rasmussen & Williams

5 I write code for good & astrophysics.

6 I (probably) do data science.

7 not data science. Photo credit James Silvester silvesterphoto.tumblr.com

8 data science. cb Flickr user Marcin Wichary

9 Data Science PHYSICS DATA p(data physics)

10 I work with Kepler data.

11 I get really passionate about NOISE

12 relative flux time [KBJD] KIC 33000

13 Kepler 3

14 Kepler 3

15 relative flux time [KBJD] Kepler 3

16 HACKS * ALL HACKS and I m righteous!

17 Why not model all the things? with, for example, a Gaussian Process

18 1 The power of correlated noise.

19 y = mx+ b

20 The true covariance of the observations.

21 Let s assume that the noise is independent NX apple [yn f (x n )] log p(y x,, ) = 1 n=1 n + log n Gaussian with known variance

22 Or equivalently log p(y x,, ) = 1 rt C 1 r 1 log det C N log

23 Or equivalently data covariance log p(y x,, ) = 1 rt C 1 r 1 log det C N log

24 Or equivalently data covariance log p(y x,, ) = 1 rt C 1 r 1 log det C N log residual vector r = y 1 f (x 1 ) y f (x ) y n f (x n ) T

25 Or equivalently data covariance log p(y x,, ) = 1 rt C 1 r 1 log det C N log residual vector r = y 1 f (x 1 ) y f (x ) y n f (x n ) T

26 Linear least-squares. apple m b = SA T C 1 y S = A T C 1 A 1 maximum likelihood & in this case only mean of posterior posterior covariance A = 6 4 x 1 1 x 1.. x n C = n y = 6 4 y 1 y. y n 3 7 5

27 Linear least-squares. apple m b = SA T C 1 y S = A T C 1 A 1 maximum likelihood & in this case only mean of posterior posterior covariance assuming uniform priors A = 6 4 x 1 1 x 1.. x n C = n y = 6 4 y 1 y. y n 3 7 5

28 truth

29 4 3 1 posterior constraint? truth

30 4 3 1 posterior constraint? truth

31 But we know the true covariance matrix.

32 log p(y x,, ) = 1 rt C 1 r 1 log det C N log

33 Linear least-squares. apple m b = SA T C 1 y S = A T C 1 A 1 maximum likelihood & in this case only mean of posterior posterior covariance A = 6 4 x 1 1 x 1.. x n C = n y = 6 4 y 1 y. y n 3 7 5

34 Before

35 After

36 the responsible scientist. cbd Flickr user MyFWCmedia

37 So we re finished, right?

38 In The Real World, we never know the noise.

39 Just gotta model it!

41 Model it! log p(y x,,, ) = 1 rt K 1 r 1 log det K + C where K ij = i ij + k (x i,x j ) for example [xi x j ] k (x i,x j )=a exp l drop-in replacement for your current log-likelihood function!

42 emceethe arxiv.org/abs/ dan.iel.fm/emcee MCMC Hammer it's hammer time!

43 1 0 b 1.5 ln a ln s m b ln a ln s

46 Prediction?

49 take a deep breath. cba Flickr user kpjas

50 The formal Gaussian process.

51 The model. log p(y x,,, ) = 1 [y f (x)] T K (x, ) 1 [y f (x)] 1 log det K (x, ) N log where [K (x, )] ij = i ij + k (x i,x j ) drop-in replacement for your current log-likelihood function!

52 The model. y N(f (x), K (x, )) where [K (x, )] ij = i ij + k (x i,x j ) drop-in replacement for your current log-likelihood function!

53 the data are drawn from one HUGE Gaussian * * the dimension is the number of data points.

54 A generative model y N(f (x), K (x, )) a probability distribution for y values

55 Likelihood samples. k (x i,x j )=exp [xi x j ] ` 3 exponential squared l =0.5 l =1 l = t

56 Likelihood samples. k (x i,x j )=exp [xi x j ] ` 3 exponential squared exponential squared l =0.5 l =1 l = t

57 Likelihood samples. k (x i,x j )= " 1+ p # 3 xi x j ` exp xi x j ` cos xi x j P 3 quasi-periodic l =,P=3 l =3,P=3 l =3,P= t

58 Likelihood samples. k (x i,x j )= " 1+ p # 3 xi x j ` exp xi x j ` cos xi x j P 3 quasi-periodic quasi-periodic l =,P=3 l =3,P=3 l =3,P= t

59 The conditional distribution y N(f (x), K (x, )) apple y y? N apple f (x) f (x? ), apple K, x, x K, x,? y? y N K,?,x K 1, x, x [y f (x)] + f (x? ), K,?,? K,?,x K 1, x, x K, x,? ) just see Rasmussen & Williams (Chapter )

60 What s the catch? Kepler = Big Data (by some definition) Note: I hate myself for this slide too

61 Computational complexity. log p(y x,,, ) = 1 [y f (x)] T K (x, ) 1 [y f (x)] 1 log det K (x, ) N log compute factorization // evaluate log-det // apply inverse naïvely: O(N 3 )

62 import numpy as np from scipy.linalg import cho_factor, cho_solve! def simple_gp_lnlike(x, y, yerr, a, s): r = x[:, None] - x[none, :] C = np.diag(yerr**) + a*np.exp(-0.5*r**/(s*s)) factor, flag = cho_factor(c) logdet = *np.sum(np.log(np.diag(factor))) return -0.5 * (np.dot(y, cho_solve((factor, flag), y)) + logdet + len(x)*np.log(*np.pi))

63 1 log 10 runtime/seconds log 10 N

64 exponential squared quasi-periodic

65 exponential squared quasi-periodic

66 exponential squared quasi-periodic t

67 Aren t kernel matrices Hierarchical Off-Diagonal Low-Rank? no astronomer ever

68 = K (3) K 3 K K 1 K 0 Full rank; Low-rank; Identity matrix; Zero matrix; Ambikasaran, DFM, et al. (arxiv: )

69 github.com/dfm/george

70 import numpy as np from george import GaussianProcess, kernels! def george_lnlike(x, y, yerr, a, s): kernel = a * kernels.rbfkernel(s) gp = GaussianProcess(kernel) gp.compute(x, yerr) return gp.lnlikelihood(y)

71 1 log 10 runtime/seconds log 10 N

72 1 log 10 runtime/seconds log 10 N

73 and short cadence data? one month of data in 4 seconds

74 3 Applications to Kepler data.

75 Parameter Recovery

76 time since transit KIC injection

77 time since transit KIC injection

78 time since transit KIC injection

79 after median-filter figure generated detrending github.com/dfm/tr

80 time since transit KIC injection

81 time since transit KIC injection

82 using Gaussian process figure generated noise model github.com/dfm/tr

83 time [days] Ambikasaran, DFM, et al. (arxiv: )

84 q q t r/r? f? b q q t r/r? b

85 KOI with Bekki Dawson, et al.

88 Stellar Rotation with Ruth Angus

89 figures from Ruth Angus (Oxford)

90 figures from Ruth Angus (Oxford)

91 4 Conclusions & Summary.

92 correlated noise matters. a Gaussian process provides a drop-in replacement likelihood function if you can compute it

93 Resources gaussianprocess.org/gpml github.com/dfm/ gp george

Probabilistic modeling and Inference in Astronomy

Probabilistic modeling and Inference in Astronomy Dan Foreman-Mackey Sagan Fellow, University of Washington github.com/dfm // @exoplaneteer // dfm.io Dan Foreman-Mackey Sagan Fellow, University of Washington