Nonparametric Estimation of Luminosity Functions

Size: px

Start display at page:

Download "Nonparametric Estimation of Luminosity Functions"

Ralf Snow
6 years ago
Views:

1 x x Nonparametric Estimation of Luminosity Functions Chad Schafer Department of Statistics, Carnegie Mellon University cschafer@stat.cmu.edu 1

2 Luminosity Functions The luminosity function gives the number of (quasars, galaxies, galaxy clusters,...) as a function of luminosity (L) or absolute magnitude (M) The 2-d luminosity function allows for dependence on redshift (z) Schechter Form (Schechter, 1976): ( ) ( L φ S (L) dl = n exp L ) α ( L d L L or φ S (M) dm n (α+1)(M M) exp ( 10 ) 0.4(M M) dm L ) 2

3 Schechter (1976) 3

4 From Binney and Merrifield (1998), pages : This formula was initially motivated by a simple model of galaxy formation (Press and Schecter, 1974), but it has proved to have a wider range of application than originally envisaged... With larger, deeper surveys, the limitations of the simple Schechter function start to become apparent. Particular problems on faint end, and with luminosity function evolution with redshift. Blanton, et al. (2003): Model 2-d galaxy luminosity function as Φ(M, z) = n10 0.4(z z 0)P K k=1 Φ k 2πσ 2 M exp [ 1 2 ] (M M k + (z z 0 ) Q) 2 σ 2 M 4

5 x Quasars Among the furthest observable objects Quasar PKS 2349, Observed by Hubble Telescope in 1996 Image courtesy of NASA 5

6 Absolute Magnitude Redshift Sample of 15,057 quasars from Sloan Digital Sky Survey. (Richards et al. 2006) 6

7 Luminosity Functions as Probability Density Functions Integrating a luminosity function gives the count in that bin count with 22 M 21 = φ(m) dm Proportional to the probability a randomly chosen such object is such that 22 M 21. Hence, estimating φ(m) analogous to density estimation 7

8 Density Estimation Basic Problem: Estimate f, where assume that X 1,X 2,...,X n f, i.e., for a b, P(a X i b) = b a f(x) dx Parametric Approach: Assume f belongs to some family of densities (e.g., Gaussian), and estimate the unknown parameters via maximum likelihood Nonparametric Approach: Estimate built on smoothing available data, slower rate of convergence, but less bias The histogram is a simple example of a nonparametric density estimator. 8

9 xxx Density Estimation Sample of size 303: Note that x = 20.18, s =

10 xxx Parametric Estimator density Latitude of Initial Point Assuming Gaussian, estimate µ = and σ =

11 Parametric Estimator Density Latitude of Initial Point Compared with the histogram estimate of the density. 11

12 Advantages of Parametric Form Relatively simple to fit, via maximum likelihood Functional form Easy calculation of probabilities Smaller errors, on average, provided assumption regarding density is correct 12

13 xxx Errors in Density Estimators Error at one value x: (f(x) f(x)) 2 Error accumulated over all x: ISE = (f(x) f(x)) 2 dx Mean Integrated Squared Error (MISE): ( ) MISE = E (f(x) f(x)) 2 dx 13

14 The Bias Variance Tradeoff Can write MISE = b 2 (x)dx + v(x)dx where the bias at x is and the variance at x is b(x) = E( f(x)) f(x) v(x) = V( f(x)) 14

15 The Bias Variance Tradeoff MSE Bias squared Variance Less Optimal smoothing More 15

16 Histograms Create bins B 1,B 2,...,B m of width h. Define f n (x) = m j=1 p j h I(x B j). where p j is proportion of observations in B j. Note that f n (x)dx = 1. A histogram is an example of a nonparametric density estimator. Note that it is controlled by the tuning parameter h, the bin width. 16

17 Histograms Density Latitude of Initial Point Seems like too many bins, i.e. h is too small. 17

18 Histograms Density Latitude of Initial Point Seems like too few bins, i.e. h is too large. 18

19 xxx Histograms Latitude of Initial Point Density Seems better. 19

20 Histograms Density Latitude of Initial Point Clear that normal curve is not a good fit. 20

21 Histograms For histograms, ( MISE = E ) (f(x) f(x)) 2 dx h2 12 (f (u)) 2 du + 1 nh The value h that minimizes this is h = 1 ( 1/3 6 n 1/3 (f. (u)) du) 2 and then MISE C 2 n 2/3 21

22 Kernel Density Estimation We can improve histograms by introducing smoothing. f(x) = 1 n ( ) x Xi K. nh h Here, K( ) is the kernel function, itself a smooth density. i=1 22

23 Kernels A kernel is any smooth function K such that K(x) 0 and K(x)dx = 1, xk(x)dx = 0 and σk 2 x 2 K(x)dx > 0. Some commonly used kernels are the following: the boxcar kernel : K(x) = 1 I( x < 1), 2 the Gaussian kernel : K(x) = 1 e x2 /2, 2π the Epanechnikov kernel : K(x) = 3 4 (1 x2 )I( x < 1) the tricube kernel : K(x) = (1 x 3 ) 3 I( x < 1). 23

24 xxx Kernels

25 xxx Kernel Density Estimation Places a smoothed-out lump of mass of size 1/n over each data point The shape of the lumps is controlled by K( ); their width is controlled by h. 25

26 Kernel Density Estimation Density Latitude of Initial Point h chosen too large. 26

27 xxx Kernel Density Estimation Density Latitude of Initial Point h chosen too small. 27

28 Kernel Density Estimation Density Latitude of Initial Point Just right. 28

29 xxx Kernel Density Estimation MSE 1 4 c2 1h 4 (f (x)) 2 dx + K 2 (x)dx nh Optimal bandwidth is h = ( c2 c 2 1A(f)n ) 1/5 where c 1 = x 2 K(x)dx, c 2 = K(x) 2 dx and A(f) = (f (x)) 2 dx. Then, MSE C 3 n 4/5. 29

30 The Bias Variance Tradeoff For many smoothers: which is minimized at MISE c 1 h 4 + c 2 nh h = O ( 1 n 1/5 ) Hence, MISE = O whereas, for parametric problems MISE = O ( 1 n 4/5 ( ) 1 n ) 30

31 xxx Comparisons So, for parametric form, if it s correct, MISE C 1 n For histograms, MISE C 2 n 2/3 For estimators based on smoothing, MISE C 3 n 4/5 31

32 xxx Comparisons f(x) truth closest normal x Truth is standard normal. 32

33 Comparisons MSE 0e+00 1e 05 2e 05 3e 05 4e sample size (n) normal kernel Assuming normality has its advantages. 33

34 Comparisons truth closest normal f(x) Truth is t-distribution with 5 degrees of freedom. x 34

35 Comparisons normal kernel MSE 5.0e e e sample size (n) Normal assumption costs, even for moderate sample size. 35

36 Cross-Validation Objective, data-driven choice of h is possible: ( ) MISE = E (f(x) f(x)) 2 dx ( = E = J(h) + ) f(x) 2 dx f(x) 2 dx ( 2E ) f(x)f(x)dx + f(x) 2 dx Unbiased Estimator of J(h): Ĵ(h) = ( ) 2 2 fn (x) dx n n i=1 f ( i) (X i ) where f ( i) is the density estimator obtained after removing the i th observation. 36

37 x Local Density Estimation The idea of local modelling (recal local polynomial regression) extends to density estimation. The local likelihood density estimator developed independently by Loader (1996) and Hjort and Jones (1996) allows for fitting of nonparametric models of this type. 37

38 Local Density Estimation Model assumes that for u near x, log(f(x)) a 0 + a 1 (u x) + + a p (u x) p Thus, ( ) log f(x) = â 0 + â 1 (u x) + + â p (u x) p is the local (near x) estimate of log( f( )) Estimates obtained by maximizing kernel-weigthed likelihood Repeated for a grid of values x x x x 38

39 Log Density A few local linear estimates of log( f( )) 39

40 Log Density Smoothed toghether to get log( f( )) 40

41 Back to Luminosity Functions... Simple parametric forms not realistic for large data sets Could build more complex parametric forms, e.g., Blanton, et al. (2003) Φ(M, z) = n10 0.4(z z 0)P K k=1 Φ k 2πσ 2 M exp [ 1 2 ] (M M k + (z z 0 ) Q) 2 σ 2 M Nonparametric approaches need to account for observational limitations, distinguish physical variations from observational variations 41

42 Truncation Suppose that X 1,X 2,...,X m f, but data are truncated: only get to see those X i such that X i k. Formally, we get to observe X 1,X 2,...,X n f, where f(x) f R k (x) = f(x)dx, if x k 0, if x > k Clearly, must take truncation into account when using the observed data to estimate f. 42

43 Absolute Magnitude Redshift Sample of 15,057 quasars from Sloan Digital Sky Survey. (Richards et al. 2006) 43

44 1/V max Estimator (Schmidt, 1968; Eales, 1993) Observe objects of absolute magnitudes M 1,M 2,...,M n Divide absolute magnitude range into bins. Estimate φ over j th bin as bin j φ(m) = i bin j 1 V max (i) where V max (i) is the volume over which an object of magnitude M i can be detected by the survey This is a weighted histogram 44

45 Nonparametric Maximum Likelihood (NPMLE) Lynden-Bell (1971): Consider the case of one-sided truncation Efron and Petrosian (1999): Two-sided (double) truncation Know that NPMLE will assign all probability to oberved values, but they will not be equal probabilities NPMLE estimators for this case are implemented in R package DTDA. 45

46 NPMLE Examples Data Value Observation Number Vertical lines are truncation ranges, dots are data values. 46

47 NPMLE Examples Data Value Observation Number The probability assigned to each by the NPMLE. 47

48 NPMLE Examples Data Value Observation Number Differential truncation. 48

49 NPMLE Examples Data Value Observation Number Both of the intervals (, 2) and (2, ) has probability

50 NPMLE Examples Data Value Observation Number An example where it performs poorly. 50

51 NPMLE Examples Data Value Observation Number xxx 51

52 NPMLE Examples Data Value Observation Number Example from Efron and Petrosian (1999) 52

53 NPMLE Examples Data Value Observation Number 53

54 Efron and Petrosian (1999) Quasar Example 54

55 Efron and Petrosian (1999) Quasar Example 55

56 Efron and Petrosian (1999) Quasar Example The SEF fit is a special exponential family model that assumes that log f(x) = β 0 + β 1 x + β 2 x 2 + β 3 x 3 This model can be fit via maximum likelihood. BUT: Each of these fits assumes independence between redshift and luminosity, i.e., no evolution of the luminosity function with redshift. 56

57 Absolute Magnitude Redshift Sample of 15,057 quasars from Sloan Digital Sky Survey. (Richards et al. 2006) 57

58 The Challenge: Model selection Goal: Given the truncated data shown in the scatter plot, estimate the bivariate density (φ(z,m)). Too Strong: Assume independence: φ(z, M) = f(z)g(m), or assume parametric form: φ(z,m,θ) Too Weak: Use fully nonparametric approach 58

59 xxx A Semiparametric Model Schafer (2007) We utilize a semiparametric model which allows one to place minimal assumptions on the form of the bivariate density and greatly diminish bias due to truncation and boundary effects. Decompose the bivariate density φ(z,m) into log(φ(z,m)) = f(z) + g(m) + h(z,m,θ) where f(z) and g(m) are estimated using nonparametric methods and h(z,m,θ) will take an assumed parametric form; it is intended to model the dependence between the two random variables. We use h(z,m,θ) = θzm, but another choice could be made, e.g. motivated by a physical theory. 59

60 Fitting the model: The motivating (naive) idea Assume independence and write φ(z, M) = f(z)g(m) where f( ) is the density for redshift and g( ) is the density for absolute magnitude. x Absolute Magnitude A A(4.0,.) x Let A denote the region outside of which of the data are truncated and let A(z, ) denote the cross-section of A at z. Assume, for the moment, that g( ) is known Redshift 60

61 x The available data allow for estimation of f (z) φ(z,m)dm = f(z) A(z, ) A(z, ) g(m)dm for all z, since it is the density of the observable redshift. Assuming for the moment that g( ) were known, it is possible to estimate f( ) using f(z) = f / ( ) (z) g(m)dm. A(z, ) Starting with an initial guess at g( ), we could iterate between assuming g( ) is known, and estimating f( ), and vice versa. But, the variance of this estimator could be huge: Consider the behavior of f(z) for z where A(z, ) g(m)dm is small. 61

62 Absolute Magnitude Redshift Depicting A(1.5, ) 62

63 Density % Unobserved 24% Observed Absolute Magnitude 24% of area under assumed g( ) falls in A(1.5, ) 63

64 Percent Observed Redshift (z) Percent of area under g( ) in A(z, ), as a function of z. 64

65 Estimated Redshift Density Redshift Smooth f ( ) does not lead to smooth f( ) 65

66 x Local density estimation Instead, we write ( log(f (z)) = log(f(z)) + log A(z, ) ) g(m)dm and treat ( log A(z, ) ) g(m)dm as an offset term in a model for log(f (z)) and consider only forms for f( ) which are controlled by smoothing parameters chosen by the user. The local likelihood density estimator developed independently by Loader (1996) and Hjort and Jones (1996) allows for fitting of nonparametric models of this type; the offset is easily incorporated. 66

67 x M-estimators This formulation allows the estimation problem to be couched as an M-estimator, i.e. choose parameters of the local models to maximize criterion n ψ(z i,m i ) i=1 Leads to estimates of standard errors, etc. The parametric component h(z,m,θ) is also easily included in the offset. Can use weights allows for varying selection function: n w i ψ(z i,m i ) i=1 67

68 Bandwidth Selection via Cross-Validation One ideal choice of the smoothing parameters would minimize the integrated mean squared error: [ ) ] 2 E ( φ(z,m) φ(z,m) dz dm. A An unbiased estimator of the integrated mean squared error is φ 2 (z,m)dz dm 2 n φ ( j) (z j,m j ) n A where φ ( j) (z j,m j ) denotes the estimate found when omitting the j th data pair from the analysis efficiently estimated here. j=1 68

69 Absolute Magnitude Redshift Sample of 15,057 quasars from the Sloan Digital Sky Survey. 69

70 Absolute Magnitude Redshift Same plot, with dots scaled by value of selection function for that observation. 70

71 Key aspects of the approach Allows for the removal of the independence assumption, without assuming a parametric form Iterative algorithm not computationally burdensome; must converge to a unique estimate Statistical theory leads to good approximations to standard errors for all estimated quantities Cross-validation estimator of integrated mean squared error is easily calculated, leading to good bandwidth selection Natural incorporation of the selection function 71

72 LSCV λ z λ M = 0.15 λ M = 0.16 λ M = 0.17 λ M = 0.18 λ M = 0.19 Finding the optimal bandwidths. 72

73 Absolute Magnitude Redshift Estimate of bivariate density, i.e. 2-D luminosity function. 73

74 log(ρ(z))(m < )(Mpc 3 ) Redshift Marginal density for redshift. 74

75 Φ(M i ) (Mpc 3 mag 1 ) z = 0.49 z = 1.25 z = 2.01 Φ(M i ) (Mpc 3 mag 1 ) z = 2.8 z = 3.75 z = M i M i M i Slices of the bivariate estimate, compared with Richards, et al. (2006). 75

76 z = 1 z = Φ(M) (Mpc 3 mag 1 ) Φ(M) (Mpc 3 mag 1 ) z = 3 z = M M Estimates based on simulated data. 76

77 Nonparametric Bayesian Approach Kelly et al. (2008): Model 2-d luminosity function as mixture of Gaussians. Parameterized form allows for Bayesian inference Natural uncertainty measures for any functions of the luminosity function 77

78 Figure 3(a) from Kelly, et al. (2008) 78

79 Conclusion Luminosity functions as probability densities The potential of nonparametric density estimation Moving beyond the independence assumption and binning 79

80 References Binney and Merrifield (1998). Galactic Astronomy. Princeton University Press. Blanton (2003). ApJ. 592, Eales (1993). ApJ Efron and Petrosian (1999). JASA Kelly, et al. (2008). ApJ Lynden-Bell (1971). MNRAS Press and Schechter (1974). ApJ 187, Richards et al. (2006). AJ. 131, Schafer (2007). ApJ 661, Schechter (1976). ApJ (203), Schmidt (1968). ApJ (151), Takeuchi, et al. (2000). ApJS (129),

81 Acknowledgements Peter Freeman, Chris Genovese, and Larry Wasserman, Department of Statistics, CMU. Sloan Digital Sky Survey Research supported by NSF grants # and # For more information... See cschafer/bivtrunc Contact Chad at 81

Nonparametric Statistics

Nonparametric Statistics Jessi Cisewski Yale University Astrostatistics Summer School - XI Wednesday, June 3, 2015 1 Overview Many of the standard statistical inference procedures are based on assumptions