Lecture 9 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2015 http://www.astro.cornell.edu/~cordes/a6523 Applications: Comparison of Frequentist and Bayesian inference Introduction to data mining in large-scale surveys Reading: Gregory chapters 6,7; Hancock article Lecture 10 (Thursday 26 Feb): Adam Brazier (Cornell Center for Advanced Computing) will talk about astronomy-survey workflows and the howto of databases
Topics for Lecture 10 this week Sensor data (e.g. telescope data) often requires further filtering and cross-comparisons of the global output. By storing output in a database we can query our data products efficiently and with a wide variety of qualifiers and filters. Databases, particularly relational databases, are used in many fields, including industry, to store information in a form that can be efficiently queried. We will introduce the relational database structure, how they can be queried, how they should be designed and how they can be incorporated into the scientific workflow.
Topics Plan Bayesian inference Detection problems Matched filtering and localization Modeling (linear, nonlinear) Cost functions Parameter estimation and errors Optimization methods Hill climbing, annealing, genetic algorithms MCMC variants (Gibbs, Hamiltonian) Generalized spectral analysis Lomb-Scargle Maximum entropy High resolution method Bayesian approaches Wavelets Principal components Cholesky decomposition Large scale surveys in astronomy Time domain Spectral line Images and image cubes Detection & characterization of events, sources, objects Known object types Unknown object types Current algorithms Data mining tools Databases Distributed processing
Some Matrix Manipulations We will do consider matrix manipulations in our treatment of model fitting, optimization, and basis vectors. Dot product: we can write this in several ways: c = a b = j a j b j = a j b j (summation convention for repeated indices) Derivative: ac =(â i ai ) c =(â i ai ) a j b j = â i b i = b Transformation: y = Ax =col j A ij x j =col(a ij x j ) xy = ˆx k xk A ij x j =(ˆxk A ik )=A Quadratic form: Q = x t Ax = scalar xq = ˆx k xk A ij x i x j = ˆx k (A kj x j + A ik x i )=Ax + A t x =(A + A t )x 1
Consider vectors A, B and matrix C with lengths N 1, N 1, and N N, respectively. Show that (a) A A B = B (b) A A 2 =2A (c) A A CA =(C + C)A for real A. (d) A A CA = C A + CA for complex A. (e) (CA) = A C (f) If A is a zero mean stochastic process (e.g. a vector of N measurements of a noiselike signal), its covariance matrix can be written as C = AA. Here the notation is: = conjugate; t = transpose conjugate; = transpose conjugate. 2
Inference Questions Finding objects (detection) t, ν, λ, θ, (θ, t, λ) (θ, t, λ, π), etc. (π=polarization) Blind surveys (no prior info) We typically know a lot about object types and shapes + instrumental resolutions (PSF) Characterizing individual objects Parametric models (estimation) Modeling object populations Distributions of objects of same class Comparison of populations (crosscatalog) Supernova remnants and NS,BH GRBs and galaxies (partial correlation) Stars Galaxies Asteriods Gas clouds M, T, R, V, line ra8os, spin periods, orbits, tests of physics HR diagram, SMBH- galaxy halo, Exoplanet distribu8ons, prevalence of Earth- like planets
Examples of detections Change points Change in mean Change in variance Change in slope Comparison of models with and without change points Finding clusters in measurement space or parameter space One type of object detection Clustering algorithms Bayesian blocks Object shape known: matched filtering Correlation function based
Bayesian Model Comparison Bayeisian parameter estimation: P (θ M) = where the normalization is P (D M) = P (θ M)P (D θ,m) P (D M) dθ P (θ M)P (D θ,m) We can rewrite this in terms of the likelihood L(θ D, M) =P (D θ,m) as P (θ M)L(θ D, M) P (θ M) = dθ P (θ M)L(θ D, M) Global likelihood: L(M) P (D M) = dθ P (θ M)L(θ D, M) Comparison of alternative models: M i,i=1, 2... Extrapolate from parameter space to model space: model M i = L(M i ) Prior for model M i : Posterior for model M i : P (M i I) P (M i D, I) = P (M i I)L(M i ) P (D I) 1
Odds ratio: An implementation of Occam s razor. O ij = P (M i D, I) P (M j D, I) = P (M i I) P (M j I) L(M i ) L(M j ) = ratio of priors Bayes factor B ij. For the Bayes factor we need the global likelihood for each model: L(M i )= dθ P (θ M i )L(θ D, M i ) 1D case: Suppose the prior is flat and uninformative with width θ Also suppose the likelihood function L(θ D, M i ) is unimodal with width δθ < θ. Then we can approximate the global likelihood for M i as L(M i )= dθ P (θ M i )L(θ D, M i ) L(θ D, M i) max δθ θ L(ˆθ D, M i ) δθ θ) The Bayes factor becomes B ij = L(M i) L(M j ) L( ˆθ i D, M i ) L( ˆθ j D, M j ) δθi θj = Tradeoff between amplitudes of the maximum likelihoods, the widths of the likelihood functions, and the widths of the priors. The Bayes factor penalizes models that require larger volumes of parameter space to be searched. 2 δθ j θ i
Figure 1: A noninformative prior for the mean, µ. In this case, a flat prior PDF, f µ (µ), is shown along with a likelihood function, L(µ), that is much narrower than the prior. The peak of L is the maximum likelihood estimate for µ and is the arithmetic mean of the data:
F & B Approaches to a simple model M1 = constant y = a M2 = line y = a + b x