A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

Size: px
Start display at page:

Download "A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring"

Transcription

1 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring Lecture 12 Applications: Model comparison Some Least-squares lessons Reading: Assignment 3 - some least-squares problems - analyze F test for M1, M2 - analyze Bayesian odds ratio for M1, M2 - a maximum entropy problem

2 Next few lectures Revisit spectral analysis Issues with Fourier based estimators 100% errors unless degrees of freedom increased Nonuniform sampling: Lomb-Scargle method» Still suffers from large sidelobles» Can apply CLEAN algorithm Poor performance with red processes (leakage = bias) Alternative approaches: Bayesian method Cholesky decomposition Maximum entropy method Designer methods for specific criteria (e.g. no bias) Etc.

3 Data and fit D= x Line Fit = x Constant Fit = Data S/N = 1 Line fit Constant fit Residuals 1 2 N = 100 χ 2 l = 90.2 χ 2 r l =0.9 χ 2 c = χ 2 r c = x

4 D= x N = 100 SNR = 0.0 D= x N = 100 SNR = 1.0 <F 12 > = <F 12 > = F 12 (constant fit):(line fit) F 12 (constant fit):(line fit) D= x N = 10 SNR = 10.0 <F 12 > = D= x N = 100 SNR = <F 12 > = F 12 (constant fit):(line fit) (constant fit):(line fit) F 12

5 Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal covariance matrix: Linear model: n = 0 C = diag(σ 2 j). y = Xθ X = design matrix θ = parameter vector with M elements For any given choice of θ, we can estimate the noise vector as n = D y. Cost function for least squares: χ 2 (θ) =n 1 C 1 n =(D y) C 1 (D y) [sometimes written as C(θ) or Q(θ)] Minimizing the cost function: θ χ 2 (θ) =0 gives a system of equations with solution θ = X C 1 X 1 X C 1 D. The existence of a solution depends on whether matrices are invertible. A bad choice of basis functions in X may lead to an ill-conditioned matrix. Singular value decomposition or a better choice of basis functions may be needed. For a noise PDF that is Gaussian, the likelihood function L(θ) e χ2 (θ)/2. The covariance matrix of the parameters is P θ = θθ = X C 1 X 1. 1

6 Comments For a linear model, the solution is unique if there is no noise. The least-squares solution with non-zero noise is unique: there is only one minimum in the cost function. The cost function is quadratic in δθ = θ θ true. However, different realizations of the data will yield different solutions whose range is quantified by P θ. 2

7 Consider data D where each element is d i = y i + n i = a + bx i + n i, i.e. a straight line. The design matrix is then a 2-column N-row array and θ is a 2-element vector: 1 x 1 1 x X = 2.., θ = θ1 a θ 2 b 1 x N Now suppose we do not know that the form of the data but we wish to find the best model. As the universe of models, consider just the pair: M 1 : M 2 : y i = θ 1 = constant y i = θ 1 + θ 2 x i = line For a given realization of a data set, we can estimate the parameters of each model. We then want to test how good each model is and test them against each other. These are the questions of statistical inference: 1. How do we decide whether each model is a good fit or not? 2. Given that M1 is a subset of M2 when θ 2 =0, how do we gauge that the extra parameter in M2 is warranted (demanded) by the data? 3. What are acceptable values for estimates of the parameters of the better model? The answers to these questions are different for frequentist vs. Bayesian approaches. 3

8 Gaussian Noise Model Let the noise be N(0, C): 1 f n (n) = (2π) N/2 (det C) 1/2 e 1 2 n C 1 n Note that the argument in the exponential is a quadratic form. The likelihood function for the parameters is obtained by using the estimate n = D y(θ) that depends on a particular choice of the parameter vector θ: L(θ) = 1 (2π) N/2 (det C) 1/2 e 1 2 n C 1 n For this case and this case only, minimizing χ 2 yields identical results to maximizing L: Different situations are encountered: Least squares estimate= maximum likelihood estimate 1. The noise covariance matrix C is known (in shape and element by element values) 2. The form of C is known but the values are not. 3. Nothing is known about C a priori. In the example here we assume case 1, that C is known. 4

9 Frequentist Approach Testing a model: Calculate the minimum of the cost function, χ 2 min = χ2 ( θ). For shorthand, we will call the minimum just χ 2. If the model is a good (if not perfect) match to the true underlying model, we expect that over an ensemble of realizations χ 2 = N M = number of degrees of freedom σ χ 2 = 2(N M) = σ 1/2 χ 2 2 χ 2 =. N M = degrees of freedom matter (could have N and M both large) Note that χ 2 (θ) varies quadratially in δθ = θ θ. Why? Reduced chi-square: χ 2 r = χ 2 /(N M). A good model has χ 2 r =1and σχ 2 r =[2/(N M)]1/2 A model can be assessed by calculating the probability that the estimated χ 2 r is statistically consistent with 1 given N M. See 7.2 of Gregory. 5

10 Frequentist Approach (continued) Parameter estimation errors: Use P θ to obtain the variances of each parameter (diagonal elements) and correlations between parameters (off-diagonal elements). The interpretation of the parameter values is like this: these are the variations in estimates of the parameters expected from different realizations of the data chosen from an ensemble. Model comparison with the F test: While models can be individually assessed as above, they can also be compared by using the quantitiy: F 12 = χ2 r1 χ 2 r2 = χ2 1/(N M 1 ) χ 2 2/(N M 2 ). The PDF of F 12 has an analytical form (see 6.5, Equation 6.36 of Gregory). One can calculate whether one model is better than another by calculating the probability that the value of F 12 would be obtained by chance if the reduced χ 2 is the same in both models. Generally one would say that model M2 is preferred over M1 if the probability of obtaining F 12 is smaller than some selected amount, like 5% or 1%. 6

11 h"p://en.wikipedia.org/wiki/f- distribu7on

12 F test in python: from scipy import stats Fmean = dof2 / (dof2-2.) Fsig = (dof2 /(dof2-1)) * sqrt( (2.*dof2+2.*dof1-4.) / (dof1*(dof2-4)) ) Fmode = dof2*(dof1-2.) / (dof1 * (dof2+2.)) Fmin = 0. Fmax = Fmean + 4.*Fsig Fvec = arange(fmin, Fmax, 0.01) Fpdf = stats.f.pdf(fvec, dof1, dof2) Fcdf = stats.f.cdf(fvec, dof1, dof2) The F distribution gives the probability that the ratio of two chi 2 RVs can occur by chance This can be used to determine at some probability level that two values of chi 2 are actually different For 99 and 98 degrees of freedom, F needs to be >? (Homework!)

13 Data and fit D= x Line Fit = x Constant Fit = Data S/N = 1 Line fit Constant fit Residuals 1 2 N = 100 χ 2 l = 90.2 χ 2 r l =0.9 χ 2 c = χ 2 r c = x

14 D= x N = 100 SNR = 0.0 D= x N = 100 SNR = 1.0 <F 12 > = <F 12 > = F 12 (constant fit):(line fit) F 12 (constant fit):(line fit) D= x N = 10 SNR = 10.0 <F 12 > = D= x N = 100 SNR = <F 12 > = F 12 (constant fit):(line fit) (constant fit):(line fit) F 12

15 Two examples of F distributions and the locations of the 95% and 99% probability areas. PDF of F F Distribution N dof1 =9, N dof2 =8 F95 = F99 = PDF of F F Distribution N dof1 = 99, N dof2 = 98 F95 = F99 = CDF of F CDF of F F F 7

16 F Distribution N dof1 = 99, N dof2 = 98 PDF of F F95 = F99 = CDF of F F

17 PDF of F F Distribution N dof1 =9, N dof2 =8 F95 = F99 = CDF of F F

18 Bayesian Approach For the Bayesian approach we calculate the likelihood function L(θ) and multiply by a prior for the parameters to get the posterior PDF: P (θ I)L(θ) P (θ DI) = dθ P (θ I)L(θ) From the posterior PDF we obtain the M-dimensional PDF for the parameters. This has a different interpretation than the frequentist approach: the posterior PDF expresses our uncertainty about the parameters for a specific data set and given background and prior information. We compare models using the odds ratio ( 3.5 of Gregory and notes on website, Bayesian Model Comparison. Here we assume that the two models have equal priors (we have no a priori preference) and thus concentrate on the Bayes factor B 21 = L(M 2) L(M 1 ) = ratio of global likelihoods of M 2,1 The global likelihoods are just the denominators in the posteriod PDFs, so B 21 = L(M 2) dθ P L(M 1 ) = (θ M2 ) L(θ D, M 2 ) dθ P (θ M1 ) L(θ D, M 1 ) Use flat priors with width for M1: a 1 and for M2: a 2 b 2. Also assume the likelihood functions are narrower than the priors and have widths for M1: δa 1 and for M2: δa 2 δb 2. Then B 21 L( ˆθ 2 D, M 1 ) δa2 δb 2 a1 L( ˆθ 1 D, M 1 ) a 2 b 2 δa 1 For M2 to be superior to M1 (higher odds ratio), the likelihood function has to be sufficiently large to offset the penalty of the extra parameter contained in the larger volume in parameter space. 8

19 Bayesian Approach (continued) For the particular M1 and M2 models (constant vs. line fits), the ratio of likelihoods is just L( ˆθ 2 D, M 1 ) L( ˆθ 1 D, M 1 ) = e χ2 2 /2 e = χ1 2 /2 e(χ1 1 χ1)2 2 = e χ2 /2 and the Bayes factor becomes B 21 e χ2 /2 δa2 δb 2 a1 a 2 b 2 δa 1 9

20 THE ASTROPHYSICAL JOURNAL, 505:315È338, 1998 September 20 ( The American Astronomical Society. All rights reserved. Printed in U.S.A. NEUTRON STAR POPULATION DYNAMICS. II. THREE-DIMENSIONAL SPACE VELOCITIES OF YOUNG PULSARS J. M. CORDES Department of Astronomy and NAIC, Space Sciences Building, Cornell University, Ithaca, NY ; cordes=spacenet.tn.cornell.edu AND DAVID F. CHERNOFF Department of Astronomy, Space Sciences Building, Cornell University, Ithaca, NY ; cherno =spacenet.tn.cornell.edu Received 1997 July 21; accepted 1998 April 24 ABSTRACT We use astrometric, distance, and spindown data on pulsars to (1) estimate three-dimensional velocity components, birth distances from the Galactic plane, and ages of individual objects; (2) determine the distribution of space velocities and the scale height of pulsar progenitors; (3) test spindown laws for pulsars; (4) test for correlations between space velocities and other pulsar parameters; and (5) place empirical requirements on mechanisms than can produce high-velocity neutron stars. Our approach incorporates measurement errors, uncertainties in distances, deceleration in the Galactic potential, and di erential Galactic rotation. We focus on a sample of proper motion measurements of young (\10 Myr) pulsars whose trajectories may be accurately and simply modeled. This sample of 49 pulsars excludes millisecond pulsars and other objects that may have undergone accretion-driven spinup. We estimate velocity components and birth z distance on a case-by-case basis assuming that the actual age equals the conventional spindown age for a braking index n \ 3, no torque decay, and birth periods much shorter than present-day periods. Every sample member could have originated within 0.3 kpc of the Galactic plane while still having reasonable present-day peculiar radial velocities. For the 49 object sample, the scale height of the progenitors is D0.13 kpc, and the three-dimensional velocities are distributed in two components with characteristic speeds of 175 `19 km s~1 and 700 `300 km s~1, representing D86% and D14% of the population, respectively. The sample ~24 velocities are inconsistent ~132 with a singlecomponent Gaussian model and are well described by a two-component Gaussian model but do not require models of additional complexity. From the best-ðt distribution, we estimate that about 20% of the known pulsars will escape the Galaxy, assuming an escape speed of 500 km s~1. The best-ðt, dualcomponent model, if augmented by an additional, low-velocity (\50 km s~1) component, tolerates, at most, only a small extra contribution in number, less than 5%. The best three-component models do not show a preference for Ðlling in the probability distribution at speeds intermediate to 175 and 700 km s~1 but are nearly degenerate with the best two-component models. We estimate that the high-velocity tail ([1000 km s~1) may be underrepresented (in the observed sample) by a factor D2.3 owing to selection e ects in pulsar surveys. The estimates of scale height and velocity parameters are insensitive to the explicit relation of chronological and spindown ages. A further analysis starting from our inferred velocity distribution allows us to test spindown laws and age estimates. There exist comparably good descriptions of the data involving di erent combinations of braking index and torque decay timescale. We Ðnd that a braking index of 2.5 is favored if torque decay occurs on a timescale of D3 Myr, while braking indices D4.5 ^ 0.5 are preferred if there is no torque decay. For the sample as a whole, the most probable chronological ages are typically smaller than conventional spindown ages by factors as large as 2. We have also searched for correlations between three-dimensional speeds of individual pulsars and combinations of spin period and period derivative. None appears to be signiðcant. We argue that correlations identiðed previously between velocity and (apparent) magnetic moment reñect the di erent evolutionary paths taken by young, isolated (nonbinary), high-ðeld pulsars and older, low-ðeld pulsars that have undergone accretion-driven spinup. We conclude that any such correlation measures di erences in spin and velocity selection in the evolution of the two populations and is not a measure of processes taking place in the core collapse that produces neutron stars in the Ðrst place. We assess mechanisms for producing high-velocity neutron stars, including disruption of binary systems by symmetric supernovae and neutrino, baryonic, or electromagnetic rocket e ects during or shortly after the supernova. The largest velocities seen (D1600 km s~1), along with the paucity of low-velocity pulsars, suggest that disruption of binaries by symmetric explosions is insufficient. Rocket e ects appear to be a necessary and general phenomenon. The required kick amplitudes and the absence of a magnetic ÐeldÈ velocity correlation do not yet rule out any of the rocket models. However, the required amplitudes suggest that the core collapse process in a supernova is highly dynamic and aspherical and that the impulse delivered to the neutron star is larger than existing simulations of core collapse have achieved. Subject headings: binaries: close È pulsars: general È stars: distances È stars: evolution È stars: kinematics È stars: neutron 315 Neutron stars = runaway popula7on <V> ~ 500 km/s Escape velocity from the Milky Way ~ 500 km/s at the solar circle Bayesian analysis of a pulsar sample from the MW popula7on of NS Measurements = proper mo7ons and distance proxies Model takes into account decelera7on in the Galac7c poten7al No a priori model for the likelihood func7on

21 4.3. Parameterization and L ikelihood Function The likelihood function for the parameters is the product over N pulsars, psr Npsr L \ < Lk, (24) k/1 where the likelihood factor, L 4 f (l8 for each k Òz k o nü k, D k *, t k ), pulsar is simply equation (23) evaluated using the measured proper motions (and errors) along with the distance constraints D, D, the direction nü, and the age t. To apply L equation U (23), we adopt a parametric approach, where we assume particular forms for the PDFs in z and 0 V(P). SpeciÐcally, we assume that z and V(P) have a multicomponent 0 Gaussian PDF of the form 0 0 ng f (z, V(P)) \ ; wj g (z, h )g (V (P), p ). (25) 0 0 1d 0 zj 3d 0 Vj j/1 In equation (25), g (p, p) is a standard one-dimensional 1d Gaussian PDF with zero mean and standard deviation p, g (p, p) \ (2np2)~1@2 exp ([p2/2p2), while g is a three- dimensional 1d Gaussian function, g (q, p) \ (2np2)~3@2 3d exp ([q2/2p2). The weights w sum to unity. 3d The PDF is deðned so that f dz d3v (P) is the inðnitesimal j probability. 0 0 We have chosen a multicomponent Gaussian model because its analytical properties allow it to Ðt a wide range of shapes for the actual distributions in z and V(P). There is 0 not necessarily an implied physical basis for this choice of form: the di erent Gaussian velocity components need not correspond to di erent population components.

22 4.4. Results and Comparison of Models Using Odds Ratios The parameters to be determined for a pulsar sample are (1) n standard deviations for velocities, p, (2) n ¹ n scale heights g for the birth altitude, h, and (3) n Vj [ 1 weights, h g w, for a total of n ] n [ 1 parameters. zj g j g h We considered a set of models with increasingly complex velocity and birth height distributions. We label the models using n.n as the model number, where the Ðrst number g h represents the value of n and the second represents the g value of n, and brieñy describe them below: h Model 1.1.Èa single-component model (n \ n \ 1), Model 2.1.Èa two-component velocity g model h with a single scale height (n \ 2, n \ 1), g h Model 2.2.Èa model with two scale height and two velocity components (n \ 2, n \ 2), and Model 3.1.Èa g three-component h velocity model with a single scale height (n \ 3, n \ 1). g h We assume Ñat priors for the parameters in selected ranges that are listed in Table 2. We generated the posterior probability distribution of parameters, which is just the

23 No. 1, 1998 NEUTRON STAR POPULATION DYNAMICS. II. 323 TABLE 2 PARAMETER SEARCH RANGES Parameter Minimum Maximum Range w *w 1 \ 1 w *w 2 \ 1 h z1 (kpc) *h z1 \ 0.5 h z2 (kpc) *h z2 \ 0.5 p V1 (km s~1) *p V1 \ 2000 p V2 (km s~1) *p V2 \ 1800 p V3 (km s~1) *p V3 \ 300 evaluation of the likelihood in the selected ranges. The modes of the posterior, i.e., the maximum likelihood results, and log L appear in Table 3. These are the best ÏÏ values of parameters for each model. The analysis was repeated for several age cuts for the sample: q \ 1, 5, and 10 Myr (for Smax n \ 3 and no torque decay). The mode of the distribution is typically well deðned. For example, in Figure 4 we display contours of log L (log likelihood) for model 1.1 with q \ 10 Myr. The mode is identiðed by the cross. In Smax Figure 5 we display contours for model 2.1 in twodimensional surfaces that pass through the mode. The results in Table 3 clearly suggest the presence of a signiðcant high-velocity component when the models are sufficiently complex. At the same time, examination of the likelihood contours in the simplest model does not provide any obvious indications that its single 300 km s~1 component is inadequate. To make such a determination, we must compare the models in a systematic manner. To compare models, we use the odds ratio (cf. Gregory & Loredo 1992) to quantify goodness of Ðt while penalizing models with more parameters. We assume that all models are equally probable, a priori. The odds ratio reduces to the Bayes factor,ïï which is the ratio of global likelihoods of two models (Gregory & LoredoÏs eq. [2.12]). The global likelihood for a model M given data is P FIG. 4.ÈContour plot of the log likelihood function for the singleè Gaussian component model (n \ 1, n \ 1) as a function of the rms velocity and scale height. Contours are g spaced h by log 2. P

24 FIG. 5.ÈContour plots of the log likelihood function for the doubleègaussian comp parameters and for two-dimensional slices through the best-ðt model. Contours are spaced We take the singleègaussian component model with two parameters as our reference model: M The odds ratio for the Mth model relative to M becomes f (D o M) O 4 M,M1 f (D o M ) 4 " V M M1, (28) " V 1 M1 M which we evaluate through numerical integration of the likelihood function for each model over a uniform grid in parameter space with bounds given in Table 2. The associated volumes for the four models are V 1.1 \ *h z1 *p V1, V 2.1 \ V 1.1 *w 1 *p V2 A 2.1, V 2.2 \ V 1.1 *w 1 *p V2 *h z2 A 2.2, and V 3.1 \ V 1.1 *w 1 *w 2 *p V2 *p V3 A 3.1, (29) where A, A, and A ¹ 1 are factors that account for the overlap 2.1 in 2.2 our search 3.1 of parameter space for the maxim models The the odd the da (model (model 2.2 and This co sidered numbe power these re scale h ponent D175 k To r tion of margin.ècontour plots of the log likelihood function for the doubleègaussian component model (n \ 2, n \ 1) as a function of values for pairs of rs and for two-dimensional slices through the best-ðt model. Contours are spaced by log 2. g h

25 Bayes factor,ïï which is the ratio of global likelihoods of two models (Gregory & LoredoÏs eq. [2.12]). The global likelihood for a model M given data D is f (D o M) \ P dhf h (h o M)L(h), (26) where f (h o M) is the prior PDF for model parameters h and L(h) is h the likelihood function as we have used it in this paper. We have already assumed the prior PDF is Ñat, and we have seen that L(h) is strongly peaked. Thus, letting hü be ity and scale height. Contours are spaced by log 2. the parameters that maximize L(h), f (D o M) D f h (hü o M) P dhl(h) 4 " M VM, (27) where the last equation deðnes the integrated likelihood, " 4/ dhl(h), and the volume in parameter space that is searched, M V \ [ f (hü o M)]~1. M h TABLE 3 THREE-DIMENSIONAL VELOCITY PDF MODELS Model h z1 h z2 p V1 p V2 p V3 (n g.n h ) w 1 w 2 (kpc) (kpc) (km s~1) (km s~1) (km s~1) N parms log L Odds q S \10 Myr; N psr \ [ [ [ [ q S \5 Myr; N psr \ [ [ q S \1 Myr; N psr \ [ [

26 NEUTRON STAR POPULATION DYNAMICS. II. TABLE 4 BEST-FIT PARAMETERS FOR MODEL 2.1 Parameter Value w ` ~0.12 h (kpc) `0.04 z1 ~0.02 p (km s~1) `19 V1 ~24 p (km s~1) `295 V2 ~132 Marginalized PDFs for individual parameters of the best- fit model È n \

27 Least Squares Examples I. A polynomial model for a times series y i with errors i : k k y i = X ij θ j + i = t j 1 i θ j + i. j=1 j=1 The model is linear in the parameters even though it is nonlinear in the independent variable, t i. If the polynomial order is p, then k = p +1and The design matrix and parameter vector are: X = 1 t 1 t 2 1 t p 1 1 t 2 t 2 2 t p 2. 1 t n t 2 n t p n θ = θ 1 θ 2. θ p+1 Define T k = n t k j and tk y = i n t k i y i i=1 Need 1/n in front of last sum 1

28 Then the product of the design matrix with itself is the k k = (p +1) (p +1)matrix and T 0 T 1 T 2 T p T 1 T 2 T 3 T p+1 X X = T 2 T 3 T 4 T p T p T p+1 T p+2 T 2p X y = X y 1 y 2. y n = n i=1 y i n t i y i i=1. n t p i y i i=1 n The least-squares solution ˆθ =(X X) 1 X y requires the inverse of X X that will exist if the determinant is nonzero. y tỵ. t p y 2

29 First-order polynomial: something we can easily solve. y i = θ 1 + θ 2 t i X X = T0 T 1 T 1 T 2 X 1 1 X = det(x (matrix of cofactors) X) 1 T2 T = 1 (T 0 T 2 T1 2) T 1 T 0 3

30 Then ˆθ =(X X) 1 X y = n (T 0 T 2 T 2 1 ) T2 T 1 T 1 T 0 y ty = n (T 0 T 2 T 2 1 ) yt 2 tyt 1 yt 1 + tyt 0 n (nt 2 T 2 1 ) yt 2 tyt 1 yt 1 + tyn So the individual parameters are ˆθ 1 = n yt 2 tyt 1 (nt 2 T1 2) and ˆθ2 = n nty yt1 (nt 2 T1 2). 4

31 Assuming the errors i are stationary and statistically independent with variance 2 i = σ2, the covariance matrix of the parameters is where P = σ 2 X X 1 = σ θ1 σ θ2 ρ θ1 θ 2 θ 2 1 θ 1 θ 2 T 2 = σ nt 2 T1 2 n = σ nt 2 T1 2 θ 1 θ 2 σ 2 θ 1 σ θ1 σ θ2 ρ θ1 θ 2 θ1 2 σ θ1 σ θ2 ρ θ1 θ 2 σθ 2 2 1/2 1/2 = θ 1θ 2 σ θ1 σ θ2 = T 1 nt2 (negatively correlated) 5

32 For n 1 and uniform sampling t i = i, i =1,...,n, so σ θ1 2σ n T 1 n 2 /2 and T 2 n 3 /3 σ θ2 12σ n 3/2 ρ θ1 θ 2 n2 /2 n n 3 /3 = (highly anticorrelated) The anticorrelation means that any error in one parameter is compensated by the error in the other. 6

33 Better parameterization for the first-order polynomial: orthogonal polynomials. E.g. y i = θ 1 + θ 2 (t i t) where t = 1 n n i=1 Now the design matrix and the various products are X = 1 t 1 t 1 t 2 t. 1 t n t and the solution is now, T0 T X X = 1 T0 0 =, T 1 T 2 0 T 2 t i X X 1 1 T2 0 = T 0 T 2 0 T 0 n(t t)y θ 1 = y, θ 2 = T 2 The errors on θ 1,2 are the same but the parameters are now uncorrelated, ρ θ1 θ2 =0. 7

34 II. Sinusoids Consider the linear model y = Xθ + where X comprises complex exponentials and the parameter vector θ comprises Fourier amplitudes: X nm = e 2πinm/N, n =0,...,N 1, m =0,...,k 1 WN nm where W N e 2πi/N is the N th root of 1 on the unit circle. for k = N we have X = W N WN k 1 WN 2 WN 2k W N 1 N W (N 1)k N k = N W N WN N 1 1 WN 2 W 2(N 1) N WN N 1 W (N 1)2 N 8

Frequentist-Bayesian Model Comparisons: A Simple Example

Frequentist-Bayesian Model Comparisons: A Simple Example Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal

More information

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University Lecture 19 Modeling Topics plan: Modeling (linear/non- linear least squares) Bayesian inference Bayesian approaches to spectral esbmabon;

More information

F & B Approaches to a simple model

F & B Approaches to a simple model A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys

More information

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring Lecture 9 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2015 http://www.astro.cornell.edu/~cordes/a6523 Applications: Comparison of Frequentist and Bayesian inference

More information

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University Lecture 23 Birthday problem comments Construc

More information

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring Lecture 8 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2015 http://www.astro.cornell.edu/~cordes/a6523 Applications: Bayesian inference: overview and examples Introduction

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Miscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4)

Miscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4) Miscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4) Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ Bayesian

More information

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Questions?! C. Porciani! Estimation & forecasting! 2! Cosmological parameters! A branch of modern cosmological research focuses

More information

Massachusetts Institute of Technology Department of Economics Time Series Lecture 6: Additional Results for VAR s

Massachusetts Institute of Technology Department of Economics Time Series Lecture 6: Additional Results for VAR s Massachusetts Institute of Technology Department of Economics Time Series 14.384 Guido Kuersteiner Lecture 6: Additional Results for VAR s 6.1. Confidence Intervals for Impulse Response Functions There

More information

Bayesian Inference in Astronomy & Astrophysics A Short Course

Bayesian Inference in Astronomy & Astrophysics A Short Course Bayesian Inference in Astronomy & Astrophysics A Short Course Tom Loredo Dept. of Astronomy, Cornell University p.1/37 Five Lectures Overview of Bayesian Inference From Gaussians to Periodograms Learning

More information

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming

More information

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2015 http://www.astro.cornell.edu/~cordes/a6523 Lecture 4 See web page later tomorrow Searching for Monochromatic Signals

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

INTRODUCTION TO PATTERN RECOGNITION

INTRODUCTION TO PATTERN RECOGNITION INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take

More information

Fitting a Straight Line to Data

Fitting a Straight Line to Data Fitting a Straight Line to Data Thanks for your patience. Finally we ll take a shot at real data! The data set in question is baryonic Tully-Fisher data from http://astroweb.cwru.edu/sparc/btfr Lelli2016a.mrt,

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

DETECTION theory deals primarily with techniques for

DETECTION theory deals primarily with techniques for ADVANCED SIGNAL PROCESSING SE Optimum Detection of Deterministic and Random Signals Stefan Tertinek Graz University of Technology turtle@sbox.tugraz.at Abstract This paper introduces various methods for

More information

Gravitational-Wave Data Analysis: Lecture 2

Gravitational-Wave Data Analysis: Lecture 2 Gravitational-Wave Data Analysis: Lecture 2 Peter S. Shawhan Gravitational Wave Astronomy Summer School May 29, 2012 Outline for Today Matched filtering in the time domain Matched filtering in the frequency

More information

arxiv:astro-ph/ v1 14 Sep 2005

arxiv:astro-ph/ v1 14 Sep 2005 For publication in Bayesian Inference and Maximum Entropy Methods, San Jose 25, K. H. Knuth, A. E. Abbas, R. D. Morris, J. P. Castle (eds.), AIP Conference Proceeding A Bayesian Analysis of Extrasolar

More information

Advanced Statistical Methods. Lecture 6

Advanced Statistical Methods. Lecture 6 Advanced Statistical Methods Lecture 6 Convergence distribution of M.-H. MCMC We denote the PDF estimated by the MCMC as. It has the property Convergence distribution After some time, the distribution

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

Practical Statistics

Practical Statistics Practical Statistics Lecture 1 (Nov. 9): - Correlation - Hypothesis Testing Lecture 2 (Nov. 16): - Error Estimation - Bayesian Analysis - Rejecting Outliers Lecture 3 (Nov. 18) - Monte Carlo Modeling -

More information

Pulsar Ages and the Evolution of Pulsar Spin Velocity Alignment

Pulsar Ages and the Evolution of Pulsar Spin Velocity Alignment Pulsar Ages and the Evolution of Pulsar Spin Velocity Alignment Aristeidis Noutsos Michael Kramer Simon Johnston Dominic Schnitzeler Evan Keane NS212b Bonn, 22 October Previously on Pulsar Spin Velocity

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

A Bayesian Treatment of Linear Gaussian Regression

A Bayesian Treatment of Linear Gaussian Regression A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,

More information

Signal Modeling, Statistical Inference and Data Mining in Astrophysics

Signal Modeling, Statistical Inference and Data Mining in Astrophysics ASTRONOMY 6523 Spring 2013 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Course Approach The philosophy of the course reflects that of the instructor, who takes a dualistic view

More information

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Statistical Methods in Particle Physics Lecture 1: Bayesian methods Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65 St Andrews 16 29 August 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)? ECE 830 / CS 76 Spring 06 Instructors: R. Willett & R. Nowak Lecture 3: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we

More information

Lecture 3: Introduction to Complexity Regularization

Lecture 3: Introduction to Complexity Regularization ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,

More information

Statistical Models with Uncertain Error Parameters (G. Cowan, arxiv: )

Statistical Models with Uncertain Error Parameters (G. Cowan, arxiv: ) Statistical Models with Uncertain Error Parameters (G. Cowan, arxiv:1809.05778) Workshop on Advanced Statistics for Physics Discovery aspd.stat.unipd.it Department of Statistical Sciences, University of

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

A523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

A523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 A523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 Lecture 1 Organization:» Syllabus (text, requirements, topics)» Course approach (goals, themes) Book: Gregory, Bayesian

More information

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 Reading: Chapter 10 = linear LSQ with Gaussian errors Chapter 11 = Nonlinear fitting Chapter 12 = Markov Chain Monte

More information

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10 Physics 509: Error Propagation, and the Meaning of Error Bars Scott Oser Lecture #10 1 What is an error bar? Someone hands you a plot like this. What do the error bars indicate? Answer: you can never be

More information

Exponential Time Series Analysis in Lattice QCD

Exponential Time Series Analysis in Lattice QCD Yale University QCDNA IV 02 May 2007 Outline Definition of the problem Estimating ground state energies Limitations due to variance growth Estimating excited state energies Bayesian Methods Constrained

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we

More information

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2013

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2013 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2013 Lecture 26 Localization/Matched Filtering (continued) Prewhitening Lectures next week: Reading Bases, principal

More information

Introduction to Bayesian Data Analysis

Introduction to Bayesian Data Analysis Introduction to Bayesian Data Analysis Phil Gregory University of British Columbia March 2010 Hardback (ISBN-10: 052184150X ISBN-13: 9780521841504) Resources and solutions This title has free Mathematica

More information

The Formation of the Most Relativistic Pulsar PSR J

The Formation of the Most Relativistic Pulsar PSR J Binary Radio Pulsars ASP Conference Series, Vol. 328, 2005 F. A. Rasio and I. H. Stairs The Formation of the Most Relativistic Pulsar PSR J0737 3039 B. Willems, V. Kalogera, and M. Henninger Northwestern

More information

Linear Regression Models

Linear Regression Models Linear Regression Models Model Description and Model Parameters Modelling is a central theme in these notes. The idea is to develop and continuously improve a library of predictive models for hazards,

More information

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation

More information

Eco517 Fall 2014 C. Sims MIDTERM EXAM

Eco517 Fall 2014 C. Sims MIDTERM EXAM Eco57 Fall 204 C. Sims MIDTERM EXAM You have 90 minutes for this exam and there are a total of 90 points. The points for each question are listed at the beginning of the question. Answer all questions.

More information

Continuous Wave Data Analysis: Fully Coherent Methods

Continuous Wave Data Analysis: Fully Coherent Methods Continuous Wave Data Analysis: Fully Coherent Methods John T. Whelan School of Gravitational Waves, Warsaw, 3 July 5 Contents Signal Model. GWs from rotating neutron star............ Exercise: JKS decomposition............

More information

Probing the covariance matrix

Probing the covariance matrix Probing the covariance matrix Kenneth M. Hanson Los Alamos National Laboratory (ret.) BIE Users Group Meeting, September 24, 2013 This presentation available at http://kmh-lanl.hansonhub.com/ LA-UR-06-5241

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Notes on Time Series Modeling

Notes on Time Series Modeling Notes on Time Series Modeling Garey Ramey University of California, San Diego January 17 1 Stationary processes De nition A stochastic process is any set of random variables y t indexed by t T : fy t g

More information

Choosing among models

Choosing among models Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Why Try Bayesian Methods? (Lecture 5)

Why Try Bayesian Methods? (Lecture 5) Why Try Bayesian Methods? (Lecture 5) Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ p.1/28 Today s Lecture Problems you avoid Ambiguity in what is random

More information

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University. Motivations: Detection & Characterization. Lecture 2.

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University. Motivations: Detection & Characterization. Lecture 2. A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University Lecture 2 Probability basics Fourier transform basics Typical problems Overall mantra: Discovery and cri@cal thinking with data + The

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Lecture 2 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

component risk analysis

component risk analysis 273: Urban Systems Modeling Lec. 3 component risk analysis instructor: Matteo Pozzi 273: Urban Systems Modeling Lec. 3 component reliability outline risk analysis for components uncertain demand and uncertain

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

Statistics notes. A clear statistical framework formulates the logic of what we are doing and why. It allows us to make precise statements.

Statistics notes. A clear statistical framework formulates the logic of what we are doing and why. It allows us to make precise statements. Statistics notes Introductory comments These notes provide a summary or cheat sheet covering some basic statistical recipes and methods. These will be discussed in more detail in the lectures! What is

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

STA 414/2104, Spring 2014, Practice Problem Set #1

STA 414/2104, Spring 2014, Practice Problem Set #1 STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,

More information

Statistical Methods for Astronomy

Statistical Methods for Astronomy Statistical Methods for Astronomy If your experiment needs statistics, you ought to have done a better experiment. -Ernest Rutherford Lecture 1 Lecture 2 Why do we need statistics? Definitions Statistical

More information

Teaching Bayesian Model Comparison With the Three-sided Coin

Teaching Bayesian Model Comparison With the Three-sided Coin Teaching Bayesian Model Comparison With the Three-sided Coin Scott R. Kuindersma Brian S. Blais, Department of Science and Technology, Bryant University, Smithfield RI January 4, 2007 Abstract In the present

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Ways to make neural networks generalize better

Ways to make neural networks generalize better Ways to make neural networks generalize better Seminar in Deep Learning University of Tartu 04 / 10 / 2014 Pihel Saatmann Topics Overview of ways to improve generalization Limiting the size of the weights

More information

Linear Algebra in Computer Vision. Lecture2: Basic Linear Algebra & Probability. Vector. Vector Operations

Linear Algebra in Computer Vision. Lecture2: Basic Linear Algebra & Probability. Vector. Vector Operations Linear Algebra in Computer Vision CSED441:Introduction to Computer Vision (2017F Lecture2: Basic Linear Algebra & Probability Bohyung Han CSE, POSTECH bhhan@postech.ac.kr Mathematics in vector space Linear

More information

FORMATION AND EVOLUTION OF COMPACT BINARY SYSTEMS

FORMATION AND EVOLUTION OF COMPACT BINARY SYSTEMS FORMATION AND EVOLUTION OF COMPACT BINARY SYSTEMS Main Categories of Compact Systems Formation of Compact Objects Mass and Angular Momentum Loss Evolutionary Links to Classes of Binary Systems Future Work

More information

Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I

Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I 1 Sum rule: If set {x i } is exhaustive and exclusive, pr(x

More information

Chapter [4] "Operations on a Single Random Variable"

Chapter [4] Operations on a Single Random Variable Chapter [4] "Operations on a Single Random Variable" 4.1 Introduction In our study of random variables we use the probability density function or the cumulative distribution function to provide a complete

More information

Statistics and Data Analysis

Statistics and Data Analysis Statistics and Data Analysis The Crash Course Physics 226, Fall 2013 "There are three kinds of lies: lies, damned lies, and statistics. Mark Twain, allegedly after Benjamin Disraeli Statistics and Data

More information

Statistics and nonlinear fits

Statistics and nonlinear fits Statistics and nonlinear fits 2 In this chapter, we provide a small window into the field of statistics, the mathematical study of data. 1 We naturally focus on the statistics of nonlinear models: how

More information

Nonparametric Regression With Gaussian Processes

Nonparametric Regression With Gaussian Processes Nonparametric Regression With Gaussian Processes From Chap. 45, Information Theory, Inference and Learning Algorithms, D. J. C. McKay Presented by Micha Elsner Nonparametric Regression With Gaussian Processes

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Journeys of an Accidental Statistician

Journeys of an Accidental Statistician Journeys of an Accidental Statistician A partially anecdotal account of A Unified Approach to the Classical Statistical Analysis of Small Signals, GJF and Robert D. Cousins, Phys. Rev. D 57, 3873 (1998)

More information

A Probability Review

A Probability Review A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in

More information

Multivariate Distribution Models

Multivariate Distribution Models Multivariate Distribution Models Model Description While the probability distribution for an individual random variable is called marginal, the probability distribution for multiple random variables is

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009

Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009 Systematic uncertainties in statistical data analysis for particle physics DESY Seminar Hamburg, 31 March, 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

An example to illustrate frequentist and Bayesian approches

An example to illustrate frequentist and Bayesian approches Frequentist_Bayesian_Eample An eample to illustrate frequentist and Bayesian approches This is a trivial eample that illustrates the fundamentally different points of view of the frequentist and Bayesian

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Parameter Estimation and Fitting to Data

Parameter Estimation and Fitting to Data Parameter Estimation and Fitting to Data Parameter estimation Maximum likelihood Least squares Goodness-of-fit Examples Elton S. Smith, Jefferson Lab 1 Parameter estimation Properties of estimators 3 An

More information

Hypothesis Testing, Bayes Theorem, & Parameter Estimation

Hypothesis Testing, Bayes Theorem, & Parameter Estimation Data Mining In Modern Astronomy Sky Surveys: Hypothesis Testing, Bayes Theorem, & Parameter Estimation Ching-Wa Yip cwyip@pha.jhu.edu; Bloomberg 518 Erratum of Last Lecture The Central Limit Theorem was

More information

GRAVITATIONAL WAVES. Eanna E. Flanagan Cornell University. Presentation to CAA, 30 April 2003 [Some slides provided by Kip Thorne]

GRAVITATIONAL WAVES. Eanna E. Flanagan Cornell University. Presentation to CAA, 30 April 2003 [Some slides provided by Kip Thorne] GRAVITATIONAL WAVES Eanna E. Flanagan Cornell University Presentation to CAA, 30 April 2003 [Some slides provided by Kip Thorne] Summary of talk Review of observational upper limits and current and planned

More information