Multivariate Analysis and Likelihood Inference

Size: px
Start display at page:

Download "Multivariate Analysis and Likelihood Inference"

Transcription

1 Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference

2 Joint density function The distribution of a random vector X =(X 1,...,X k ) T is characterized by its joint density function f such that P {X 1 a 1,..., X k a k } = ak a1 f(x 1,...,x k )dx 1 dx k when X has a differentiable distribution function (which is the left-hand side of (??)), and for the case of discrete X i, f(a 1,...,a k )=P {X 1 = a 1,...,X k = a k }. Change of variables Suppose X has a differentiable distribution function with density function f X. Let g : k k be continuously differentiable and one-to-one. Let Y = g(x). Since g is a one-to-one transformation, we can solve Y = g(x) to obtain X = h(y); h =(h 1,..., h k ) T is the inverse of the transformation g. The density function f Y of Y is given by f Y (y) =f X (h(y)) det( h/ y), where h/ y is the Jacobian matrix ( h i / y j ) 1 i,j k. The inverse function theorem of multivariate calculus gives h/ y =( g/ x) 1, or equivalently x/ y =( y/ x) 1.

3 Mean and covariance matrix The mean vector of Y =(Y 1,...,Y k ) T is E(Y) = (EY 1,..., EY k ) T. The covariance matrix of X is Cov(Y) = (Cov(Y i,y j )) 1 i,j k = E { (Y E(Y))(Y E(Y)) T }. The following linearity (and bilinearity) properties hold for EY (and Cov(Y)). For nonrandom p k matrix A and p 1 vector B, E(AY + B) =AEY + B, Cov(AY + B) =ACov(Y)A T. Eigenvalue and eigenvector Definition 1 Let V be a p p matrix. A complex number λ is called an eigenvalue of V if there exists a p 1 vector a 0 such that Va = λa. Such a vector a is called an eigenvector of V corresponding to the eigenvalue λ. We can rewrite Va = λa as (V λi)a =0. Since a 0, this implies that λ is a solution of the equation det(v λi) = 0. Since det(v λi) is a polynomial of degree p in λ, there are p eigenvalues, not necessarily distinct. If V is symmetric, then all its eigenvalues are real and can be ordered as λ 1 λ p. Moreover, tr(v) =λ λ p, det(v) =λ 1...λ p. If a is an eigenvector of V corresponding to the eigenvalue λ, then so is ca for c 0. Moreover, premultiplying λa = Va by a T yields λ = a T Va / a 2.

4 Eigenvalue and eigenvector We shall now focus on the case where V is the covariance matrix of a random vector x =(X 1,...,X p ) T. Proposition 1 The eigenvalues of V are real and nonnegative since V is symmetric and nonnegative definite. Consider the linear combination a T x with a =1that has the largest variance over all such linear combinations. Maximizing a T Va (= Var(a T x)) over a with a =1yields that Va = λa. Since a 0, this implies that λ is an eigenvalue of V and a is the corresponding eigenvector, and that λ = a T Va. Denote λ 1 = max a: a =1 a T Va and a 1 be the corresponding eigenvector with a 1 =1. Eigenvalue and eigenvector Next consider a T x that maximizes Var(a T x)=a T Va subject to a T 1 a =0and a =1. Its solution should satisfy {a T Va + λ(1 a T a)+ηa T 1 a} =0for i =1,..., p. a i This implies that λ is an eigenvalue of V with corresponding unit eigenvector a 2 that is orthogonal to a 1. Proceeding inductively in this way, we obtain the eigenvalue λ 1 λ 2 λ p of V with the optimization characterization λ k+1 = max a T Va. a: a =1,a T a j =0 for 1 j k The maximizer a k+1 of the above equation is an eigenvector corresponding to the eigenvalue λ k+1. Definition 2 a T i x is called the ith principal component of x.

5 Principal components (a) From the preceding derivation, λ i = Var(a T i x). (b) The elements of the eigenvectors a i are called factor loadings in PCA. Note that (a 1,..., a p ) is an orthogonal matrix, i.e., Moreover, I =(a 1,..., a p )(a 1,...,a p ) T = a 1 a T a pa T p. V = λ 1 a 1 a T λ p a p a T p. (c) Since V = Cov(x) and x =(X 1,...,X p ) T, tr(v) = p i=1 Var(X i). Hence it follows from the fact tr(v) = p i=1 λ i that λ λ p = p Var(X i ). i=1 PCA of sample covariance matrix Suppose x 1,..., x n are n independent observations from a multivariate population with mean µ and covariance matrix V. The mean vector µ can be estimated by x = n i=1 x i/n, and the covariance matrix can be estimated by the sample covariance matrix, i.e., V = n (x i x)(x i x) T / (n 1). i=1 Let â j =(â 1j,...,â pj ) T be the eigenvector corresponding to the jth largest eigenvalue λ j of the sample covariance matrix V.

6 PCA of sample covariance matrix For fixed p, the following asymptotic results have been established under the assumption λ 1 > > λ p > 0 as n : (i) n(â i a i ) has a limiting normal distribution with mean 0 and covariance matrix λ i λ h (λ i λ h ) 2 a ha T h. h i Note that â i and a i are chosen so that they have the same sign. Approximating covariance structure Even when p is large, if many eigenvalues λ h are small compared with the largest λ i s, the covariance structure of the data can be approximated by a few principal components with large eigenvalues. PCA of sample covariance matrix For fixed p, the following asymptotic results have been established under the assumption λ 1 > > λ p > 0 as n : (i) n(â i a i ) has a limiting normal distribution with mean 0 and covariance matrix λ i λ h (λ i λ h ) 2 a ha T h. h i Note that â i and a i are chosen so that they have the same sign. (ii) n( λ i λ i ) has a limiting N(0, 2λ 2 i ) distribution. Moreover, for i j, the limiting distribution of n( λ i λ i, λ j λ j ) is that of two independent normals.

7 Representation of data in terms of principal components Let x i =(x i1,...,x ip ) T, i =1,..., n, be the multivariate sample. Let X k =(x 1k,..., x nk ) T, 1 k p, and define Y j = â 1j X â pj X p, 1 j p, where â j =(â 1j,...,â pj ) T is the eigenvector corresponding to the jth largest eigenvalue λ j of the sample covariance matrix V with â j =1. Since the matrix  := (â ij) 1 i,j p is orthogonal (i.e., ÂÂT = I), it follows that the observed data X k can be expressed in terms of the principal components Y j as X k = â k1 Y â kp Y p. Moreover, the sample correlation matrix of the transformed data y tj is the identity matrix. PCA of correlation matrices An alternative to Cov(x) is the correlation matrix Corr(x). Since a primary goal of PCA is to uncover low-dimensional subspaces around which x tends to concentrate, scaling the components of x by their standard deviations may work better for this purpose, and applying PCA to sample correlation matrices may give more stable results.

8 Example: PCA of U.S. Treasury-LIBOR swap rates swp1y swp4y swp10y 4 swp30y 1 07/05/00 05/03/01 03/11/02 01/09/03 11/07/03 09/10/04 07/15/05 Figure 1: Daily swap rates of four of the eight maturities T =1, 2, 3, 4, 5, 7, 10, 30 from July 3, 2000 to July 15, We now apply PCA to account for the variance of daily changes in swap rates with a few principal components (or factors). Example: PCA of U.S. Treasury-LIBOR swap rates 0.2 0!0.2! !0.2!0.4 07/05/00 05/03/01 03/11/02 01/09/03 11/07/03 09/10/04 07/15/05 Figure 2: The differenced series for 1-year (top) and 30-year (bottom) swap rates. Let d kt = r kt r k,t 1 denote the daily changes in the k-year swap rates during the period.

9 Example: PCA of U.S. Treasury-LIBOR swap rates The sample mean vector and the sample covariance and correlation matrices of the difference data {(d 1t,...,d 8t ), 1 t 1256} are bµ = ` T bσ = , C A Example: PCA of U.S. Treasury-LIBOR swap rates The sample mean vector and the sample covariance and correlation matrices of the difference data {(d 1t,...,d 8t ), 1 t 1256} are bµ = ` T br = C A

10 Example: PCA of U.S. Treasury-LIBOR swap rates Table 1: PCA of covariance matrix of swap rate data. PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC λ i Proportion factor loadings Example: PCA of U.S. Treasury-LIBOR swap rates Table 2: PCA of correlation matrix of swap rate data. PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 λ i Proportion factor loadings

11 Example: PCA of U.S. Treasury-LIBOR swap rates Let x tk =(d tk µ k ) / σ k, where σ k is the sample standard deviation of d tk. We represent X k =(x 1k,...,x nk ) T in terms of the principal components Y 1,...,Y 8, X k = â k1 Y â kp Y p. Tables 1 and 2 shows the PCA results using the sample covariance matrix Σ and the sample correlation matrix R. From Table 2, the proportion of the overall variance explained by the first three principal components are 90.8%, 6.7% and 1.3%, respectively. Hence 98.9% of the overall variance is explained by the first three principal components, yielding the approximation (d k µ k ) / σ k. = âk1 Y 1 + â k2 Y 2 + â k3 Y 3 for the differenced swap rates. Example: PCA of U.S. Treasury-LIBOR swap rates 0.6 the 1st the 2nd the 3rd !0.2!0.4! Figure 3: Eigenvectors of the first three principal components, which represent the parallel shift, tilt, and convexity components of swap rate movements.

12 Example: PCA of U.S. Treasury-LIBOR swap rates The graphs of the factor loadings show the following constituents of interest rate or yield curve movements documented in the literature: (a) Parallel shift component. The factor loadings of the first principal component are roughly constant over different maturities. This means that a change in the swap rate for one maturity is accompanied by roughly the same change for other maturities. (b) Tilt component. The factor loadings of the second principal component have a monotonic change with maturities. Changes in short-maturity and long-maturity swap rates in this component have opposite signs. (c) Curvature component. The factor loadings of the third principal component are different for the midterm rates and the average of short- and long-term rates, revealing a curvature of the graph that resembles the convex shape of the relationship between the rates and their maturities. Joint, marginal and conditional distributions An m 1 random vector Y is said to have a multivariate normal distribution N(µ, V) if its density function is f(y) = 1 { (2π) m/2 det(v) exp 1 } 2 (y µ)t V 1 (y µ), y m. Suppose that Y N(µ, V) is partitioned as ( ) ( ) ( ) Y1 µ1 V11 V Y =, µ =, V = 12, Y 2 µ 2 V 21 V 22 where Y 1 and µ 1 have dimension r(< m) and V 11 is r r. Then Y 1 N(µ 1, V 11 ), Y 2 N(µ 2, V 22 ), and the conditional distribution of Y 1 given Y 2 = y 2 is N ( µ 1 + V 12 V 1 22 (y 2 µ 2 ), V 11 V 12 V 1 22 V 21).

13 Orthogonality and independence Let ζ be a standard normal random vector of dimension p and let A, B be r p and q p nonrandom matrices such that AB T = 0. Then the components of U 1 := Aζ are uncorrelated with those of U 2 := Bζ because E(U 1 U T 2 )=AE(ζζ T )B T = 0. Note that ( ) ( ) U1 A U = = ζ U 2 B has a multivariate normal distribution with mean 0 and covariance matrix ( ) AA T 0 V = 0 BB T, Then the density function of U is a product of the density functions of U 1 and U 2, so U 1 and U 2 are independent. Note also that U T 1 U 2 = 0; i.e., they are orthogonal vectors. Sample covariance matrix and Wishart distribution Let Y 1,...,Y n be independent m 1 random N(µ, Σ) vectors with n > m and positive definite Σ. Define Ȳ = 1 n n Y i, W = i=1 n (Y i Ȳ)(Y i Ȳ)T. i=1 It can be shown that the sample mean vector Ȳ N(µ, Σ/n), and Ȳ and the sample covariance matrix W/(n 1) are independent. We next generalize n t=1 (y t ȳ) 2/ σ 2 χ 2 n 1 to the multivariate case. Wishart distribution Suppose Y 1,..., Y n m are independent N(0, Σ) random vectors. Then the random matrix W = Σ n i=1 Y iyi T is said to have a Wishart distribution, denoted by W m (Σ,n).

14 Wishart distribution The density function of the Wishart distribution W m (Σ,n) generalizes this to f(w) = det(w)(n m 1)/2 exp { 1 2 tr( Σ 1 W )}, W > 0, [2 m det(σ)] n/2 Γ m (n/2) in which W > 0 denotes that W is positive definite and Γ m ( ) denotes the multivariate gamma function Γ m (t) =π m(m 1)/4 m i=1 ( Γ t i 1 2 Note that the usual gamma function corresponds to the case m =1. ). Wishart distribution The following properties of the Wishart distribution are generalizations of some well-known properties of the chi-square distribution and the gamma(α, β) distribution. (a) If W W m (Σ,n), then E(W) =nσ. (b) Let W 1,..., W k be independently distributed with W j W m (Σ,n j ), j =1,..., k. Then k j=1 W j W m (Σ, k j=1 n j). (c) Let W W m (Σ,n) and A be a nonrandom m m nonsingular matrix. Then AWA T W m (AΣA T,n). In particular, a T Wa (a T Σa)χ 2 n for all nonrandom vectors a 0.

15 Multivariate-t distribution Multivariate-t distribution If Z N(0, Σ) and W W m (Σ,k) such that the m 1 vector Z and the m m matrix W are independent, then ( W/k ) 1/2 Z is said to have the m-variate t-distribution with k degrees of freedom. The m-variate t-distribution with k degrees of freedom has the density function f(t) = Γ( (k + m)/2 ) ( (πk) m/2 Γ(k/2) 1+ tt t k ) (k+m)/2, t m. If t has the m-variate t-distribution with k degrees of freedom such that k m, then k m +1 km tt t has the F m,k m+1 -distribution. (1) Multivariate-t distribution Applying (1) to Hotelling s T 2 -statistic T 2 = n(ȳ µ)t ( W / (n 1) ) 1 ( Ȳ µ), (2) where Ȳ and W are defined in (??), yields noting that n m m(n 1) T 2 F m,n m, (3) [ / 1/2 [ n( ] [ / 1/2N(0, W (n 1)] Ȳ µ) = W m (Σ,n 1) (n 1)] Σ) has the multivariate t-distribution.

16 Multivariate-t distribution In the definition of the multivariate t-distribution, it is assumed that Z and W share the same Σ. More generally, we can consider the case where Z N(0, V) instead. By considering V 1/2 Z instead of Z, we can assume that V = I m. Then the density function of ( W/k ) 1/2 Z, with independent Z N(0, I m ) and W W m (Σ,k), has the general form Γ ( (k + m)/2 ) ( f(t) = (πk) m/2 Γ(k/2) 1+ tt Σ 1 t ) (k+m)/2, t m. det(σ) k This density function can be used to define multivariate t-copulas. Method of maximum likelihood (MLE) Suppose X 1,...,X n have joint density function f θ (x 1,...,x n ), θ p. The likelihood function based on the observations X 1,..., X n is L(θ) =f θ (X 1,...,X n ), and the MLE θ of θ is the maximizer of L(θ) or the log-likelihood l(θ) = log f θ (X 1,..., X n ). Example: MLE in Gaussian regression models Consider the regression model y t = β T x t + ɛ t, in which the regressors x t F t 1 = {y t 1, x t 1,..., y 1, x 1 }, and ɛ t are i.i.d. N(0, σ 2 ). Letting θ =(β, σ 2 ), the likelihood function is n { 1 [ ]} L(θ) = exp (y t β T x t ) 2 /(2σ 2 ). 2πσ t=1 It then follows that the MLE of β is the same as the OLS estimator β that minimizes n t=1 (y t β T x t ) 2, and the MLE of σ 2 is σ 2 = n 1 n t=1 (y t β T x t ) 2.

17 Consistency of MLE The classical asymptotic theory of maximum likelihood deals with i.i.d. X t so that l(θ) = n t=1 log f θ(x t ). Let θ 0 denote the true parameter value. By the law of large numbers, n 1 l(θ) E θ0 {log f θ (X 1 )} as n, (4) with probability 1, for every fixed θ. Jensen s inequality Suppose X has a finite mean and ϕ : is convex, i.e., λϕ(x) + (1 λ)ϕ(y) ϕ(λx + (1 λ)y) for all 0 < λ < 1 and x, y. (If the inequality above is strict, then ϕ is called strictly convex.) Then Eϕ(X) ϕ(ex), and the inequality is strict if ϕ is strictly convex unless X is degenerate (i.e., it takes only one value). Consistency of MLE Since log x is a (strictly) convex function, it follows from Jensen s inequality that ( ) ( ) E θ0 log f θ(x 1 ) f log E θ (X 1 ) f θ0 (X 1 ) θ0 = log 1 = 0; f θ0 (X 1 ) the inequality above is strict unless the random variable f θ (X 1 ) /f θ0 (X 1 ) is degenerate under P θ0, which means P θ0 {f θ (X 1 )=f θ0 (X 1 )} =1since E θ0 {f θ (X 1 )/f θ0 (X 1 )} =1. Thus, except in the case where f θ (X 1 )=f θ0 (X 1 ) with probability 1, E θ0 log f θ (X 1 ) <E θ0 log f θ0 (X 1 ) for θ θ 0. This suggests that the MLE is consistent (i.e., it converges to θ 0 with probability 1 as n ). A rigorous proof of consistency of the MLE involves sharpening the argument in (4) by considering (instead of fixed θ) sup θ U n 1 l(θ) over neighborhoods U of θ θ 0.

18 Asymptotic normality of MLE Suppose f θ (x) is smooth in θ. Then we can find the MLE by solving the equation l(θ) =0, where l is the (gradient) vector of partial derivatives l/ θ i. For large n, the consistent estimate θ is near θ 0, and therefore Taylor s theorem yields 0 = l( θ) = l(θ 0 )+ 2 l(θ )( θ θ 0 ), (5) where θ lies between θ and θ 0 (and is therefore close to θ 0 ) and ( 2 2 ) l l(θ) = θ i θ j 1 i,j p is the (Hessian) matrix of second partial derivatives. From (5) it follows that θ θ 0 = ( 2 l(θ ) ) 1 l(θ0 ). (7) (6) Asymptotic normality of MLE The central limit theorem (CLT) can be applied to n l(θ 0 )= log f θ (X t ) θ=θ0, (8) t=1 noting that the gradient vector log f θ (X t ) θ=θ0 has mean 0 and covariance matrix { }) 2 I(θ 0 )= (E θ log f θ i θ θ (X t ) θ=θ0. (9) j 1 i,j p The matrix I(θ 0 ) is called the Fisher information matrix. By the law of large numbers, n 1 2 l(θ 0 )= n 1 with probability 1. n t=1 2 log f θ (X t ) θ=θ0 I(θ 0 ) (10)

19 Asymptotic normality of MLE Moreover, n 1 2 l(θ )=n 1 2 l(θ 0 )+o(1) with probability 1 by the consistency of θ. From (7) and (10), it follows that n( θ θ0 )= ( n 1 2 l(θ ) ) 1 ( l(θ0 )/ n ) (11) converges in distribution to N(0, I 1 (θ 0 )) as n, in view of the following theorem. Slutsky s theorem If X n converges in distribution to X and Y n converges in probability to some nonrandom C, then Y n X n converges in distribution to CX. It is often more convenient to work with the observed Fisher information matrix 2 l( θ) than the Fisher information matrix ni(θ 0 ) and to use the following variant of the limiting normal distribution of (11): ( θ θ 0 ) T ( 2 l( θ))( θ θ 0 ) has a limiting χ 2 p-distribution. (12) Asymptotic normality of MLE The preceding asymptotic theory has assumed that X t are i.i.d. so that the law of large numbers and CLT can be applied. More generally, without assuming that X t are independent, the log-likelihood function can still be written as a sum: l n (θ) = n log f θ (X t X 1,...,X t 1 ). (13) t=1 It can be shown that { l n (θ 0 ),n 1} is a martingale, or equivalently, } X1 E { log f θ (X t X 1,..., X t 1 ) θ=θ0,...,x t 1 = 0, and martingale strong laws and CLTs can be used to establish the consistency and asymptotic normality of θ under certain conditions, so (12) still holds in this more general setting.

20 Asymptotic inference confidence ellipsoid In view of (12), an approximate 100(1 α)% confidence ellipsoid for θ 0 is { θ :( θ θ) T ( 2 l n ( θ) ) } ( θ θ) χ 2 p;1 α, (14) where χ 2 p;1 α is the (1 α)th quantile of the χ2 p -distribution. Asymptotic inference hypothesis testing To test H 0 : θ Θ 0, where Θ 0 is a q-dimensional subspace of the parameter space with 0 q < p, the generalized likelihood ratio (GLR) statistic is Λ =2 { l n ( θ) sup θ Θ 0 l n (θ) }, (15) where l n (θ) is the log-likelihood function, defined in (13). The derivation of the limiting normal distribution of (11) can be modified to show that Λ has a limiting χ 2 p q distribution under H 0. (16) Hence the GLR test with significance level (type I error probability) α rejects H 0 if Λ exceeds χ 2 p q;1 α.

21 Asymptotic inference delta method Let g : p. If θ is the MLE of θ, then g( θ) is the MLE of g(θ). If g is continuously differentiable in some neighborhood of the true parameter value θ 0, then g( θ) =g(θ 0 ) + ( g(θ )) T ( θ θ 0 ), where θ lies between θ and θ 0. Therefore g( θ) g(θ 0 ) is asymptotically normal with mean 0 and variance ( ) T ( ) 1 ( ) g( θ) 2 l n ( θ) g( θ). (17) This is called the delta method for evaluating the asymptotic variance of g( θ). The above asymptotic normality of g( θ) g(θ 0 ) can be used to construct confidence intervals for g(θ 0 ). Parametric bootstrap When θ 0 is unknown, the parametric bootstrap draws bootstrap samples (X b,1,...,x b,n ), b =1,..., B, from Fb θ, where θ is the MLE of θ. Then we can use these bootstrap samples to estimate the bias and standard error of θ. To construct a bootstrap confidence region for g(θ 0 ), we can use ( g( θ) g(θ 0 )) / ( ) T ( ) 1 ( ) g( θ) 2 l n ( θ) g( θ) as an approximate pivot whose (asymptotic) distribution does not depend on the unknown θ 0 and use the quantiles of its bootstrap distribution in place of standard normal quantiles.

Lecture 32: Asymptotic confidence sets and likelihoods

Lecture 32: Asymptotic confidence sets and likelihoods Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

Notes on Random Vectors and Multivariate Normal

Notes on Random Vectors and Multivariate Normal MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests

More information

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality. 88 Chapter 5 Distribution Theory In this chapter, we summarize the distributions related to the normal distribution that occur in linear models. Before turning to this general problem that assumes normal

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

STA 2101/442 Assignment 3 1

STA 2101/442 Assignment 3 1 STA 2101/442 Assignment 3 1 These questions are practice for the midterm and final exam, and are not to be handed in. 1. Suppose X 1,..., X n are a random sample from a distribution with mean µ and variance

More information

Lecture 11. Multivariate Normal theory

Lecture 11. Multivariate Normal theory 10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini April 27, 2018 1 / 1 Table of Contents 2 / 1 Linear Algebra Review Read 3.1 and 3.2 from text. 1. Fundamental subspace (rank-nullity, etc.) Im(X ) = ker(x T ) R

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

Probability Background

Probability Background Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak. Large Sample Theory Large Sample Theory is a name given to the search for approximations to the behaviour of statistical procedures which are derived by computing limits as the sample size, n, tends to

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

The Multivariate Normal Distribution 1

The Multivariate Normal Distribution 1 The Multivariate Normal Distribution 1 STA 302 Fall 2017 1 See last slide for copyright information. 1 / 40 Overview 1 Moment-generating Functions 2 Definition 3 Properties 4 χ 2 and t distributions 2

More information

The Multivariate Normal Distribution 1

The Multivariate Normal Distribution 1 The Multivariate Normal Distribution 1 STA 302 Fall 2014 1 See last slide for copyright information. 1 / 37 Overview 1 Moment-generating Functions 2 Definition 3 Properties 4 χ 2 and t distributions 2

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Asymptotic Theory. L. Magee revised January 21, 2013

Asymptotic Theory. L. Magee revised January 21, 2013 Asymptotic Theory L. Magee revised January 21, 2013 1 Convergence 1.1 Definitions Let a n to refer to a random variable that is a function of n random variables. Convergence in Probability The scalar a

More information

Lecture 28: Asymptotic confidence sets

Lecture 28: Asymptotic confidence sets Lecture 28: Asymptotic confidence sets 1 α asymptotic confidence sets Similar to testing hypotheses, in many situations it is difficult to find a confidence set with a given confidence coefficient or level

More information

Lecture 1: Introduction

Lecture 1: Introduction Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Exam 2. Jeremy Morris. March 23, 2006

Exam 2. Jeremy Morris. March 23, 2006 Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

8 - Continuous random vectors

8 - Continuous random vectors 8-1 Continuous random vectors S. Lall, Stanford 2011.01.25.01 8 - Continuous random vectors Mean-square deviation Mean-variance decomposition Gaussian random vectors The Gamma function The χ 2 distribution

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get 18:2 1/24/2 TOPIC. Inequalities; measures of spread. This lecture explores the implications of Jensen s inequality for g-means in general, and for harmonic, geometric, arithmetic, and related means in

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j. Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That

More information

The Multivariate Gaussian Distribution

The Multivariate Gaussian Distribution The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4.

1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4. Multivariate normal distribution Reading: AMSA: pages 149-200 Multivariate Analysis, Spring 2016 Institute of Statistics, National Chiao Tung University March 1, 2016 1. Density and properties Brief outline

More information

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions. Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of

More information

y = 1 N y i = 1 N y. E(W )=a 1 µ 1 + a 2 µ a n µ n (1.2) and its variance is var(w )=a 2 1σ a 2 2σ a 2 nσ 2 n. (1.

y = 1 N y i = 1 N y. E(W )=a 1 µ 1 + a 2 µ a n µ n (1.2) and its variance is var(w )=a 2 1σ a 2 2σ a 2 nσ 2 n. (1. The probability density function of a continuous random variable Y (or the probability mass function if Y is discrete) is referred to simply as a probability distribution and denoted by f(y; θ) where θ

More information

Stat 5101 Notes: Algorithms

Stat 5101 Notes: Algorithms Stat 5101 Notes: Algorithms Charles J. Geyer January 22, 2016 Contents 1 Calculating an Expectation or a Probability 3 1.1 From a PMF........................... 3 1.2 From a PDF...........................

More information

Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets

Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets Confidence sets X: a sample from a population P P. θ = θ(p): a functional from P to Θ R k for a fixed integer k. C(X): a confidence

More information

Maximum likelihood estimation

Maximum likelihood estimation Maximum likelihood estimation Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Maximum likelihood estimation 1/26 Outline 1 Statistical concepts 2 A short review of convex analysis and optimization

More information

The Multivariate Normal Distribution. In this case according to our theorem

The Multivariate Normal Distribution. In this case according to our theorem The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

18 Bivariate normal distribution I

18 Bivariate normal distribution I 8 Bivariate normal distribution I 8 Example Imagine firing arrows at a target Hopefully they will fall close to the target centre As we fire more arrows we find a high density near the centre and fewer

More information

Introduction to Estimation Methods for Time Series models Lecture 2

Introduction to Estimation Methods for Time Series models Lecture 2 Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Course information: Instructor: Tim Hanson, Leconte 219C, phone Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment.

Course information: Instructor: Tim Hanson, Leconte 219C, phone Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment. Course information: Instructor: Tim Hanson, Leconte 219C, phone 777-3859. Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment. Text: Applied Linear Statistical Models (5th Edition),

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

MLES & Multivariate Normal Theory

MLES & Multivariate Normal Theory Merlise Clyde September 6, 2016 Outline Expectations of Quadratic Forms Distribution Linear Transformations Distribution of estimates under normality Properties of MLE s Recap Ŷ = ˆµ is an unbiased estimate

More information

Matrix Approach to Simple Linear Regression: An Overview

Matrix Approach to Simple Linear Regression: An Overview Matrix Approach to Simple Linear Regression: An Overview Aspects of matrices that you should know: Definition of a matrix Addition/subtraction/multiplication of matrices Symmetric/diagonal/identity matrix

More information

Lecture 3 September 1

Lecture 3 September 1 STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

ST 740: Linear Models and Multivariate Normal Inference

ST 740: Linear Models and Multivariate Normal Inference ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /

More information

Common-Knowledge / Cheat Sheet

Common-Knowledge / Cheat Sheet CSE 521: Design and Analysis of Algorithms I Fall 2018 Common-Knowledge / Cheat Sheet 1 Randomized Algorithm Expectation: For a random variable X with domain, the discrete set S, E [X] = s S P [X = s]

More information

Statistics of stochastic processes

Statistics of stochastic processes Introduction Statistics of stochastic processes Generally statistics is performed on observations y 1,..., y n assumed to be realizations of independent random variables Y 1,..., Y n. 14 settembre 2014

More information

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v } Statistics 35 Probability I Fall 6 (63 Final Exam Solutions Instructor: Michael Kozdron (a Solving for X and Y gives X UV and Y V UV, so that the Jacobian of this transformation is x x u v J y y v u v

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review

More information

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay Solutions to Final Exam 1. (13 pts) Consider the monthly log returns, in percentages, of five

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama Instructions This exam has 7 pages in total, numbered 1 to 7. Make sure your exam has all the pages. This exam will be 2 hours

More information

Chp 4. Expectation and Variance

Chp 4. Expectation and Variance Chp 4. Expectation and Variance 1 Expectation In this chapter, we will introduce two objectives to directly reflect the properties of a random variable or vector, which are the Expectation and Variance.

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

TAMS39 Lecture 10 Principal Component Analysis Factor Analysis

TAMS39 Lecture 10 Principal Component Analysis Factor Analysis TAMS39 Lecture 10 Principal Component Analysis Factor Analysis Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content - Lecture Principal component analysis

More information

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data

Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data Applied Mathematical Sciences, Vol 3, 009, no 54, 695-70 Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data Evelina Veleva Rousse University A Kanchev Department of Numerical

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

7 Multivariate Statistical Models

7 Multivariate Statistical Models 7 Multivariate Statistical Models 7.1 Introduction Often we are not interested merely in a single random variable but rather in the joint behavior of several random variables, for example, returns on several

More information

Lecture 2: Review of Probability

Lecture 2: Review of Probability Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................

More information

δ -method and M-estimation

δ -method and M-estimation Econ 2110, fall 2016, Part IVb Asymptotic Theory: δ -method and M-estimation Maximilian Kasy Department of Economics, Harvard University 1 / 40 Example Suppose we estimate the average effect of class size

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

Canonical Correlation Analysis of Longitudinal Data

Canonical Correlation Analysis of Longitudinal Data Biometrics Section JSM 2008 Canonical Correlation Analysis of Longitudinal Data Jayesh Srivastava Dayanand N Naik Abstract Studying the relationship between two sets of variables is an important multivariate

More information

Introduction to Normal Distribution

Introduction to Normal Distribution Introduction to Normal Distribution Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 17-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction

More information

Gaussian random variables inr n

Gaussian random variables inr n Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b

More information

Gaussian vectors and central limit theorem

Gaussian vectors and central limit theorem Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86 Outline 1 Real Gaussian random variables

More information

Multivariate Gaussian Analysis

Multivariate Gaussian Analysis BS2 Statistical Inference, Lecture 7, Hilary Term 2009 February 13, 2009 Marginal and conditional distributions For a positive definite covariance matrix Σ, the multivariate Gaussian distribution has density

More information

Topics in Probability and Statistics

Topics in Probability and Statistics Topics in Probability and tatistics A Fundamental Construction uppose {, P } is a sample space (with probability P), and suppose X : R is a random variable. The distribution of X is the probability P X

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

3. Probability and Statistics

3. Probability and Statistics FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important

More information

Sampling Distributions

Sampling Distributions Merlise Clyde Duke University September 8, 2016 Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2 Prostate Example > library(lasso2);

More information

1. Let A be a 2 2 nonzero real matrix. Which of the following is true?

1. Let A be a 2 2 nonzero real matrix. Which of the following is true? 1. Let A be a 2 2 nonzero real matrix. Which of the following is true? (A) A has a nonzero eigenvalue. (B) A 2 has at least one positive entry. (C) trace (A 2 ) is positive. (D) All entries of A 2 cannot

More information

Probability Lecture III (August, 2006)

Probability Lecture III (August, 2006) robability Lecture III (August, 2006) 1 Some roperties of Random Vectors and Matrices We generalize univariate notions in this section. Definition 1 Let U = U ij k l, a matrix of random variables. Suppose

More information

LIST OF FORMULAS FOR STK1100 AND STK1110

LIST OF FORMULAS FOR STK1100 AND STK1110 LIST OF FORMULAS FOR STK1100 AND STK1110 (Version of 11. November 2015) 1. Probability Let A, B, A 1, A 2,..., B 1, B 2,... be events, that is, subsets of a sample space Ω. a) Axioms: A probability function

More information

Chapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability

Chapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability Probability Theory Chapter 6 Convergence Four different convergence concepts Let X 1, X 2, be a sequence of (usually dependent) random variables Definition 1.1. X n converges almost surely (a.s.), or with

More information

Submitted to the Brazilian Journal of Probability and Statistics

Submitted to the Brazilian Journal of Probability and Statistics Submitted to the Brazilian Journal of Probability and Statistics Multivariate normal approximation of the maximum likelihood estimator via the delta method Andreas Anastasiou a and Robert E. Gaunt b a

More information

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan Monte-Carlo MMD-MA, Université Paris-Dauphine Xiaolu Tan tan@ceremade.dauphine.fr Septembre 2015 Contents 1 Introduction 1 1.1 The principle.................................. 1 1.2 The error analysis

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. (a) If X and Y are independent, Corr(X, Y ) = 0. (b) (c) (d) (e) A consistent estimator must be asymptotically

More information

The Delta Method and Applications

The Delta Method and Applications Chapter 5 The Delta Method and Applications 5.1 Local linear approximations Suppose that a particular random sequence converges in distribution to a particular constant. The idea of using a first-order

More information

2. Matrix Algebra and Random Vectors

2. Matrix Algebra and Random Vectors 2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns

More information

Principal component analysis

Principal component analysis Principal component analysis Angela Montanari 1 Introduction Principal component analysis (PCA) is one of the most popular multivariate statistical methods. It was first introduced by Pearson (1901) and

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:

More information

III - MULTIVARIATE RANDOM VARIABLES

III - MULTIVARIATE RANDOM VARIABLES Computational Methods and advanced Statistics Tools III - MULTIVARIATE RANDOM VARIABLES A random vector, or multivariate random variable, is a vector of n scalar random variables. The random vector is

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information