Multivariate Analysis and Likelihood Inference
|
|
- Alexina French
- 6 years ago
- Views:
Transcription
1 Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference
2 Joint density function The distribution of a random vector X =(X 1,...,X k ) T is characterized by its joint density function f such that P {X 1 a 1,..., X k a k } = ak a1 f(x 1,...,x k )dx 1 dx k when X has a differentiable distribution function (which is the left-hand side of (??)), and for the case of discrete X i, f(a 1,...,a k )=P {X 1 = a 1,...,X k = a k }. Change of variables Suppose X has a differentiable distribution function with density function f X. Let g : k k be continuously differentiable and one-to-one. Let Y = g(x). Since g is a one-to-one transformation, we can solve Y = g(x) to obtain X = h(y); h =(h 1,..., h k ) T is the inverse of the transformation g. The density function f Y of Y is given by f Y (y) =f X (h(y)) det( h/ y), where h/ y is the Jacobian matrix ( h i / y j ) 1 i,j k. The inverse function theorem of multivariate calculus gives h/ y =( g/ x) 1, or equivalently x/ y =( y/ x) 1.
3 Mean and covariance matrix The mean vector of Y =(Y 1,...,Y k ) T is E(Y) = (EY 1,..., EY k ) T. The covariance matrix of X is Cov(Y) = (Cov(Y i,y j )) 1 i,j k = E { (Y E(Y))(Y E(Y)) T }. The following linearity (and bilinearity) properties hold for EY (and Cov(Y)). For nonrandom p k matrix A and p 1 vector B, E(AY + B) =AEY + B, Cov(AY + B) =ACov(Y)A T. Eigenvalue and eigenvector Definition 1 Let V be a p p matrix. A complex number λ is called an eigenvalue of V if there exists a p 1 vector a 0 such that Va = λa. Such a vector a is called an eigenvector of V corresponding to the eigenvalue λ. We can rewrite Va = λa as (V λi)a =0. Since a 0, this implies that λ is a solution of the equation det(v λi) = 0. Since det(v λi) is a polynomial of degree p in λ, there are p eigenvalues, not necessarily distinct. If V is symmetric, then all its eigenvalues are real and can be ordered as λ 1 λ p. Moreover, tr(v) =λ λ p, det(v) =λ 1...λ p. If a is an eigenvector of V corresponding to the eigenvalue λ, then so is ca for c 0. Moreover, premultiplying λa = Va by a T yields λ = a T Va / a 2.
4 Eigenvalue and eigenvector We shall now focus on the case where V is the covariance matrix of a random vector x =(X 1,...,X p ) T. Proposition 1 The eigenvalues of V are real and nonnegative since V is symmetric and nonnegative definite. Consider the linear combination a T x with a =1that has the largest variance over all such linear combinations. Maximizing a T Va (= Var(a T x)) over a with a =1yields that Va = λa. Since a 0, this implies that λ is an eigenvalue of V and a is the corresponding eigenvector, and that λ = a T Va. Denote λ 1 = max a: a =1 a T Va and a 1 be the corresponding eigenvector with a 1 =1. Eigenvalue and eigenvector Next consider a T x that maximizes Var(a T x)=a T Va subject to a T 1 a =0and a =1. Its solution should satisfy {a T Va + λ(1 a T a)+ηa T 1 a} =0for i =1,..., p. a i This implies that λ is an eigenvalue of V with corresponding unit eigenvector a 2 that is orthogonal to a 1. Proceeding inductively in this way, we obtain the eigenvalue λ 1 λ 2 λ p of V with the optimization characterization λ k+1 = max a T Va. a: a =1,a T a j =0 for 1 j k The maximizer a k+1 of the above equation is an eigenvector corresponding to the eigenvalue λ k+1. Definition 2 a T i x is called the ith principal component of x.
5 Principal components (a) From the preceding derivation, λ i = Var(a T i x). (b) The elements of the eigenvectors a i are called factor loadings in PCA. Note that (a 1,..., a p ) is an orthogonal matrix, i.e., Moreover, I =(a 1,..., a p )(a 1,...,a p ) T = a 1 a T a pa T p. V = λ 1 a 1 a T λ p a p a T p. (c) Since V = Cov(x) and x =(X 1,...,X p ) T, tr(v) = p i=1 Var(X i). Hence it follows from the fact tr(v) = p i=1 λ i that λ λ p = p Var(X i ). i=1 PCA of sample covariance matrix Suppose x 1,..., x n are n independent observations from a multivariate population with mean µ and covariance matrix V. The mean vector µ can be estimated by x = n i=1 x i/n, and the covariance matrix can be estimated by the sample covariance matrix, i.e., V = n (x i x)(x i x) T / (n 1). i=1 Let â j =(â 1j,...,â pj ) T be the eigenvector corresponding to the jth largest eigenvalue λ j of the sample covariance matrix V.
6 PCA of sample covariance matrix For fixed p, the following asymptotic results have been established under the assumption λ 1 > > λ p > 0 as n : (i) n(â i a i ) has a limiting normal distribution with mean 0 and covariance matrix λ i λ h (λ i λ h ) 2 a ha T h. h i Note that â i and a i are chosen so that they have the same sign. Approximating covariance structure Even when p is large, if many eigenvalues λ h are small compared with the largest λ i s, the covariance structure of the data can be approximated by a few principal components with large eigenvalues. PCA of sample covariance matrix For fixed p, the following asymptotic results have been established under the assumption λ 1 > > λ p > 0 as n : (i) n(â i a i ) has a limiting normal distribution with mean 0 and covariance matrix λ i λ h (λ i λ h ) 2 a ha T h. h i Note that â i and a i are chosen so that they have the same sign. (ii) n( λ i λ i ) has a limiting N(0, 2λ 2 i ) distribution. Moreover, for i j, the limiting distribution of n( λ i λ i, λ j λ j ) is that of two independent normals.
7 Representation of data in terms of principal components Let x i =(x i1,...,x ip ) T, i =1,..., n, be the multivariate sample. Let X k =(x 1k,..., x nk ) T, 1 k p, and define Y j = â 1j X â pj X p, 1 j p, where â j =(â 1j,...,â pj ) T is the eigenvector corresponding to the jth largest eigenvalue λ j of the sample covariance matrix V with â j =1. Since the matrix  := (â ij) 1 i,j p is orthogonal (i.e., ÂÂT = I), it follows that the observed data X k can be expressed in terms of the principal components Y j as X k = â k1 Y â kp Y p. Moreover, the sample correlation matrix of the transformed data y tj is the identity matrix. PCA of correlation matrices An alternative to Cov(x) is the correlation matrix Corr(x). Since a primary goal of PCA is to uncover low-dimensional subspaces around which x tends to concentrate, scaling the components of x by their standard deviations may work better for this purpose, and applying PCA to sample correlation matrices may give more stable results.
8 Example: PCA of U.S. Treasury-LIBOR swap rates swp1y swp4y swp10y 4 swp30y 1 07/05/00 05/03/01 03/11/02 01/09/03 11/07/03 09/10/04 07/15/05 Figure 1: Daily swap rates of four of the eight maturities T =1, 2, 3, 4, 5, 7, 10, 30 from July 3, 2000 to July 15, We now apply PCA to account for the variance of daily changes in swap rates with a few principal components (or factors). Example: PCA of U.S. Treasury-LIBOR swap rates 0.2 0!0.2! !0.2!0.4 07/05/00 05/03/01 03/11/02 01/09/03 11/07/03 09/10/04 07/15/05 Figure 2: The differenced series for 1-year (top) and 30-year (bottom) swap rates. Let d kt = r kt r k,t 1 denote the daily changes in the k-year swap rates during the period.
9 Example: PCA of U.S. Treasury-LIBOR swap rates The sample mean vector and the sample covariance and correlation matrices of the difference data {(d 1t,...,d 8t ), 1 t 1256} are bµ = ` T bσ = , C A Example: PCA of U.S. Treasury-LIBOR swap rates The sample mean vector and the sample covariance and correlation matrices of the difference data {(d 1t,...,d 8t ), 1 t 1256} are bµ = ` T br = C A
10 Example: PCA of U.S. Treasury-LIBOR swap rates Table 1: PCA of covariance matrix of swap rate data. PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC λ i Proportion factor loadings Example: PCA of U.S. Treasury-LIBOR swap rates Table 2: PCA of correlation matrix of swap rate data. PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 λ i Proportion factor loadings
11 Example: PCA of U.S. Treasury-LIBOR swap rates Let x tk =(d tk µ k ) / σ k, where σ k is the sample standard deviation of d tk. We represent X k =(x 1k,...,x nk ) T in terms of the principal components Y 1,...,Y 8, X k = â k1 Y â kp Y p. Tables 1 and 2 shows the PCA results using the sample covariance matrix Σ and the sample correlation matrix R. From Table 2, the proportion of the overall variance explained by the first three principal components are 90.8%, 6.7% and 1.3%, respectively. Hence 98.9% of the overall variance is explained by the first three principal components, yielding the approximation (d k µ k ) / σ k. = âk1 Y 1 + â k2 Y 2 + â k3 Y 3 for the differenced swap rates. Example: PCA of U.S. Treasury-LIBOR swap rates 0.6 the 1st the 2nd the 3rd !0.2!0.4! Figure 3: Eigenvectors of the first three principal components, which represent the parallel shift, tilt, and convexity components of swap rate movements.
12 Example: PCA of U.S. Treasury-LIBOR swap rates The graphs of the factor loadings show the following constituents of interest rate or yield curve movements documented in the literature: (a) Parallel shift component. The factor loadings of the first principal component are roughly constant over different maturities. This means that a change in the swap rate for one maturity is accompanied by roughly the same change for other maturities. (b) Tilt component. The factor loadings of the second principal component have a monotonic change with maturities. Changes in short-maturity and long-maturity swap rates in this component have opposite signs. (c) Curvature component. The factor loadings of the third principal component are different for the midterm rates and the average of short- and long-term rates, revealing a curvature of the graph that resembles the convex shape of the relationship between the rates and their maturities. Joint, marginal and conditional distributions An m 1 random vector Y is said to have a multivariate normal distribution N(µ, V) if its density function is f(y) = 1 { (2π) m/2 det(v) exp 1 } 2 (y µ)t V 1 (y µ), y m. Suppose that Y N(µ, V) is partitioned as ( ) ( ) ( ) Y1 µ1 V11 V Y =, µ =, V = 12, Y 2 µ 2 V 21 V 22 where Y 1 and µ 1 have dimension r(< m) and V 11 is r r. Then Y 1 N(µ 1, V 11 ), Y 2 N(µ 2, V 22 ), and the conditional distribution of Y 1 given Y 2 = y 2 is N ( µ 1 + V 12 V 1 22 (y 2 µ 2 ), V 11 V 12 V 1 22 V 21).
13 Orthogonality and independence Let ζ be a standard normal random vector of dimension p and let A, B be r p and q p nonrandom matrices such that AB T = 0. Then the components of U 1 := Aζ are uncorrelated with those of U 2 := Bζ because E(U 1 U T 2 )=AE(ζζ T )B T = 0. Note that ( ) ( ) U1 A U = = ζ U 2 B has a multivariate normal distribution with mean 0 and covariance matrix ( ) AA T 0 V = 0 BB T, Then the density function of U is a product of the density functions of U 1 and U 2, so U 1 and U 2 are independent. Note also that U T 1 U 2 = 0; i.e., they are orthogonal vectors. Sample covariance matrix and Wishart distribution Let Y 1,...,Y n be independent m 1 random N(µ, Σ) vectors with n > m and positive definite Σ. Define Ȳ = 1 n n Y i, W = i=1 n (Y i Ȳ)(Y i Ȳ)T. i=1 It can be shown that the sample mean vector Ȳ N(µ, Σ/n), and Ȳ and the sample covariance matrix W/(n 1) are independent. We next generalize n t=1 (y t ȳ) 2/ σ 2 χ 2 n 1 to the multivariate case. Wishart distribution Suppose Y 1,..., Y n m are independent N(0, Σ) random vectors. Then the random matrix W = Σ n i=1 Y iyi T is said to have a Wishart distribution, denoted by W m (Σ,n).
14 Wishart distribution The density function of the Wishart distribution W m (Σ,n) generalizes this to f(w) = det(w)(n m 1)/2 exp { 1 2 tr( Σ 1 W )}, W > 0, [2 m det(σ)] n/2 Γ m (n/2) in which W > 0 denotes that W is positive definite and Γ m ( ) denotes the multivariate gamma function Γ m (t) =π m(m 1)/4 m i=1 ( Γ t i 1 2 Note that the usual gamma function corresponds to the case m =1. ). Wishart distribution The following properties of the Wishart distribution are generalizations of some well-known properties of the chi-square distribution and the gamma(α, β) distribution. (a) If W W m (Σ,n), then E(W) =nσ. (b) Let W 1,..., W k be independently distributed with W j W m (Σ,n j ), j =1,..., k. Then k j=1 W j W m (Σ, k j=1 n j). (c) Let W W m (Σ,n) and A be a nonrandom m m nonsingular matrix. Then AWA T W m (AΣA T,n). In particular, a T Wa (a T Σa)χ 2 n for all nonrandom vectors a 0.
15 Multivariate-t distribution Multivariate-t distribution If Z N(0, Σ) and W W m (Σ,k) such that the m 1 vector Z and the m m matrix W are independent, then ( W/k ) 1/2 Z is said to have the m-variate t-distribution with k degrees of freedom. The m-variate t-distribution with k degrees of freedom has the density function f(t) = Γ( (k + m)/2 ) ( (πk) m/2 Γ(k/2) 1+ tt t k ) (k+m)/2, t m. If t has the m-variate t-distribution with k degrees of freedom such that k m, then k m +1 km tt t has the F m,k m+1 -distribution. (1) Multivariate-t distribution Applying (1) to Hotelling s T 2 -statistic T 2 = n(ȳ µ)t ( W / (n 1) ) 1 ( Ȳ µ), (2) where Ȳ and W are defined in (??), yields noting that n m m(n 1) T 2 F m,n m, (3) [ / 1/2 [ n( ] [ / 1/2N(0, W (n 1)] Ȳ µ) = W m (Σ,n 1) (n 1)] Σ) has the multivariate t-distribution.
16 Multivariate-t distribution In the definition of the multivariate t-distribution, it is assumed that Z and W share the same Σ. More generally, we can consider the case where Z N(0, V) instead. By considering V 1/2 Z instead of Z, we can assume that V = I m. Then the density function of ( W/k ) 1/2 Z, with independent Z N(0, I m ) and W W m (Σ,k), has the general form Γ ( (k + m)/2 ) ( f(t) = (πk) m/2 Γ(k/2) 1+ tt Σ 1 t ) (k+m)/2, t m. det(σ) k This density function can be used to define multivariate t-copulas. Method of maximum likelihood (MLE) Suppose X 1,...,X n have joint density function f θ (x 1,...,x n ), θ p. The likelihood function based on the observations X 1,..., X n is L(θ) =f θ (X 1,...,X n ), and the MLE θ of θ is the maximizer of L(θ) or the log-likelihood l(θ) = log f θ (X 1,..., X n ). Example: MLE in Gaussian regression models Consider the regression model y t = β T x t + ɛ t, in which the regressors x t F t 1 = {y t 1, x t 1,..., y 1, x 1 }, and ɛ t are i.i.d. N(0, σ 2 ). Letting θ =(β, σ 2 ), the likelihood function is n { 1 [ ]} L(θ) = exp (y t β T x t ) 2 /(2σ 2 ). 2πσ t=1 It then follows that the MLE of β is the same as the OLS estimator β that minimizes n t=1 (y t β T x t ) 2, and the MLE of σ 2 is σ 2 = n 1 n t=1 (y t β T x t ) 2.
17 Consistency of MLE The classical asymptotic theory of maximum likelihood deals with i.i.d. X t so that l(θ) = n t=1 log f θ(x t ). Let θ 0 denote the true parameter value. By the law of large numbers, n 1 l(θ) E θ0 {log f θ (X 1 )} as n, (4) with probability 1, for every fixed θ. Jensen s inequality Suppose X has a finite mean and ϕ : is convex, i.e., λϕ(x) + (1 λ)ϕ(y) ϕ(λx + (1 λ)y) for all 0 < λ < 1 and x, y. (If the inequality above is strict, then ϕ is called strictly convex.) Then Eϕ(X) ϕ(ex), and the inequality is strict if ϕ is strictly convex unless X is degenerate (i.e., it takes only one value). Consistency of MLE Since log x is a (strictly) convex function, it follows from Jensen s inequality that ( ) ( ) E θ0 log f θ(x 1 ) f log E θ (X 1 ) f θ0 (X 1 ) θ0 = log 1 = 0; f θ0 (X 1 ) the inequality above is strict unless the random variable f θ (X 1 ) /f θ0 (X 1 ) is degenerate under P θ0, which means P θ0 {f θ (X 1 )=f θ0 (X 1 )} =1since E θ0 {f θ (X 1 )/f θ0 (X 1 )} =1. Thus, except in the case where f θ (X 1 )=f θ0 (X 1 ) with probability 1, E θ0 log f θ (X 1 ) <E θ0 log f θ0 (X 1 ) for θ θ 0. This suggests that the MLE is consistent (i.e., it converges to θ 0 with probability 1 as n ). A rigorous proof of consistency of the MLE involves sharpening the argument in (4) by considering (instead of fixed θ) sup θ U n 1 l(θ) over neighborhoods U of θ θ 0.
18 Asymptotic normality of MLE Suppose f θ (x) is smooth in θ. Then we can find the MLE by solving the equation l(θ) =0, where l is the (gradient) vector of partial derivatives l/ θ i. For large n, the consistent estimate θ is near θ 0, and therefore Taylor s theorem yields 0 = l( θ) = l(θ 0 )+ 2 l(θ )( θ θ 0 ), (5) where θ lies between θ and θ 0 (and is therefore close to θ 0 ) and ( 2 2 ) l l(θ) = θ i θ j 1 i,j p is the (Hessian) matrix of second partial derivatives. From (5) it follows that θ θ 0 = ( 2 l(θ ) ) 1 l(θ0 ). (7) (6) Asymptotic normality of MLE The central limit theorem (CLT) can be applied to n l(θ 0 )= log f θ (X t ) θ=θ0, (8) t=1 noting that the gradient vector log f θ (X t ) θ=θ0 has mean 0 and covariance matrix { }) 2 I(θ 0 )= (E θ log f θ i θ θ (X t ) θ=θ0. (9) j 1 i,j p The matrix I(θ 0 ) is called the Fisher information matrix. By the law of large numbers, n 1 2 l(θ 0 )= n 1 with probability 1. n t=1 2 log f θ (X t ) θ=θ0 I(θ 0 ) (10)
19 Asymptotic normality of MLE Moreover, n 1 2 l(θ )=n 1 2 l(θ 0 )+o(1) with probability 1 by the consistency of θ. From (7) and (10), it follows that n( θ θ0 )= ( n 1 2 l(θ ) ) 1 ( l(θ0 )/ n ) (11) converges in distribution to N(0, I 1 (θ 0 )) as n, in view of the following theorem. Slutsky s theorem If X n converges in distribution to X and Y n converges in probability to some nonrandom C, then Y n X n converges in distribution to CX. It is often more convenient to work with the observed Fisher information matrix 2 l( θ) than the Fisher information matrix ni(θ 0 ) and to use the following variant of the limiting normal distribution of (11): ( θ θ 0 ) T ( 2 l( θ))( θ θ 0 ) has a limiting χ 2 p-distribution. (12) Asymptotic normality of MLE The preceding asymptotic theory has assumed that X t are i.i.d. so that the law of large numbers and CLT can be applied. More generally, without assuming that X t are independent, the log-likelihood function can still be written as a sum: l n (θ) = n log f θ (X t X 1,...,X t 1 ). (13) t=1 It can be shown that { l n (θ 0 ),n 1} is a martingale, or equivalently, } X1 E { log f θ (X t X 1,..., X t 1 ) θ=θ0,...,x t 1 = 0, and martingale strong laws and CLTs can be used to establish the consistency and asymptotic normality of θ under certain conditions, so (12) still holds in this more general setting.
20 Asymptotic inference confidence ellipsoid In view of (12), an approximate 100(1 α)% confidence ellipsoid for θ 0 is { θ :( θ θ) T ( 2 l n ( θ) ) } ( θ θ) χ 2 p;1 α, (14) where χ 2 p;1 α is the (1 α)th quantile of the χ2 p -distribution. Asymptotic inference hypothesis testing To test H 0 : θ Θ 0, where Θ 0 is a q-dimensional subspace of the parameter space with 0 q < p, the generalized likelihood ratio (GLR) statistic is Λ =2 { l n ( θ) sup θ Θ 0 l n (θ) }, (15) where l n (θ) is the log-likelihood function, defined in (13). The derivation of the limiting normal distribution of (11) can be modified to show that Λ has a limiting χ 2 p q distribution under H 0. (16) Hence the GLR test with significance level (type I error probability) α rejects H 0 if Λ exceeds χ 2 p q;1 α.
21 Asymptotic inference delta method Let g : p. If θ is the MLE of θ, then g( θ) is the MLE of g(θ). If g is continuously differentiable in some neighborhood of the true parameter value θ 0, then g( θ) =g(θ 0 ) + ( g(θ )) T ( θ θ 0 ), where θ lies between θ and θ 0. Therefore g( θ) g(θ 0 ) is asymptotically normal with mean 0 and variance ( ) T ( ) 1 ( ) g( θ) 2 l n ( θ) g( θ). (17) This is called the delta method for evaluating the asymptotic variance of g( θ). The above asymptotic normality of g( θ) g(θ 0 ) can be used to construct confidence intervals for g(θ 0 ). Parametric bootstrap When θ 0 is unknown, the parametric bootstrap draws bootstrap samples (X b,1,...,x b,n ), b =1,..., B, from Fb θ, where θ is the MLE of θ. Then we can use these bootstrap samples to estimate the bias and standard error of θ. To construct a bootstrap confidence region for g(θ 0 ), we can use ( g( θ) g(θ 0 )) / ( ) T ( ) 1 ( ) g( θ) 2 l n ( θ) g( θ) as an approximate pivot whose (asymptotic) distribution does not depend on the unknown θ 0 and use the quantiles of its bootstrap distribution in place of standard normal quantiles.
Lecture 32: Asymptotic confidence sets and likelihoods
Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence
More informationMAS223 Statistical Inference and Modelling Exercises
MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,
More informationNotes on Random Vectors and Multivariate Normal
MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution
More informationPart IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015
Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)
More informationNotes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed
18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,
More informationRandom Variables and Their Distributions
Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital
More informationTest Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics
Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests
More information5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.
88 Chapter 5 Distribution Theory In this chapter, we summarize the distributions related to the normal distribution that occur in linear models. Before turning to this general problem that assumes normal
More informationMultivariate Regression
Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the
More informationSTA 2101/442 Assignment 3 1
STA 2101/442 Assignment 3 1 These questions are practice for the midterm and final exam, and are not to be handed in. 1. Suppose X 1,..., X n are a random sample from a distribution with mean µ and variance
More informationLecture 11. Multivariate Normal theory
10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini April 27, 2018 1 / 1 Table of Contents 2 / 1 Linear Algebra Review Read 3.1 and 3.2 from text. 1. Fundamental subspace (rank-nullity, etc.) Im(X ) = ker(x T ) R
More informationAsymptotic Statistics-VI. Changliang Zou
Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous
More informationProbability Background
Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationP n. This is called the law of large numbers but it comes in two forms: Strong and Weak.
Large Sample Theory Large Sample Theory is a name given to the search for approximations to the behaviour of statistical procedures which are derived by computing limits as the sample size, n, tends to
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationThe Multivariate Normal Distribution 1
The Multivariate Normal Distribution 1 STA 302 Fall 2017 1 See last slide for copyright information. 1 / 40 Overview 1 Moment-generating Functions 2 Definition 3 Properties 4 χ 2 and t distributions 2
More informationThe Multivariate Normal Distribution 1
The Multivariate Normal Distribution 1 STA 302 Fall 2014 1 See last slide for copyright information. 1 / 37 Overview 1 Moment-generating Functions 2 Definition 3 Properties 4 χ 2 and t distributions 2
More informationMaximum Likelihood Estimation
Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7
MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is
More informationAsymptotic Theory. L. Magee revised January 21, 2013
Asymptotic Theory L. Magee revised January 21, 2013 1 Convergence 1.1 Definitions Let a n to refer to a random variable that is a function of n random variables. Convergence in Probability The scalar a
More informationLecture 28: Asymptotic confidence sets
Lecture 28: Asymptotic confidence sets 1 α asymptotic confidence sets Similar to testing hypotheses, in many situations it is difficult to find a confidence set with a given confidence coefficient or level
More informationLecture 1: Introduction
Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationExam 2. Jeremy Morris. March 23, 2006
Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More information8 - Continuous random vectors
8-1 Continuous random vectors S. Lall, Stanford 2011.01.25.01 8 - Continuous random vectors Mean-square deviation Mean-variance decomposition Gaussian random vectors The Gamma function The χ 2 distribution
More informationMaximum Likelihood Estimation
Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function
More informationIf g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get
18:2 1/24/2 TOPIC. Inequalities; measures of spread. This lecture explores the implications of Jensen s inequality for g-means in general, and for harmonic, geometric, arithmetic, and related means in
More informationRegression and Statistical Inference
Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF
More informationMISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30
MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)
More informationThe purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.
Chapter 9 Pearson s chi-square test 9. Null hypothesis asymptotics Let X, X 2, be independent from a multinomial(, p) distribution, where p is a k-vector with nonnegative entries that sum to one. That
More informationThe Multivariate Gaussian Distribution
The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance
More information1 Data Arrays and Decompositions
1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is
More information1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4.
Multivariate normal distribution Reading: AMSA: pages 149-200 Multivariate Analysis, Spring 2016 Institute of Statistics, National Chiao Tung University March 1, 2016 1. Density and properties Brief outline
More informationFor iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.
Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of
More informationy = 1 N y i = 1 N y. E(W )=a 1 µ 1 + a 2 µ a n µ n (1.2) and its variance is var(w )=a 2 1σ a 2 2σ a 2 nσ 2 n. (1.
The probability density function of a continuous random variable Y (or the probability mass function if Y is discrete) is referred to simply as a probability distribution and denoted by f(y; θ) where θ
More informationStat 5101 Notes: Algorithms
Stat 5101 Notes: Algorithms Charles J. Geyer January 22, 2016 Contents 1 Calculating an Expectation or a Probability 3 1.1 From a PMF........................... 3 1.2 From a PDF...........................
More informationChapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets
Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets Confidence sets X: a sample from a population P P. θ = θ(p): a functional from P to Θ R k for a fixed integer k. C(X): a confidence
More informationMaximum likelihood estimation
Maximum likelihood estimation Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Maximum likelihood estimation 1/26 Outline 1 Statistical concepts 2 A short review of convex analysis and optimization
More informationThe Multivariate Normal Distribution. In this case according to our theorem
The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix
More information18 Bivariate normal distribution I
8 Bivariate normal distribution I 8 Example Imagine firing arrows at a target Hopefully they will fall close to the target centre As we fire more arrows we find a high density near the centre and fewer
More informationIntroduction to Estimation Methods for Time Series models Lecture 2
Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:
More informationx. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).
.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics
More informationCourse information: Instructor: Tim Hanson, Leconte 219C, phone Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment.
Course information: Instructor: Tim Hanson, Leconte 219C, phone 777-3859. Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment. Text: Applied Linear Statistical Models (5th Edition),
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationMLES & Multivariate Normal Theory
Merlise Clyde September 6, 2016 Outline Expectations of Quadratic Forms Distribution Linear Transformations Distribution of estimates under normality Properties of MLE s Recap Ŷ = ˆµ is an unbiased estimate
More informationMatrix Approach to Simple Linear Regression: An Overview
Matrix Approach to Simple Linear Regression: An Overview Aspects of matrices that you should know: Definition of a matrix Addition/subtraction/multiplication of matrices Symmetric/diagonal/identity matrix
More informationLecture 3 September 1
STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have
More informationMultivariate Statistics
Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical
More informationST 740: Linear Models and Multivariate Normal Inference
ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /
More informationCommon-Knowledge / Cheat Sheet
CSE 521: Design and Analysis of Algorithms I Fall 2018 Common-Knowledge / Cheat Sheet 1 Randomized Algorithm Expectation: For a random variable X with domain, the discrete set S, E [X] = s S P [X = s]
More informationStatistics of stochastic processes
Introduction Statistics of stochastic processes Generally statistics is performed on observations y 1,..., y n assumed to be realizations of independent random variables Y 1,..., Y n. 14 settembre 2014
More informationStatistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }
Statistics 35 Probability I Fall 6 (63 Final Exam Solutions Instructor: Michael Kozdron (a Solving for X and Y gives X UV and Y V UV, so that the Jacobian of this transformation is x x u v J y y v u v
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationSTAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.
STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review
More informationTHE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam
THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay Solutions to Final Exam 1. (13 pts) Consider the monthly log returns, in percentages, of five
More informationEC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)
1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For
More informationQualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama
Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama Instructions This exam has 7 pages in total, numbered 1 to 7. Make sure your exam has all the pages. This exam will be 2 hours
More informationChp 4. Expectation and Variance
Chp 4. Expectation and Variance 1 Expectation In this chapter, we will introduce two objectives to directly reflect the properties of a random variable or vector, which are the Expectation and Variance.
More informationPart IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationTAMS39 Lecture 10 Principal Component Analysis Factor Analysis
TAMS39 Lecture 10 Principal Component Analysis Factor Analysis Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content - Lecture Principal component analysis
More informationTesting a Normal Covariance Matrix for Small Samples with Monotone Missing Data
Applied Mathematical Sciences, Vol 3, 009, no 54, 695-70 Testing a Normal Covariance Matrix for Small Samples with Monotone Missing Data Evelina Veleva Rousse University A Kanchev Department of Numerical
More informationEconomics 583: Econometric Theory I A Primer on Asymptotics
Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:
More information7 Multivariate Statistical Models
7 Multivariate Statistical Models 7.1 Introduction Often we are not interested merely in a single random variable but rather in the joint behavior of several random variables, for example, returns on several
More informationLecture 2: Review of Probability
Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................
More informationδ -method and M-estimation
Econ 2110, fall 2016, Part IVb Asymptotic Theory: δ -method and M-estimation Maximilian Kasy Department of Economics, Harvard University 1 / 40 Example Suppose we estimate the average effect of class size
More informationNotes on the Multivariate Normal and Related Topics
Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions
More informationCanonical Correlation Analysis of Longitudinal Data
Biometrics Section JSM 2008 Canonical Correlation Analysis of Longitudinal Data Jayesh Srivastava Dayanand N Naik Abstract Studying the relationship between two sets of variables is an important multivariate
More informationIntroduction to Normal Distribution
Introduction to Normal Distribution Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 17-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction
More informationGaussian random variables inr n
Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b
More informationGaussian vectors and central limit theorem
Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86 Outline 1 Real Gaussian random variables
More informationMultivariate Gaussian Analysis
BS2 Statistical Inference, Lecture 7, Hilary Term 2009 February 13, 2009 Marginal and conditional distributions For a positive definite covariance matrix Σ, the multivariate Gaussian distribution has density
More informationTopics in Probability and Statistics
Topics in Probability and tatistics A Fundamental Construction uppose {, P } is a sample space (with probability P), and suppose X : R is a random variable. The distribution of X is the probability P X
More informationAsymptotic Statistics-III. Changliang Zou
Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (
More informationReview of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley
Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate
More information3. Probability and Statistics
FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important
More informationSampling Distributions
Merlise Clyde Duke University September 8, 2016 Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2 Prostate Example > library(lasso2);
More information1. Let A be a 2 2 nonzero real matrix. Which of the following is true?
1. Let A be a 2 2 nonzero real matrix. Which of the following is true? (A) A has a nonzero eigenvalue. (B) A 2 has at least one positive entry. (C) trace (A 2 ) is positive. (D) All entries of A 2 cannot
More informationProbability Lecture III (August, 2006)
robability Lecture III (August, 2006) 1 Some roperties of Random Vectors and Matrices We generalize univariate notions in this section. Definition 1 Let U = U ij k l, a matrix of random variables. Suppose
More informationLIST OF FORMULAS FOR STK1100 AND STK1110
LIST OF FORMULAS FOR STK1100 AND STK1110 (Version of 11. November 2015) 1. Probability Let A, B, A 1, A 2,..., B 1, B 2,... be events, that is, subsets of a sample space Ω. a) Axioms: A probability function
More informationChapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability
Probability Theory Chapter 6 Convergence Four different convergence concepts Let X 1, X 2, be a sequence of (usually dependent) random variables Definition 1.1. X n converges almost surely (a.s.), or with
More informationSubmitted to the Brazilian Journal of Probability and Statistics
Submitted to the Brazilian Journal of Probability and Statistics Multivariate normal approximation of the maximum likelihood estimator via the delta method Andreas Anastasiou a and Robert E. Gaunt b a
More informationMonte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan
Monte-Carlo MMD-MA, Université Paris-Dauphine Xiaolu Tan tan@ceremade.dauphine.fr Septembre 2015 Contents 1 Introduction 1 1.1 The principle.................................. 1 1.2 The error analysis
More informationStatement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.
MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss
More informationLecture 15. Hypothesis testing in the linear model
14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma
More informationFinal Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.
1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. (a) If X and Y are independent, Corr(X, Y ) = 0. (b) (c) (d) (e) A consistent estimator must be asymptotically
More informationThe Delta Method and Applications
Chapter 5 The Delta Method and Applications 5.1 Local linear approximations Suppose that a particular random sequence converges in distribution to a particular constant. The idea of using a first-order
More information2. Matrix Algebra and Random Vectors
2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns
More informationPrincipal component analysis
Principal component analysis Angela Montanari 1 Introduction Principal component analysis (PCA) is one of the most popular multivariate statistical methods. It was first introduced by Pearson (1901) and
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationMa 3/103: Lecture 24 Linear Regression I: Estimation
Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the
More informationChapter 2: Fundamentals of Statistics Lecture 15: Models and statistics
Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:
More informationIII - MULTIVARIATE RANDOM VARIABLES
Computational Methods and advanced Statistics Tools III - MULTIVARIATE RANDOM VARIABLES A random vector, or multivariate random variable, is a vector of n scalar random variables. The random vector is
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationTable of Contents. Multivariate methods. Introduction II. Introduction I
Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation
More information