Probability Lecture III (August, 2006)

Similar documents
The Multivariate Normal Distribution. In this case according to our theorem

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Probability Background

Formulas for probability theory and linear models SF2941

Lecture 11. Multivariate Normal theory

Notes on Random Vectors and Multivariate Normal

MIT Spring 2016

Multivariate Random Variable

Multiple Random Variables

The Multivariate Normal Distribution 1

Chapter 7: Special Distributions

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Asymptotic Statistics-III. Changliang Zou

Chapter 5 continued. Chapter 5 sections

Lecture 21: Convergence of transformations and generating a random variable

Random Variables and Their Distributions

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

MATH2715: Statistical Methods

STAT 100C: Linear models

Lecture 1: August 28

Chapter 5. The multivariate normal distribution. Probability Theory. Linear transformations. The mean vector and the covariance matrix

Lecture 14: Multivariate mgf s and chf s

The Gauss-Markov Model. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 61

Econ 508B: Lecture 5

MAS223 Statistical Inference and Modelling Exercises

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

matrix-free Elements of Probability Theory 1 Random Variables and Distributions Contents Elements of Probability Theory 2

Elements of Probability Theory

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

1.12 Multivariate Random Variables

Two hours. Statistical Tables to be provided THE UNIVERSITY OF MANCHESTER. 14 January :45 11:45

Uses of Asymptotic Distributions: In order to get distribution theory, we need to norm the random variable; we usually look at n 1=2 ( X n ).

Multivariate Analysis and Likelihood Inference

Convergence of Random Variables

Lecture 7: Chapter 7. Sums of Random Variables and Long-Term Averages

Review of Statistics

ACM 116: Lectures 3 4

BIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84

Stat 5101 Notes: Brand Name Distributions

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Marshall-Olkin Bivariate Exponential Distribution: Generalisations and Applications

The Multivariate Normal Distribution 1

Proving the central limit theorem

Large Sample Properties of Estimators in the Classical Linear Regression Model

Regression and Statistical Inference

3. Probability and Statistics

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Lecture 11. Probability Theory: an Overveiw

Chapter 5. Chapter 5 sections

Theoretical Statistics. Lecture 1.

Lecture 22: Variance and Covariance

Lecture 2: Repetition of probability theory and statistics

5 Operations on Multiple Random Variables

LIST OF FORMULAS FOR STK1100 AND STK1110

Lecture 15: Multivariate normal distributions

Random Vectors and Multivariate Normal Distributions

2. Matrix Algebra and Random Vectors

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

Lecture 3 - Expectation, inequalities and laws of large numbers

7. The Multivariate Normal Distribution

Stat 366 A1 (Fall 2006) Midterm Solutions (October 23) page 1

Convergence of Random Variables

Moment Generating Function. STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution

MATH 829: Introduction to Data Mining and Analysis Consistency of Linear Regression

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

The Multivariate Normal Distribution. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 36

Chapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability

The Delta Method and Applications

18 Bivariate normal distribution I

MAS113 Introduction to Probability and Statistics

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

Lesson 4: Stationary stochastic processes

1.1 Review of Probability Theory

WLS and BLUE (prelude to BLUP) Prediction

ECON 5350 Class Notes Review of Probability and Distribution Theory

LECTURES 2-3 : Stochastic Processes, Autocorrelation function. Stationarity.

Prediction. Prediction MIT Dr. Kempthorne. Spring MIT Prediction

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

A Probability Review

Multivariate Distributions

The Multivariate Normal Distribution. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 36

. Find E(V ) and var(v ).

Exercises and Answers to Chapter 1

Final Exam. Economics 835: Econometrics. Fall 2010

REVIEW OF MAIN CONCEPTS AND FORMULAS A B = Ā B. Pr(A B C) = Pr(A) Pr(A B C) =Pr(A) Pr(B A) Pr(C A B)

Lecture The Sample Mean and the Sample Variance Under Assumption of Normality

Chapter 2. Discrete Distributions

STA 2201/442 Assignment 2

A Note on Hilbertian Elliptically Contoured Distributions

Stat 5101 Notes: Brand Name Distributions

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

THE HILBERT SPACE L2

1. Stochastic Processes and Stationarity

BASICS OF PROBABILITY

Basic Concepts in Matrix Algebra

Probability and Statistics Notes

Transcription:

robability Lecture III (August, 2006) 1 Some roperties of Random Vectors and Matrices We generalize univariate notions in this section. Definition 1 Let U = U ij k l, a matrix of random variables. Suppose E U ij < for all i, j. Define the expectation of U by E(U) = (E(U ij )) k l. The following are some properties of random vectors: Let U, respectively V, denote a random k, respectively l, vector. 1. If A m k, B m l are nonrandom and EU, EV are defined, then E(AU + BV) = AE(U) + BE(V). Definition 2 For a random vector U, suppose EUi 2 < for i = 1,, k or equivalently E( U 2 ) <, where denotes Euclidean distance. Define the variance of U, often called the variance-covariance matrix, by Var(U) = E(U E(U))(U E(U)) T = (Cov(U i, U j )) k k a symmetric matrix. 2. If A is m k as before, Var(AU) = A Var(U)A T. Note that Var(U) is k k, Var(AU) is m m. 3. Let c k 1 denote a constant vector. Then Var(U + c) = Var(U). Var(c) = (0) k k. 4. The variance of any random vector is nonnegative definite symmetric matrix. To see this, note that if a k 1 is constant we can apply Var(AU) = A Var(U)A T to obtain Var(a T U) = Var(Σ k j=1a j U j ) = a T Var(U)a = Σ i,j a i a j Cov(U i, U j ). Because the variance of any random variable is nonnegative and a is arbitrary, we conclude from the above equalities that Var(U) is a nonnegative definite symmetric matrix. Definition 3 Define the moment generating function (m.g.f.) of U k 1 for t R k by 5. If U k 1, V k 1 are independent then M(t) = M U (t) = E(e tt U ) = E(e Σk j=1 t ju j ). M U+V (t) = M U (t)m V (t). 1

2 The Bivariate Normal Distribution The family of k-variate normal distributions arises on theoretical grounds when we consider the limiting behavior of sums of independent k-vectors of random variables. In this section we focus on the case k = 2 where all properties can be derived relative easily. A planar vector (X, Y ) has a bivariate normal distribution if, and only if, there exist constants a ij, 1 i, j 2, µ 1, µ 2, and independent standard normal random variables Z 1, Z 2 such that X = µ 1 + a 11 Z 1 + a 12 Z 2 Y = µ 2 + a 21 Z 1 + a 22 Z 2. In matrix notation, if A = (a ij ), µ = (µ 1, µ 2 ) T, X = (X, Y ) T, Z = (Z 1, Z 2 ) T, the definition is equivalent to Two important properties follow from the definition. X = AZ + µ. (1) roposition 4 The marginal distributions of the components of a bivariate normal random vector are (univariate) normal or degenerate (concentrate on one point). Note that the converse of the proposition is not true (See problem B.4.10 in Bickel and Doksum [2001]). Also, note that and define E(X) = µ 1 + a 11 E(Z 1 ) + a 12 E(Z 2 ) = µ 1, E(Y ) = µ 2 σ 1 = Var X, σ 2 = Var Y. Then X has N (µ 1, σ 2 1) and Y a N (µ 2, σ 2 2) distribution. roposition 5 If we apply an affine transformation g(x) = Cx + d to a vector X, which has a bivariate normal distribution, then g(x) also has such a distribution. This is clear because CX + d = C(AZ + µ) + d = (CA)Z + (Cµ + d). (2) We define the variance-covariance matrix of (X, Y ) (or of the distribution of (X, Y )) as the matrix of central second moments ( ) σ 2 Σ = 1 ρσ 1 σ 2 ρσ 1 σ 2 σ2 2, (3) where ρ = Cor(X, Y ) = Cov(X, Y ) σ 1 σ 2. This symmetric matrix is in many ways the right generalization of the variance to two dimensions. Theorem 6 Suppose that σ 1 σ 2 0 and ρ < 1. Then the density of X is p X (x) = ( 1 2π det Σ exp 1 ) 2 (x µ)t Σ 1 (x µ). (4) Remark 7 From (3) we see that Σ is nonsigular iff σ 1 σ 2 0 and ρ < 1. Bivariate normal distribution with σ 1 σ 2 0 and ρ < 1 are referred to as nondegenerate, whereas others are degenerate. 2

Remark 8 When ρ = 0, p X (x) becomes the joint density of two independent normal variables. Thus, in the bivariate normal case, correlation zero is equivalent to independence. Exercise 9 Given nonnegative constants σ 1, σ 2, a number ρ such that ρ < 1 and numbers µ 1, µ 2, construct a random vector (X, Y ) T, where X = µ 1 + σ 1 Z 1, Y = µ 2 + σ 2 (ρz 1 + 1 ρ 2 Z 2 ) Check that (X, Y ) T has a bivariate normal distribution with vector of means (µ 1, µ 2 ) T and variancecovariance matrix Σ as given in (3). 3 Convergence in robability v.s. in Distribution 3.0.1 Chebyshev s inequality If X is any random variable and a is a constant, then 3.1 Converagence in robability ( X a) E(X2 ) a 2. Definition 10 If a sequence of random variables, {Z n }, is such that ( Z n α > ε) approaches zero as n approaches infinity, for any ε > 0 and where α is some scalar, then Z n is said to converge in probability to α. Definition 11 A sequence of random vectors Z n (Z n1, Z n2,..., Z nd ) T converages in probability to Z (Z 1, Z 2,..., Z d ) T iff or equivalently Z nj Zj for 1 j d. 3.1.1 A Law of Large Numbers Z n Z 0 Theorem 12 Example 13 Let X 1, X 2,, X i, be a sequence of independent random variables with E(X i ) = µ and V ar(x i ) = σ 2. Let X n = 1 n n i=1 X i. Then, for any ε > 0, roof. We first find E( X n ) and V ar( X n ): Since the X i are independent, ( Xn µ > ε) 0 as n E( X n ) = 1 n V ar( X n ) = 1 n 2 n E(X i ) = µ i=1 n i=1 V ar(x i ) = σ2 n The desired result now follows immediately from Chebyshev s inequality, which states that ( Xn µ > ε) V ar( X n ) ε 2 = σ2 nε 2 0, as n 3

3.2 Convergence in Distribution Definition 14 Let X 1, X 2, be a sequence of random variables with cumulative distribution functions F 1, F 2,, and let X be a random variable with distribution function F. We say that X n converges in distribution to X if lim n F n(x) = F (x) at every point at which F is continuous. Definition 15 A sequence {Z n } of random vvectors converges in law (in distribution) to Z, written Z n Z, iff h(z n ) for all functions h : R d R, h continuous. 3.2.1 Central Limit Theorem L h(z) Theorem 16 The Multivariate Central Limit Theorem. Let X 1, X 2,, X n be independent and identically distributed random k vectors with E X 1 2 <. Let E(X 1 ) = µ, V ar(x 1 ) = Σ, and let S n = Σ n i=1 X i. Then, for every continuous functiong: g : R k R, ( ) Sn nµ L g g(z) n where Z N k (0, Σ). 3.2.2 The O, p, and o Notation The following asymptotic order in probability notation is useful. U n = o (1) iff U n 0 U n = O (1) iff ɛ > 0, M < such that n [ U n M] ɛ U n = o (V n ) iff U n V n = o (1) U n = O (V n ) iff U n V n = O (1) U n p V n iff U n = O (V n ) and V n = O (U n ). Note that O (1)o (1) = o (1), O (1) + o (1) = O (1), and U n L U = Un = O (1). Example 17 Suppose Z 1, Z 2,, Z n are iid as Z 1 with E Z 1 2 <. Set µ =E Z 1, then Z n = µ+o p (n 1 2 ) by the central limit theorem. Exercise 18 Let X i be the last digit of Di 2, where D i is a random digit between 0 and 9. For instance, if D i = 7 then Di 2 = 49 and X i = 9. Let X n = (X 1 + + X n )/n be the average of a large number n of such last digits, obtained from independent random digits D 1,, D n. 4

a) redict the value of Xn for large n. b) Find a number ɛ such that for n = 10, 000 the chance that your prediction is off by more than ɛ is about 1 in 200. c) Find approximately the least value of n such that your prediction of Xn is correct to within 0.01 with probability at least 0.99. d) Which can be predicted more accurately for large n: the value of Xn, or the value of D n = (D 1 + + D n )/n? e) If you just had to predict the first digit of X100, what digit should you choose to maximize your chance of being correct, and what is that chance? References [1] eter J. Bickel and Kjell A. Doksum (2001) Mathematical Statistics: Basic Ideas and Selected Topics, Vol. I, 2nd Edition. rentice Hall [2] Geoffrey Grimmett and David Stirzaker (2002) robability and Random rocesses, 3rd Edition, Oxford University ress. [3] Jim itman (1993) robability, Springer-Verlag New York, Inc. [4] John A. Rice (1995) Mathematical Statistics and Data Analysis, 2nd Edition, Duxbury ress. 5