Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Similar documents
Lecture 20: Multivariate convergence and the Central Limit Theorem

EE 4TM4: Digital Communications II Probability Theory

Distribution of Random Samples & Limit theorems

This section is optional.

Lecture 19: Convergence

7.1 Convergence of sequences of random variables

Convergence of random variables. (telegram style notes) P.J.C. Spreij

LECTURE 8: ASYMPTOTICS I

Lecture 8: Convergence of transformations and law of large numbers

Lecture 7: Properties of Random Samples

Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

ST5215: Advanced Statistical Theory

7.1 Convergence of sequences of random variables

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

SDS 321: Introduction to Probability and Statistics

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Topic 9: Sampling Distributions of Estimators

STAT Homework 1 - Solutions

Topic 9: Sampling Distributions of Estimators

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Mathematical Statistics - MS

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Large Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution

Topic 9: Sampling Distributions of Estimators

4. Partial Sums and the Central Limit Theorem

STATISTICAL METHODS FOR BUSINESS

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

Lecture Chapter 6: Convergence of Random Sequences

Probability and Random Processes

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Estimation of the Mean and the ACVF

Probability and Statistics

It is often useful to approximate complicated functions using simpler ones. We consider the task of approximating a function by a polynomial.

Mathematics 170B Selected HW Solutions.

Lecture 18: Sampling distributions

Sequences and Series of Functions

Random Variables, Sampling and Estimation

Entropy Rates and Asymptotic Equipartition

STA Object Data Analysis - A List of Projects. January 18, 2018

32 estimating the cumulative distribution function

HOMEWORK I: PREREQUISITES FROM MATH 727

The Central Limit Theorem

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Parameter, Statistic and Random Samples

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

Lecture 2: Concentration Bounds

TAMS24: Notations and Formulas

Basics of Probability Theory (for Theory of Computation courses)

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

Math 61CM - Solutions to homework 3

Univariate Normal distribution. whereaandbareconstants. Theprobabilitydensityfunction(PDFfromnowon)ofZ andx is. ) 2π.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

Lecture 11 and 12: Basic estimation theory

Math 525: Lecture 5. January 18, 2018

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1

An Introduction to Randomized Algorithms

Lecture 23: Minimal sufficiency

AMS 216 Stochastic Differential Equations Lecture 02 Copyright by Hongyun Wang, UCSC ( ( )) 2 = E X 2 ( ( )) 2

Learning Theory: Lecture Notes

Matrix Representation of Data in Experiment

Asymptotic distribution of products of sums of independent random variables

Lecture 33: Bootstrap

Limit Theorems. Convergence in Probability. Let X be the number of heads observed in n tosses. Then, E[X] = np and Var[X] = np(1-p).

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Expectation and Variance of a random variable

ECE534, Spring 2018: Final Exam

Statistical Theory; Why is the Gaussian Distribution so popular?

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Generalized Semi- Markov Processes (GSMP)

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

STAT Homework 2 - Solutions

1 Convergence in Probability and the Weak Law of Large Numbers

Notes 27 : Brownian motion: path properties

MAS111 Convergence and Continuity

Quick Review of Probability

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Exercise 4.3 Use the Continuity Theorem to prove the Cramér-Wold Theorem, Theorem. (1) φ a X(1).

2. The volume of the solid of revolution generated by revolving the area bounded by the

We are mainly going to be concerned with power series in x, such as. (x)} converges - that is, lims N n

Binomial Distribution

Approximations and more PMFs and PDFs

Notes 19 : Martingale CLT

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

2.2. Central limit theorem.

CH.25 Discrete Random Variables

1.010 Uncertainty in Engineering Fall 2008

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Joint Probability Distributions and Random Samples. Jointly Distributed Random Variables. Chapter { }

Transcription:

Probability 2 - Notes 0 Some Useful Iequalities. Lemma. If X is a radom variable ad g(x 0 for all x i the support of f X, the P(g(X E[g(X]. Proof. (cotiuous case P(g(X Corollaries x:g(x f X (xdx x:g(x g(x f X (xdx g(x f X (xdx E[g(X]. Markov s Iequality. For ay h > 0, P( X h E[ X ] h. Whe X oly takes o-egative values the for ay h > 0 P(X h E[X] h. Proof. Take g(x X h lemma. i the lemma. If X oly takes o-egative values take g(x X h i the 2. Chebyshev s Iequality. If E[X] µ ad Var(X σ 2, which are fiite, the for ay h > 0 P( X µ h σ2 h 2. Proof. Take g(x ( 2 X µ h i the lemma. Note: Chebyshev s iequality ca be used to derive the weak law of large umbers. This is specified i the theorem below. Theorem. Let X,X 2,... be a sequece of i.i.d. radom variables each with fiite mea µ ad fiite variace σ 2. The for ay ε > 0 ad δ > 0 there exists a N such that P( X µ ε δ for all N, where X j X j. Proof. Note that E[X ] µ ad Var(X σ2. Cosider ay ε > 0 ad δ > 0. Apply Chebyshev s iequality to X ad let h ε. The P( X µ ε σ2 δ provided σ2 ε 2 ε 2 δ. Therefore we eed oly choose N σ2 to obtai the result. ε 2 δ Note. Observe that lim P( X µ ε 0 for ay ε > 0. We say that X coverges i probability to µ as teds to ifiity. Some examples usig the iequalities.. From Markov s iequality with h Nµ, if X is a o-egative radom variable, P(X > Nµ for ay N > 0. µ Nµ N 2. If σ 2 0 the from Chebyshev s iequality for ay h > 0, P( X µ < h P( X µ h σ2. Hece P(X µ lim h 2 h 0 P( X µ < h. So variace zero implies the radom variable takes a sigle value with probability.

3. Whe σ 2 > 0 Chebyshev s iequality gives a lower boud o the probability that X lies withi k stadard deviatios from the mea. Take h kσ. The P( X µ < kσ P( X µ kσ σ2 (kσ 2 k 2 4. Whe σ, how large a sample is eeded if we wat to be at least 95% certai that the sample mea lies withi 0.5 of the true mea? We use Chebyshev s iequality for X with h 0.5. The P( X µ < 0.5 ( X µ 0.5 provided 0.05 4 80. So we eed a miimum sample size of 80. The Cetral Limit Theorem. σ2 (0.5 2 4 0.95 Let X,X 2,... be a sequece of i.i.d. radom variables each with fiite mea µ ad fiite variace σ 2 ad let X be the sample mea based o X,...,X. The we ca fid a approximatio for P(X A whe is large by writig the evet for X i terms of the stadardized variable Z (X µ/σ (i.e. P(X A P ( Z (A µ σ ad provig that lim P(Z z Φ(z z 2π e x2 /2 dx which is the c.d.f. of N(0,. The proof of this result uses the m.g.f. ad the followig lemma. Lemma. Let Z,Z 2,... be a sequece of radom variables. If lim M Z (t M(t, which is the m.g.f. of a distributio with c.d.f. F, the lim F Z (z F(z at all poits z for which F(z is cotiuous. Theorem (The Cetral Limit Theorem. Let X,X 2,... be a sequece of i.i.d. radom variables each with m.g.f. which exists for etries i a ope regio about zero so is differetiable with fiite mea, deoted by µ, ad fiite variace, deoted by σ 2. Let Z (X µ/σ, the lim P(Z z Φ(z. Proof. Let U j (X j µ/σ ad let M U (t be the commo m.g.f. The M U (t e µt/σ M X (t/σ exists i a ope iterval about t 0, M(0, M (0 E[U] 0 ad M (0 E[U 2 ] Var(U. So U,U 2,... are i.i.d. with mea zero ad variace oe. Now [ M Z (t E e t j U j/ ] j E[e tu j/ ] ( M U (t/ Takig logs to base e gives l(m Z (t (l(m U (t/. L Hopital s rule. The Now let x / ad use lim l(m U(t/ lim x 0 l(m U (xt x 2

lim tm x 0 U (xt/m U(xt lim 2x t 2 (M x 0 t2 (M U (0M U(0 (M U (02 2(M U (0 2 t2 2 U (xtm U(xt (M U (xt2 /(M U (xt 2 2 Hece lim t l(m Z (t t 2 /2 ad so lim t M Z (t e t2 /2. Sice this is the m.g.f. of a N(0, distributio, usig the lemma proves that lim P(Z z Φ(z z 2π e x2 /2 dx The bivariate ad multivariate ormal distributio. A idirect method was used o problem sheet 9 to get you to derive stadard results for a bivariate ormal distributio. The results are summarised below. The results may be proved directly, however it is messy uless you use matrix ad vector otatio. Oce you do this results ca just as easily be obtaied for the multivariate ormal, so we may just as well derive results immediately for the more geeral case. Summary of results for the bivariate ormal distributio.. If X ad X 2 have bivariate ormal distributio the the joit p.d.f. is f X,X 2 (x,x 2 2πσ σ 2 ( ρ 2 e ( ( x µ 2 2ρ ( ( ( x µ x2 µ 2 x2 µ 2 2( ρ 2 σ σ σ + 2 2 σ 2 for all x,y. The distributio has parameters µ,µ 2,σ 2,σ2 2,ρ. The parameter ρ is restricted so that < ρ <. 2. The joit m.g.f. is M X,X 2 (t,t 2 e (µ t +µ 2 t 2 + 2 (σ2 t2 +2ρσ σ 2 t t 2 +σ 2 2 t2 2 This ca be used to idetify the parameters ad fid the margial distributios. M X (t M X,X 2 (t,0 e µ t + 2 σ2 t2. Hece X N(µ,σ 2. Similarly X 2 N(µ 2,σ 2 2. Differetiatig the joit m.g.f. i stadard maer shows that ρ(x,x 2 ρ. 3. X ad X 2 are idepedet iff ρ 0. This is easily see from either the joit p.d.f. or the joit m.g.f. 4. The coditioal distributio of X 2 X x is ormal with mea liear i x ad variace which does ot deped o x. A similar result holds for the coditioal distributio of X X 2 x 2. Usig vector ad matrix otatio. X ( X X 2 ( x ; x x 2 ( t ; t t 2 ( µ ; m µ 2 ( σ 2 ; V ρσ σ 2 ρσ σ 2 σ 2 2

The m is the vector of meas ad V is the variace-covariace matrix. Note that V σ 2 σ2 2 ( ρ2 ad V ( σ 2 ( ρ 2 ρ σ σ 2 ρ σ σ 2 σ 2 2 Hece f X (x (2π 2/2 V /2 e 2 (x mt V (x m for all x. Also M X (t e tt m+ 2 tt Vt. The Multivariate Normal Distributio. We agai use matrix ad vector otatio, but ow there are radom variables so that X, x, t ad m are ow -vectors with i th etries X i, x i, t i ad µ i ad V is the matrix with ii th etry σ 2 i ad i j th etry (for i j σ i j. Note that V is symmetric so that V T V. The joit p.d.f. is f X (x (2π /2 V /2 e 2 (x mt V (x m for all x. We say that X N(m,V. We ca fid the joit m.g.f. quite easily. ] M X (t E [e j t jx j E[e tt X ]... (2π /2 V /2 e 2((x m T V (x m 2t T x dx...dx We do the equivalet of completig the square, i.e. we write (x m T V (x m 2t T x (x m a T V (x m a + b for a suitable choice of the -vector a of costats ad a costat b. The M X (t e b/2... (2π /2 V /2 e 2 (x m at V (x m a dx...dx e b/2. We just eed to fid a ad b. Expadig we have ((x m a T V ((x m a + b (x m T V (x m 2a T V (x m + a T V a + b (x m T V (x m 2a T V x + [ 2a T V m + a T V a + b ] This has to equal (x m T V (x m 2t T x for all x. Hece we eed a T V t T ad b [ 2a T V m + a T V a ]. Hece a Vt ad b [ 2t T m + t T Vt ]. Therefore

M X (t e b/2 e tt m+ 2 tt Vt Results obtaied usig the m.g.f.. Ay (o-empty subset of multivariate ormals is multivariate ormal. Simply put t j 0 for all j for which X j is ot i the subset. For example M X (t M X,...,X (t,0,...,0 e t µ +t 2 σ2 /2. Hece X N(µ,σ 2. A similar result holds for X i. This idetifies the parameters µ i ad σ 2 i as the mea ad variace of X i. Also M X,X 2 (t,t 2 M X,...,X (t,t 2,0,...,0 e t µ +t 2 µ 2 + 2 (t2 σ2 +2σ 2t t 2 +σ 2 2 t2 2 Hece X ad X 2 have bivariate ormal distributio with σ 2 Cov(X,X 2. A similar result holds for the joit distributio of X i ad X j for i j. This idetifies V as the variace-covariace matrix for X,...,X. 2. X is a vector of idepedet radom variables iff V is diagoal (i.e. all off-diagoal etries are zero so that σ i j 0 for i j. Proof. From (, if the X s are idepedet the σ i j Cov(X i,x j 0 for all i j, so that V is diagoal. If V is diagoal the t T Vt j σ2 j t2 j ad hece M X (t e tt m+ 2 tt Vt j ( e µ jt j + 2 σ2 j t2 j /2 j M Xj (t j By the uiqueess of the joit m.g.f., X,...,X are idepedet. 3. Liearly idepedet liear fuctios of multivariate ormal radom variables are multivariate ormal radom variables. If Y AX + b, where A is a o-sigular matrix ad b is a (colum -vector of costats, the Y N(Am + b,ava T. Proof. Use the joit m.g.f. M Y (t E[e tt Y ] E[e tt AX+b ] e tt b E[e (AT t T X ] e tt b M X (A T t e tt b e (AT t T m+ 2 (AT t T V(A T t e tt (Am+b+ 2 tt (AVA T t This is just the m.g.f. for the multivariate ormal distributio with vector of meas Am + b ad variace-covariace matrix AVA T. Hece, from the uiqueess of the joit m.g.f, Y N(Am + b,ava T. Note that from (2 a subset of the Y s is multivariate ormal.

NOTE. The results cocerig the vector of meas ad variace-covariace matrix for liear fuctios of radom variables hold regardless of the joit distributio of X,...,X. We defie the expectatio of a vector of radom variables X, E[X] to be the vector of the expectatios ad the expectatio of a matrix of radom variables Y, E[Y], to be the matrix of the expectatios. The the variace-covariace matrix of X is just E[(X E[X](X E[X] T ]. The followig results are easily obtaied: (i Let A be a m matrix of costats, B be a m k matrix of costats ad Y be a k matrix of radom variables. The E[AY + B] AE[Y] + B. Proof. The i j th etry of E[AY + B] is E[ r A iry r j + B i j ] r A ire[y r j ] + B i j, which is the i j th etry of AE[Y] + B. The result is the immediate. (ii Let C be a k m matrix of costats ad Y be a k matrix of radom variables. The E[YC] E[Y]C. Proof. Just traspose the equatio. The result the follows from (i. Hece if Z AX+b, where A is a m matrix of costats, b is a m-vector of costats ad X is a -vector of radom variables with E[X] m ad variace-covariace matrix V, the E[Z] E[AX + b] AE[X] + b Am + b Also the variace-covariace matrix for Y is just E[(Y E[Y](Y E[Y] T ] E[A(X m(x m T A T ] AE[(X m(x m T ]A T AVA T Example. Suppose that E[X ], E[X 2 ] 0, Var(X 2, Var(X 2 4 ad Cov(X,X 2. Let Y X + X 2 ad Y 2 X + ax 2. Fid the meas, variaces ad covariace ad hece fid a so that Y ad Y 2 are ucorrelated. Writig i vector ad matrix otatio we have E[Y] Am ad the variace-covariace matrix for Y is just AVA T where m ( 0 V ( 2 4 A ( a Therefore Am ( a ( 0 (

( AVA T a ( 2 4 ( a ( 8 3 + 5a 3 + 5a 2 + 2a + 4a 2 Hece Y ad Y 2 have meas ad, variaces 8 ad 2 + 2a + 4a 2 ad covariace 3 + 5a. They are therefore ucorrelated if 3 + 5a 0, i.e. if a 3 5.