The n th moment M n of a RV X is defined to be E(X n ), M n = E(X n ). Andrew Dabrowski

Similar documents
SDS 321: Introduction to Probability and Statistics

Joint Probability Distributions and Random Samples (Devore Chapter Five)

STAT 430/510: Lecture 10

() Chapter 8 November 9, / 1

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

What if There Were No Law of Large Numbers?

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2

Continuous Distributions

Probability inequalities 11

Chapter 1 Review of Equations and Inequalities

Chapter 4. Chapter 4 sections

Warm-up Quantifiers and the harmonic series Sets Second warmup Induction Bijections. Writing more proofs. Misha Lavrov

Math 115 Spring 11 Written Homework 10 Solutions

The normal distribution

Math Lecture 4 Limit Laws

Chapter 2. Limits and Continuity 2.6 Limits Involving Infinity; Asymptotes of Graphs

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Partial Fractions. June 27, In this section, we will learn to integrate another class of functions: the rational functions.

III. Chain Reactions and Criticality

Some Review Problems for Exam 2: Solutions

Discrete Random Variables

Section Example Determine the Maclaurin series of f (x) = e x and its the interval of convergence.

Common ontinuous random variables

Math221: HW# 7 solutions

MATH 115, SUMMER 2012 LECTURE 12

SDS 321: Introduction to Probability and Statistics

Polynomial Division. You may also see this kind of problem written like this: Perform the division x2 +2x 3

Lecture 6 Basic Probability

ORF 245 Fundamentals of Statistics Chapter 4 Great Expectations

Tom Salisbury

Limits at Infinity. Horizontal Asymptotes. Definition (Limits at Infinity) Horizontal Asymptotes

DIFFERENTIATION AND INTEGRATION PART 1. Mr C s IB Standard Notes

Stat 5101 Notes: Algorithms (thru 2nd midterm)

Probability and Distributions

LECTURE 5, FRIDAY

7 Random samples and sampling distributions

Lecture 11: Continuous-valued signals and differential entropy

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Disjointness and Additivity

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Math 260: Solving the heat equation

Sums and Products. a i = a 1. i=1. a i = a i a n. n 1

Chapter 11 - Sequences and Series

Ch. 8 Math Preliminaries for Lossy Coding. 8.5 Rate-Distortion Theory

Legendre s Equation. PHYS Southern Illinois University. October 18, 2016

Derivatives and the Product Rule

56 CHAPTER 3. POLYNOMIAL FUNCTIONS

Preliminary Statistics Lecture 3: Probability Models and Distributions (Outline) prelimsoas.webs.com

ORF 245 Fundamentals of Statistics Great Expectations

Math Lecture 23 Notes

Midterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley

Central limit theorem. Paninski, Intro. Math. Stats., October 5, probability, Z N P Z, if

Sequences and infinite series

Order Statistics and Distributions

Notice that this is not just theory. Applying +5 F and -5 F in a point, has exactly the same effect as applying no force in this point at all.

MATH 829: Introduction to Data Mining and Analysis Consistency of Linear Regression

Continuous Random Variables

MAS113 Introduction to Probability and Statistics. Proofs of theorems

4.5 Integration of Rational Functions by Partial Fractions

2t t dt.. So the distance is (t2 +6) 3/2

CS 6820 Fall 2014 Lectures, October 3-20, 2014

STATISTICS 1 REVISION NOTES

p. 6-1 Continuous Random Variables p. 6-2

1 + lim. n n+1. f(x) = x + 1, x 1. and we check that f is increasing, instead. Using the quotient rule, we easily find that. 1 (x + 1) 1 x (x + 1) 2 =

Updated: January 16, 2016 Calculus II 7.4. Math 230. Calculus II. Brian Veitch Fall 2015 Northern Illinois University

1.5 Inverse Trigonometric Functions

Slide 1. Slide 2. Slide 3. Dilution Solution. What s the concentration of red triangles? What s the concentration of red triangles?

Handbook of Ordinary Differential Equations

Stat 5101 Notes: Algorithms

Ruminations on exterior algebra

Cool Results on Primes

Limits: How to approach them?

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 5: May 9, Abstract

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Quadratic Equations Part I

Summing Series: Solutions

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

Chapter Five Notes N P U2C5

Week 2: Review of probability and statistics

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]

ECE 650 Lecture 4. Intro to Estimation Theory Random Vectors. ECE 650 D. Van Alphen 1

INTRODUCTION TO SIGMA NOTATION

f (1 0.5)/n Z =

2 (Statistics) Random variables

Topic 4: Continuous random variables

MATH 103 Pre-Calculus Mathematics Test #3 Fall 2008 Dr. McCloskey Sample Solutions

Continuing discussion of CRC s, especially looking at two-bit errors

Math 480 The Vector Space of Differentiable Functions

Finding local extrema and intervals of increase/decrease

Unit 2-1: Factoring and Solving Quadratics. 0. I can add, subtract and multiply polynomial expressions

Vector Spaces. Chapter Two

8.1 Exponent Properties Involving Products

Chapter 4 Continuous Random Variables and Probability Distributions

STA205 Probability: Week 8 R. Wolpert

Matrices and Deformation

8. Limit Laws. lim(f g)(x) = lim f(x) lim g(x), (x) = lim x a f(x) g lim x a g(x)

Activity: Derive a matrix from input-output pairs

Integration of Rational Functions by Partial Fractions

APPM/MATH 4/5520 Solutions to Problem Set Two. = 2 y = y 2. e 1 2 x2 1 = 1. (g 1

Transcription:

We ll get to the in a few slides, but first some preliminaries. The tool we need to prove the is the sequence of moments of a probability distribution.

We ll get to the in a few slides, but first some preliminaries. The tool we need to prove the is the sequence of moments of a probability distribution. Moments provide a way of describing distributions different from density functions or c.d.f.s, and better suited to the.

The n th moment M n of a RV X is defined to be E(X n ), M n = E(X n ).

The n th moment M n of a RV X is defined to be E(X n ), M n = E(X n ). This may not exist, especially when n is large.

The n th moment M n of a RV X is defined to be E(X n ), M n = E(X n ). This may not exist, especially when n is large. But if it does exit for all n, those numbers together tell you pretty much everything you need to know about X.

The n th moment M n of a RV X is defined to be E(X n ), M n = E(X n ). This may not exist, especially when n is large. But if it does exit for all n, those numbers together tell you pretty much everything you need to know about X. For example, µ X = M 1

The n th moment M n of a RV X is defined to be E(X n ), M n = E(X n ). This may not exist, especially when n is large. But if it does exit for all n, those numbers together tell you pretty much everything you need to know about X. For example, µ X = M 1 σ X = M 2 M 2 1.

The n th moment M n of a RV X is defined to be E(X n ), M n = E(X n ). This may not exist, especially when n is large. But if it does exit for all n, those numbers together tell you pretty much everything you need to know about X. For example, µ X = M 1 σ X = M 2 M 2 1. If g(x) = a 0 + a 1 x + a 2 x 2 + + a k x k, then E(g(X )) = a 0 + a 1 M 1 + a 2 M 2 + + a k M k.

The n th moment M n of a RV X is defined to be E(X n ), M n = E(X n ). This may not exist, especially when n is large. But if it does exit for all n, those numbers together tell you pretty much everything you need to know about X. For example, µ X = M 1 σ X = M 2 M 2 1. If g(x) = a 0 + a 1 x + a 2 x 2 + + a k x k, then E(g(X )) = a 0 + a 1 M 1 + a 2 M 2 + + a k M k. In other words we can compute the expected value of any polynomial function of X just using the M n s.

We can even compute things like How? P(a X a + dx).

We can even compute things like P(a X a + dx). How? Define b(x) = 1 2πɛ e 1 2 ( x a ɛ )2. This is just a very narrow bell curve, in which 99% of the bell is within 3ɛ of a.

We can even compute things like P(a X a + dx). How? Define b(x) = 1 2πɛ e 1 2 ( x a ɛ )2. This is just a very narrow bell curve, in which 99% of the bell is within 3ɛ of a. Note that b is equivalent to a normal p.d.f., but I m not using it here as a p.d.f., I m using it like the g on the previous slide: I m going to find E(b(X )).

We can even compute things like P(a X a + dx). How? Define b(x) = 1 2πɛ e 1 2 ( x a ɛ )2. This is just a very narrow bell curve, in which 99% of the bell is within 3ɛ of a. Note that b is equivalent to a normal p.d.f., but I m not using it here as a p.d.f., I m using it like the g on the previous slide: I m going to find E(b(X )). Since b is essentially a polynomial (an infinite one, but still...) it has the same property as g: that its expectation is determined by the M n s.

We can even compute things like P(a X a + dx). How? Define b(x) = 1 2πɛ e 1 2 ( x a ɛ )2. This is just a very narrow bell curve, in which 99% of the bell is within 3ɛ of a. Note that b is equivalent to a normal p.d.f., but I m not using it here as a p.d.f., I m using it like the g on the previous slide: I m going to find E(b(X )). Since b is essentially a polynomial (an infinite one, but still...) it has the same property as g: that its expectation is determined by the M n s. And the shape of b is useful because it focuses attention on a small range of x values.

Now suppose that X is a continuous RV with p.d.f. f. If ɛ is small enough, then we can assume that f is constant over almost the entire bell.

Now suppose that X is a continuous RV with p.d.f. f. If ɛ is small enough, then we can assume that f is constant over almost the entire bell. That means E(b(X )) = b(x)f (x) dx

Now suppose that X is a continuous RV with p.d.f. f. If ɛ is small enough, then we can assume that f is constant over almost the entire bell. That means = X a >3ɛ E(b(X )) = b(x)f (x) dx + b(x)f (x) dx X a 3ɛ b(x)f (x) dx

Now suppose that X is a continuous RV with p.d.f. f. If ɛ is small enough, then we can assume that f is constant over almost the entire bell. That means = X a >3ɛ E(b(X )) = b(x)f (x) dx + 0 + f (a) X a 3ɛ b(x)f (x) dx X a 3ɛ b(x)f (x) dx b(x) dx f (a).

This means that the density function f of X is determined by the expected values of RV s like b(x ).

This means that the density function f of X is determined by the expected values of RV s like b(x ). And that in turn means that f is determined by the M n s, because although b(x) is not a polynomial, it is a limit of polynomials: b(x) = 1 2πɛ e 1 2 ( x a ɛ )2 = 1 2πɛ k=0 ( 1 2 ( x a ɛ ) 2 ) k k!

In a similar way it can be shown that those functions b also determine the probabilities in discrete distributions.

In a similar way it can be shown that those functions b also determine the probabilities in discrete distributions. I ll consider it established now that the M n s determine all the properties of a distribution. Hence if we can show that two distributions have finite and equal M n for all n we will know they are actually the same distribution. Now let s use the moments M n to prove the Central Limit Theorem.

Central Limit Theorem Theorem Let X i for i = 1, 2, 3,... be i.i.d. RVs with mean 0 and s.d. 1. Define A n = 1 n X i. n Then as n the distribution of A n approaches the standard normal. i=1

Central Limit Theorem Theorem Let X i for i = 1, 2, 3,... be i.i.d. RVs with mean 0 and s.d. 1. Define A n = 1 n X i. n Then as n the distribution of A n approaches the standard normal. Note that the particular distribution of the X i s does not matter the limiting distribution is the same no matter what you start out with. i=1

The basic idea is to show that the moments of the limiting distribution depend only on the first and second moments of the X i nothing else.

The basic idea is to show that the moments of the limiting distribution depend only on the first and second moments of the X i nothing else. Let s warm up by computing a few easy moments of A n. ( ) ( 1 n n ) E(A n ) = E X i = 1 E(X i ) = 0 n n i=1 i=1 because all the X i have mean 0.

Second moment: ( E(A 2 n) = E 1 n n i=1 ) 2 X i = 1 n n i=1 E(X 2 i ) + i j E(X i X j )

Second moment: ( E(A 2 n) = E 1 n n i=1 ) 2 X i = 1 n n i=1 E(X 2 i ) + i j E(X i X j ) = 1 [n + 0] = 1. n

Second moment: ( E(A 2 n) = E 1 n n i=1 ) 2 X i = 1 n n i=1 E(X 2 i ) + i j E(X i X j ) = 1 [n + 0] = 1. n So now we know that A n has mean 0 and s.d. 1 for all n.

Second moment: ( E(A 2 n) = E 1 n n i=1 ) 2 X i = 1 n n i=1 E(X 2 i ) + i j E(X i X j ) = 1 [n + 0] = 1. n So now we know that A n has mean 0 and s.d. 1 for all n. We conclude therefore the limiting distribution of the A n also has mean 0 and s.d. 1.

Third moment. ( E(A 3 n) = E 1 n n i=1 ) 3 X i

Third moment. = 1 n 3/2 n i=1 ( E(A 3 n) = E 1 n n i=1 E(Xi 3 ) + E(X i Xj 2 ) + i j ) 3 X i distinct i,j,k E(X i X j X k )

Third moment. = 1 n 3/2 = 1 n 3/2 n i=1 ( E(A 3 n) = E 1 n n i=1 E(Xi 3 ) + E(X i Xj 2 ) + i j ) 3 X i [ ne(x 3 1 ) + n(n 1)E(X 1 )E(X 2 2 ) + E(X i X j X k ) distinct i,j,k ( ] n )E(X 1 ) 3 3

Third moment. = 1 n 3/2 = 1 n 3/2 n i=1 ( E(A 3 n) = E 1 n n i=1 E(Xi 3 ) + E(X i Xj 2 ) + i j ) 3 X i [ ne(x 3 1 ) + n(n 1)E(X 1 )E(X 2 2 ) + = 1 n 3/2 (n M 3) where M 3 is the third moment of X 1. E(X i X j X k ) distinct i,j,k ( ] n )E(X 1 ) 3 3

Third moment. = 1 n 3/2 = 1 n 3/2 n i=1 ( E(A 3 n) = E 1 n n i=1 E(Xi 3 ) + E(X i Xj 2 ) + i j ) 3 X i [ ne(x 3 1 ) + n(n 1)E(X 1 )E(X 2 2 ) + = 1 n 3/2 (n M 3) where M 3 is the third moment of X 1. But this expression goes to 0 as n. E(X i X j X k ) distinct i,j,k ( ] n )E(X 1 ) 3 3

Third moment. = 1 n 3/2 = 1 n 3/2 n i=1 ( E(A 3 n) = E 1 n n i=1 E(Xi 3 ) + E(X i Xj 2 ) + i j ) 3 X i [ ne(x 3 1 ) + n(n 1)E(X 1 )E(X 2 2 ) + E(X i X j X k ) distinct i,j,k ( ] n )E(X 1 ) 3 3 = 1 n 3/2 (n M 3) where M 3 is the third moment of X 1. But this expression goes to 0 as n. So the third moment of the limiting distribution of A n is 0.

Note that so far the only higher moment of the X i that has appeared has been M 3 and that ended up cancelling as n.

Note that so far the only higher moment of the X i that has appeared has been M 3 and that ended up cancelling as n. One obvious pattern you ll have noticed is that any term involving a factor like E(X i ) drops out because that value is 0.

Note that so far the only higher moment of the X i that has appeared has been M 3 and that ended up cancelling as n. One obvious pattern you ll have noticed is that any term involving a factor like E(X i ) drops out because that value is 0. So the only terms to worry about are those in which each factor is a second or higher moment of X i.

Note that so far the only higher moment of the X i that has appeared has been M 3 and that ended up cancelling as n. One obvious pattern you ll have noticed is that any term involving a factor like E(X i ) drops out because that value is 0. So the only terms to worry about are those in which each factor is a second or higher moment of X i. There s another pattern lurking here though...

Consider an example: Suppose we re computing the fifth moment of A n. One expression we ll have to deal with is E(Xi 2 )E(Xj 3 ). i j

Consider an example: Suppose we re computing the fifth moment of A n. One expression we ll have to deal with is E(Xi 2 )E(Xj 3 ). i j Each summand is equal to M 2 M 3, and the number of summands is no more (actually fewer than) n 2. n 2 M 2 M 3

Consider an example: Suppose we re computing the fifth moment of A n. One expression we ll have to deal with is E(Xi 2 )E(Xj 3 ). i j Each summand is equal to M 2 M 3, and the number of summands is no more (actually fewer than) n 2. n 2 M 2 M 3 Now recall that A n includes a factor of 1 n, so that the fifth moment of A n in includes as factor of 1 n 5 = 1 n 5/2.

Consider an example: Suppose we re computing the fifth moment of A n. One expression we ll have to deal with is E(Xi 2 )E(Xj 3 ). i j Each summand is equal to M 2 M 3, and the number of summands is no more (actually fewer than) n 2. n 2 M 2 M 3 Now recall that A n includes a factor of 1 n, so that the fifth moment of A n in includes as factor of 1 n 5 = 1 n 5/2. Multiplying these two expressions together gives M n 2M 3 again approaches 0 as n. which once

Evidently very few terms survive the limit. In fact the only expressions that don t cancel in the limit are those involving only second moments of X i : 1 n k/2 E(Xi 2 1 )E(Xi 2 2 )... E(X 2 distinct i 1,i 2,...,i k/2 i k/2 )

Evidently very few terms survive the limit. In fact the only expressions that don t cancel in the limit are those involving only second moments of X i : 1 n k/2 E(Xi 2 1 )E(Xi 2 2 )... E(X 2 distinct i 1,i 2,...,i k/2 = 1 ( ) n n k/2 M k/2 2. k/2 i k/2 )

Evidently very few terms survive the limit. In fact the only expressions that don t cancel in the limit are those involving only second moments of X i : 1 n k/2 E(Xi 2 1 )E(Xi 2 2 )... E(X 2 distinct i 1,i 2,...,i k/2 = 1 ( ) n n k/2 M k/2 2. k/2 i k/2 ) Since ( n k/2) is on the order of n k/2, this term does not cancel as n.

Evidently very few terms survive the limit. In fact the only expressions that don t cancel in the limit are those involving only second moments of X i : 1 n k/2 E(Xi 2 1 )E(Xi 2 2 )... E(X 2 distinct i 1,i 2,...,i k/2 = 1 ( ) n n k/2 M k/2 2. k/2 i k/2 ) Since ( n k/2) is on the order of n k/2, this term does not cancel as n. Note that even this is only possible if k is even,

Evidently very few terms survive the limit. In fact the only expressions that don t cancel in the limit are those involving only second moments of X i : 1 n k/2 E(Xi 2 1 )E(Xi 2 2 )... E(X 2 distinct i 1,i 2,...,i k/2 = 1 ( ) n n k/2 M k/2 2. k/2 i k/2 ) Since ( n k/2) is on the order of n k/2, this term does not cancel as n. Note that even this is only possible if k is even, so we conclude that odd moments of A n approach 0 in the limit.

OK, that was the hard part, the rest is easy.

OK, that was the hard part, the rest is easy. Now we know that the limiting distribution of A n depends only on M 1 and M 2. That means that any X i s with the same mean (0) and the same s.d. (1) will have the same limiting distribution.

OK, that was the hard part, the rest is easy. Now we know that the limiting distribution of A n depends only on M 1 and M 2. That means that any X i s with the same mean (0) and the same s.d. (1) will have the same limiting distribution. Well it just so happens that we know a distribution with mean 0 and s.d. 1 which also just so happens to play very well with linear combinations of itself.

OK, that was the hard part, the rest is easy. Now we know that the limiting distribution of A n depends only on M 1 and M 2. That means that any X i s with the same mean (0) and the same s.d. (1) will have the same limiting distribution. Well it just so happens that we know a distribution with mean 0 and s.d. 1 which also just so happens to play very well with linear combinations of itself. I refer of course to the standard normal distribution. If the X i are standard normal, then A n, being a linear combination of independent standard normals is also normal.

OK, that was the hard part, the rest is easy. Now we know that the limiting distribution of A n depends only on M 1 and M 2. That means that any X i s with the same mean (0) and the same s.d. (1) will have the same limiting distribution. Well it just so happens that we know a distribution with mean 0 and s.d. 1 which also just so happens to play very well with linear combinations of itself. I refer of course to the standard normal distribution. If the X i are standard normal, then A n, being a linear combination of independent standard normals is also normal. Moreover we calculated the mean and s.d. of A n earlier and they turned out to be 0 and 1 respectively.

OK, that was the hard part, the rest is easy. Now we know that the limiting distribution of A n depends only on M 1 and M 2. That means that any X i s with the same mean (0) and the same s.d. (1) will have the same limiting distribution. Well it just so happens that we know a distribution with mean 0 and s.d. 1 which also just so happens to play very well with linear combinations of itself. I refer of course to the standard normal distribution. If the X i are standard normal, then A n, being a linear combination of independent standard normals is also normal. Moreover we calculated the mean and s.d. of A n earlier and they turned out to be 0 and 1 respectively. I.e., if the X i s are standard normal then so are the A n s.

OK, that was the hard part, the rest is easy. Now we know that the limiting distribution of A n depends only on M 1 and M 2. That means that any X i s with the same mean (0) and the same s.d. (1) will have the same limiting distribution. Well it just so happens that we know a distribution with mean 0 and s.d. 1 which also just so happens to play very well with linear combinations of itself. I refer of course to the standard normal distribution. If the X i are standard normal, then A n, being a linear combination of independent standard normals is also normal. Moreover we calculated the mean and s.d. of A n earlier and they turned out to be 0 and 1 respectively. I.e., if the X i s are standard normal then so are the A n s. And if every single A n is standard normal, then the limiting distribution is also.

To put this together: Since there is some distribution for the X i which produces the standard normal as the limit of A n, then in fact every distribution, once it s been standardized to have mean 0 and s.d. 1, must produce the standard normal in the limit also.

To put this together: Since there is some distribution for the X i which produces the standard normal as the limit of A n, then in fact every distribution, once it s been standardized to have mean 0 and s.d. 1, must produce the standard normal in the limit also. That s because we have shown that the limiting distribution depends only on the mean and s.d. of the original distribution.

To put this together: Since there is some distribution for the X i which produces the standard normal as the limit of A n, then in fact every distribution, once it s been standardized to have mean 0 and s.d. 1, must produce the standard normal in the limit also. That s because we have shown that the limiting distribution depends only on the mean and s.d. of the original distribution. Q.E.D.