GSBA 603. Empirical Research Methods I. Fall Instructor: Dr. Gareth James. Course Notes

Size: px

Start display at page:

Download "GSBA 603. Empirical Research Methods I. Fall Instructor: Dr. Gareth James. Course Notes"

Rodger Terry
6 years ago
Views:

1 GSBA 603 Empirical Research Methods I Fall 000 Instructor: Dr. Gareth James Course Notes

2 ii

3 Contents Random Variables. Quick Review ::::::::::::::::::::::::::::::::::::::::.. Discrete Random Variables ::::::::::::::::::::::::::::.. Continuous Random Variables ::::::::::::::::::::::::::. Transformations of a Random Variable :::::::::::::::::::::::::: 4.. Transformations Using the cdf :::::::::::::::::::::::::: 4.. Direct Approach :::::::::::::::::::::::::::::::::: 6..3 An Application to Simulating Random Variables :::::::::::::::: 6 Joint Distributions 9. Discrete Random Variables :::::::::::::::::::::::::::::::: 9.. Joint Probability Mass Functions ::::::::::::::::::::::::: 9.. Marginal Distributions ::::::::::::::::::::::::::::::: 0..3 Joint Cumulative Distribution Functions ::::::::::::::::::::: 0..4 n Dierent Random Variables ::::::::::::::::::::::::::: 0. Continuous Random Variables :::::::::::::::::::::::::::::::.3 Independent Random Variables ::::::::::::::::::::::::::::::.4 Conditional Distributions ::::::::::::::::::::::::::::::::: 3.4. Conditional Probabilities and Independence ::::::::::::::::::: 4.4. Generating Random Variables from a Joint Distribution :::::::::::: 4 3 Moments 7 3. The Expected Value of a Random Variable ::::::::::::::::::::::: Expectations for Discrete Random Variables :::::::::::::::::: Expectations for Continuous Random Variables ::::::::::::::::: Expectations of Functions ::::::::::::::::::::::::::::: Some Useful Expectation Results ::::::::::::::::::::::::: 0 3. Variance and Standard Deviation ::::::::::::::::::::::::::::: Some Useful Results :::::::::::::::::::::::::::::::: 3.. Chebyshev's Inequality ::::::::::::::::::::::::::::::: 3.3 Covariance and Correlation :::::::::::::::::::::::::::::::: Covariance ::::::::::::::::::::::::::::::::::::: Correlation ::::::::::::::::::::::::::::::::::::: Conditional Expectation :::::::::::::::::::::::::::::::::: AVery Useful Theorem :::::::::::::::::::::::::::::: Moment Generating Functions :::::::::::::::::::::::::::::: Useful Properties :::::::::::::::::::::::::::::::::: 9 iii

4 iv CONTENTS 3.5. Moment Generating Functions of Some Important Distributions ::::::: 9 4 Limit Theorems Convergence of Random Variables :::::::::::::::::::::::::::: Convergence in Probability :::::::::::::::::::::::::::: Convergence in Distribution :::::::::::::::::::::::::::: Almost Sure Convergence ::::::::::::::::::::::::::::: The Law of Large Numbers :::::::::::::::::::::::::::::::: The Central Limit Theorem :::::::::::::::::::::::::::::::: 37 5 Distributions Derived from the Normal 43 5., t and F distributions :::::::::::::::::::::::::::::::::: Estimating Population Parameters :::::::::::::::::::::::::::: 45 6 Survey Sampling Simple Random Sampling ::::::::::::::::::::::::::::::::: Estimation of a Ratio ::::::::::::::::::::::::::::::::::: Ratio Estimate of y :::::::::::::::::::::::::::::::: Stratied Random Sampling :::::::::::::::::::::::::::::::: What is Stratied Sampling? ::::::::::::::::::::::::::: Properties of Stratied Estimates ::::::::::::::::::::::::: Methods of Allocation ::::::::::::::::::::::::::::::: Relative Precision of Stratied and Simple Random Sampling ::::::::: 55 7 Point Estimation Method of Moments :::::::::::::::::::::::::::::::::::: Consistency of Method of Moments Estimators ::::::::::::::::: Method of Maximum Likelihood ::::::::::::::::::::::::::::: Methods of Evaluating Estimators :::::::::::::::::::::::::::: Using MSE to Evaluate an Estimator :::::::::::::::::::::: A Problem in Paradise ::::::::::::::::::::::::::::::: Uniform Minimum Variance Unbiased Estimators :::::::::::::::::::: Back to Maximum Likelihood Estimators :::::::::::::::::::::::: Large Sample Properties of MLEs :::::::::::::::::::::::: Some Examples ::::::::::::::::::::::::::::::::::: 70 8 Hypothesis Testing Recap :::::::::::::::::::::::::::::::::::::::::::: Neyman-Pearson Paradigm :::::::::::::::::::::::::::::::: Type I and II Errors. ::::::::::::::::::::::::::::::: p-values ::::::::::::::::::::::::::::::::::::::: Optimal Tests : Neyman-Pearson Lemma :::::::::::::::::::::::: Simple vs Simple Hypothesis ::::::::::::::::::::::::::: Composite Alternative Hypotheses :::::::::::::::::::::::: Generalized Likelihood Ratio Tests :::::::::::::::::::::::::::: The Test Statistic ::::::::::::::::::::::::::::::::: Asymptotic Distribution of ::::::::::::::::::::::::::: Some Applications of the Generalized Likelihood Ratio Test :::::::::::::: 8

5 CONTENTS v 9 Analysis of Variance One Way Analysis of Variance :::::::::::::::::::::::::::::: Notatation and Terminology ::::::::::::::::::::::::::: The Model ::::::::::::::::::::::::::::::::::::: Fitting the Model and the Sum of Squares Decomposition ::::::::::: Hypothesis Testing and the ANOVA Table ::::::::::::::::::: Estimation and Testing of Factor Level Eects ::::::::::::::::::::: Single Factor Level Mean ::::::::::::::::::::::::::::: Dierences Between Factor Levels :::::::::::::::::::::::: Contrasts :::::::::::::::::::::::::::::::::::::: Linear Combinations :::::::::::::::::::::::::::::::: Multiple Comparison Problems :::::::::::::::::::::::::::::: Tukey Multiple Comparison Procedure :::::::::::::::::::::: The Bonferroni Method :::::::::::::::::::::::::::::: Two Way Analysis of Variance :::::::::::::::::::::::::::::: Notation ::::::::::::::::::::::::::::::::::::::: The Two Way Model :::::::::::::::::::::::::::::::: Additive Models :::::::::::::::::::::::::::::::::: The Sums of Sequares Decomposition and ANOVA Table ::::::::::: An Example :::::::::::::::::::::::::::::::::::: Examining Factor Level Eects (Two Way ANOVA) :::::::::::::::::: No Interactions ::::::::::::::::::::::::::::::::::: Interactions :::::::::::::::::::::::::::::::::::::0 0 The Bootstrap 03

6 Chapter Random Variables. Quick Review Denition Arandom variable is a function from a sample space, S, into the real numbers. There are twotypes of random variables, Discrete and Continuous. denition of the dierence. The following is a formal Denition Arandom variable X is continuous if F X (x) isacontinuous function of x. Arandom variable X is discrete if F X (x) is a step function... Discrete Random Variables Only take on a countable (not necessarily nite) set of values. If X takes on values x ;x ;::: then there is a function such that p(x i )=P(X = x i ) with the constraints that X p(x i ) 0 for all i and p(x i )= i p is called the probability mass function (pmf). For example the pmf for a Binomial random variable is n p(x) = p x (, p) n,x x Another useful function is the cumulative distribution function (cdf) F X (b)=p (X b) = Note that if X takes on integer values. The cdf is non-decreasing and bx i=, P (X = i) p(b) =F (b), F (b, ),<b< lim F (x) =0; lim x!, F (x) = x!

7 CHAPTER. RANDOM VARIABLES Statistical distributions are used to model populations, as such, we usually deal with a family of distributions e.g. the Binomial family. These families are indexed by parameters. For the Binomial there are two parameters, n and p. Some common families of random variables that we will use during this course are : Bernoulli Binomial X = ( 0 with probability, p with probability p X = X + X + + X n where X i Bern(p) n p(x) = p x (, p) n,x x Geometric (time until rst success) P (X = xjp)=p(, p) x, x =; ; 3;::: Poisson P (X = xj) = e, x ; x =0; ; ; 3;::: x! Note that the Poisson is the limit of a Binomial as n!and np!. Uniform P (X = xjn) = N ; x =; ;:::;N.. Continuous Random Variables These random variables take on an uncountable set of values in some range. P (X = x) = 0 for all x but P (a X b) > 0. To understand why this is the case consider the following example. Let X be the weight of a randomly chosen \one pound bag of sugar". Even though the bag is nominally one pound in reality this will only approximate the true weight so X will be a random variable (i.e. every bag will have a slightly dierent weight). Now suppose we have a scale that reads weights up to one decimal place. Then P (Scale reads one pound) = P (0:95 <X<:05) = 0:9 (say) Now suppose we get a more accurate scale that reads weights up to two decimal places. Then P (Scale reads one pound) = P (0:995 <X<:005) = 0: (say) In theory we can continue to get more accurate scales and the probability will continue to decline. So if we get a scale that reads weights accurate to one hundred decimal places then P (Scale reads one pound) = P (0:99 95 <X<:00 05) = 0:00 0 (say) So no matter what probability you give that the bag will weigh exactly one pound I can nd a scale that is accurate enough so that the probability is less than that number. Thus the only possible probability is zero.

8 .. QUICK REVIEW 3 Using a pmf makes no sense in this case. Instead we use a probability density function (pdf) f X (x). We choose f X (x) so that and P (a <X<b) = Z, f X (x) 0 f X (x)dx = Z b a f X (x)dx We still use a cdf F X (x) =P (X x) = Z x, f X (t)dt By the fundamental theorem of calculus f X F X(x) P (a <X<b)=F X (b), F X (a) Just as with discrete random variables there are many common \families of continuous random variables" Uniform[0; ] Uniform[a; b] f(x) = f(x) = ( ( ; 0 x 0 ; otherwise b,a ;a x b 0 ; otherwise Exponential f(x) = ( e,x ;x 0 0 ; otherwise Note that we can easily calculate the cdf for the exponential F (x) = Z x 0 e,t dt Gamma f(x) = where,() = R 0 t, e,t dt and >0;>0. = [,e,t ] x 0 =, e,x (,() x, e,x ;x 0 0 ;x<0

9 4 CHAPTER. RANDOM VARIABLES Normal f(x) = p exp(,(x, ) = );, <x< Note a Standard Normal is a normal with mean zero and variance one. It is usually denoted by Z. Its density is and its cdf is f(z) =(z) = p exp(,z =); F (x) =(x)= Z x, (z)dz = Z x,, <z< p exp(,z =)dz Note that the pmf (discrete random variables), pdf (continuous) and cdf all uniquely dene a random variable's distribution.. Transformations of a Random Variable We often know X f X (x) but want to know the distribution of Y = g(x). How would we calculate f Y (y)? There are two commonly used procedures. The rst is to calculate the cdf for Y and dierentiate this to obtain the density function. The second is to calculate the density function directly... Transformations Using the cdf We will illustrate this approach through a series of examples. Example one Suppose Y = ax + b; a>0(a and b constants). Then Therefore F Y (y) = P (Y y) = P (ax + b y) = P X y, b a y, b = F X a f Y (y) = d y, b dy F X a = y, b a f X a For example if X N(; ) and Y = ax + b then f Y (y) = a p exp, y, b a = a p exp, y, b, a a!,!

10 .. TRANSFORMATIONS OF A RANDOM VARIABLE 5 So Y N(a + b; a ). Hence if we let Z = X, = X, (a ==; b =,=) then Z N, ; = N(0; ) This provides a justication for the procedure of standardizing a normal and then looking up the probability using standard normal tables i.e. x0, P (x 0 X x ) = P X, x, x0, = P Z x, = P Z x,, P Z x 0, x, x0, =, Note (z) = Example two Suppose we wish to nd the density ofx = Z where Z is a standard normal. F X (x) = P (X x) Note by the Fundamental Theorem of Calculus Therefore f X (x) = d dx F X(x) = P (Z x) = P (, p x Z p x) = ( p x), (, p x) d (x) =(x) dx = d dx (p x), d dx (,p x) = x,= ( p x)+ x,= (, p x) = x,= ( p x) by symmetry of = x,= p e,x= x>0 Z z, (x)dx This is the chi-square density with degree of freedom ( ).

11 6 CHAPTER. RANDOM VARIABLES.. Direct Approach In certain circumstances it is possible to directly calculate the density for a transformed variable. In particular if Y = g(x) and g is monotone increasing or decreasing then f Y (y) =j dx dy jf X(x) where x = g, (y) (.) Example one If X U[0; ] and Y ==X then g(x)==x. Hence if y = g(x)==x then x ==y and We know that Therefore dx dy =, y f X (x) = 0 x (i.e. 0 =y ) ( ( f Y (y) =j dx j, dy jf j; 0 =y ; y X(x) = y = y 0; otherwise 0; y< It is not too hard to see where (.) comes from. Suppose for example g is monotone increasing. Then and F Y (y) = P (Y y) = P (g(x) y) = P (X g, (y)) = F X (g, (y)) f Y (y) = df Y (y) dy = df X(g, (y)) dy = dg, (y) f X (g, (y)) = dx dy dy f X(x) where x = g, (y) This is exactly the same as (.) since dx dy must be positive ifg is monotone increasing. Similarly if g is monotone decreasing we get a similar formula (try it for yourself)...3 An Application to Simulating Random Variables Suppose that U U[0; ] and we want to produce a random variable X with cdf of F.Ifweset X = F, (U) then the cdf of X will be F.Why? P (X x) = P (F, (U) x) = P (U F (x)) = F (x) because P (U y) =y provided 0 y

12 .. TRANSFORMATIONS OF A RANDOM VARIABLE 7 Example one Suppose we want to generate a random variable with an Exp() distribution i.e. T Exp(). The exponential c.d.f. is F (t) =, e,t but we need F, (t). Set x =, e,t ), x = e,t ) t =, log(, x)= ) F, (t) =, log(, t)= Therefore T = F, (U) =, log(, U)= Exp() provided U U[0; ] Note V =, U is also U[0; ] because P (V < v) = P (, U v) = P (U >, v) =, (, v) = v Therefore T =, log(u)= Exp()

13 8 CHAPTER. RANDOM VARIABLES

14 Chapter Joint Distributions In the previous chapter we reviewed basic properties of univariate (one dimensional) random variables. However we are often interested in several dierent random variables in which case we need to understand their joint distribution i.e. their joint relationship. Denition 3 An n-dimensional random vector is a function from a sample space S into R n, n- dimensional Euclidean space.. Discrete Random Variables.. Joint Probability Mass Functions Denition 4 If X takes on values x ;x ;::: and Y takes on y ;y ;::: their \joint probability mass function", p(x; y) is p(x i ;y i )=P (X = x i ;Y = y i )=Probability that X = x i and Y = y i Example one Suppose we throw two dice. Then the sample space has 36 possible outcomes. If X is the sum of the two dice and Y is the absolute dierence then the joint pmf can be written as follows. x y For example the only way for X to equal 6 and Y to equal 0 is if both dice come up as 3s. Hence P (X =6;Y =0)= 36 9

15 0 CHAPTER. JOINT DISTRIBUTIONS.. Marginal Distributions What if we want to calculate a probability for a single variable? Suppose for the previous example we wanted to calculate P (Y =4) = P (Y =4;X =)+P (Y =4;X =3)++ P (Y =4;X = ) = P (Y =4;X =6)+P (Y =4;X =8) = = 9 In general we call the probabilities for a single random variable the \Marginal Distribution". For example the marginal distribution of Y is equal to p Y (y) =P (Y = y) = X i p(x i ;y) Similarly p X (x) =P (X = x) = X i p(x; y i )..3 Joint Cumulative Distribution Functions We also dene the joint cdf as F X;Y (x; y)=p (X x; Y y) = xx yx t =, t =, p X;Y (t ;t ) Notice that and F X (x) =P (X x) =P (X x; Y )=F X;Y (x; ) F Y (y) =P (Y y) =P (X ;Y y) =F X;Y (;y) so we can reconstruct the \marginal cdfs" from the joint cdf...4 n Dierent Random Variables Suppose we have n dierent random variables X ;X ;:::;X n We dene marginal pmfs and joint pmfs and cdfs in a similar way for multiple random variables. p(x ;x ;:::;x n ) = P (X = x ;X = x ;:::;X n = x n ) X P X (x ) = p(x ;x ;:::;x n ) x ;::: ;x n F (x ;x ;:::x n ) = P (X x ;X x ;:::;X n x n )

16 .. CONTINUOUS RANDOM VARIABLES. Continuous Random Variables Analogous results hold for continuous random variables. We dene the joint probability density function, f(x; y), as the curve such that for any \reasonable" set A ZZ P ((X; Y ) A) = f(x; y)dydx As with discrete random variables the joint cdf is dened as F X;Y (x; y)=p (X x; Y y) = A Z x, Z y, f(u; v)dvdu Notice that this implies we can use the following formula to calculate the joint density function. The marginal cdf of X is f(x; F (x; F X (x) =P (X x) = lim y! F (x; y) = Finally the marginal density of X is f X (x) = Z, Z x, Z, f(x; y)dy F (x; f(u; v)dvdu Example one Suppose X and Y are both normal random variables. Then their joint distribution is called a \bivariate normal". Its joint pdf is f(x; y)= p x exp, (x, x ) + (y, y), (x, x)(y, y ) y, (, ) x y x y This distribution has ve parameters, < x < ;, < y < ; x > 0; y > 0;, This distribution has many nice properties. The marginal distributions are normal i.e. and. is equal to the correlation between X and Y X N( x ; x) Y N( y ; y) cor(x; Y )= 3. X and Y are independent if and only if =0. 4. A linear combination of X and Y is also normal i.e. ax + by = N(a x + b y ;a x + b y +ab x y )

17 CHAPTER. JOINT DISTRIBUTIONS.3 Independent Random Variables Denition 5 Random variables X ;X ;:::;X n are independent if or equivalently F (x ;x ;:::;x n )=F X (x )F X (x ) F Xn (x n ) p(x ;x ;:::;x n ) = p X (x )p X (x ) p Xn (x n ) or f(x ;x ;:::;x n ) = f X (x )f X (x ) f Xn (x n ) Example one Suppose that a node in a communication network has the property that if two packets of information arrive within time of each other they will collide and then have to be retransmitted. If times of the two packets are independent and uniform on [0;T], what is the probability they will collide? Let T and T be the times of arrival of the two packets. Then Therefore and since they are independent T ;T U[0;T] f T (t )=f T (t )= T f T ;T (t ;t )= T 0 t T; 0 t T Hence P (jt, T j) = ZZ = R ZZ T T dt dt R dt dt = T (T, (T, ) ) =, (, =T) where R is the region where jt, t j Example two Suppose that X and X are both Poisson random variables with parameters and and that they are independent. What is the distribution of X + X? Since X and X are independent we know P (X = x ;X = x )=P(X = x )P (X = x )= e, x x! e, x x!

18 .4. CONDITIONAL DISTRIBUTIONS 3 Therefore Therefore P (X + X = n) = = i=0 i=0 P (X = i; X = n, i) e, i i! = e,( + ) = e,( + ) n! e, n,i (n, i)! i=0 i!(n, i)! i n,i ( + ) n = e,( + ) ( + ) n n!.4 Conditional Distributions n! i!(n, i)! X + X P oisson( + ) i n,i + + i=0 {z } Often we need to incorporate new information into a probability calculation. For example suppose in the previous example with the two dice we were interested in P (X = 7) i.e. the probability that the sum of the two dice was 7. Then we would add up the possible ways this could happen and see that P (X =7)==6 However suppose that we were also told that Y = 4 i.e. the dierence between the dice was 4, but not told what X is. Then it is clear that there is no way the dice can sum to 7. So \given that Y = 4" the probability that X = 7 is zero. This is called a conditional probability and is dened as follows. Denition 6 For discrete random variables the conditional probability that X = x, given that we know Y = y, isequal to P (X = xjy = y) = P (X = x; Y = y) P (Y = y) For continuous random variables the \conditional pdf" is dened as f XjY (xjy)= f XY (x; y) f Y (y) = p XY (x; y) p Y (y) and P (a X bjy = y) = Z b a f XjY (xjy)dx Notice that for the above example this denition makes sense because P (X =7;Y =4)=0so P (X =7jY = 4) = 0. Suppose however we were interested in the probability that X = 6 given

19 4 CHAPTER. JOINT DISTRIBUTIONS that we knew Y =4. IfY = 4 it is possible for X to be 6 or 8 so the conditional probability is neither 0 nor. In fact P (X =6jY =4)= P (X =6;Y =4) P (Y =4) = =8 =9 = Again this corresponds to our intuition because, given that Y =4,X is equally likely to be 6 or 8 so the new probability is =. Note that the unconditional probability is P (X = 6) = 5=36 so the new piece of information produced a large increase in the probability..4. Conditional Probabilities and Independence Recall that if X and Y are independent then Therefore if X and Y are independent p XY (x; y)=p X (x)p Y (y) ) p X (x) = p XY (x; y) p Y (y) p X (x) =p XjY (xjy) In other words the probability that X = x is unchanged by the new information that Y = y. This makes sense because if the two random variables are independent then knowledge about one should not add anything to our knowledge about the other. Similarly if X and Y are independent continuous random variables then f X (x) =f XjY (xjy).4. Generating Random Variables from a Joint Distribution Example one Suppose X and Y have the following joint pdf f XY (x; y)= e,y ; 0 x y What is f Y jx (yjx)? In other words what is the distribution of Y given that we know X = x? We know that f Y jx (yjx)= f XY (x; y) f X (x) so we need to calculate In other words X exp(). Therefore f X (x) = = Z, Z x f XY (x; y)dy e,y dy = [,e,y ] x = e,x f Y jx (yjx)= e,y e,x = e,(y,x) ; y x

20 .4. CONDITIONAL DISTRIBUTIONS 5 This tells us that Y jx = x is exponential on the interval [x; ). Suppose we wanted to generate two random variables with this joint distribution. Since they are not independent we can not simply generate X and Y separately from their marginal distributions i.e. f XY (x; y) 6= f X (x)f Y (y). However we know that f XY (x; y) =f X (x)f Y jx (yjx) So we can generate X from its marginal distribution (i.e. exponential()) and then generate Y from its conditional distribution given X (i.e. exponential from X to ). Notice that we could just as easily calculate the conditional distribution of XjY = y. Again we rst need to calculate the marginal distribution of Y. f Y (y) = = Z, Z y This tells us that Y Gamma( =;). Therefore 0 f XY (x; y)dx e,y dx = y e,y ; y 0 f XjY (xjy)= f XY (x; y) f Y (y) = e,y y e,y = y 0 x y Therefore XjY = y U[0;y] So an alternative method for generating X and Y would be to rst generate Y from a Gamma( =;) distribution and then produce X as a uniform random variable between 0 and Y.

21 6 CHAPTER. JOINT DISTRIBUTIONS

22 Chapter 3 Moments 3. The Expected Value of a Random Variable The expected value of a random variable is a measure of the average value it takes on. It can also be thought of as the long run average. If the experiment (e.g. tossing the dice) is repeated an innite number of times the average value of the random variables produced will equal their expected value. 3.. Expectations for Discrete Random Variables Denition 7 If X is a discrete random variable then x = EX = X i x i P (X = x i ) provided that X i jx i jp (X = x i ) < Example one Suppose we bet $ that an odd number comes up on the roulette wheel. There are 38 numbers on a roulette wheel (, 36 plus 0 and 00). Therefore if X is the net winning from playing once. ( with probability 8 38 X =, with probability 0 38 Therefore EX = 8 38, 0 38 =, 9 In other words if you played roulette for a long time on average you would lose $ 9 played. every time you Example two Suppose that X is a geometric random variable with probability of success p. Then P (X = k) =pq k, ; k =; ;:::; q =, p 7

23 8 CHAPTER 3. MOMENTS and the expected value of X is EX = = p = p X k= X X k= k= kpq k, kq k, d dq qk X = p d dq k= q q k = p d dq, q p = (, q) = p Example three Suppose that X P oisson(). Then the expected value of X is EX = X k=0 = e, = e, but X j=0 k e, k k! X X k= j=0 k, (k, )! j j! j j! = e ) EX = e, e = (j = k, ) Example four Suppose X = i with probability i i =; ; 3;::: Then X P (X = x i )= X so this is a well dened random variable. However Therefore EX is not dened. X x i P (X = x i )= X i = =, = = i i = X =

24 3.. THE EXPECTED VALUE OF A RANDOM VARIABLE Expectations for Continuous Random Variables Denition 8 If X isacontinuous random variable with pdf f(x), then EX = Z, xf(x)dx provided Z jxjf(x)dx <, Example one Suppose X Gamma(; ). Then EX = Example two Suppose X Cauchy. Then = Z x 0 Z 0 =,(),() x, e,x dx,() x e,x dx Z,( +) + =,( +),() + =,(),() + = f(x) = +,( +) x e,x dx 0 {z } +x Since the density is symmetric around zero it seems that the expected value should be zero. However so the expectation is not dened Expectations of Functions Z, jxj +x dx = Let Y = g(x). If we know the distribution of X how dowe calculate the expected value of Y? Denition 9 If X is discrete then EY = X x g(x)p (X = x) If X is continuous then EY = Z, g(x)f(x)dx

25 0 CHAPTER 3. MOMENTS 3..4 Some Useful Expectation Results. If X ;X ;:::;X n are random variables and a and b i are constants then E a + b i X i! = a +. If X and Y are independent random variables then Example one Suppose X Bin(n; p) then EX = E(XY)=(EX)(EY) k=0 n k k p k (, p) n,k This is a dicult sum to calculate. However we know that where and Therefore by the rst useful result X i = X = X + X + + X n ( if success on the i trial 0 otherwise EX i = p +0 (, p) =p b i EX i EX = EX + EX + + EX n = np Example two Suppose X N(; ) and Y Bin(n; p) and they are independent. Then E(XY)=(EX)(EY)=np 3. Variance and Standard Deviation Denition 0 The variance of X is dened as x = Var(X) = E(X, EX) = (P i (x i, ) P (X = x i ) R, (x, ) f(x)dx (discrete random variable) (continuous random variable) The standard deviation of X is SD(X)= = p = p Var(X)

26 3.. VARIANCE AND STANDARD DEVIATION The variance measures the \average squared distance of a random variable from its mean". The standard deviation is a measure of the \average distance of a random variable from its mean". Note that standard deviation is measured in the same units as X. Example one Suppose X Bernoulli(p) then 3.. Some Useful Results Var(X) = (0, p) (, p)+(, p) p = p (, p)+p(, p) = p(, p). For any constants a and b and any random variable X Var(aX + b) =a Var(X). For any random variable X Var(X)=EX, (EX) 3. If X ;X ;:::;X n are independent random variables then Var(X + X + + X n )=Var(X )+Var(X )++ Var(X n ) In particular if X and Y are independent random variables then Var(X + Y )=Var(X, Y )=Var(X)+Var(Y ) Notice that for any random variables X ;X ;:::;X n but it is only true that Var if the random variables are independent. Example one Suppose X U[0; ]. Then E X i! = X i! = EX i Var(X i ) and Therefore EX = EX = Z, Z, xf(x)dx = x f(x)dx = Z 0 Z 0 xdx = x dx = 3 Var(X)=EX, (EX) = 3, =

27 CHAPTER 3. MOMENTS and r SD(X)= Example two Suppose X Bin(n; p). What is Var(X)? We know that Var(X) =EX, (EX) and that EX = np so all we need to do is calculate EX.However EX = k=0 k n k p k (, p) n,k =? This sum is dicult to calculate. On the other hand we know that X = X + X + + X n where X i are independent Bernoulli random variables. Therefore we know Var(X)=Var(X + X + + X n )=Var(X )+Var(X )++ Var(X n )=np(, p) 3.. Chebyshev's Inequality Chebyshev's Inequality is a very useful result for proving \Limit Theorems" which we will explore in the next chapter. Theorem For any random variable X and any t>0 where = EX and = Var(X). Proof Let R = fx : jx, j tg. Then But if x R then Therefore P (jx, j t) =t P (jx, j t) = jx, j Z R f(x)dx t because jx, j t P (jx, j t) = Z Z R R Z = t, f(x)dx Notice that an alternative way to write the inequality is (x, ) f(x)dx t (x, ) f(x)dx t P (jx, j k) k (t = k)

28 3.3. COVARIANCE AND CORRELATION Covariance and Correlation So far we have learnt about the expected value of a random variable (a measure of its average) and the variance (a measure of how close the random variable usually is to its expected value). In this section we will talk about two related concepts, namely covariance and correlation Covariance Denition The Covariance between two random variables, X and Y, is dened as Cov(X; Y )=E[(X, EX)(Y, EY )] or equivalently Cov(X; Y )=E(XY), (EX)(EY) Covariance is a measure of how two random variables vary together. Cov(X; Y ) > 0 means \if X is large then Y tends to be large also". X and Y are said to have a positive relationship. Cov(X; Y ) < 0 means \if X is large then Y tends to be small". X and Y are said to have a negative relationship. Cov(X; Y ) = 0 means \there is no clear trend". Example one Suppose X and Y have the following joint distribution y /3 0 0 /3 x 0 0 /3 0 /3 0 0 /3 /3 /3 /3 /3 Then EX =0;EY = 0 and E(XY)==3 socov(x; Y )==3, 0 0==3. Therefore X and Y will have a positive relationship. Example two Suppose X and Y have the following joint distribution y /3 /3 x 0 0 /3 0 /3 /3 0 0 /3 /3 /3 /3

29 4 CHAPTER 3. MOMENTS Then EX =0;EY = 0 and E(XY)=,=3 socov(x; Y )=,=3, 0 0=,=3. Therefore X and Y have a negative relationship. Example three Suppose X and Y have the following joint distribution y /9 /9 /9 /3 x 0 /9 /9 /9 /3 /9 /9 /9 /3 /3 /3 /3 Then EX =0;EY = 0 and E(XY)=0soCov(X; Y )=0, 0 0 = 0. Therefore X and Y have no clear relationship. Useful results for covariance The covariance between two random variables is especially useful for calculating the variance of a sum of random variables. This is commonly used in nance for calculating the risk of a portfolio.. For any random variables X and Y Var(X + Y )=Var(X)+Var(Y )+Cov(X; Y ) (Recall Var(X + Y )=Var(X)+Var(Y )ifx and Y are independent). For any random variables X and Y Var(X, Y )=Var(X)+Var(Y ), Cov(X; Y ) 3. If X and Y are independent then Cov(X; Y )=0 However note that the fact Cov(X; Y ) = 0 does not mean X and Y are independent! We will prove the rst result. Proof of rst result Var(X + Y ) = E((X + Y ) ), [E(X + Y )] = E(X +XY + Y ), [EX + EY ] = EX +E(XY)+EY, (EX), (EY ), (EX)(EY ) = [EX, (EX) ]+[EY, (EY ) ]+[E(XY), (EX)(EY)] = Var(X)+V ar(y )+Cov(X; Y )

30 3.3. COVARIANCE AND CORRELATION 5 Notice that since Cov(X; Y ) can be negative this means that Var(X + Y ) can be zero even if Var(X) and Var(Y ) are greater than zero! Example four Imagine you have two fair coins. ( 0 if coin lands tails X = if coin lands heads ( 0 if coin lands tails Y = if coin lands heads Let Z be the number of heads showing on the two coins (i.e. Z = X + Y ). What is the variance of Z? If the coins are independent then Var(Z) =Var(X + Y )=Var(X)+Var(Y )==4+=4 == this is greater than zero as you may expect because Z could be 0; or. However now imagine that you glue the coins together, side by side, so that one head and one tail is always showing. Then either coin can land heads or tails but there will always be exactly one head showing. Thus Var(X) > 0;Var(Y ) > 0 but Var(X + Y )= Correlation There is one obvious problem with covariance. It is not scale invariant. Example one Suppose I am measuring the relationship between heights of fathers in meters (X) and heights of sons in meters (Y ) and that Cov(X; Y )=:. If I decide to measure heights in centimeters instead with S the height of fathers and T the height of sons I get S = 00X and T = 00Y so Cov(S; T )=Cov(00X; 00Y ) = Cov(X; Y )=; 000 The covariance has increased by a factor of 0; 000 yet the relationship is exactly the same. This means that it is impossible to tell whether the covariance you have observed is large or not because it is entirely dependent on the scale that is used. To avoid this problem we often use correlation instead. Denition The correlation between two random variables, X and Y is denoted as Cor(X; Y ) or and is equal to Cor(X; Y )= = Correlation has the following properties. For any random variables, X and Y, Cov(X; Y ) p Var(X)Var(Y ),

31 6 CHAPTER 3. MOMENTS. If >0 then X and Y have a positive relationship. In particular if = then Y = ax + b for some constants a and b (a >0). 3. If < 0 then X and Y have a negative relationship. In particular if =, then Y =,ax +b for some constants a and b (a >0). 4. is scale invariant. In other words no mater what units we use to measure X and Y, will remain unchanged. This last property isvery important. It means that if is large (i.e. close to or,) then that implies a strong relationship no mater what units we are using. Example one continued Recall X = heights of fathers measured in meters and Y = heights of sons measured in meters. Suppose that Var(X)=Var(Y ) = and Cov(X; Y )=:. Then = Cov(X; Y ) p Var(X)Var(Y ) = : p =0:6 This indicates a \fairly" strong positive relationship. Also recall that S = heights of fathers measured in centimeters and T = heights of sons measured in centimeters. We have already seen that Cov(S; T )=; 000. We can also calculate and similarly V ar(t ) = 0; 000. Therefore Var(S)=Var(00X) = 00 Var(X)=0; 000 = Cov(S; T ) p Var(S)Var(T ) = ; 000 p 0; 000 0; 000 =0:6 The fact that the units have changed has had no eect on. Correlations will be used later in this course and in GSBA 604 when you study regression. 3.4 Conditional Expectation Recall that the conditional distribution of Y given X = x is P (Y = yjx = x) = P (Y = y; X = x) P (X = x) if X and Y are discrete or f(yjx = x) = f XY (x; y) f X (x) if X and Y are continuous. These conditional distributions have all the normal properties of \regular" distributions. In particular they have a \conditional mean". The conditional mean is simply the expected value of Y given that we now know X = x.

32 3.4. CONDITIONAL EXPECTATION 7 Denition 3 The conditional expectation of Y given X = x is E(Y jx = x) = X y yp Y jx (yjx) if X and Y are discrete and E(Y jx = x) = Z, yf Y jx (yjx)dy if they are continuous. We also dene the conditional expectation of a function of Y as E(h(Y )jx = x) = Z, Example one Recall the example from the previous chapter where h(y)f Y jx (yjx)dy f XY (x; y)= e,y ; 0 x y We have already shown that and f Y jx (yjx) =e,(y,x) ; y x f XjY (xjy) = y ; 0 x y Therefore we can calculate the conditional expectations E(XjY = y) = Z y 0 x y dx = y (y =) = y= and E(Y jx = x) = = = Z x Z 0 Z ye,(y,x) dy (z + x)e,z dz (z = y, x) ze,z dz + Z 0 0 = + x xe,z dz 3.4. A Very Useful Theorem The following theorem relates unconditional and conditional expectations and variances. Theorem For any two random variables X and Y. E Y Y = E X [E Y (Y jx)]

33 8 CHAPTER 3. MOMENTS. where Var Y (Y )=Var X [E Y (Y jx)] + E X [Var Y (Y jx)] Var X (Y jx)=e(y jx), [E(Y jx)] Example one Suppose X P oisson() and (Y jx = x) Bin(x; p). What is EY and Var(Y )? and EY = E[E(Y jx)] = E[Xp]=pEX = p Var(Y ) = Var(E(Y jx)) + E(Var(Y jx)) = Var(Xp)+E(Xp(, p)) = p + p(, p) = p In fact it can be shown that Y P oisson(p). What about Cov(X; Y )? E(XY) = E X [E(XYjX)] = E X [XE(Y jx)] = E[X p] = pe[x ] = p(var(x)+(ex) )=p( + ) Therefore Cov(X; Y )=E(XY), (EX)(EY)=p( + ), p = p 3.5 Moment Generating Functions We have already seen that both the density and cdf provide a unique characterization of a distribution. Here we present a third alternative, the \Moment Generating Function". Denition 4 The moment generating function (mgf) of a random variable X is M X (t) =E(e tx ) provided the expectation is dened i.e. M X (t) = (P R, etx f(x)dx x etx p(x) if X is discrete if X is continuous The moments of a random variable are dened as follows. Denition 5 The rth moment of a random variable is E(X r ),provided the expectation exists. The rth central moment is E[(X, EX) r ],provided the expectation exists. For example EX is the rst moment, EX is the second moment etc. and Var(X) is the second central moment (the rst central moment is always zero).

34 3.5. MOMENT GENERATING FUNCTIONS Useful Properties Moment generating functions have a number of properties that make them very useful.. If the mgf exits for t in an open interval containing zero, it uniquely determines the probability distribution.. If the mgf exists in an open interval containing zero and EX r exists then M (r) (0) = EX r 3. If X has mgf M X (t) then for any constants a and b M ax+b (t) =e bt M X (at) 4. If X and Y are independent random variables then M X+Y (t) =M X (t)m Y (t) Also if X ;X ;:::;X n are independent random variables and Z = P n X i then M Z (t) = ny M Xi (t) 5. If X ;X ;:::;X n are independent and identically distributed (iid) then M Z (t) =(M X (t)) n 3.5. Moment Generating Functions of Some Important Distributions Distribution MGF Normal(; ) exp(t + t =) Normal(0; ) exp(t =) Uniform(a; b) e bt,e at t(b,a) Uniform(0; ) e t, t Binomial(n; p) (pe t +, p) n Poisson() exp((e t, )) Gamma(; ),t Chisquare() (, t),=

35 30 CHAPTER 3. MOMENTS Example one Suppose X P oisson(). Then M X (t) = E(e Xt ) Example two Suppose X Gamma(; ). Then Example three Suppose X N(0; ). Then but notice Therefore = = X X k=0 k=0 e tk e, k =k! e, (e t ) k =k! = e, exp(e t ) = exp((e t, )) M X (t) = E(e Xt ) x Z e tx 0 X exp(,e t )(e t ) k =k! k=0 {z } = 0,() x, e,x dx Z =,() x, e,(,t)x dx = = Z (, t) (, t) x, e,(,t)x dx,(), t M X (t) = 0 {z } Z, p e tx e,x = dx, tx = (x, tx + t ), t = (x, t), t = M X (t) =e t = Z p e,(x,t) = dx, {z } = e t = Example four Suppose Y N(; ). What is M Y (t)? We know that if X N(0; ) and Y = + X then Y N(; ). Therefore we can use property 3 along with example three to calculate the mgf. M Y (t) = M X+ (t) = e t M X (t) = e t e t = = e t+ t =

36 3.5. MOMENT GENERATING FUNCTIONS 3 Example ve Suppose X P oisson( ) and X P oisson( ) and they are independent. What is the mgf of Y = X + X? Therefore Y P oisson( + ) M Y (t) = M X +X (t) = M X (t)m X (t) (by property 4) = exp[ (e t, )] exp[ (e t, )] = exp[( + )(e t, )] Example six Suppose X Gamma( ;) and X Gamma( ;) and they are independent. Let Y = X + X. Then Therefore Y Gamma( + ;) M Y (t) = M X (t)m X (t) =, t, t + =, t Example seven Suppose that X N(; ). What is EX and Var(X)? We know from example four that Therefore By property we know M X (t) = exp[t + t =] MX(t) 0 = ( + t) exp[t + t =] MX(t) 00 = ( + t) exp[t + t =] + exp[t + t =] EX = M 0 X (0) = ( + 0) exp[0+ 0 =] = and EX = MX(0) 00 = ( + 0) exp[0+ 0 =] + exp[0+ 0 =] = + ) Var(X) = EX, (EX) = +, =

37 3 CHAPTER 3. MOMENTS Example eight Suppose X N(0; ). What is the distribution of Y = X? This is the mgf of a freedom. M Y (t) = E(e Xt ) = = Z, Z, e x t p e,x = dx p e,x (,t)= dx = (, t),= Z = (, t),= p (, t), e,x =(,t), dx, {z } random variable so Y hasachi-squared distribution with one degree of

38 Chapter 4 Limit Theorems 4. Convergence of Random Variables Intuition suggests that if we toss a fair coin many times the proportion of heads should be \close" to but what exactly does \close" mean? Let X i = ( if ith toss is a head 0 if ith toss is a tail Then the proportion of heads in n tosses of the coin is ^p n = X n = n X i So our intuition suggests that X n! as n! However Xn is a random variable. What does it mean for a random variable to converge to a constant? First we will review what it means for a sequence of (non random) numbers to converge. If X i was not random, i.e. it was predetermined that X i was either 0 or, then we would say or lim n! X n = For every >0 there exists N such that jx n, j <for every n>n Unfortunately this won't work for random numbers. After all there is some (very small) probability that every toss could be a head and then Xn =! It turns out that for random variables there are several dierent ways of dening convergence. 33

39 34 CHAPTER 4. LIMIT THEOREMS 4.. Convergence in Probability While it is theoretically possible for the coin to land heads every time the chance of this happening must be unbelievably small. This suggests that a reasonable denition of convergence for random numbers may be that \the probability that Xn is close to = becomes very high as n becomes large". This sort of convergence is called Convergence inprobability. Denition 6 The random variable, Z n, is said to converge in probability to (Z n every >0, p! ) if, for P (jz n, j <)! as n! or equivalently P (jz n, j >)! 0 as n! Example one X n p! if for every >0 P (j Xn, =j <)! asn! 4.. Convergence in Distribution It is possible for a random variable to converge, not to a constant (as with Xn converging to =), but to another random variable. In other words the distribution of the random variable converges to the distribution of another random variable. Example one An example is the Binomial distribution converging to a Poisson distribution. Suppose that X n Bin(n; p n ) where p n = =n. Then as n!, X n will converge to a P oisson() distribution. Recall that the distribution of a random variable is uniquely dened by its cdf. The cdf is non random so it makes sense to talk about a cdf converging to another cdf. This is the idea behind convergence in distribution. Denition 7 Let X ;X ;:::;X n ;::: be asequence ofrandom variables with cdfs F ;F ;::: and let X be arandom variable with cdf F. We say that X n converges in distribution to X (X n ) X) if lim n! F n(x) =F (x) for every value of x at which F is continuous. Example two Suppose we have a sequence of random variables X ;X ;::: with cdfs F ;F ;::: and that lim n! F n(x) =, e,x for every x Then we would say that X n converges in distribution to an exponential random variable with = because, e,x is the cdf for an exp() random variable.

40 4.. THE LAW OF LARGE NUMBERS Almost Sure Convergence The strongest form of convergence is called almost sure convergence. We will not use almost sure convergence in this course but we give the denition for completeness. Denition 8 Z n is said to converge almost surely to (Z n! a.s.) if, for every >0, jz n, j > only a nite number of times with probability. In other words beyond some point in the sequence the dierence is always less than, but that point is random. Theorem 3 shows that almost sure convergence is a stronger form of convergence than convergence in probability. Theorem 3 If Z n! a.s. (almost surely) then Z n p!. The converse is not necessarily true. 4. The Law of Large Numbers We can use the previous denitions for convergence to prove one of the most widely used results in statistics. Theorem 4 (Law of Large Numbers - Weak) Let X ;X ;:::;X n ;::: be asequence of independent random variables with EX i = and Var(X i )= <. Let Xn = n P n X i. Then X n p! i.e. for every >0 P (j X n, j <)! as n! Proof First notice that and E X n = E X i =n Var( Xn ) = Var! = n = n n n = n X i X i! Var(X i ) = n EX i = n = by independence We will now make use of Chebyshev's inequality. Recall Chebyshev says that for every random variable P (jx, EXj >t) Var(X)=t

41 36 CHAPTER 4. LIMIT THEOREMS Therefore Hence Xn p!. P (j Xn, j >) = P (j Xn, E Xn j >) Var( X n ) by Chebyshev = n! 0 as n! There is a second version of this theorem called the \Strong Law of Large Numbers" (SLLN). This says that not only does X n converge in probability but also almost surely. The proof of the SLLN is beyond the scope of this course. Example one Suppose we toss a coin n times. Let X i = ( if ith toss is a head 0 if ith toss is a tail Then the proportion of heads in n tosses of the coin is ^p n = X n = n Since each coin toss is independent of the previous tosses and EX i == the law of large numbers (LLN) tells us that ^p n (i.e. the proportion of heads) will converge in probability to=. Example two Suppose we wish to calculate I(f) = Z 0 X i f(x)dx where f is a known function. If f is fairly simple we can use calculus. However often the function is too complicated to allow us to calculate the integral analytically. (This is especially true for multidimensional integrals.) A popular way to estimate this integral is called \Monte Carlo". Monte Carlo works in the following way. First generate a large number of uniform[0; ] random variables i.e. X ;X ;:::;X n. Then compute ^I(f) = n f(x i ) ^I(f) is an estimate for I(f). Why? First realize that f(x );f(x );:::;f(x n ) are also random variables because X ;X ;:::;X n are random. Therefore ^I(f) isanaverage of iid random variables so the LLN tells us that ^I(f)! E[f(X i )] but E[f(X i )] = Z f(x) dx = Z 0 0 f(x)dx = I(f)

42 4.3. THE CENTRAL LIMIT THEOREM 37 so if n is large ^I(f) should be close to I(f) i.e. ^I(f) I(f) Note that if you want to make lots of money quit 603 now, learn all you can about Monte Carlo, call your self a Finance person and go work on Wall Street. Solving these integrals is very important in nance! Example three Recall that we are often interested in, the population variance e.g. for condence intervals or hypothesis tests. Usually is unknown but we estimate it using S = n Notice that the LLN tells us that n X i (X i, X) = n X i, X p! EX and X p! EX This also implies that X p! (EX) Therefore S = n More generally it is always true that provided EX r exists. X i, X p! EX, (EX) = n X r i 4.3 The Central Limit Theorem p! EX r In this section we will prove the fundamental theorem of statistics, the Central Limit Theorem. First however we will mention an important theorem which demonstrates a relationship between moment generating functions and convergence in distribution. Theorem 5 Let X ;X ;:::;X n ;::: be random variables with mgfs of M ;M ;:::;M n ;::: and let X be arandom variable with mgf M X.If M n (t)! M X (t) for every t in some open interval containing zero, then X n ) X In other words X n converges in distribution to X.

43 38 CHAPTER 4. LIMIT THEOREMS This theorem is very useful in proving convergence in distribution because it means we only need to prove that the mgf converges and we have already seen that mgfs have many desirable properties. Suppose that X ;X ;:::;X n is a sequence of iid random variables and that Then we are often interested in calculating S n = X i P (a S n b) Usually the exact distribution of S n is unknown. The LLN tells us that S n n p! So, for large n, S n n but what do we mean by i.e how close is S n likely to be to n? We use the Central Limit Theorem to examine the uctuations around and hence estimate the above probability. Theorem 6 (Central Limit Theorem) Let X ;X ;:::;X n be an iid sequence ofrandom variables having mean 0, variance < and mgf M. Let Then or in other words Proof Let S n = X i lim P Sn n! p n x =(x) S n p ) N(0; ) n Z n = S n p n,<x< We need to show that the mgf of Z n tends to the mgf of a standard normal. M Sn (t) =[M(t)] n and M Zn = M Sn (t= p n)= M(t= p n) n Taylor's theorem tells us that we can write M(s) as where s =s! 0ass! 0. But we know that M(s) =M(0) + sm 0 (0) + s M 00 (0) + s M(0) = ; M 0 (0) = EX =0; M 00 (0) = EX =

44 4.3. THE CENTRAL LIMIT THEOREM 39 So M(t= p n) = + (t= p n) + n = + t = n where n = n n and n! 0asn!. Therefore M Zn (t) = + n = + t =+ n n + t =+ n n n Now recall that and since t =+ n! t = lim ( + n! a=n)n = e a M Zn (t) =!! e t = + t =+ n n + t = n n n Therefore Z n ) N(0; ) Corollary If X ;X ;:::;X n are iid with mean and variance then S n, n p n ) N(0; ) and S n =n, = p n = X, = p ) N(0; ) n Even though the CLT only guarantees that for innite n we will converge to a normal distribution it is still very useful for nite n because in practice if n is \large" (n >30 is used as a very general rule) then it is often true that a, n P (a S n b) = P p n S n, n p n b, n p n a, n P p n Z b, n p Z N(0; ) n b, n a, n = p, n p n

45 40 CHAPTER 4. LIMIT THEOREMS Example one Suppose that we want to construct a 95% condence interval for using X ;X ;:::;X n iid with EX = and Var(X)= i.e. nd c and c such that P (c c )=0:95 One way to do this would be to use Chebyshev's inequality i.e. q P jx, j >k Var( X) =k ) P, j X, j >k= p n =k ) P, X, k= p n X + k= p n, =k ) P, X, 4:47= p n X +4:47= p n 0:95 (, =4:47 =0:95) On the other hand if we use the CLT then P,k X, = p n k (k), (,k) ) P (x, k= p n x + k= p n) (k), (,k) ) P (x, :96= p n x +:96= p n) (:96), (,:96) = 0:95 Therefore we get two possible condence intervals. The Chebyshev condence interval = [ X, 4:47= p n; X +4:47= p n] and the CLT condence interval = [ X, :96= p n; X +:96= p n] Notice that the forms of the two intervals are almost identical except that the Chebyshev interval is more than twice as wide. For example if n = 00;= 0 and X = 5 then Chebyshev condence interval = [5, 4:47 0= p 00; 5+4:47 0= p 00] = [0:53; 9:47] CLT condence interval = [5, :96 0= p 00; 5+:96 0= p 00] = [3:04; 6:96] Example two We can also use the CLT to derive the normal approximation to the Binomial distribution. Recall if X Bin(n; p) then X = X + X + + X n iid where X ;X ;:::;X n Bernoulli(p) i.e. ( with probability p X i = 0 with probability, p Since X is a sum of iid random variables we can use the CLT. The CLT tells us P X, EX p Var(X) Z! (Z) n large

46 4.3. THE CENTRAL LIMIT THEOREM 4 but we know Therefore P EX = np; Var(X)=np(, p)! X, np p Z (Z) np(, p) n large So for example if an airline accepts reservations from 40 people for a ight with 400 seats and people turn up for their ights independently with probability 0:95 what is the probability too many people turn up? If X is the number of people that turn up then X Bin(n = 40;p=0:95) Therefore P (X 40) = P X, np p np(, p)! 40, 40 0:95 p 40 0:95(, 0:95) P (Z :6) = 0:0045

47 4 CHAPTER 4. LIMIT THEOREMS

48 Chapter 5 Distributions Derived from the Normal In the previous chapter we showed that by the CLT [ X, :96= p n; X +:96= p n] is an approximate 95% condence interval for provided n is large enough. However what if is unknown? In this case it is common to use where t is the t distribution and [ X, t n, (0:975)S= p n; X + t n, (0:975)S= p n] S = n, (X i, X) In this chapter we provide denitions for the t distribution as well as the and F distributions and explain how they are used. 5., t and F distributions We begin with a denition for the distribution Denition 9 If Z N(0; ) then U = Z is called a chi-square random variable with degree of freedom (U ). Notice if X N(; ) then X, N(0; ) so X, Denition 0 If U ;U ;:::;U n are independent random variables then the distribution of V = U + U + + U n is called the chi-square distribution with n degrees of freedom (V n). Its density is f(v) = n=,(n=) v(n=), e,v= ; v 0 43

49 44 CHAPTER 5. DISTRIBUTIONS DERIVED FROM THE NORMAL Notice that this is the density for a Gamma( = n=;==) so EV = n and Var(V )=n. Also notice that if U and V are independent and U n and V m then U + V n+m. Next we dene the t distribution Denition If Z N(0; ) and V n and Z and V are independent then the distribution of T = Z p V=n is called the t distribution with n degrees of freedom (t n ). The density function of t n is,(n+)=,[(n +)=] f(t) = p + t n,(n=) n Notice that f(t) =f(,t) sothet distribution is symmetric about zero. As n!t n ) N(0; ). Why? If V n then V = U + U + + U n where U i.sobythe Law of Large Numbers V n p! EU = Therefore and T = r V n p! Z p V=n ) Z N(0; ) In practice for more than 5 or 30 degrees of freedom the distributions are very close. The last distribution we dene is the F distribution. Denition Let U and V be independent chi-square random variables with m and n degrees of freedom respectively. Then the distribution of W = U=m V=n is called the F distribution with m and n degrees of freedom (F m;n ). Its density is f(w) =,((m + n)=),(m=),(n=) m n m= w m=, + m n w,(m+n)= ; w 0 The F distribution is instrumental in ANOVA calculations which we will cover at the end of this course.

3. Probability and Statistics

FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important