Random Variables, Sampling and Estimation

Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig i the ext chapters. The key topics that we will review are the followig: Descriptive statistics. e.g. mea ad variace. Probability. e.g. evets, relative frequecy, margial ad coditioal probability distributios. Radom variables, probability distributios, ad expectatios. Samplig. e.g. simple radom samplig. Estimatio. e.g. the distictio betwee ad estimator ad a estimate. Statistical iferece. t ad F tests. 1.2 Probabilities 1.2.1 Evets Radom experimet. Process leadig to two or more possible outcomes, with ucertaity as to which outcome will occur. Flip of a coi, toss of a die, a studets takes a class ad either obtais a A or ot. Sample space. Set of all basic outcomes of a radom experimet. 1

2 1 Radom Variables, Samplig ad Estimatio Whe flippig a coi, S = [head, tail]. Whe takig a class, S = [A, B, C, D, F, drop]. Whe tossig a die, S = [1, 2, 3, 4, 5, 6]. No two outcomes ca occur simultaeously. Evet. Subset of basic outcomes i the sample space. Evet E 1 : Pass the class the the subset of basic outcomes is A, B, C. Itersectio of evet. Whe two evets E 1 ad E 2 have some basic outcomes i commo. It is deoted by E 1 E 2. Evet E 1 : Idividuals with college degree. Evet E 2 : Idividuals who are married. E 1 E 2 : Idividuals who have college degree ad are married. Joit probability. Probability that the itersectio occurs. Mutually exclusive evets. E 1 ad E 2 are mutually exclusive if E 1 E 2 is empty. Uio of evets. Deoted by E 1 E 2. At least oe of these evets occurs. Either E 1, E 2, or both. Complemet. The complemet of E is deoted by Ē ad it is the set of basic outcomes of a radom experimet that belogs to S, but ot to E 1. E 1 is the complemet of Ē 1 Evet E 2 : Idividuals who are married. E 1 ad Ē are mutually exclusive evets. 1.2.2 Probability postulates Give a radom experimet, we wat to determie the probability that a particular evet will occur. A probability is a measure from 0 to 1.

1.3 Discrete radom variables ad expectatios 3 0 the evet will ot occur. 1 the evet is certai. Whe the outcomes are equally likely to occur, the probability of a evet E is: P(E)=N E /N N E : Number of outcomes i evet E. N: Total umber of outcomes i the sample space S. Example 1: Flip of a coi, Evet E is head the P(E) = 1/2. N E = 1 ad N = 2. Example 2: Evet E is wiig the lottery the if there are 1000 lottery tickets ad you bought, 2 P(E) = 2/1000 = 0.002. Some probability rules P(E Ē) = P(E) + P(Ē) = 1. P(Ē) = 1 - P(E). Coditioal probability P(E 1 E 2 ): Probability that E 1 occurs, give that E 2 has already occurred. P(E 1 E 2 ) = P(E 1 E 2 ) / P(E 2 ) give that P(E 2 ) > 0. Additio rule P(E 1 E 2 ) = P(E 1 ) + P(E 2 ) - P(E 1 E 2 ). Statistically idepedet evets P(E 1 E 2 ) = P(E 1 )P(E 2 ). P(E 1 E 2 ) = P(E 1 )P(E 2 ) / P(E 2 ) = P(E 1 ). 1.3 Discrete radom variables ad expectatios 1.3.1 Discrete radom variables Radom variable. Variable that takes umerical values determied by the outcome of a radom experimet. Examples: Hourly wage, GDP, iflatio, the umber whe tossig a die. Notatio: Radom variable X ca take possible values x 1,x 2, x. Discrete radom variable. A radom variable that takes a coutable umber of values.

4 1 Radom Variables, Samplig ad Estimatio Examples: Number of years of educatio. Cotiuous radom variable. A radom variable that ca take ay value o a iteral. Examples: Wage, GDP, exact weight. Cosider tossig two dies (gree ad red). This will yield 36 possible outcomes because the gree ca take 6 possible values ad the red ca take also 6 values, 6 6=36. The possible outcomes. Let s defie the radom variable X to be the sum of two dice. Therefore X ca take 11 possible values, from 2 to 12. This iformatio is summarized i the followig tables. Table 1.1 Outcomes with two dies red / gree 1 2 3 4 5 6 1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12 Table 1.2 Frequecies ad probability distributios Value of X 2 3 4 5 6 7 8 9 10 11 12 Frequecy 1 2 3 4 5 6 5 4 3 2 1 Probability (p) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 1.3.2 Expected value of radom variables Let E(X) be the expected value of the radom variable X. The expected value of a discrete radom variable is the weighted average of all its possible values, takig the probability of each outcome as its weight. Radom variable X ca take particular values x 1,x 2,...,x ad the probability of x i is give by p i. The we have that the expected value is give by: E(X)=x 1 p 1 + x 2 p 2 + +x p = i=1 x i p i. (1.1) We ca also write the expected value as: E(X)= µ X. For the previous example we ca calculate that the expected value is:

1.3 Discrete radom variables ad expectatios 5 E(X)=2 1/36+3 2/36+ +12 1/36=252/36=7 (1.2) Table 1.3 Expected value of X, two dice example X p X p 2 1/36 2/36 3 2/36 6/36 4 3/36 12/36 5 4/36 20/36 6 5/36 30/36 7 6/36 42/36 8 5/36 40/36 9 4/36 36/36 10 3/36 30/36 11 2/36 22/36 12 1/36 12/36 Total E(X)= i=1 x i p x 252/36 = 7 1.3.3 Expected value rules E(X+Y + Z)=E(X)+E(Y)+E(Z) (1.3) E(bX)=bE(X) for a costat b (1.4) E(b) = b (1.5) For the example where Y = b 1 + b 2 X, b 1 ad b 2 are costats we wat to calculate E(X). E(Y) = E(b 1 + b 2 X) (1.6) = E(b 1 )+E(b 2 X) = b 1 + b 2 E(X) 1.3.4 Variace of a discrete radom variable Let var(x) be the variace of the radom variable X. var(x) is a useful measure of the dispersio of its probability distributio. It is defied as the expected value of the square of the differece betwee X ad its mea. That is, (X µ X ) 2, where µ X is the populatio mea of X. var(x) = σ 2 X = E[(X µ X ) 2 ] (1.7)

6 1 Radom Variables, Samplig ad Estimatio = (x 1 µ X ) 2 p 1 +(x 2 µ X ) 2 p 2 + +(x µ X ) 2 p (1.8) = i=1 (x i µ X ) 2 p i Takig the square root of the variace (σx 2 ) oe ca obtai the stadard deviatio, σ X. The stadard deviatio also serves as a measure of dispersio of the probability distributio. A useful way to write the variace is: σ 2 X = E(X 2 ) µ 2 X. (1.9) From the previous example of tossig two dies, we have that the populatio variace ca be calculated as follows: Table 1.4 Populatio variace, X from the two dice example X p X µ X (X µ X ) 2 (X µ X ) 2 p 2 1/36-5 25 0.69 3 2/36-4 16 0.89 4 3/36-3 9 0.75 5 4/36-2 4 0.44 6 5/36-1 1 0.14 7 6/36 0 0 0.00 8 5/36 1 1 0.14 9 4/36 2 4 0.44 10 3/36 3 9 0.75 11 2/36 4 16 0.89 12 1/36 5 25 0.69 Total 5.83 1.3.5 Probability desity Because discrete radom variables, by defiitio, ca oly take a fiite umber of values, they are easy to summarize graphically. The probability distributio is the graph that liks all the values that a radom variable ca take with its correspodig probabilities. For the two dice example above, see Figure 1.1.

1.4 Cotiuous radom variables 7 Fig. 1.1 Discrete probabilities, X from the two dice example 1.4 Cotiuous radom variables 1.4.1 Probability desity Cotiuous radom variables ca take ay value o a iterval. This meas that it ca take a ifiite umber of differet values, hece it is ot possible to obtai a graph like the oe preseted i Figure 1.1 for a cotiuous radom variable. Istead, we will defie the probability of a radom variable lyig withi a give iterval. For example, the probability that the height of a idividual is betwee 5.5 ad 6 feet. This is depicted i Figure 1.2 as the shaded area below the probability desity curve for the values of X betwee 5.5 ad 6. The probability of the radom variable X writte as a fuctio of the radom variable is kow as the probability desity fuctio. We ca write this oes as f(x). The, if we use a little math we ca easily fid the area uder the curve. Recall that the are uder a curve ca be obtaied by takig the itegral. Probability desity fuctio. Is a fuctio that describes the relative likelihood for a radom variable to occur at a give poit. 6 5.5 0 f(x) = 0.18 (1.10) f(x) = 1

8 1 Radom Variables, Samplig ad Estimatio Fig. 1.2 Cotiuous probabilities, X from the height example The first lie i the equatio above just calculates the itegral uder the curve f(x) betwee the poits 5.5 ad 6. The secod lie shows that the whole area uder the curve preseted i Figure 1.2 is equal to oe. This is for the same reaso why the summatio of all the bars i Figure 1.1 are also equal to oe; the total probability is always equal to oe. 1.4.2 Normal distributio The ormal distributio is the most widely kow cotiuous probability distributio. The graph associated with its probability desity fuctio has a bell-shape ad its is kow as the Gaussia fuctio or bell curve. Its probability desity fuctio is give by: f(x)= 1 2πσ 2 e (x µ2 ) 2σ 2 (1.11) where µ is the mea ad σ 2 is the variace. Figure 1.1 is a example of this distributio.

1.5 Covariace ad correlatio 9 1.4.3 Expected value ad variace of a cotiuous radom variable The basic differece betwee a discrete ad a cotiuous radom variable is that the secod ca take o ifiite possible values, hece the summatios sigs that are used to calculate the expected value ad the variace of a discrete radom variable caot be used for a cotiuous radom variable. Istead, we use itegral sigs. For the expected value we have: E(X) = X f(x)dx (1.12) where the itegratio is performed over the iterval for which f(x) is defied. For the variace we have: σx 2 = E[(X µ X ) 2 ]= (X µ X ) 2 f(x)dx (1.13) 1.5 Covariace ad correlatio 1.5.1 Covariace Whe dealig with two variables, the first questio you wat to aswer is whether these variables move together or whether they move i opposite directios. The covariace will help us aswer that questio. For two radom variables X ad Y, the covariace is defied as: cov(x,y) = σ XY = E[(X µ X )(Y µ Y )] (1.14) where µ X ad µ Y are the populatio meas of X ad Y, respectively. Whe to radom variables are idepedet, their covariace is equal to zero. Whe σ XY > 0 we say that the variables move together. Whe σ XY < 0 they move i opposite directios. 1.5.2 Correlatio Oe cocer whe usig the cov(x,y) as a measure of associatio is that the result is measured i the uits of X times the uits of Y. The correlatio coefficiet, that is dimesioless, overcomes this difficulty. For variables X ad Y the correlatio coefficiet is defied as:

10 1 Radom Variables, Samplig ad Estimatio corr(x,y) = ρ Y X = σ Y X σx 2σ Y 2 (1.15) The correlatio coefficiet is a umber betwee 1 ad 1. Whe it is positive, we say that there is a positive correlatio betwee X ad Y ad that these two variables move i the same directio. Whe it is egative, we say that they move i opposite directios. 1.6 Samplig ad estimators Notice that i the two dice example we kow the populatio characteristics, that is, the probability distributio. From this probability distributio it is easy to obtai the populatio mea a variace. However, what happes most of the time is that we eed to rely o a data set to get estimates of the populatio parameters (e.g the mea ad the variace). I that case the estimates of the populatio parameters are obtaied usig estimators, ad the sample eeds to have certai characteristics. The estimators ad the samplig are the subject of this sectio. 1.6.1 Samplig The most commo way to obtai a sample from the populatio is through simple radom samplig. Simple radom samplig. It is a procedure to obtai a sample from the populatio, where each of the observatios is chose radomly ad etirely by chace. This meas that each observatio i the populatio has the same probability of beig chose. Oce the sample of the radom variable X has be geerated, each of the observatios ca be deoted by{x 1,x 2,,x }. 1 1 The textbook Dougherty (2007) makes the distictio betwee the specific values of the radom variable X before ad after they are kow, ad emphasizes this distictio by usig uppercase ad lowercase letter. This distictio is useful oly i some cases ad that is why most textbooks do ot make this distictio. We will follow emphasize the distictio ad we will use oly lowercase letters.

1.7 Ubiasedess ad efficiecy 11 1.6.2 Estimators Estimator. It is a geeral rule (mathematical formula) for estimatig a ukow populatio parameter give a sample of data. For example, a estimator for the populatio mea is the sample mea: X = 1 (x 1+ x 2 + +x )= 1 i=1 x i. (1.16) A iterestig feature of this estimator is that the variace of X is 1/ times the variace of X. The derivatio is the followig: σ 2 X = var( X) (1.17) σ 2 X = var{1 (x 1+ x 2 + +x )} (1.18) σ 2 X = 1 2 var{1 (x 1+ x 2 + +x )} (1.19) σ 2 X = 1 2{var(x 1)+var(x 2 )+ +var(x )} (1.20) σ 2 X = 1 2{σ 2 X + σ 2 X + +σ 2 X} (1.21) σ 2 X = 1 2{σ 2 X}= σ 2 X (1.22) Graphically, this result is show i Figure 1.3. The distributio of X has a higher variace (it is more dispersed) tha the distributio of X. 1.7 Ubiasedess ad efficiecy 1.7.1 Ubiasedess Because estimators are radom variables, we ca take expectatios of the estimators. If the expectatio of the estimator is equal to the true populatio parameter, the we say that this estimator is ubiased. Let θ be the populatio parameter ad let ˆθ be a poit estimator of θ. The, ˆθ is ubiased if: E( ˆθ)=θ (1.23) Example. The sample mea of X is a ubiased estimator of the populatio mea µ X :

12 1 Radom Variables, Samplig ad Estimatio X µ X Fig. 1.3 Probability desity fuctios of X ad X. E( X) = E( 1 i )= i=1x 1 E( x i ) (1.24) i=1 = 1 i ))= i=1(e(x 1 i=1 µ X = 1 µ X = µ X Ubiased estimator. A estimator is ubiased if its expected value is equal to the true populatio parameter. The bias of a estimator is just the differece betwee its expected value ad the true populatio parameter: Bias( ˆθ)=E( ˆθ) θ (1.25) 1.7.2 Efficiecy It is ot oly importat that a estimator is o average correct (ubiased), but also that it has a high probability of beig close to the true parameter. Whe comparig two estimators, ˆθ 1 ad ˆθ 2, we say that ˆθ 1 is more efficiet if var( ˆθ 1 ) < var( ˆθ 2 ). A compariso of the efficiecy betwee these two estimators i preseted i Figure 1.4. The estimator with higher variace,( ˆθ 2 ), is more dispersed.

1.8 Estimators for the variace, covariace, ad correlatio 13 θˆ 1 θˆ 2 µ X Fig. 1.4 Efficiecy of estimators ˆθ 1 ad ˆθ 2, with var( ˆθ 1 )<var( ˆθ 2 ). Most efficiet estimator. The estimator with the smallest variace from all ubiased estimators. 1.7.3 Ubiasedess versus efficiecy Both, ubiasedess ad efficiecy, are desired properties of a estimator. However, there may be coflicts i the selectio betwee two estimators ˆθ 1 ad ˆθ 2, if, for example, ˆθ 1 is more efficiets, but it is also biased. This case is preseted i Figure 1.5. The simplest way to select betwee these two estimators is to pick the oe that yields the smallest mea square error (MSE): MSE( ˆθ)=var( ˆθ)+bias( ˆθ) 2 (1.26) 1.8 Estimators for the variace, covariace, ad correlatio While we have already see the populatios formulas for the variace, covariace ad correlatio, it is importat to keep i mid that we do ot have the whole populatio. The data sets we will be workig with are just samples of the populatios. The formula for the sample variace is:

14 1 Radom Variables, Samplig ad Estimatio θˆ 2 θˆ 1 µ X Fig. 1.5 ˆθ 2 is ubiased, but ˆθ 1 is more efficiet. s 2 X = 1 1 i=1 (x i X) 2 (1.27) Notice how we chaged the otatio from σ 2 to s 2. The first oe deotes the populatio variace, while the secod oe refers to the sample variace. A estimator for the populatio covariace is give by: s XY = 1 1 Fially, the formula for the correlatio coefficiet, r XY, is: r XY = i=1 (x i X)(y i Ȳ). (1.28) i=1 (x i X)(y i Ȳ) i=1 (x i X) 2 i=1 (y (1.29) i Ȳ) 2. 1.9 Asymptotic properties of estimators Asymptotic properties of estimators just refers to their properties whe the umber of observatios i the sample grows large ad approached to ifiity.

1.9 Asymptotic properties of estimators 15 = 1000 = 250 = 40 θ Fig. 1.6 The estimator is biased for small samples, but cosistet. 1.9.1 Cosistecy A estimator ˆθ is said to be cosistet if its bias becomes smaller as the sample size grows large. Cosistecy is importat because may of the most commo estimators used i ecoometrics are biased, the the miimum we should expect from these estimators is that the bias becomes small as we are able to obtai larger data sets. Figure 1.6 illustrates the cocept of cosistecy by showig how a estimator of the populatio parameter θ becomes ubiased as. 1.9.2 Cetral limit theorem Havig ormally distributed radom variables is importat because we ca the costruct, for example, cofidece itervals for its mea. However, what if a radom variable does ot follow a ormal distributio? The cetral limit theorem gives us the aswer. Cetral limit theorem. States the coditios uder which the mea of a sufficietly large umber of idepedet radom variables (with fiite mea ad variace) will be approximate a ormal distributio. Hece, eve if we do ot kow the uderlyig distributio of a radom variable, we will still be able to costruct cofidece itervals that will be approximately valid. I a umerical example, let s assume that the radom variable X follows a

16 1 Radom Variables, Samplig ad Estimatio = 100 = 20 = 10 Fig. 1.7 Distributio of the sample mea of a uiform distributio. uiform distributio [-0.5,0.5]. Hece, it is equally likely that this radom variable takes ay value withi this rage. Figure 1.7 shows the distributio of the average of this radom variable for = 10, 20, ad 100. All of these three distributios look very close to a ormal distributio.