Quantitative Techniques - Lecture 8: Estimation

Quantitative Techniques - Lecture 8: Estimation Key words: Estimation, hypothesis testing, bias, e ciency, least squares Hypothesis testing when the population variance is not known roperties of estimates Approaches to estimation Di erence between Means When we know the population variance we can generally use a z test or some variant for hypothesis testing as long as we know how the variable in question is distributed. For example, if two populations are normally distributed with variances 1 and the sampling distribution of the di erence between the means is is also normally distributes with variance s 1 x1 x = + (1) n 1 n where n 1 and n are the sample sizes. You can the use this to test whether the populations from which samples are drawn have the same or di erent means. Example: Two groups of students get scores of 110 and 115 respectively. We know that 1 = 8 and = 10: Sample sizes are 16 and 5. Do the studetns come from populations with di erent means? Using our standard methods from last week: x 1 = 110; n 1 = 16 () x = 115; n = 5 (3) 1. De ne the so-called null hypothesis to be tested. In this case it is that 1 = 0:. Calculate the so called test statistic which follows a particular distribution. In our case we calculate x1 x x1 :Since we are using the standard normal x distribution this is z. i:e:z = 110 115 x1 x (4) 1

x1 x = s r 1 + 64 = n 1 n 16 + 100 5 = p 8 = : 88 4 (5) ) z = 5 = 1: 7678 (6) : 88 4 3. Decide on a signi cance level. In this case we will use 5 per cent for a two tailed test. 4. Look up the critical value z which gives you the correct area under the tail(s). In this case z = 1:96: 5. Compare z with z. If z > z reject the null hypothesis. Otherwise accept the "null." In this case z < z so we accept the null hypothesis of no di ernec ein poeulation means. Hypothesis testing when the population variance is not known Lab 7 asks you to investigate the properties of the di erence between sample means (or mean of the sample di erence as it is expressed there). If we do not know the variance we shall have to estimate it. It seems natural to base it on (xi x) : Consider rst the expression (x i ) for any i. Its expected value is the de nition of the variance N, so the expected value of 1 (x i ) is simply N : Is E (x i x) also equal to N? To answer this we need to remind ourselves that in a recent lab we showed that the sample mean is a least squares estimator. i.e. it is the value of ^ which minimises (x i ^) : So only in the unusual circumstance that x = (probability = 0) will (x i x) = (xi ) : It turns out that Ef X (x i x) g = (N 1) (7) This means that an unbiased estimate of is: s (xi x) = N 1 (8)

The square root of this is STDEV in Excel. If our sample is the population we need to use STDEV which divides by N instead of N 1: For future reference let s call this formula s p. There is one other complication. Since we do not know with accuracy the denominator of our z statistic is subject to error, which raises the uncertainty surrounding our so-called z: In theses circumstances we should really be using something called a "Student" t statistic. Who knows what the "Student" refers to? roperties of estimates and estimators An estimator is a method or formula for getting an estimate. We have already introduced the concept of bias in an estimator when we saw that E(s p ) > : Biasedness is one of several possible properties of an estimator. 1. An estimate ~ is unbiased if E ~ = : i.e. if the expected value = true value. Unbiasedness is obviously a good property.. Some unbiased estimates are better than others. Each estimator will have a sampling distribution and a variance. Amn estimtor that has less variance than any other with which it is compared is an e cient estimator. 3. Some estimators may be biased in small samples but the bias gradually disappears. Such estimators are said to be asymptotically unbiased. 4. For some estimators the variance tends to zero as the sample size increases. 5. Roughly combining the last two we get the concept of consistency. An estimator is consistent if the probability that j ~ j is less than some small value approaches 1.00 as the sample size increases without limit. In e ect this means that the estimator approaches the true value as the sample size increases. Approaches to estimation For more details see Ashefelter et. al. chapter 7 and overhead slides. These include the methods of: 1. Least squares. Maximum likelihood 3. Moments 3

1 Regression Analysis So far we have been looking at estimation in the conext of estimation of population means and hypothesis testing such as testing for di erences between population means. Economists main quantitative tool is regression analysis (sometimes it seems like the only tool!). The basic model is as follows. Assume that there is a data generating process where some observed variable X causes variations in another variable Y. X is called the independent or explanatory variable and Y the dependent variable. Sometimes we call Y the LHS or endogenous variable and X the RHS or exogenous variable ( LHS for left hand side etc.) This is because we express the causal relationship as a mathematical function. For conveneience of exposition we suppose this can be represented as a linear relationship: Y i = a + bx i + u i (9) Here the subscript i refers to a particular observation. The term u i is present to indicate the following possibilities 1. There are other determinants of Y that we have not incorporated into our model. Y may be subject to measurement error. The term u i is variously referred to as the disturbance or error term. It helps if E(u i ) = 0 but we shall discuss this assumption later. Now consider the problem of estimating the parameters a and b of this function as well as the individual u i s. Denote estimates by a ^ symbol: Y i = ^a + ^bx i + ^u i (10) We can represent this as a tted line through a series of scatter points with the deviations of the individual points corresponding to the ^u i for each observation. We call each ^u i the residual. The tted line is ^Y i = ^a + ^bx (11) 4

^Y is called the tted value of Y. There are various ways of tting this line including minimising the sum of absolute deviations j^u i j (try it in a spreadsheet!) but the commonest method is to minimise ^u i. This is the least squares (LS) estimate. The LS estimate has some desirable properties: 1. If E(u i ) = 0 and E(X i u i ) = 0 (X and u uncorrelated) then ^a and ^b are unbiased.. If u i is independently distributed as a normal distribution with constant variance then ^a and ^b are a) maximum likelihood and therefore consistent b) e cient (minimum variance) c) distributed according to the "Student" t distribution with mean = a and b respectively. We shall look at the meaning of and issues related to independence and constant variance of u i next week. Example: I generated the folllowing data based on the true model Y = 10 + X + u i (just as in Laboratory 8): Obs X Y 1.4 57.31 10.76 30.71 3 16.48 45.9 4 10.07 44.44 5 9.53 79.56 6 8.06 59.6 7 14.41 38.70 8 4.69 54.13 9 15.87 39.07 10 19.4 51.50 11 17.33 61.4 1 11.3 37.50 13 3.38 75.13 14 1.77 47.95 15 1.88 55.50 16 16.77 4.4 Using Excel this generated the following output SUMMARY OUTUT Regression Statistics Multiple R 0.81513838 5

F R Square 0.664450348 Adjusted R Square 0.64048516 Standard Error 8.0730001 Observations 16 ANOVA df SS MS F Significance Regression 1 1784.3417 1784.3417 7.758833 0.000119488 Residual 14 901.098753 64.364196 Total 15 685.44088 Coeffs StanErr tstat -value Lower95% Upper95% Intercept 16.094 6.9589.393 0.0353 1.840 31.1348 X 1.8480 0.3510 5.65 0.0001 1.095.6007 Let s spend a little time looking at this output: Summary output: These are various measures of goodness of t Multiple R: square root or R square. Correlation coe cient between Y and ^Y R square = Regression sum of squares/ Total sum of squares Adjusted R square : Adjusted for degress of freedom (numbers observations less number of parameters) Standard error: A measure of the residual variance. ANOVA = ANalysis Of VAriance These are various sums of squares. Which do you think correspond to: (Y Y ) (Y ^Y ) u i ( ^Y Y ) What does "df" stand for? SS? MS? How does it look as if F is calculated? 6

Last three rows: Intercept? X? Coe s StanErr tstat -value Lower95% Upper95% Would you say this regression line fairly represented the data generation process? Is the estimate of b biased? With di erent samples that might produce higher ^bs what do you think would happen to the ^a? 1.1 Calculating ^a and ^b The least squares estimators are ^b = (X X)(Y Y ) (X X) ^a = Y ^b X (1) Y. You should remember this. Note that this is not symmetric between X and Reading Ashenfelter chapters 7-10. Exercises Kraus p 14 Q.4. Chapter 9 problems 9.6 questions 1 and. roblems 10.8 questions 1 and 3 7