Quantitative Techniques - Lecture 8: Estimation

Similar documents
1 A Non-technical Introduction to Regression

1 Quantitative Techniques in Practice

Econometrics Midterm Examination Answers

Problem set 1 - Solutions

Economics 620, Lecture 7: Still More, But Last, on the K-Varable Linear Model

1. The Multivariate Classical Linear Regression Model

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Föreläsning /31

Correlation Analysis

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

1 Correlation between an independent variable and the error

We begin by thinking about population relationships.

Finansiell Statistik, GN, 15 hp, VT2008 Lecture 15: Multiple Linear Regression & Correlation

Applied Regression Analysis

Lecture Notes Part 7: Systems of Equations

LECTURE 13: TIME SERIES I

Economics 241B Estimation with Instruments

ECONOMET RICS P RELIM EXAM August 24, 2010 Department of Economics, Michigan State University

Essential of Simple regression

Economics 620, Lecture 13: Time Series I

INTRODUCTION TO ANALYSIS OF VARIANCE

Business Statistics. Lecture 9: Simple Regression

1.5 Statistical significance of linear trend

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

2 Regression Analysis

ANOVA TESTING 4STEPS. 1. State the hypothesis. : H 0 : µ 1 =

Linear Regression. y» F; Ey = + x Vary = ¾ 2. ) y = + x + u. Eu = 0 Varu = ¾ 2 Exu = 0:

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Introductory Econometrics

1 Correlation and Inference from Regression

Least Squares Estimation-Finite-Sample Properties

Unit 27 One-Way Analysis of Variance

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

Testing Linear Restrictions: cont.

Multiple Linear Regression CIVL 7012/8012

Instrumental Variables and Two-Stage Least Squares

The Simple Linear Regression Model

Simple Linear Regression: One Quantitative IV

SIMPLE REGRESSION ANALYSIS. Business Statistics

Ordinary Least Squares Regression Explained: Vartanian

ECON3150/4150 Spring 2015

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner

REVIEW 8/2/2017 陈芳华东师大英语系

Basic Probability Reference Sheet

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Chapter 2. Dynamic panel data models

Final Exam - Solutions

PSYC 331 STATISTICS FOR PSYCHOLOGISTS

Violation of OLS assumption- Multicollinearity

ECO220Y Simple Regression: Testing the Slope

Statistics. Class 7: Covariance, correlation and regression

Lecture Notes on Measurement Error

Simple Linear Regression

Econometrics Homework 1

ECNS 561 Multiple Regression Analysis

Prob/Stats Questions? /32

Chapter 12 - Lecture 2 Inferences about regression coefficient

Ch 2: Simple Linear Regression

Six Sigma Black Belt Study Guides

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

Ordinary Least Squares Regression Explained: Vartanian

Inference for Regression Simple Linear Regression

Mathematics for Economics MA course

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Econometrics 2, Class 1

Multiple Linear Regression

2 Prediction and Analysis of Variance

Economics 326 Methods of Empirical Research in Economics. Lecture 14: Hypothesis testing in the multiple regression model, Part 2

Inferences for Regression

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Statistics and Quantitative Analysis U4320

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

The regression model with one stochastic regressor (part II)

Institute of Actuaries of India

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

LOOKING FOR RELATIONSHIPS

Correlation and Linear Regression

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Advanced Experimental Design

Sir Francis Galton

Econometrics -- Final Exam (Sample)

Environmental Econometrics

Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur

Can you tell the relationship between students SAT scores and their college grades?

i) the probability of type I error; ii) the 95% con dence interval; iii) the p value; iv) the probability of type II error; v) the power of a test.

Chapter 4. Regression Models. Learning Objectives

Correlation and Regression

Economics Introduction to Econometrics - Fall 2007 Final Exam - Answers

Do not copy, post, or distribute

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

(c) i) In ation (INFL) is regressed on the unemployment rate (UNR):

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Mathematical Notation Math Introduction to Applied Statistics

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Transcription:

Quantitative Techniques - Lecture 8: Estimation Key words: Estimation, hypothesis testing, bias, e ciency, least squares Hypothesis testing when the population variance is not known roperties of estimates Approaches to estimation Di erence between Means When we know the population variance we can generally use a z test or some variant for hypothesis testing as long as we know how the variable in question is distributed. For example, if two populations are normally distributed with variances 1 and the sampling distribution of the di erence between the means is is also normally distributes with variance s 1 x1 x = + (1) n 1 n where n 1 and n are the sample sizes. You can the use this to test whether the populations from which samples are drawn have the same or di erent means. Example: Two groups of students get scores of 110 and 115 respectively. We know that 1 = 8 and = 10: Sample sizes are 16 and 5. Do the studetns come from populations with di erent means? Using our standard methods from last week: x 1 = 110; n 1 = 16 () x = 115; n = 5 (3) 1. De ne the so-called null hypothesis to be tested. In this case it is that 1 = 0:. Calculate the so called test statistic which follows a particular distribution. In our case we calculate x1 x x1 :Since we are using the standard normal x distribution this is z. i:e:z = 110 115 x1 x (4) 1

x1 x = s r 1 + 64 = n 1 n 16 + 100 5 = p 8 = : 88 4 (5) ) z = 5 = 1: 7678 (6) : 88 4 3. Decide on a signi cance level. In this case we will use 5 per cent for a two tailed test. 4. Look up the critical value z which gives you the correct area under the tail(s). In this case z = 1:96: 5. Compare z with z. If z > z reject the null hypothesis. Otherwise accept the "null." In this case z < z so we accept the null hypothesis of no di ernec ein poeulation means. Hypothesis testing when the population variance is not known Lab 7 asks you to investigate the properties of the di erence between sample means (or mean of the sample di erence as it is expressed there). If we do not know the variance we shall have to estimate it. It seems natural to base it on (xi x) : Consider rst the expression (x i ) for any i. Its expected value is the de nition of the variance N, so the expected value of 1 (x i ) is simply N : Is E (x i x) also equal to N? To answer this we need to remind ourselves that in a recent lab we showed that the sample mean is a least squares estimator. i.e. it is the value of ^ which minimises (x i ^) : So only in the unusual circumstance that x = (probability = 0) will (x i x) = (xi ) : It turns out that Ef X (x i x) g = (N 1) (7) This means that an unbiased estimate of is: s (xi x) = N 1 (8)

The square root of this is STDEV in Excel. If our sample is the population we need to use STDEV which divides by N instead of N 1: For future reference let s call this formula s p. There is one other complication. Since we do not know with accuracy the denominator of our z statistic is subject to error, which raises the uncertainty surrounding our so-called z: In theses circumstances we should really be using something called a "Student" t statistic. Who knows what the "Student" refers to? roperties of estimates and estimators An estimator is a method or formula for getting an estimate. We have already introduced the concept of bias in an estimator when we saw that E(s p ) > : Biasedness is one of several possible properties of an estimator. 1. An estimate ~ is unbiased if E ~ = : i.e. if the expected value = true value. Unbiasedness is obviously a good property.. Some unbiased estimates are better than others. Each estimator will have a sampling distribution and a variance. Amn estimtor that has less variance than any other with which it is compared is an e cient estimator. 3. Some estimators may be biased in small samples but the bias gradually disappears. Such estimators are said to be asymptotically unbiased. 4. For some estimators the variance tends to zero as the sample size increases. 5. Roughly combining the last two we get the concept of consistency. An estimator is consistent if the probability that j ~ j is less than some small value approaches 1.00 as the sample size increases without limit. In e ect this means that the estimator approaches the true value as the sample size increases. Approaches to estimation For more details see Ashefelter et. al. chapter 7 and overhead slides. These include the methods of: 1. Least squares. Maximum likelihood 3. Moments 3

1 Regression Analysis So far we have been looking at estimation in the conext of estimation of population means and hypothesis testing such as testing for di erences between population means. Economists main quantitative tool is regression analysis (sometimes it seems like the only tool!). The basic model is as follows. Assume that there is a data generating process where some observed variable X causes variations in another variable Y. X is called the independent or explanatory variable and Y the dependent variable. Sometimes we call Y the LHS or endogenous variable and X the RHS or exogenous variable ( LHS for left hand side etc.) This is because we express the causal relationship as a mathematical function. For conveneience of exposition we suppose this can be represented as a linear relationship: Y i = a + bx i + u i (9) Here the subscript i refers to a particular observation. The term u i is present to indicate the following possibilities 1. There are other determinants of Y that we have not incorporated into our model. Y may be subject to measurement error. The term u i is variously referred to as the disturbance or error term. It helps if E(u i ) = 0 but we shall discuss this assumption later. Now consider the problem of estimating the parameters a and b of this function as well as the individual u i s. Denote estimates by a ^ symbol: Y i = ^a + ^bx i + ^u i (10) We can represent this as a tted line through a series of scatter points with the deviations of the individual points corresponding to the ^u i for each observation. We call each ^u i the residual. The tted line is ^Y i = ^a + ^bx (11) 4

^Y is called the tted value of Y. There are various ways of tting this line including minimising the sum of absolute deviations j^u i j (try it in a spreadsheet!) but the commonest method is to minimise ^u i. This is the least squares (LS) estimate. The LS estimate has some desirable properties: 1. If E(u i ) = 0 and E(X i u i ) = 0 (X and u uncorrelated) then ^a and ^b are unbiased.. If u i is independently distributed as a normal distribution with constant variance then ^a and ^b are a) maximum likelihood and therefore consistent b) e cient (minimum variance) c) distributed according to the "Student" t distribution with mean = a and b respectively. We shall look at the meaning of and issues related to independence and constant variance of u i next week. Example: I generated the folllowing data based on the true model Y = 10 + X + u i (just as in Laboratory 8): Obs X Y 1.4 57.31 10.76 30.71 3 16.48 45.9 4 10.07 44.44 5 9.53 79.56 6 8.06 59.6 7 14.41 38.70 8 4.69 54.13 9 15.87 39.07 10 19.4 51.50 11 17.33 61.4 1 11.3 37.50 13 3.38 75.13 14 1.77 47.95 15 1.88 55.50 16 16.77 4.4 Using Excel this generated the following output SUMMARY OUTUT Regression Statistics Multiple R 0.81513838 5

F R Square 0.664450348 Adjusted R Square 0.64048516 Standard Error 8.0730001 Observations 16 ANOVA df SS MS F Significance Regression 1 1784.3417 1784.3417 7.758833 0.000119488 Residual 14 901.098753 64.364196 Total 15 685.44088 Coeffs StanErr tstat -value Lower95% Upper95% Intercept 16.094 6.9589.393 0.0353 1.840 31.1348 X 1.8480 0.3510 5.65 0.0001 1.095.6007 Let s spend a little time looking at this output: Summary output: These are various measures of goodness of t Multiple R: square root or R square. Correlation coe cient between Y and ^Y R square = Regression sum of squares/ Total sum of squares Adjusted R square : Adjusted for degress of freedom (numbers observations less number of parameters) Standard error: A measure of the residual variance. ANOVA = ANalysis Of VAriance These are various sums of squares. Which do you think correspond to: (Y Y ) (Y ^Y ) u i ( ^Y Y ) What does "df" stand for? SS? MS? How does it look as if F is calculated? 6

Last three rows: Intercept? X? Coe s StanErr tstat -value Lower95% Upper95% Would you say this regression line fairly represented the data generation process? Is the estimate of b biased? With di erent samples that might produce higher ^bs what do you think would happen to the ^a? 1.1 Calculating ^a and ^b The least squares estimators are ^b = (X X)(Y Y ) (X X) ^a = Y ^b X (1) Y. You should remember this. Note that this is not symmetric between X and Reading Ashenfelter chapters 7-10. Exercises Kraus p 14 Q.4. Chapter 9 problems 9.6 questions 1 and. roblems 10.8 questions 1 and 3 7