An Introduction to Parameter Estimation

Similar documents
Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

from estimating ei by yi - m*(xi) and then using the formula for m*(x) with yi replaced by the square of this estimated ei.

Making sense of Econometrics: Basics

The Simple Linear Regression Model

Lectures 5 & 6: Hypothesis Testing

Linear Model Under General Variance

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Econometrics Summary Algebraic and Statistical Preliminaries

ECON The Simple Regression Model

Gov 2002: 4. Observational Studies and Confounding

1 Probability theory. 2 Random variables and probability theory.

Econometrics. Week 11. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Review of Statistics

ECON 3150/4150, Spring term Lecture 6

Chapter 6: Rational Expr., Eq., and Functions Lecture notes Math 1010

MS&E 226: Small Data

Topic 6: Non-Spherical Disturbances

1. The OLS Estimator. 1.1 Population model and notation

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

ACE 562 Fall Lecture 2: Probability, Random Variables and Distributions. by Professor Scott H. Irwin

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator

Final Exam. Economics 835: Econometrics. Fall 2010

Simple Linear Regression Estimation and Properties

An overview of applied econometrics

WEEK 7 NOTES AND EXERCISES

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Review: Limits of Functions - 10/7/16

Review of Econometrics

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University

ECON3150/4150 Spring 2015

Covariance and Correlation

Probability. Hosung Sohn

The regression model with one stochastic regressor (part II)

1. Regressions and Regression Models. 2. Model Example. EEP/IAS Introductory Applied Econometrics Fall Erin Kelley Section Handout 1

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

If we want to analyze experimental or simulated data we might encounter the following tasks:

Parameter Estimation

Intermediate Econometrics

LECTURE 5. Introduction to Econometrics. Hypothesis testing

MA Advanced Econometrics: Applying Least Squares to Time Series

ECO220Y Simple Regression: Testing the Slope

Math 1302 Notes 2. How many solutions? What type of solution in the real number system? What kind of equation is it?

30 Wyner Math Academy I Fall 2015

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Instrumental Variables

Econometrics Master in Business and Quantitative Methods

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

3 Polynomial and Rational Functions

Math 115 Spring 11 Written Homework 10 Solutions

Partial Fraction Decomposition

1 Motivation for Instrumental Variable (IV) Regression

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014

4.2 Graphs of Rational Functions

Answers. 2. List all theoretically possible rational roots of the polynomial: P(x) = 2x + 3x + 10x + 14x ) = A( x 4 + 3x 2 4)

Economics 620, Lecture 7: Still More, But Last, on the K-Varable Linear Model

Lecture 3: Multiple Regression

Topic 4: Model Specifications

LECTURE 11: GENERALIZED LEAST SQUARES (GLS) In this lecture, we will consider the model y = Xβ + ε retaining the assumption Ey = Xβ.

STATISTICS 1 REVISION NOTES

THE EFFECTS OF MULTICOLLINEARITY IN ORDINARY LEAST SQUARES (OLS) ESTIMATION

Lecture 4: Testing Stuff

Using regression to study economic relationships is called econometrics. econo = of or pertaining to the economy. metrics = measurement

1 Quantitative Techniques in Practice

Short T Panels - Review

Introductory Econometrics

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Introduction to Econometrics Final Examination Fall 2006 Answer Sheet

More on Estimation. Maximum Likelihood Estimation.

Dynamic Regression Models (Lect 15)

ECON3150/4150 Spring 2016

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

MS&E 226: Small Data

The regression model with one fixed regressor cont d

Multiple Linear Regression

REED TUTORIALS (Pty) LTD ECS3706 EXAM PACK

Financial Econometrics

Quantitative Techniques - Lecture 8: Estimation

1 Correlation between an independent variable and the error

The cover page of the Encyclopedia of Health Economics (2014) Introduction to Econometric Application in Health Economics

Fourier and Stats / Astro Stats and Measurement : Stats Notes

LIST OF FORMULAS FOR STK1100 AND STK1110

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

UNIVERSITY OF TORONTO Faculty of Arts and Science

SOME BASICS OF TIME-SERIES ANALYSIS

Reading Assignment. Distributed Lag and Autoregressive Models. Chapter 17. Kennedy: Chapters 10 and 13. AREC-ECON 535 Lec G 1

Chapter 1 Statistical Inference

Partial Fraction Decomposition Honors Precalculus Mr. Velazquez Rm. 254

The Simple Regression Model. Simple Regression Model 1

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

df=degrees of freedom = n - 1

FIRST MIDTERM EXAM ECON 7801 SPRING 2001

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Unit Root and Cointegration

Transcription:

Introduction Introduction to Econometrics An Introduction to Parameter Estimation This document combines several important econometric foundations and corresponds to other documents such as the Introduction to Probability, Criteria for Estimators and Estimator of the Variance. There is some overlap from between these documents, however the information is not displayed in identical formats and may prove useful as complimentary explanations and definitions. The Disturbance Term ε Without the disturbance term the relationship is deterministic. Meanwhile if we have a disturbance term then the relationship is no longer exact and we say it is stochastic. The disturbance term is required for 3 main reasons 1. Omission of the influence of innumerable chance events. Not all casual factors are known and even variables with little effect. The error term is often viewed as the sum of many omitted variables with small impacts. 2. Measurement Error: Some variables cannot be measured accurately. 3. Human indeterminacy: Humans are not rational and will behave differently in two separate occasions when there are identical conditions. The Difference between Estimates and Estimators An estimate is the actual number we estimate for β, but we never really know the true value of β. To have a better estimate of the parameters we can estimate them together when we have more than one parameter. This is because an economic relationship has many parameters and although we may only be interested in one parameter we keep the other parameters to make our parameter of interest more accurate in how it affects the relationship. This is why β never really refers to a single parameter value but instead refers to the set β. β 1 β 2 β = β k 1 β k Where an estimate of this β will be a set of numbers such as 3.79 8.63 5.22 31.4 1

Estimator and estimate Econometricians focus not on the number estimate itself, but rather the method or formula by which the data are transformed into an estimate. This transformation method is called the estimation method or the estimator and it is used to justify the estimate. As there is no way of knowing the disturbance terms an estimate from the sample could be quite inaccurate. Therefore although the estimate is inaccurate it is defended by the fact that the estimator/ method usually produces estimates that are fairly accurate. Therefore and estimate cannot be justified by itself and it requires the estimator to justify it. There are infinite ways to estimate β, this means there are infinite estimators. Therefore producing an estimator for β is not difficult. However this leaves us with the question of how to know which estimator is preferred? A good estimator will produce a good estimate of β. However a good estimator in one situation may not be a poor estimator in another situation. The meaning of preferred depends upon what the person is trying to estimate and in which context they are working. Estimators that meet certain criteria are called good estimators, whether they are preferred depends on who is doing the estimating and under what circumstances. Sample Distributions Suppose that you have 30 observations on variables x and y and you know that y has been produced by x. The following y observations are created by multiplying β with x and then adding an error ε. y 1 = βx 1 + ε 1 y 2 = βx 2 + ε 2... y 30 = βx 30 + ε 30 Now we are interested in estimating the unknown parameter β. We think that by taking the average would be a good estimate so we come up with the following formula. β = y This will produce a number say 1.334, but how do we know if this is a good estimator of β. β = (βx + ε) 2 = β + ε So β is equal to β plus an expression involving the error term. Because the values for the error term are positive half the time and negative half the time so it is safe to assume this will go to zero as we sum it up. Once the last term cancels out we should have a good estimator. If β = 1.334 how close is this estimate to the actual β? This will depend on the distribution of the errors. If there are mainly large positive errors then our estimate would over estimate. If

there are large negative errors then our estimate would dramatically underestimate. If the sample of 30 observations provide errors that are fairly standard when compared to the population or a larger sample then the estimate should be very close. The important thing to note is that the set of 30 unknown error terms in the sample determines the estimate β produced by the formula. There is no way of knowing whether the estimator is exactly the true value. Sampling distribution is the answer to this problem. Suppose that we know that the true value is β and we do our formula for all 30 observations but then repeat the process for a million times with a new 50 observations. Eventually we can use all the values of our estimate β to construct a histogram. The histogram will count the frequency of different outcomes and allocate them into predetermined bins. The histogram should show that high and low values are very rare and estimates closer to the true value will be more common as they have a higher frequency. In addition the histogram will show relative probabilities of obtaining different values of the estimate β, during the repeated process procedure. This distribution is called the sampling distribution of the estimate β. The sampling distribution of a statistic tells us the relative frequency with which values of the statistic would occur if we repeat the process drawing new sets of errors. Choosing between β and some other formulated method that produces β is that the we know the β has properties defined in terms of the sample distribution. For example we know that β is unbiased if its mean is equal to the parameter it is trying to estimate β. The properties of sample distribution depend on the process used to generate the data. This is what econometric theory tries to find. The OLS estimator had attractive sample distribution for some applications but not all. In fact no method of estimation posses attractive sample distributions for all types of applications. All statistics have sampling distributions. An F value has its own distribution and this means that not just parameter estimates have sample distributions. Calculating Sampling Distributions 1. The first approach follows the above method. Or if there is a non zero intercept y = βx + ε β = 45σ2 () 2 This shows the importance of the distribution y 1 = α + βx 1 + ε 1 β = β + 45α 3

2. Classical Linear Model y = βx + ε Variance V(ε) = σ 2 I The mean of the sampling distribution of β OLS is (β, σ 2 (X X) 1 ) The distribution will have a normal shape. 3. When N values of the x are drawn randomly from a distribution with mean μ and has variance σ 2 the sampling distribution of x has mean μ and variance σ2 N. The Central Limit Theorem is used because as N gets large the sampling distribution will be normal in shape. Variance Suppose a random variable x has a probability density function f(x) where x can be thought of as a coefficient estimate β and therefore f(β ) would be its sampling distribution. The variance of x is defined as V(x) = E(x Ex) 2 = (x Ex) 2 f(x)dx Variance A weighted average of the squared difference between x and its mean over all possible values of x, and the weights are probabilities of the x values occurring. So if you randomly select an x value and square the difference between this value and the mean of x to get a number J, what is the value you would get for J if you repeated this an infinite number of times. Covariance The covariance between two variables x and y will have a joint density function. If you randomly draw a pair of x and y values and subtract their means from each and multiply them together you will get a value known as the covariance when you repeat this process infinitely. Variance Covariance Matrix C(x, y) = E(x Ex)(y Ey) σ 2 σ xy σ xy Ω = σ xy σ 2 σ xy σ xy σ xy σ 2 4

The variance-covariance matrix is a square matrix and has the variance of the individual elements of x down the diagonal and then all the covariances between these elements in the off diagonals. In the OLS model y = Xβ + ε the variance- covariance matrix becomes. V(β OLS ) = V(ε)(X X) 1 = σ 2 (X X) 1 If there is only an intercept then X becomes a column of 1 s Estimation V(β OLS ) = V(y ) = V(y) N The actual estimates of the variance of x can be calculated in the following way. 1. For a x if we have n observations, then V(x) is usually estimated by s 2 = (x i x ) 2 (N 1) 2. If we have N corresponding observation on y, then C(xy) is estimated as shown below. S xy = (x i x )(y i y ) (N 1) 3. If x is the error in a regression model with k explanatory variables (including the intercept), then V(x) is estimated as follows. s 2 = SSE (N K) 4. To estimate the covariance matrix we combine 1 and 2 to fill in the individual elements of the variance covariance matrix Introduction to the Properties of Variance The variance of the sample mean is equal to the variance of x divided by the number of observations in the sample. V(x ) = V(x) N The variance of a linear function of x y = a + bx where a and b are constants V(y) = b 2 V(x) The variance of the sum or, difference, of two random variables say k and j is found by the following. u = k j V(u) = V(k) + V(j) ± 2C(k, j) 5

The variance of the sample proportion of success p is calculated as the following. Asymptotics V(p ) = p(1 p) N Asymptotics gives us a way of obtaining information to better understand finite sample distributions and helps us to produce good approximations. The theory of Asymptotics distribution is concerned with β as the sample size becomes very large (approaching infinity). Convergence in probability: The distribution of β N collapse and become very concentrated toward a particular value as the sample size increases. This idea is the foundation for the concept of consistency. Convergence in distribution: If he distribution of β N approximates a known distribution such as the normal distribution, when the sample size becomes large. This provides us the basis for hypothesis testing procedures. Convergence in Probability β N k as N Plim β N = k Then β N is said to converge in probability to k or has a probability limit of k. If β is the estimate and is equal to k then the estimate is said to be consistent. Also if they are not equal k β is the asymptotic bias of β as an estimator for β. Also the variance will approach zero as the sample size becomes large and therefore the sample mean converges in quadratic mean and is a consistent estimator of mean μ. Consistency OLS y = Xβ + ε β OLS = β + (X X) 1 X ε Now we apply the limits (you can do this with expectations) plimβ OLS = β + plim(x X) 1 X ε We pre-multiply by the number of observations N then apply slutsky theorem. plim(x X) 1 X ε = plim X 1 X N plim( X ε N ) By dividing by N we are looking at the average values and these are finite under some assumptions as the sample becomes infinite. Because the error term converges to zero we will have the following plimβ OLS = β The other expression will drop out because the error term converges to 0 and this is multiplied by the all but the actual β. 6

Asymptotic Variance: Because OLS is unbiased in small samples it is also unbiased in large samples, and that the variance of β OLS can be written as the following. σ 2 (X X) 1 = σ2 1 X X N N This will approach zero as the sample size gets very large. This is easy to see because N is the denominator. Convergence in Distribution Suppose the sample size becomes very large and the distribution f n of β N becomes almost identical to a specific distribution f. We do this so we have distribution as an approximate for the unknown small sample distribution of β N. However there are two problems. In a lot of applications the distribution collapses to a spike so it will not make sense to use it as an approximate for the small sample distribution. We normalize β N to prevent the distribution from collapsing. We do this by focusing on distribution N (β N β). In the OLS case when the sample size becomes large the mean of N (β OLS β) is equal to 0 and the variance is σ 2 (X X) 1. How do we know what form the distribution (e.g Normal Distribution) is going to take and N gets large. We can use the central limit theorem to solved this problem. The theory says that the sample mean statistic is distributed normally when the sample size becomes very large. For instance take OLS, if the errors are distributed normally then the distribution has mean β OLS and variance and variance σ 2 (X X) 1. If the errors are not distributed normally it is difficult to utilize for testing so we use the approximation of this exact distribution which is called the asymptotic distribution of β OLS. 7