ECONOMETRICS Introduction & First Principles

Similar documents
ECON The Simple Regression Model

ECNS 561 Multiple Regression Analysis

Introductory Econometrics

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Multiple Linear Regression CIVL 7012/8012

The Simple Linear Regression Model

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Intermediate Econometrics

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Econometrics I Lecture 3: The Simple Linear Regression Model

Making sense of Econometrics: Basics

Lectures 5 & 6: Hypothesis Testing

Homoskedasticity. Var (u X) = σ 2. (23)

Chapter 2: simple regression model

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Lecture 3: Multiple Regression

ECON3150/4150 Spring 2015

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

The Simple Regression Model. Simple Regression Model 1

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

An Introduction to Parameter Estimation


Introductory Econometrics

Econometrics Summary Algebraic and Statistical Preliminaries

ACE 564 Spring Lecture 8. Violations of Basic Assumptions I: Multicollinearity and Non-Sample Information. by Professor Scott H.

Applied Statistics and Econometrics

Econometrics - 30C00200

Intro to Applied Econometrics: Basic theory and Stata examples

Making sense of Econometrics: Basics

ECON3150/4150 Spring 2016

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Inferences for Regression

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

OSU Economics 444: Elementary Econometrics. Ch.10 Heteroskedasticity

Lecture 4: Multivariate Regression, Part 2

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Statistical Inference with Regression Analysis

2. Linear regression with multiple regressors

1 Correlation between an independent variable and the error

Correlation and Regression

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

The cover page of the Encyclopedia of Health Economics (2014) Introduction to Econometric Application in Health Economics

1 Correlation and Inference from Regression

Multiple Regression Analysis

Environmental Econometrics

Lecture-1: Introduction to Econometrics

Simple Linear Regression: The Model

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Lecture 16 - Correlation and Regression

EC4051 Project and Introductory Econometrics

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

EMERGING MARKETS - Lecture 2: Methodology refresher

WISE International Masters

WISE International Masters

Simple Regression Model (Assumptions)

Section 3: Simple Linear Regression

2 Prediction and Analysis of Variance

Applied Statistics and Econometrics. Giuseppe Ragusa Lecture 15: Instrumental Variables

1 Motivation for Instrumental Variable (IV) Regression

Review of Econometrics

Quantitative Economics for the Evaluation of the European Policy

Chapter 2 The Simple Linear Regression Model: Specification and Estimation

Lecture 4: Multivariate Regression, Part 2

Intermediate Econometrics

Chapter 4: Regression Models

ECO220Y Simple Regression: Testing the Slope

THE MULTIVARIATE LINEAR REGRESSION MODEL

1 A Non-technical Introduction to Regression

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

Two-Variable Regression Model: The Problem of Estimation

Inference in Regression Model

Multiple Regression Analysis: Inference MULTIPLE REGRESSION ANALYSIS: INFERENCE. Sampling Distributions of OLS Estimators

Inference in Regression Analysis

Applied Microeconometrics (L5): Panel Data-Basics

Chapter 16. Simple Linear Regression and dcorrelation

appstats27.notebook April 06, 2017

Regression Analysis. BUS 735: Business Decision Making and Research

Least Squares Estimation-Finite-Sample Properties

Multivariate Regression Analysis

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Essential of Simple regression

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Linear Models in Econometrics

Warm-up Using the given data Create a scatterplot Find the regression line

Contest Quiz 3. Question Sheet. In this quiz we will review concepts of linear regression covered in lecture 2.

Applied Regression Analysis. Section 2: Multiple Linear Regression

An overview of applied econometrics

Econometrics. Week 11. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Statistics II. Management Degree Management Statistics IIDegree. Statistics II. 2 nd Sem. 2013/2014. Management Degree. Simple Linear Regression

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

The Simple Regression Model. Part II. The Simple Regression Model

Ordinary Least Squares Regression

Chapter 27 Summary Inferences for Regression

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Heteroskedasticity ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Transcription:

ECONOMETRICS Introduction & First Principles DA V I D C. BR O A D S T O C K Research Institute of Economics & Management, China. OSEC Pre-Master Course, 2015.

COURSE OUTLINE Part 1. Introduction to econometrics: including review of types of data used and basic notation for cross sectional, time-series and panel data. OLS and the linear regression. Understanding test statistics with a review of Monte-Carlo simulation. Mis-specification testing Heteroskedasticity, heterogeneity and structural breaks. Panel data. Time series methods. Advanced topics.

CLASS SCHEDULE Dates/locations of econometrics classes TABLE : Class schedule Date Venue Lecture Lecturer 21 Mar Saturday-morning C205 Jingshi Lecture 1 David Broadstock 28 Mar Saturday-morning C205 Jingshi Lecture 2 David Broadstock 11 Apr Saturday-morning C205 Jingshi Lecture 3 David Broadstock 18 Apr Saturday-morning C205 Jingshi Lecture 4 David Broadstock 25 Apr Saturday-morning C205 Jingshi Lecture 5 «Guest» 09 May Saturday-morning C205 Jingshi Lecture 6 David Broadstock 16 May Saturday-morning C205 Jingshi Lecture 7 David Broadstock 23 May Saturday-morning C205 Jingshi Lecture 8 David Broadstock TBA TBA Final Exam

GRADING/CLASS REQUIREMENTS This course is aimed at reviewing the overall merits and approaches of (statistical) econometric methods. Attendance = 5% Homeworks = 25% In class presentation = 10% Final Exam = 60% Each week we will have a presentation by 1 group discussing a paper of my choosing - aiming at 15 minutes. These will be used to tackle issue relating to econometric practices from a wider viewpoint. We shall begin with the birth of Econometrica and the first winner of the Nobel Prize in Economics!

SOME ONLINE RESOURCES Use the web - it is invaluable! This course is aimed at reviewing the overall merits and approaches of (statistical) econometric methods. The website of the Wooldridge book (datasets available) Econometrics Journal s online resources: http://econometriclinks.com Under the above, link to a list of different econometrics software packages (many are free). You should also be able to implement basic operations in a spreadsheet, or find software within the university. You may also wish to check the course webpage (click on me), where lecture slides and homework materials will be posted.

INTRODUCTION Today s learning outcomes. To get familiar with each other To spark an interest to econometric thinking To develop an apprieciation of the different types of data in the real-world To understand the components of a simple linear model To understand the components of an estimated simple linear model To understand what the least squares criterion is To be aware of the different types of data we may work with To begin to consider the notion of causality (probably next week)

ECONOMETRICS - WHAT IS IT? Analysis of statistical relationships in economic data. A highly useful tool, with lots of applications in Business in Government in Research Centres in your 3rd year dissertation in postgraduate studies Takes an effort to learn, but once learned, it is very rewarding, both intellectually and in terms of your employability. If you don t learn Econometrics while at University, you ll probably never learn it!

HOW DO WE THINK OF THE PROBLEM Part 1/2. Electricity consumption (Y ) depends on income (X). However consumption also depends on several other economic variables that we ignore for now In economics we express our ideas about relationships between economic variables using the mathematical concept of a function For example, to express the relationship between food electricity consumption for individual i, Y i and his/her income X i we may write Y i = f (X i ) (1) However, when studying this relationship one recognizes that the actual consumption by an individual is the result of a systematic part and/or a random and unpredictable component u i that we call the random term Y i = f (X i ) + u i (2) The random term accounts for the many factors that affect sales that we have omitted from this simple model, and it also reflects the intrinsic uncertainty in the behaviour of individuals.

HOW DO WE THINK OF THE PROBLEM Part 2/2. To complete the specification of the econometric model, we must also say something about the form of the algebraic relationship among our economic variables In the example about the relationship of annual electricity expenditure and annual income, we assume that the systematic part of the demand relation is linear f (X i ) = β 1 + β 2 X i (3) The corresponding econometric model is Y i = β 1 + β 2 X i + u i (4) This together with some assumptions about u i and X i is called the simple linear regression model

THE PROBLEM OF INFERENCE We will spend some time on this. Given that the data has been generated by a model that has the form Y i = β 1 + β 2 X i + u i, what are the values of β 1 and β 2 in such a model? Using a sample of observations on Y and X we would like to estimate the parameters b 1 and b 2 We would like to test hypotheses on these parameters For example we want to know if b 1 and b 2 are somehow informative/stable Also, we would like to understand if we are using the correct model in the first place

SOME JARGON There is a lot... Y i f (X i ), f (X i ) = β 1 + β 2 X i (plus assumptions) is the linear regression model f (X i ) = β 1 + β 2 X i is called linear regression β 1 and β 2 are the parameters β 1 is the intercept parameter β 2 is the slope parameter. df (X) Since = d(β 1+β 2 X i ) = β d(x) d(x) 2, then β 2 tells us the increase of E(Y X) for each unit increase in X Y is the dependent variable X is the independent variable, or the regressor u is the error term

SOME MORE JARGON... get used to the terms Given two estimates b 1 and b 2, the quantity Ŷi = β 1 + β 2 X i is the predicted value for individual i The difference Y i Ŷi = û i is the residual Notice that the residuals and the errors are different things The error of a sample is the deviation of the sample from the (unobservable) true function value, while the residual of a sample is the difference between the sample and the estimated function value. (Wikipedia!)

BEFORE DISCUSSING ESTIMATION... lets think about data and variables a little There are a number of different types of data structures - we will not handle them all in this course, but here are some of the main ones to be aware of: Cross-sectional Panel Time series Discrete/categorical outcomes Count-data/truncated data Big-data

CROSS-SECTIONAL DATA Cross-sectional data is the simplest type of data available to us. We rarely work with this type of data in empirical research, but the teaching of econometrics and development of econometric theory still takes advantage of it: many innovative estimators start life without considering aspects of time or individual-type heterogeneity often considering these (necessary) refinements once the cross-sectional world is understood. Imagine we have a sample of households for a single year. Denote each household i, and the total number of households in the sample as I. Note, we could easily have a cross section of individuals, or countries or firms etc. We typically will denote the data Y i for the dependent variable and X i for independent variables. An example cross-sectional demand function might look something like: Q i = α + β P P i + β Y Y i + ε i

PLOTTING CROSS-SECTION DATA Below are some hypothetical data on price, income and quantity consumed - we will revisit next week why it might be interesting to work with hypothetical data sometimes. 2 1 0 1 2 0 2 4 P Q 1 0 1 2 3 2 0 2 4 Y Q

PANEL DATA Panel data is an intuitively simple extension to cross-sectional data. In short, panel data includes information on individuals i = 1,..., I, similar in this sense to cross-sectional data, but in addition records this information over many time periods t = 1,..., T e.g. years, months quarters etc. Time periods are ideally (though not always) equidistant, meaning that the amount of time passed between periods 1 and 2 is the same as for 2 and 3 and all other sequential periods. We typically will denote the data Y it for the dependent variable and X it for independent variables. An example panel-data demand function might look something like: Q it = α i + β P P it + β Y Y it + ε it Note: There is an important difference between panel data (sometimes the name given to repeated cross-sections ) and longitudinal data (sometimes referred to as pure panel )

PANEL-DATA ILLUSTRATED Below is an example of energy demand for 17 countries OECD 17 log energy consumption per capita log(epc) 1.5 1.0 0.5 0.0 0.5 1.0 1.5 1960 1970 1980 1990 2000 Time

TIME SERIES DATA Time series data is data which concentrates on a single individual (again noting the general interpretation we have for the term individual ) over multiple time periods. It allows for more comprehensive understanding of trends in the world and the exploration of phenomena such as periods of economic boom, or sudden price collapses. We typically will denote the data Y t for the dependent variable and X t for independent variables. An example time-series demand function might look something like: Q t = α i + β P P t + β Y Y t + ε t A particular interest in time series is the treatment of high-frequency data which are increasingly commonplace.

TIME-SERIES DATA ILLUSTRATED Below is an example of UK gasoline demand UK real (weighted) gasoline price log(price) 5.7 5.8 5.9 6.0 6.1 6.2 1960 1970 1980 1990 2000 2010 Time

DISCRETE/CATEGORICAL OUTCOMES Discrete or categorical variables can be found in cross-sections, panel-data and time series. These variables represent things that have a finite number of outcomes, for example the outcome of tossing a coin, or perhaps the decision by OPEC to change supply. Discrete variables can be simple consider for example a variable OPEC t intended to reflect the decision by OPEC to change (either increase or decrease) oil supply in period t. We could have: OPEC t = { 0 if supply remains unchanged 1 if supply changes We must be careful to consider if the discrete variable has an order. For instance choosing which color to have your car requires comparing many colors, which have no natural order. Alternatively a variable describing satisfaction (1=unhappy, 5=very happy) has an intuitive ordering to it. This can have important consequences for estimation.

COUNT-DATA/TRUNCATED DATA Count-data and truncated data are two other types of special variables, which can again have important consequences for estimation when present: Count data will generally be variables measured in integers (e.g. cannot be obtained in parts) and which may take any integer value from 0 to. Truncated data is often continuous over a range, for example the share of gasoline in total energy will be a maximum of 1 and a minimum of zero - but is perfectly continuous on this range and not restricted to integer values.

BIG DATA Big Data represents the Information assets characterized by such a High Volume, Velocity and Variety to require specific Technology and Analytical Methods for its transformation into Value [De Mauro et al. (2015)] Walmart handles more than 1 million customer transactions every hour, which are imported into databases estimated to contain more than 2.5 petabytes (2560 terabytes) of data the equivalent of 167 times the information contained in all the books in the US Library of Congress. [The Economist (2010)] I am presently trying to work with a 400GB dataset on energy consumption across 16,000 households - simply opening the data is beyond the capacity of most software and PC s - not to mention the programming ability of the average economist...

VARIABLE TYPES Most data types can contain one or more of the following We will review for general types of variables: Continuous variables These are variables that in their purest sense can take any value from minus infinity to plus infinity. We will in most cases assume this for the dependent variable. Bounded/truncated variables Variables of this form may not go above/below certain values. They can be bound from one or two sides. Can have important effects on density estimation for example (solved via reflection). Discrete As already discussed these variables depict selection among a finite set of outcomes and can lead to using quite specialized statistical approaches. Latent variables see 2 slides later

VARIABLE TYPES ILLUSTRATED Below are some examples of how different variables look A continuous normal variable A 'truncated from the left' normal variable A 'truncated from above and below' normal variable Frequency 0 2000 4000 6000 8000 Frequency 0 5000 10000 15000 Frequency 0 5000 10000 15000 20000 4 2 0 2 4 mean=0, sd=1 4 2 0 2 4 mean=0, sd=1, lower=0, upper=infinity 4 2 0 2 4 mean=0, sd=1, lower=0,upper=1

LATENT VARIABLES Latent variables, and more generally latent information are things economists have used in various ways for some time. Beyond simply unobserved, latent variables often fall into the territory of being unobservable, in economics this might for example take the form of preferences This will at first seem counter-intuitive, how can the behavior of something immeasurable be quantified? The answers to this are interesting and highlight the more elegant aspects of economics and statistics - incorporating wisdom into rigorous mathematical structures. We will consider technological progress and underlying trends as some intuitive pedagogical examples. A relatively mainstream example would also be factor analysis and it s variants.

GUESSING THE PARAMETERS Art or science? Let us turn attention back to the the econometrics, and more specifically the nature of estimation. So, how to determine b 1 and b 2? Fit the line in a way, that our prediction mistakes are minimised in the best possible way. Does it mean we can minimize the sum of the residuals?

GUESSING THE PARAMETERS The least squares criterion Instead of choosing b 1 and b 2 to minimize the sum of the residuals, we choose b 1 and b 2 to minimize the sum of the squared residuals û 2 1 + û 2 2 + û 2 3 +... + û 2 n (5) = (Y i b 1 b 2 X i ) 2 + (Y 2 b 1 b 2 X 2 ) 2 + (Y 3 b 1 b 2 X 3 ) 2 +... + (Y n b 1 b 2 X n) 2 (6) Notice that the sum of the squared residuals can be zero if and only if all the residuals are zero

THE LEAST SQUARES CRITERION Our first estimator The study of the linear regression model is the focus of this course A good starting point for understanding the linear regression model are Chapter s 1 and 2 of Wooldridge This course requires that you are familiar with the material in the Review section at the end of the book. It is highly recommended that you read through it during the beginning of the term.

A RESEARCH FORMAT Before we go any further with the mechanics. It all starts with a problem (or question) Economic theory gives us a way of thinking about the problem: What economic variables are involved and what is the possible direction of the relationship(s)? The working economic model leads to an econometric model. We must choose a functional form and make some assumptions about the nature of the error term Sample data are obtained, and a desirable method of statistical analysis chosen, based on our initial assumptions, and our understanding of how the data were collected Estimates of the unknown parameters are obtained with the help of a statistical software package, predictions are made and hypothesis tests are performed Model diagnostics are performed to check the validity of the assumptions that were made. For example, were all of the right-hand side explanatory variables relevant? Was the correct functional form used? The economic consequences and the implications of the empirical results are analyzed and evaluated. What economic resource allocation and distribution results are implied, and what are their policy-choice implications? What remaining questions might be answered with further study or new and better data?

ORDINARY LEAST SQUARES (OLS) The workhorse of econometrics. Getting back to the idea of guessing the parameters, we have already introduced the least squares criterion In the next slides we review the mechanics of least squares estimation It is little more than standard optimization, but over a large number of observations We concentrate on the simple linear regression for illustration, i.e. the case where there is only one X variable, since the mechanics get much more involved with more than one X, after which matrix algebra becomes favorable

MINIMIZATION (1/6) Simple linear regression The minimization problem min (b 1,b 2 ) (Y i b 1 b 2 X i ) 2 (7) Leads to the following first order conditions: b 1 b 2 (Y i b 1 b 2 X i ) 2 = 0 (8a) (Y i b 1 b 2 X i ) 2 = 0 (8b)

MINIMIZATION (2/6) Simple linear regression By taking the derivatives of each term of the sums, we can re-write the first order conditions as b 1 (Y i b 1 b 2 X i ) 2 = 0 b 2 (Y i b 1 b 2 X i ) 2 = 0 Then, evaluating these two derivatives we obtain 2 2 (9a) (9b) (Y i b 1 b 2 X i ) = 0 (10a) X(Y i b 1 b 2 X i ) = 0 (10b)

MINIMIZATION (3/6) Simple linear regression The 2 term cancels from each equation giving: (Y i b 1 b 2 X i ) = 0 (11a) X(Y i b 1 b 2 X i ) = 0 Expanding out the brackets and re-arranging these we obtain the normal equations: (11b) Y i = nb 1 + b 2 X i Y i = b 1 X i + b 2 X i X 2 i (12a) (12b)

MINIMIZATION (4/6) Simple linear regression Now we can solve for b 1 and b 2 simultaneously by multiplying 12a by n X i and also by multiplying 12b by n to give: n X i Y i = nb 1 X i Y i = nb 1 X i + b 2 ( X i ) 2 X i + nb 2 X 2 i (13a) (13b)

MINIMIZATION (5/6) Simple linear regression Subtracting 13a from 13b n X i Y i From which it follows that: X i Y i = b 2 [n X 2 i ( ] X i ) 2 (14) b 2 = n n X iy i n X n i Y i [ n n X i 2 ( ] (15) n X i) 2

MINIMIZATION (6/6) Simple linear regression Now, given b 2 we can recover b 1 by recalling normal equation 12a and re-arranging to give: nb 1 = Y i b 2 X i (16a) n b 1 = Y i n b 2 n X i n = Ȳ b 2 X (16b)

MINIMIZATION SUMMARY The ordinary least squares estimator Note that the slope of the estimated relationship (b 2 ) is equivalent to cov(x, Y ) divided by var(x) : b 2 = cov(x,y ) var(x) Also note that the 2 nd equation implies that the estimated line always passes through the means of X i and Y i : Ȳ = b 1 + b 2 X [This will be important to remember for hypothesis testing and model validation] Since these formulas work for any values of the sample data, they are the least squares estimators.

THE MAIN ASSUMPTIONS To be reviewed in multiple regression SLR.1 Linear in parameters y = β 0 + β 1 x + u (17) SLR.2 Random sampling: We can take a random sample of size n, {x i, y i : i = 1, 2,..., n} from the population model SLR.3 Zero conditional mean: E(u x) = 0 SLR.4 Sample variation in the independent variable: In the sample, the independent variable s x i, i = 1,..., n, are not all equal to the same constant. This requires some variation in x in the population.

MEANING OF LINEAR REGRESSION Linearity in parameters This can be a little confusing, as it is possible to specify some non-linear relationships using the simple linear regression model Clarity over this confusion comes in terms of the role of the parameters. The parameters in the equation must enter in a linear fashion, however it is still possible for variables to enter into the equation non-linearly e.g. interaction terms, which we come back to in a future class Can anybody think of examples when we might wish to try and control for non-linear behaviour? Econometrics can deal with non-linear models (i.e. non-linear in parameters) but this is beyond the extent of this course.

MULTIPLE LINEAR REGRESSION When there is more than one X. Y i = b 1 b 2 X 1i b 3 X 2i +... + b k X ki + u i (18) The X variables can be transformations e.g. X 1i = X 1i, X 2i = X 2 1i The only limitations on the number of included variables is the sample size (and desired degrees of freedom for inference - to be discussed next week) Ceteris paribus - specification of the standard Multiple linear regression allows for a holding other things fixed interpretation/control environment. However it does not require anything to be fixed during data collection in order to work.

MINIMIZATION (1/8) Multiple regression The minimization problem min (b 1,b 2,b 3 ) (Y i b 1 b 2 X 1 b 3 X 2 ) 2 (19) Leads to the following first order conditions: b 1 b 2 b 3 (Y i b 1 b 2 X 1 b 3 X 2 ) 2 = 0 (20a) (Y i b 1 b 2 X 1 b 3 X 2 ) 2 = 0 (20b) (Y i b 1 b 2 X 1 b 3 X 2 ) 2 = 0 (20c)

MINIMIZATION (2/8) Multiple regression By taking the derivatives of each term of the sums evaluating the three derivatives we obtain: 2 (Y i b 1 b 2 X 1 b 3 X 2 ) = 0 (21a) 2 X 1 (Y i b 1 b 2 X 1 b 3 X 2 ) = 0 2 X 2 (Y i b 1 b 2 X 1 b 3 X 2 ) = 0 (21b) (21c)

MINIMIZATION Multiple regression This is where Wooldridge stops the derivation for the multiple linear regression

MINIMIZATION (3/8) Multiple regression Canceling the 2 terms from each equation and re-arranging we obtain the normal equations: Y i = nb 1 + b 2 X 1 + b 3 X 1 Y i = b 1 X 2 Y i = b 1 X 1 + b 2 X1 2 + b 3 X 1 + b 2 X 2 X 1 X 2 X 1 X 2 + b 3 X 2 2 (22a) (22b) (22c)

MINIMIZATION Multiple regression When it comes to solving this set of equations it is convenient to apply matrix algebra or Cramer s rule It is not required for this course to use these methods It is not really required to derive the least squares estimators for the multiple regression model, though it is instructive to do so The following slides outline an approach to deriving the parameter estimates using the same notation as applied for the simple linear regression

MINIMIZATION (4/8) Multiple regression In order to solve this set of equations it is useful to recall the structure of the two (X) variable regression model: Ŷ = ˆb 1 + ˆb 2 X 1 + ˆb 3 X 2 + u (23) Averaging over the sample observations (noting also that ū = 0) gives: Ȳ = ˆb 1 + ˆb 2 X1 + ˆb 3 X2 (24) Now subtracting (24) from (23) gives the deviation form : y = ˆb 2 x 1 + ˆb 3 x 2 + u (25)

MINIMIZATION (5/8) Multiple regression The intercept b 1 disappears from the deviations form of the regression but is easily recovered by re-arranging the averages form of the equation to give: ˆb 1 = Ȳ ˆb 2 X1 ˆb 3 X2 (26) in order determine the values for b 2 and b 3, as in the case of the simple linear regression, we wish to minimise the sum of the squared residuals for the deviations form of the regression: min (b 2,b 3 ) (y b 2 x 1 b 3 x 2 ) 2 (27)

MINIMIZATION (6/8) Multiple regression Evaluating the first order derivatives gives the following first order conditions: b 2 b 3 which give respectively: (y i b 2 x 1 b 3 x 2 ) 2 = 0 (28a) (y i b 2 x 1 b 3 x 2 ) 2 = 0 (28b) x 1 y = b 2 x1 2 b 3 x 1 x 2 x 2 y = b 2 x 1 x 2 b 3 x 2 2 (29a) (29b)

MINIMIZATION (7/8) Multiple regression In order to eliminate one of the unknown parameters we multiply equation (24a) by n x 2 2 and (24b) by n x 1x 2 to give: x 1 y x2 2 = b 2 x 2 y x 1 x 2 = b 2 x 2 1 x2 2 b 3 x 1 x 2 (x 1 x 2 ) 2 b 3 x 1 x 2 x 2 2 x 2 2 (30a) (30b) Then subtract (25b) from (25a) to give: x 1 y x2 2 x 2 y [ x 1 x 2 = b 2 (x 1 x 2 ) 2 x 2 1 ] x2 2 (31)

MINIMIZATION (8/8) Multiple regression This can be re-arranged to give: n b 2 = x 1y n x 2 2 n x 2y n x 1x 2 [ n (x 1x 2 ) 2 ] (32) n x 1 2 n x 2 2 In a similar fashion we can find that: n b 3 = x 2y n [ n x 2 1 x 2 1 n x 1y n x 1x 2 n x 2 2 n (x 1x 2 ) 2 ] (33)

R 2 (GOODNESS-OF-FIT) The coefficient of determination - multiple regression Given SST (the total sum of squares), SSE (the explained sum of squares) and SSR (the sum of squared residuals) we can define R 2 the ratio of the explained variation compared to the total variation; thus, it is interpreted as the fraction of the sample variation in y that is explained by x. Where R 2 = SSE SST = 1 SSR SST SST = SSE = (34) (y i ȳ) 2 (35a) (ŷ i ȳ) 2 (35b) SSR = (ū i ) 2 (35c)

R 2 (GOODNESS-OF-FIT) The coefficient of determination - some notes R 2 provides an extremely useful measure of the ability of the specified regression equation to explain the variation in the independent variable R 2 never decreases and it usually increases when another independent variable is added to a regression This makes R 2 a poor tool for deciding whether one additional variable or many additional variables should be added to a model However, as we will see in week 4 and in later weeks, the R 2 does provide useful information for considering whether groups of independent variables are useful in explaining the dependent variable

THE MAIN ASSUMPTIONS In the context of the multiple regression In the context of the more general multiple regression we have a slightly different set of assumptions: MLR.1 Linear in parameters y = β 0 + β 1 x 1 + β 2 x 1... + β k x k + u (36) MLR.2 Random sampling: We have a random sample of n observatoins, {x 1, x 2,..., x k, y i : i = 1, 2,..., n} from the population model described by assumptions MLR.1. MLR.3 Zero conditional mean: E(u x 1, x 2,..., x k ) = 0 MLR.4 No perfect collinearity: In the sample (and therefore the population), none of the independent variables is constant, and there are no exact linear relationships among the independent variables. MLR.5 Homoskedasticity: Var(u x 1, x 2,..., x k ) = σ 2

MLR.1 LINEAR IN PARAMETERS The meaning of linear regression This can be a little confusing, as it is possible to specify some non-linear relationships using the simple linear regression model The answer comes in terms of the role of the parameters. The parameters in the equation must enter in a linear fashion, however it is still possible for variables to enter into the equation non-linearly Can anybody think of examples when we might wish to try and control for non-linear behaviour? Econometrics can deal with non-linear models (i.e. non-linear in parameters) but this is beyond the extent of this course.

MLR.2 RANDOM SAMPLING The meaning of linear regression We assume that the data are randomly drawn from the population For OLS to be unbiased, this assumption needs to hold for the population

MLR.3 ZERO CONDITIONAL MEAN Exogenous explanatory variables One way in which this assumption can fail is if the functional relationship between the explained and the explanatory variables, for example omitting a quadratic term when in fact it should be present Functional form mis-specification can be a problem and its detection is considered in Chapter 9 of Wooldridge Omitting an important variable that is correlated with any of x 1, x 2,..., x k of the included variables can also cause MLR.3 to fail as it will mean that there is still important information contained in the residual term Next week we will show how this can generate bias in the results, and in the following weeks we will consider what can be done to remedy it

MLR.4 NO PERFECT COLLINEARITY Sample properties In the sample (and therefore the population), none of the independent variables is constant, and there are no exact linear relationships among the independent variables. This says nothing about the relationship defined by MLR.3, (i.e. the relationship with u, rather it relates to the relationships among any of x 1, x 2,..., x k of the included variables. If one of the independent variables is an exact linear relationship of another, then we say that it is perfectly co-linear and it cannot be estimated by OLS. This assumption does still allow for correlation between x 1, x 2,..., x k - it is the co-movement in the variables which determines the value of the coefficients

RECALL REGRESSION ESTIMATOR Why X must take at least two values Regarding constant terms, from simple OLS recall that: n (Y Ȳ )(X X) b 2 = n (X X) (37) 2 If X had no variation at all, so that each X i would be equal to the mean of X, the OLS estimator would not be defined, as one can t divide by zero But, even if there is some variation, more of it would be better for the precision of the OLS estimator as will be seen later Regarding the slope parameters, consider the equation for b 2, given earlier, if x 2 = 1(x 1 ): n b 2 = x 1y n x 2 2 n x 2y n x 1x 2 [ n (x 1x 2 ) 2 ] (38) n x 1 2 n x 2 2

MLR.5 HOMOSKEDASTICITY Var(u x 1, x 2,..., x k ) = σ 2 This assumption means that the variance in the error term, u, conditional on the explanatory variables, is the same for all combinations of outcomes of the explanatory variables. If this variance changes with any of the explanatory variables, then the residual process is said to exhibit heteroskedasticity heteroskedasticity generated problems that can have important implications for model inference and will be reviewed in week 7 Assumptions MLR.1-MLR.5 are known as the Gauss-Markov assumptions for cross sectional regression. The set of assumptions developed so far are only appropriate for cross-sectional regression. Stating the assumptions for time series or for panel data is much more difficult, though there are similarities.

OLS VARIANCE AND COVARIANCE Parameter uncertainty It will prove important for us to understand the variance and co-variance of the OLS estimates for individual parameters. Below are definitions that we will again return to later in the course. σ 2 Var(b j ) = SST j (1 Rj 2 ) se(b 2 ) = Var(ˆb 2 ) (39a) (39b) where σ 2 is the error variance, defined in the following slide. Variance for any given parameter is therefore a combination of the (average) model uncertainty, inversely weighted by the ability of the model to describe the data.

ERROR VARIANCE Model uncertainty We wish to define σ 2 = E(u 2 ) in which the expectation is equivalent to n 1 n u2 i. However ui 2 is unobserved and so we replace it with the residuals from the estimated regression Instead of taking a direct average, the denominator is equal to n k 1 as opposed to n, and is referred to as the degrees of freedom ) ( n σ 2 = u2 i (n k 1) = SSR (n k 1) (40) You may wish to read Wooldridge regarding degrees of freedom

BLUE Best Linear Unbiased Estimator Best: refers to the estimator with the smallest variance. This is met insofar as it is the objective function for OLS Linear: means that b 1 and b 2 are linear estimators; that is, they are linear functions of the random variable Y Unbiased: An unbiased estimator is an estimator in which E( β j ) = β j Estimator: simply refers to the fact that we are looking at an estimator. Under the main MLR assumptions β j are the BLUE s of b j for all j ( j) hence the OLS estimators are BLUE (by the Gauss Markov Theorem) hence they are often referred to as the Gauss Markov assumptions. The importance of these assumptions is that when the standard set of assumptions holds we need not look for alternative unbiased estimators.

OMITTED VARIABLE BIAS 1/3 The consequences of leaving something important out Imagine the true PRF is: But however we estimate: Y = β 1 + β 1 X 1 + β 1 X 2 + u Ỹ = β 1 + β 1 X 1 Note that we use rather thanˆto emphasize that β j comes from an underspecified model We know that the OLS estimator of β 2 is; n β 2 = (Y i Ȳ )(X i X) n (X i X = 2 ) n (X i X)Y i n (X i X 2 )

OMITTED VARIABLE BIAS 2/3 The consequences of leaving something important out Since we know that Y = β 1 + β 2 X 1 + β 3 X 2 + u we can re-write the numerator of β 2 as follows: (X i X)(β 1 + β 2 X 1 + β 3 X 2 + u) = β 2 (X i X) 2 + β 3 (X i X)X 2 + = β 2 SST 1 + β 3 (X i X)X 2 + (X i X)u (X i X)u

OMITTED VARIABLE BIAS 3/3 The consequences of leaving something important out Dividing by SST 1 and taking the expectation conditional on the independent variables (noting that E(u) = 0) we have E( β 2 ) = β 2 + β 3 n (X i X)X 2 n (X i X) 2 n (X i X)X 2 n (X i X) 2 is equivalent to the slope coefficient from a regression of X 2 on X 1, which we could define as X 2 = δ 1 + δ 2 X 1. We can therefore see that E( β 2 ) = β 2 + β 3 δ2 E( β 2 ) β 2 = β 3 δ2 Where E( β 2 ) β 2 is defined as the bias

Thanks for listening! Any questions/comments are warmly welcomed. davidbroadstock@swufe.edu.cn

INTERPRETATION OF MODEL OUTPUT The impact of different functional forms (given that y = β 0 + β 1 x) Model Dependent variable Independent variable Interpretation of β 1 level-level y x y = β 1 x level-log y ln (x) y = β 1 100 % x log-level ln (y) x % y = (100 β 1 ) x log-log ln (y) ln (x) % y = β 1 % x