Basically, if you have a dummy dependent variable you will be estimating a probability.

Similar documents
Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Limited Dependent Variables

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

LOGIT ANALYSIS. A.K. VASISHT Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi

Lecture 6: Introduction to Linear Regression

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Chapter 9: Statistical Inference and the Relationship between Two Variables

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Chapter 14: Logit and Probit Models for Categorical Response Variables

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Andreas C. Drichoutis Agriculural University of Athens. Abstract

e i is a random error

Statistics for Business and Economics

Statistics for Economics & Business

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

The Ordinary Least Squares (OLS) Estimator

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

The Geometry of Logit and Probit

3/3/2014. CDS M Phil Econometrics. Vijayamohanan Pillai N. CDS Mphil Econometrics Vijayamohan. 3-Mar-14. CDS M Phil Econometrics.

January Examinations 2015

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Lecture 4 Hypothesis Testing

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Correlation and Regression

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Comparison of Regression Lines

Negative Binomial Regression

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

x i1 =1 for all i (the constant ).

CHAPTER 8. Exercise Solutions

Homework 9 STAT 530/J530 November 22 nd, 2005

Chapter 14 Simple Linear Regression

28. SIMPLE LINEAR REGRESSION III

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Chapter 11: Simple Linear Regression and Correlation

/ n ) are compared. The logic is: if the two

Lecture 3 Stat102, Spring 2007

Linear Regression Analysis: Terminology and Notation

Chapter 5 Multilevel Models

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients

Module Contact: Dr Susan Long, ECO Copyright of the University of East Anglia Version 1

PBAF 528 Week Theory Is the variable s place in the equation certain and theoretically sound? Most important! 2. T-test

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

Economics 130. Lecture 4 Simple Linear Regression Continued

Rockefeller College University at Albany

Chapter 6. Supplemental Text Material

CHAPTER 8 SOLUTIONS TO PROBLEMS

Linear Approximation with Regularization and Moving Least Squares

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Now we relax this assumption and allow that the error variance depends on the independent variables, i.e., heteroskedasticity

Stat 642, Lecture notes for 01/27/ d i = 1 t. n i t nj. n j

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Empirical Methods for Corporate Finance. Identification

18. SIMPLE LINEAR REGRESSION III

a. (All your answers should be in the letter!

Chapter 13: Multiple Regression

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

III. Econometric Methodology Regression Analysis

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Basic Business Statistics, 10/e

β0 + β1xi. You are interested in estimating the unknown parameters β

Properties of Least Squares

β0 + β1xi and want to estimate the unknown

First Year Examination Department of Statistics, University of Florida

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Chapter 8 Indicator Variables

A Robust Method for Calculating the Correlation Coefficient

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

β0 + β1xi. You are interested in estimating the unknown parameters β

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

STATISTICS QUESTIONS. Step by Step Solutions.

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

Midterm Examination. Regression and Forecasting Models

Continuous vs. Discrete Goods

4.1. Lecture 4: Fitting distributions: goodness of fit. Goodness of fit: the underlying principle

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Chapter 15 Student Lecture Notes 15-1

Lecture 2: Prelude to the big shrink

SIMPLE LINEAR REGRESSION

Interpreting Slope Coefficients in Multiple Linear Regression Models: An Example

Diagnostics in Poisson Regression. Models - Residual Analysis

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

University of California at Berkeley Fall Introductory Applied Econometrics Final examination

Multinomial logit regression

Chapter 3 Describing Data Using Numerical Measures

1 Binary Response Models

Transcription:

ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy dependent varable you wll be estmatng a probablty. Probabltes are necessarly restrcted to fall n the range [0,1] and ths puts specal condtons on the regresson. Just dong a lnear regresson can result n estmated probabltes that are ether negatve or greater than 1 and are a bt nonsenscal. As a result, there are other technques for estmatng these relatonshps that can generate better results. The Lnear Probablty Model The second or thrd best way to estmate models wth dummy dependent varables s to smply estmate the model as you normally mght: D = β 0 + β 1 X 1 + β 2 X 2 + ε For example, f you have a sample of the U.S. adult populaton and you're tryng to determne the probablty that a person s ncarcerated, you mght estmate the equaton: where D = β 0 + β 1 AGE + β 2 GENDER + ε D s a dummy varable takng the value 1 f a person s ncarcerated and 0 f not AGE s the person's age GENDER s a dummy varable equal to 1 f the person s male and 0 otherwse Imagne that the estmated coeffcents are: Dˆ = 0.0043-0.0001*AGE + 0.0052*GENDER Interpretaton of the estmated coeffcents s straghtforward. If there are two women, one of whom s one year older than the other, the estmated probablty that the older one wll be ncarcerated wll be 0.0001 less than the estmated probablty that the younger one wll be.

ECON 497: Lecture Notes 13 Page 2 of 2 If there are a man and a woman of the same age, the predcted probablty that the man wll be ncarcerated s 0.0052 greater than the predcted probablty that the woman wll be ncarcerated. Interestngly, a woman of age 43 wll have a predcted probablty of 0.0000 and women older than ths wll have negatve predcted probabltes of ncarceraton. Studenmund descrbes ssues regardng the lnear probablty model and you should read ths dscusson. One thng I wll pont out s that the adjusted R 2 s not an accurate measure of overall ft n a lnear probablty model wth dummes as dependent varables. The Weghted Least Squares Approach The most complcated pont from Studenmund's dscusson of the lnear probablty model s the dscusson of weghted least squares. Ths technque s desgned to get around the problem of heteroskedastcty (whch we haven't really dscussed yet) and can be summarzed as follows: 1. Due to the structure of the lnear probablty model, the error terms are not dentcally dstrbuted. Specfcally, error terms wll have greater varance when the actual probablty s close to 0.5 and smaller varance when the actual probablty s close to zero or one. Because all of the error terms are not dentcally dstrbuted (they have dfferent varances) there s a problem wth heteroskedastcty. Coeffcent estmates, however, wll be unbased as long as the other classcal assumptons are satsfed. 2. To address ths problem, do the standard lnear regresson and then use the estmated coeffcents to generate the predcted probabltes ( Dˆ ) for each observaton. Excel wll do ths for you f you ask t ncely. 3. Use these predcted probabltes ( Dˆ ) to generate a new value, whch s equal to the square root of Dˆ ). Call ths Z = [ Dˆ )] 1/2 4. Dvde the dependent and explanatory varables by ths new varable (Z ). 5. Now, redo the regresson usng the values of the dependent and explanatory varables, whch have been dvded by Z. The standard errors and the t-statstcs for the estmated coeffcents wll be dfferent and more accurate. Brefly, the dea behnd ths s that observatons whose error terms have greater varance should be have less nfluence than do those whose error terms have smaller varance. The

ECON 497: Lecture Notes 13 Page 3 of 3 closer to 0.5 D ˆ s, the larger the varance of the error term s lkely to be, so the observatons are weghted by 1/[ Dˆ )] 1/2 = [ Dˆ )] -1/2. The Bnomal Logt Model The proper way to estmate these models s by usng the bnomal logt model. To do ths, the dependent varable needs to be transformed. The equaton to be estmated s: D ln 1 D = β0 + β1x1 + β2x2 + ε The dependent varable s the log of the odds rato and s equal to nfnty f D =1 and s equal to negatve nfnty f D =0. The predcted probablty s equal to Dˆ = 1+ e 1 ( βˆ +βˆ X +βˆ ) X 0 1 1 2 2 The nterpretaton of the estmated coeffcents s less straghtforward here. Estmated coeffcents show the effect of a change n an explanatory varable on the predcted log of the odds rato, not on the probablty tself. Bascally, you can only tell whether an explanatory varable has a postve or negatve mpact on the probablty, not how large that mpact s. An addtonal complcaton s that logt models cannot be estmated usng OLS, so they can't really be done n Excel. Ths s somethng you need a real statstcal analyss package to do. Examples solcted from students. Bnomal Probt Model Ths s a model based on some slghtly dfferent assumptons than the bnomal logt model. In most cases the results from the two models are nearly dentcal. If you're estmatng ether a logt or a probt model, t's usually just one addtonal command to also estmate the other. You should do ths, just for completeness and to check that your mportant results are robust to changes n the model used.

ECON 497: Lecture Notes 13 Page 4 of 4 If a presenter s annoyng you n some way as they dscuss ther bnomal logt or bnomal probt results, you can make yourself equally annoyng by askng f they estmated the other model and f ther results were robust to the change. Ths s knd of a cheap queston and t really shouldn't dsturb them too much because, f they've been even slghtly responsble, they wll have done both. The real dfferent between the Logt and Probt models s that they have slghtly dfferent assumptons about the dstrbuton of the underlyng probabltes. The Probt uses the cumulatve dstrbuton functon of the Normal dstrbuton whle the Logt uses a lnear verson of the odds rato. Here's Studenmund's take on all ths: "From a researcher's pont of vew, the bggest dfferences between the two models are that the probt s based on the cumulatve normal dstrbuton and that the probt estmaton procedure uses more computer tme than does the logt. As computer programs are mproved, and as computer tme contnues to fall n prce, ths latter dfference may eventually dsappear. Snce the probt s smlar to the logt and s more expensve to run, why would you ever estmate one? The answer s that snce the probt s based on the normal dstrbuton, t's qute theoretcally appealng (because many economc varables are normally dstrbuted). Wth extremely large samples, ths advantage falls away, snce maxmum lkelhood procedures can be shown to be asymptotcally normal under farly general condton." Multnomal Logt Model If you have a qualtatve dependent varable that can take multple values, you may wsh to estmate a multnomal logt model. Ths can be a bt trcky and uncooperatve, and t can potentally requre a lot of computng tme, a complant data set wth lots of observatons of each qualtatve outcome and, most mportantly, a bg chunk of your lfe and your santy, not necessarly n that order. Bascally, the results from a multnomal logt model tell you about the effect that a change n the value of a varable has on the relatve probabltes of two of the possble outcomes. Dong ths wth some degree of relablty apparently requres a data set n whch you have a couple hundred observatons of each of the qualtatve outcomes. Example: Votng Choce Imagne that you have votng records showng demographc nformaton for a lot of people and who they voted for (Democrat, Republcan, Lbertaran) n the last electon. You mght use a multnomal logt model to dentfy the factors that have a sgnfcant mpact on makng someone vote Lbertaran rather than Republcan or Democrat.

ECON 497: Lecture Notes 13 Page 5 of 5 Example: Transportaton Choce Imagne that you get a hold of a transportaton survey from the Puget Sound Regonal Councl and you want to model transportaton choce of adults based on such thngs as ncome, number of chldren, commute dstance, etc. You mght use a multnomal logt model wth each possble choce as one possble value of the dependent varable.