Models with Discrete Dependent Variables

Similar documents
Basically, if you have a dummy dependent variable you will be estimating a probability.

Interpreting Slope Coefficients in Multiple Linear Regression Models: An Example

Module Contact: Dr Susan Long, ECO Copyright of the University of East Anglia Version 1

Lecture 6: Introduction to Linear Regression

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Chapter 14: Logit and Probit Models for Categorical Response Variables

Limited Dependent Variables

3/3/2014. CDS M Phil Econometrics. Vijayamohanan Pillai N. CDS Mphil Econometrics Vijayamohan. 3-Mar-14. CDS M Phil Econometrics.

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

The Geometry of Logit and Probit

Econometrics: What's It All About, Alfie?

Introduction to Dummy Variable Regressors. 1. An Example of Dummy Variable Regressors

β0 + β1xi and want to estimate the unknown

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

Negative Binomial Regression

y

STATISTICS QUESTIONS. Step by Step Solutions.

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

Chapter 13: Multiple Regression

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Diagnostics in Poisson Regression. Models - Residual Analysis

β0 + β1xi. You are interested in estimating the unknown parameters β

Linear Regression Analysis: Terminology and Notation

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

University of California at Berkeley Fall Introductory Applied Econometrics Final examination

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Reminder: Nested models. Lecture 9: Interactions, Quadratic terms and Splines. Effect Modification. Model 1

β0 + β1xi. You are interested in estimating the unknown parameters β

Chapter 15 - Multiple Regression

Introduction to Regression

Statistics for Economics & Business

Lecture 3 Stat102, Spring 2007

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Soc 3811 Basic Social Statistics Third Midterm Exam Spring 2010

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

Chapter 5: Hypothesis Tests, Confidence Intervals & Gauss-Markov Result

Addressing Alternative. Multiple Regression Spring 2012

ECON 351* -- Note 23: Tests for Coefficient Differences: Examples Introduction. Sample data: A random sample of 534 paid employees.

Lab 4: Two-level Random Intercept Model

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Lecture 3: Probability Distributions

PhD/MA Econometrics Examination. January PART A (Answer any TWO from Part A)

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Multinomial logit regression

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Chapter 9: Statistical Inference and the Relationship between Two Variables

Question 1 carries a weight of 25%; question 2 carries 20%; question 3 carries 25%; and question 4 carries 30%.

Regression with limited dependent variables. Professor Bernard Fingleton

SDMML HT MSc Problem Sheet 4

January Examinations 2015

x i1 =1 for all i (the constant ).

4.3 Poisson Regression

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Scientific Question Determine whether the breastfeeding of Nepalese children varies with child age and/or sex of child.

First Year Examination Department of Statistics, University of Florida

Systems of Equations (SUR, GMM, and 3SLS)

Comparison of Regression Lines

Chapter 5 Multilevel Models

Addressing Alternative Explanations: Multiple Regression

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

Andreas C. Drichoutis Agriculural University of Athens. Abstract

LOGIT ANALYSIS. A.K. VASISHT Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi

Statistics MINITAB - Lab 2

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Regression Analysis. Regression Analysis

Political Science 552

Integrated Algebra. Simplified Chinese. Problem Solving

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Ordinary Least Squares (OLS) Estimator

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Advanced Statistical Methods: Beyond Linear Regression

Chapter 14 Simple Linear Regression

Linear Correlation. Many research issues are pursued with nonexperimental studies that seek to establish relationships among 2 or more variables

If we apply least squares to the transformed data we obtain. which yields the generalized least squares estimator of β, i.e.,

Practice 2SLS with Artificial Data Part 1

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Heterogeneous Treatment Effect Analysis

Basic Business Statistics, 10/e

Dummy variables in multiple variable regression model

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Chapter 15 Student Lecture Notes 15-1

Statistics and Quantitative Analysis U4320. Segment 3: Probability Prof. Sharyn O Halloran

1 Binary Response Models

Introduction to the R Statistical Computing Environment R Programming

Statistics II Final Exam 26/6/18

Bayesian predictive Configural Frequency Analysis

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

Chapter 11: Simple Linear Regression and Correlation

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

Some basic statistics and curve fitting techniques

Transcription:

Models wth Dscrete Dependent Varables Based on J. Scott Long and Jeremy Freese (2006) Regresson Models for Categorcal Dependent Varables Usng Stata 2nd Edton (College Staton, TX: Stata Corporaton)

Introducton 1.1 Course Objectve: Ths class deals wth regresson models for categorcal dependent varables (CDVs),.e., varables that are bnary, ordnal, nomnal or count. Bnary varables ( 二分类变量 ) wth two categores ndcatng an event has occurred or that some characterstc s present, e.g., Dd a ctzen vote n the last electon? Do you consder yourself a Chnese? Do you support ndependence? Ordnal varables ( 有序多分类变量 ) wth multple categores that can be ranked, e.g., answers to survey questons that use strongly agree, dsagree, and strongly dsagree or often, occasonally, seldom, and never. Nomnal varables ( 无序多分类变量 ) wth multple categores that cannot be ranked, e.g., polcy preferences on prefer unfcaton, undetermned, prefer ndependence or ethnc denttes that nclude Tawanese, dual Chnese. Count varables ( 计次变量 ) ndcatng number of tmes that some event has occurred, e.g., how many poltcal demonstratons occurred last year? How many jobs dd someone have? 1.2 Readngs J. Scott Long and Jeremy Freese. 2006. Regresson Models for Categorcal Dependent Varables Usng Stata. Second Edton. (College Staton, TX: Stata Corporaton) J. Scott Long. 1997. Regresson Models for Categorcal and Lmted Dependent Varables (Thousand Oaks, CA: Sage Publcatons) J. Scott Long 着, 郑旭智等译,2002. 类别与受限依变项的回归统计模式, 台北 : 弘智文化事业有限公司 1.3 Before You Start: Installng SPost In order to try the examples below, you must nstall Stata Post-estmaton commands (or SPost). Whle you are n Stata, type search spost, net. One of the lnks shown on the screen s After you clck on the lnk, another lnk sayng clck here to nstall wll appear. Clck on the lnk to nstall SPost commands. 1.4 When dependent varables are categorcal, the lnear regresson model ( 线性回归 ) s napproprate because t would lead to based, neffcent and nonsenscal answers. 1

Examples: Bnary Dependent Varables Dd a ctzen vote n the last electon? Do you consder yourself a Chnese? Consder a varable y where: y value probablty 0 1/4 1 3/4 Then: E( y) 0 Pr( y 0) + 1 Pr( y 1) [ ] [ ] [ 0 1/ 4] + [ 1 3/ 4] 3/ 4 More generally, let y be a bnary random varable wth outcomes 0 and 1. Then: E( y) Pr( y 1) [ 0 Pr( y 0) ] + [ 1 Pr( y 1) ].e., the uncondtonal expected value of a bnary varable s ts probablty n the populaton. Thus, the condtonal expectaton s: E( y X Pr( y For example y ) 1 X [ 0 Pr( y 0 X )] + [ 1 Pr( y 1 X )] For a sngle ndependent varable, ncome Y 1 0 f f a α + β + ε ) X The expected outcome s: person voted she dd not vote E(Y ) Pr( Y 1 X ) α + βx 2

The model s lnear n the probablty: the lnear probablty model(lpm 线性概率模式 ) Estmaton wth STATA (a do-fle) *ncludng verson number verson 11 * f a log fle s open, close t capture log close *don t pause when output scrolls off the page set more off *log results to fle brm_bnlfp2 log usng brm_bnlfp2, replace *open data fle bnlfp2 use bnlfp2, clear *descrbe provdes nformaton of the dataset Descrbe (or data/descrbe data/ descrbe data n memory) *summarze provdes summary statstcs Summarze (or statstcs/summares, tables, and tests/summary and descrptve statstcs/summary statstcs) *regresson wth OLS regress lfp k5 k618 age wc hc lwg nc (or statstcs/lnear models and related/lnear regresson) 3

Output page -----------------------------------------------------------------------------. *open data fle bnlfp2. use bnlfp2, clear (Data from 1976 PSID-T Mroz).. *descrbe provdes nformaton of the dataset. descrbe Contans data from bnlfp2.dta obs: 753 Data from 1976 PSID-T Mroz vars: 8 30 Apr 2001 16:17 sze: 13,554 (99.9% of memory free) (_dta has notes) ----------------------------------------------------------------------------- storage dsplay value varable name type format label varable label ----------------------------------------------------------------------------- lfp byte %9.0g lfplbl Pad Labor Force: 1yes 0no k5 byte %9.0g # kds < 6 k618 byte %9.0g # kds 6-18 age byte %9.0g Wfe's age n years wc byte %9.0g collbl Wfe College: 1yes 0no hc byte %9.0g collbl Husband College: 1yes 0no lwg float %9.0g Log of wfe's estmated wages nc float %9.0g Famly ncome excludng wfe's ------------------------------------------------------------------------------- Sorted by: lfp. *summarze provdes summary statstcs. summarze Varable Obs Mean Std. Dev. Mn Max -------------+-------------------------------------------------------- lfp 753.5683931.4956295 0 1 k5 753.2377158.523959 0 3 k618 753 1.353254 1.319874 0 8 age 753 42.53785 8.072574 30 60 wc 753.2815405.4500494 0 1 -------------+-------------------------------------------------------- hc 753.3917663.4884694 0 1 lwg 753 1.097115.5875564-2.054124 3.218876 nc 753 20.12897 11.6348 -.0290001 96. regress lfp k5 k618 age wc hc lwg nc Source SS df MS Number of obs 753 -------------+------------------------------ F( 7, 745) 18.83 Model 27.7657494 7 3.96653564 Prob > F 0.0000 Resdual 156.962006 745.210687257 R-squared 0.1503 -------------+------------------------------ Adj R-squared 0.1423 Total 184.727756 752.245648611 Root MSE.45901 4

------------------------------------------------------------------------------ lfp Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- k5 -.294836.0359027-8.21 0.000 -.3653185 -.2243534 k618 -.011215.0139627-0.80 0.422 -.038626.016196 age -.0127411.0025377-5.02 0.000 -.017723 -.0077591 wc.163679.0458284 3.57 0.000.0737109.2536471 hc.018951.042533 0.45 0.656 -.0645477.1024498 lwg.1227402.0301915 4.07 0.000.0634697.1820107 nc -.0067603.0015708-4.30 0.000 -.009844 -.0036767 _cons 1.143548.1270527 9.00 0.000.894124 1.392972 ------------------------------------------------------------------------------ The nterpretaton s straghtforward : e.g., for every addtonal chld under sx, the predcted probablty of a woman beng employed decreased by.30, holdng other varables constant Structural defects of LPM: Functonal form: The effect of a varable s the same regardless of the values of the other varables. It s often substantvely reasonable that the effects of ndependent varable wll have dmnshng returns as the predcted probablty approaches 0 or 1. Nonsenscal predctons ( 概率上下限的问题 ): At 35, a woman wth four young chldren, who dd not attend college nor dd her husband, and who s average on other varables has a predcted probablty of beng employed of -.498 If a woman has four young chldren compared to no young chldren, her predcted probablty of employment decreases by 1.18 (4*(-.295)). Does ths make substantve sense? Heteroscedastcty: Var( y X ) (1 p) p Xβ (1 Xβ ) E(X)p Var(X)p(1-p) 0.1 0.09 0.2 0.16 0.3 0.21 0.4 0.24 0.5 0.25 0.6 0.24 0.7 0.21 0.8 0.16 0.9 0.09 5

* predcted probablty prvalue, x(age35 k54 wc0 hc0) rest(mean) Output page prvalue, x(age35 k54 wc0 hc0) rest (mean) regress: Predctons for lfp Predcted value of y: -.4983298 95% c: (-.7566589,-.2400007) k5 k618 age wc hc lwg nc x 4 1.3532537 35 0 0 1.0971148 20.128965 Models to be Consdered Types of Varable Models Bnary Logt( 二分胜算对数模型 ) and Probt( 二分概率 模型 ) Ordnal Ordered Logt ( 有序胜算对数模型 )and Probt ( 有序概率单元模型 ) Nomnal Multnomal Logt ( 多项胜算对数模型 ) Count Posson ( 泊松模型 )and Negatve Bnomal( 负二 项模型 ) 6

1.5 Maxmum Lkelhood Estmaton( 最大或然估计 ) Assume that s be the number of men n the sample, N be the sample sze π be the populaton probablty of beng male the probablty of havng s men n a sample sze N wth π beng the populaton probablty of beng male Pr( s π, N) N! s π (1 π ) s!( N s)! N s E.g., the probablty of havng 3 men n a sample of 10 gven π 0. 5 Pr( s 3 π.5, N 10) 10! 3.5 (1.5) 3!(10 3)! 7 0.117 Ths s a typcal problem n probablty gven the values of the parameters π and N, what s probablty of a partcular outcome s In statstcs gven the sample nformaton s and N, what s π for the populaton? The ML estmate s that value of the parameter that makes the observed data most lkely The Lkelhood functon( 或然函数 ) when N and π are held constant and s vares, we have a probablty functon when π vares and N and s are constant, we refer to t as a lkelhood functon: how lkely s π gven s and N? L( π s 3, N 10) 10! 3 π (1 π ) 3!(10 3)! 7 For ths example, the lkelhood equaton follows smply the bnomal formula 7

The ML estmate s that value that maxmzes the lkelhood of observng the sample data 8