Limited Dependent Variables

Similar documents
Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients

Basically, if you have a dummy dependent variable you will be estimating a probability.

1 Binary Response Models

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

LOGIT ANALYSIS. A.K. VASISHT Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

The Geometry of Logit and Probit

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Lecture 3: Probability Distributions

Andreas C. Drichoutis Agriculural University of Athens. Abstract

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

3/3/2014. CDS M Phil Econometrics. Vijayamohanan Pillai N. CDS Mphil Econometrics Vijayamohan. 3-Mar-14. CDS M Phil Econometrics.

Maximum Likelihood Estimation

Homework Assignment 3 Due in class, Thursday October 15

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Linear Regression Analysis: Terminology and Notation

Primer on High-Order Moment Estimators

10-701/ Machine Learning, Fall 2005 Homework 3

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

= z 20 z n. (k 20) + 4 z k = 4

Rockefeller College University at Albany

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Linear Approximation with Regularization and Moving Least Squares

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

First Year Examination Department of Statistics, University of Florida

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lecture Notes on Linear Regression

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

β0 + β1xi and want to estimate the unknown

Properties of Least Squares

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

e i is a random error

Kernel Methods and SVMs Extension

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

Introduction to Dummy Variable Regressors. 1. An Example of Dummy Variable Regressors

Composite Hypotheses testing

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

Lecture 4 Hypothesis Testing

The Expectation-Maximisation Algorithm

Errors for Linear Systems

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

January Examinations 2015

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Assortment Optimization under MNL

Credit Card Pricing and Impact of Adverse Selection

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

Vapnik-Chervonenkis theory

Stat 543 Exam 2 Spring 2016

9. Binary Dependent Variables

Linear Correlation. Many research issues are pursued with nonexperimental studies that seek to establish relationships among 2 or more variables

Multinomial logit regression

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Stat 543 Exam 2 Spring 2016

LECTURE 9 CANONICAL CORRELATION ANALYSIS

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Engineering Risk Benefit Analysis

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Hidden Markov Models

Lecture 2: Prelude to the big shrink

,, MRTS is the marginal rate of technical substitution

Goodness of fit and Wilks theorem

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

An (almost) unbiased estimator for the S-Gini index

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Lecture 12: Discrete Laplacian

Week 5: Neural Networks

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of

Polynomial Regression Models

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

Negative Binomial Regression

An Introduction to Censoring, Truncation and Sample Selection Problems

Linear Feature Engineering 11

Lecture 6: Introduction to Linear Regression

More metrics on cartesian products

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

PhysicsAndMathsTutor.com

GMM Method (Single-equation) Pongsa Pornchaiwiseskul Faculty of Economics Chulalongkorn University

Chapter 11: Simple Linear Regression and Correlation

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

CHAPTER 8. Exercise Solutions

Chapter 20 Duration Analysis

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

Comparison of Regression Lines

Recitation 2. Probits, Logits, and 2SLS. Fall Peter Hull

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

Transcription:

Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages or heght; b. Or, s bounded above at, such as top-coded ncome n BLS data; c. Or, s dscrete, takng on just the values or, such as n yes/no responses to survey questons, or bankruptcy ndcators for frms; d. Or, s dscrete, but ordnal and many-valued, such as data that have been grouped nto ranges (often the case for ncome data, or responses to survey questons about the depth of agreement (strongly agree, somewhat agree, don t agree at all; e. Or, s dscrete and many-valued, but not ordnal, such as transt mode choces (dd you take the bus, drve a car, bcycle or walk?. 2. Let us begn wth the smplest case, s a zero-one bnary varable, reflectng the answer to a yes/no queston, coded for yes, for no. a. Consder an OLS regresson of on. If greater values of are assocated wth hgher probabltes of yes, then the OLS regresson coeffcent on wll reflect ths as a postve coeffcent. Indeed, the coeffcent wll have a mean value equal to the margnal mpact of on the probablty that s a yes. Gven that, we call ths the lnear probablty model. OLS regressons for a lnear probablty model delver consstent estmates of coeffcents for an underlyng model where. P[ = ] = β + ε. To see ths, thnk about a model where P[=]=.5 for = and P[=]=.6 for =2. In ths case, 5% of cases wll have = and 5% wll have = at =, but 6% wll have = and 4% wll have = at =2. The regresson wants to fnd the condtonal expectaton of gven, and gven the / codng of, ths condtonal expectaton s the probablty we are seekng. b. Ths lnear probablty model s unsatsfactory, though, for at least two reasons:. The dsturbances are not dstrbuted ncely. In the example above, the dsturbances are evenly splt between the values [-½, ½] at = and take on the values [-.6.4] 4% and 6% of the tme, respectvely, at =2. Although they are everywhere mean-zero, they are heteroskedastc because ther dstrbuton s dfferent at dfferent values of. (Dfferent dstrbuton wth same mean usually mples dfferent varance.. The model s aesthetcally unappealng because we are estmatng a model whch we know could not generate the data. In the data, are not lnear n, but we are estmatng a model n whch s lnear n. c. The solutons to both these problems come wth a sngle strategy: wrte down a model whch could concevably generate the data, and try to estmate ts parameters. The cost of dong ths s that we typcally have to wrte out an explct dstrbuton for the dsturbances (whch we don t have to do when we use OLS and asymptotc varances and tests.

3. Bnary choce models: Logts and Probts. a. Latent varable approach. Assume that there s some latent (unobserved varable whch determnes the observed varable, and whch behaves ncely (usually, lnearly. b. = β + ε, ε ~ f ( ε = f c. Here, f s some nce dstrbuton for the dsturbance terms, perhaps normal, perhaps not. The lnear probablty model would lkely not satsfy any a pror dstrbuton, and snce ths dstrbuton s the same for each and every, the lnear probablty model would certanly suffer from heteroskedastcty. 4. The method of maxmum lkelhood says: "fnd the parameters whch maxmse the probablty of seeng the sample that you saw". = β + ε, ε ~ f ( ε a. For a general model, the probablty of seeng a whole set of observaton s really just the probablty of seeng that set of dsturbances. Consequently, maxmum lkelhood s a method of moments approach, because maxmsng lkelhood mples solvng a frst-order condton, and we solve the frst order condton by pluggng n resduals for dsturbances.. For any, ( P[ =, = ] = P[ ε = ε ] = P[ ε = β ] = f β and snce we know f, and the dsturbances are ndependent, the probablty of seeng the whole set of observatons, known as the lkelhood s:. L = P[,...,,,..., ; β ] = f ( β =. The method of maxmum lkelhood says, choose by maxmsng L. v. The lkelhood above s expressed over dsturbances, but once we choose a β, we are workng wth resduals. So, mplementaton of ths maxmsaton nvolves substtutng resduals for dsturbances. v. Products are a drag to work wth, so t you take the log, t turns nto a sum: v. L = P β = f ( β ln ln [,...,,,..., ; ] ln = 2 2 v. If f were the normal densty functon, then = ( ln f ( ε = lnσ ln 2 π + ε / 2σ 2 v., β f ( ε exp ε / σ 2 / σ 2π 2 2 ( ( 2 2 ln L = lnσ ln2 π β /2σ 2 x., and the frst-order condton for maxmsaton s =

ln L = ( β = ' x., aka, e = β = ( ths expresson s a moment condton, because the frst order condton requres that a partcular weghted average of dsturbances equals zero. We solve t by substtutng resduals for dsturbances. x. So, f you know f, you can typcally do ML. ML wth normalty on lnear models yelds the OLS estmator. 5. Consder the bnary choce latent varable model as a maxmum lkelhood problem. a. = β + ε, ε ~ f ( ε = f v b. Defne the cumulatve densty functon F dual to f as F( v = f ( u u, so that F(v gves the probablty of the dsturbance beng less than v. c. What s the probablty of seeng any partcular, par? P[, ] = P[ β + ε ] = P[ ε β ] = P[ ε β ] d., and = F ( β P[, ] = P[ β + ε < ] = P[ ε < β ] e., whch sum to zero. = F( β f. ASSUME that f s symmetrc, so that a lot of mnuses dsappear. If f s symmetrc, then ( β = ( β ( β ( β F F, F = F g. The lkelhood s thus gven by ( β ( β ( L = F F, h. And the log-lkelhood s =.. = ( ( β ( ( β ln L = ln F + ln F

j. ( ln F ( ln β F ( β ( ( ( F ( β F ( β ( β ( β ( β ( β ln L = β β β = = F F = = f ( β = = F ( β F ( β. Ths s the moment condton for the bnary dscrete choce maxmum lkelhood problem. It s analogous to e= n OLS. F( β = β, f ( β = k. If the probabltes are lnear n, then. In ths l. case, the frst-order condton on the lkelhood functon becomes ln L = β = β β = ( β = = m. That s why we call the OLS approach to ths model the lnear probablty model.. It results n a lnear optmsaton problem whch has a lnear soluton. 6. PROBIT model. 2 ( u φ( u = exp / 2 / 2π a. Denote the standard normal densty as, and denote Φ ( v = φ( u u cumulatve densty of the standard normal densty as. b. So, the probablty of seeng a partcular par,,, s one or the other of these P[, ] = P[ β + ε ] = P[ ε β ] = P[ ε β ] β = Φ σ β c. P[, ] = P[ β + ε < ] = P[ ε < β ] = Φ σ d. These sum to zero. e. The lkelhood s thus gven by β β L = Φ Φ, = σ σ f. And the log-lkelhood s v

β β ln L = ln Φ + ( ln Φ = σ σ β, σ g.. h. Here, are not dentfed, because scalng these parameters by any scalar does not affect the lkelhood.. So, one must mpose a restrcton. Typcally, we ether mpose that one element of β s, or that σ =. They are equally vald dentfyng restrctons, so t s up to convenence to choose. Let us choose σ =.. Frst-order condtons are j. ( ln Φ ( ln β Φ ( β ( ( ( Φ ( β Φ ( β ( β ( β ( β ( β ln L = β β β = = Φ Φ = = = Φ ( β Φ ( β φ ( β k. These frst-order condtons are nonlnear n the parameters, and so the soluton must be found by teraton. l. The dffcult thng here n fndng a soluton s that the parameters are nsde the normal pdf, whch has an explct analytc form, and nsde the normal cdf, whch does not have a close analytc form. m. The pcture for ths s that whch maps y-hat nto y. y-hat maps onto a number lne. For that reason, sometmes, these are called ndex models. If the ndex y- hat s less than zero, then y s zero. The probt puts a pdf onto the densty of ε ε, β corresponds to y-hat less than zero. 7. LOGIT model a. The logt model s a smple alternatve to probt model whch has a smlar dstrbuton for the dsturbance terms, but s much easer to solve. b. Consder the logstc cumulatve dstrbuton functon, denoted wth a captal lambda, exp F( v = Λ ( v = + exp ( v ( v. The assocated probablty densty F( v Λ( v = = Λ( v Λ( v v ( v functon s gven by. c. If we substtute these nto the frst-order condton for the lkelhood functon,

ln L = Λ Λ β Λ β Λ β = ( ( ( Λ ( β ( Λ ( β Λ ( β Λ ( β (( ( β ( ( β ( β ( ( β ( ( β ( β = Λ Λ = = Λ Λ Λ = = = Λ ( ( β ( β d. ou can see how ths mght be easer to solve wth a computer. o ratos. Everythng n there can be expressed analytcally. 8. Interpretaton of Parameters a. The parameter β tells you the margnal effect of on F. F gves the expectaton of gven, but because t s nonlnear n ts argument, β does not gve the dervatve of on. In partcular, b. E[ ] = F( β = E[ = ] F( β = = β f ( β c. So, the margnal effect depends on. In a probt, the margnal effect s gven by βφ ( β, and n a logt, the margnal effect s gven by ( ( ( P ( P, where P s the probablty for. βλ β Λ β = β 9. Multnomal choce wth ordered choces a. Consder a latent varable model wth ordered choces gven by b. = β + ε, ε ~ f ( ε = f < µ = 2f µ < µ... 2 = f < µ c. Ths s extremely smlar to the models above, except that there s an addtonal unknown set of parameters. d. Assume that f s the normal probablty densty functon. (ou could assume any pdf, as long as ts form s known.

e. Then, the probabltes of the varous outcomes are gven by: P[ = ] = Φ ( β ( µ β ( β ( µ β ( µ β P[ = ] = Φ Φ P[ = 2 ] = Φ Φ... = = = 2 ( µ β P[ = ] = Φ = f. Gven that these are the probabltes, the margnal effects of are gven by dfferentatng these probabltes wth respect to. g. These probabltes can be crammed nto a log-lkelhood functon, whch can be maxmsed wth respect to the parameters. β, µ,..., µ. Multnomal choce wth unordered choces. a. Probabltes of dong varous choces are modelled drectly. Here, we assume that could take on the values j=,...,. Here, there are gong to be - latent varables whch generate - probabltes for beng observed n the possble categores. Logt dstrbutons are easy to work wth here. j ( β j + ( β exp P[ = j = ] = exp b.. If =, then ths reverts to the bnary j= logt. Gven ths structure, the probablty that and are observed together s P[ = ] = d P[ = j ] c. Wth d a dummy equal to f s j. = j = j= d. The log lkelhood s gven by ln L = d ln P = j = e. And the frst-order condton s ln L j β = j= j f. ( j = ( j j = = = d P = j = d P, j =,...,. Ths smple dervatve s a consequence of the use of the logt functon. g. Chapters 2 and 22 n Greene are very good on lmted dependent varables.