covariance between any two observations

Similar documents
ECO Econometrics III. Daniel L. Millimet. Fall Southern Methodist University. DL Millimet (SMU) ECO 6375 Fall / 156

Modeling Binary Outcomes: Logit and Probit Models

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Non-linear panel data modeling

i (x i x) 2 1 N i x i(y i y) Var(x) = P (x 1 x) Var(x)

Truncation and Censoring

Limited Dependent Variable Models II

Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 2. Recap: MNL. Recap: MNL

Binary Choice Models Probit & Logit. = 0 with Pr = 0 = 1. decision-making purchase of durable consumer products unemployment

Lecture 12: Application of Maximum Likelihood Estimation:Truncation, Censoring, and Corner Solutions

Week 7: Binary Outcomes (Scott Long Chapter 3 Part 2)

ECON 594: Lecture #6

Econometrics Summary Algebraic and Statistical Preliminaries

Data-analysis and Retrieval Ordinal Classification

Introduction to GSEM in Stata

Comparing IRT with Other Models

Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1. Multinomial Dependent Variable. Random Utility Model

Lecture 11/12. Roy Model, MTE, Structural Estimation

Maximum Likelihood and. Limited Dependent Variable Models

Discrete Choice Modeling

Review of Econometrics

Econometric Analysis of Cross Section and Panel Data

Binary choice. Michel Bierlaire

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Comprehensive Examination Quantitative Methods Spring, 2018

Models of Qualitative Binary Response

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes 1

Economics 671: Applied Econometrics Department of Economics, Finance and Legal Studies University of Alabama

Introduction to Estimation Methods for Time Series models Lecture 2

A Guide to Modern Econometric:

I. Multinomial Logit Suppose we only have individual specific covariates. Then we can model the response probability as

ECONOMETRICS HONOR S EXAM REVIEW SESSION

Environmental Econometrics

Parameter Estimation

2. We care about proportion for categorical variable, but average for numerical one.

Applied Health Economics (for B.Sc.)

Statistical Estimation

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

h=1 exp (X : J h=1 Even the direction of the e ect is not determined by jk. A simpler interpretation of j is given by the odds-ratio

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Lecture 14 More on structural estimation

Single-level Models for Binary Responses

Maximum Likelihood Methods

Generalized Linear Models Introduction

ECON Introductory Econometrics. Lecture 11: Binary dependent variables

Intermediate Econometrics

Advanced Econometrics I

Econometrics Master in Business and Quantitative Methods

Poisson Regression. Ryan Godwin. ECON University of Manitoba

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

8. Hypothesis Testing

The Logit Model: Estimation, Testing and Interpretation

Linear Regression Models P8111

Binary Dependent Variables

Testing and Model Selection

Chapter 11. Regression with a Binary Dependent Variable

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

POLI 7050 Spring 2008 February 27, 2008 Unordered Response Models I

Economics 536 Lecture 21 Counts, Tobit, Sample Selection, and Truncation

Gibbs Sampling in Latent Variable Models #1

Statistics, inference and ordinary least squares. Frank Venmans

LOGISTIC REGRESSION Joseph M. Hilbe

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

disc choice5.tex; April 11, ffl See: King - Unifying Political Methodology ffl See: King/Tomz/Wittenberg (1998, APSA Meeting). ffl See: Alvarez

Generalized Linear Models for Non-Normal Data

POLI 8501 Introduction to Maximum Likelihood Estimation

Ch. 5 Hypothesis Testing

Lecture-19: Modeling Count Data II

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Ordered Response and Multinomial Logit Estimation

Models for Heterogeneous Choices

Econometrics Problem Set 10

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Lecture 3: Multiple Regression

Linear Regression. Junhui Qian. October 27, 2014

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

WISE International Masters

ECNS 561 Multiple Regression Analysis

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Econometrics II. Seppo Pynnönen. Spring Department of Mathematics and Statistics, University of Vaasa, Finland

Introduction to Econometrics Final Examination Fall 2006 Answer Sheet

Introduction to Machine Learning. Lecture 2

Duration Analysis. Joan Llull

Discrete Choice Modeling

High-Throughput Sequencing Course

Nonlinear Models for Health and Medical Expenditure Data

Lecture notes to Chapter 11, Regression with binary dependent variables - probit and logit regression

Ma 3/103: Lecture 24 Linear Regression I: Estimation

,..., θ(2),..., θ(n)

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

Transcription:

1 Ordinary Least Squares (OLS) 1.1 Single Linear Regression Model assumptions of Classical Linear Regression Model (CLRM) (i) true relationship y i = α + βx i + ε i, i = 1,..., N where α, β = population parameters α, β = parameter estimates ε i = idiosyncratic error term (reflects randomness, unobserved factors) ε i = estimated residual (ii) E[ε i ] = 0 error is mean zero (iii) E[ε 2 i ] = σ2 equal to variance of ε given (ii) variance identical i (iv) E[ε i ε j ] = 0 i j covariance between any two observations 1

(v) E[x i ε i ] = 0 x is independent of the error (vi) ε i N(0, σ 2 ) estimation errors distributed normally not needed for unbiasedness, consistency, only std errors given a random sample {y i, x i } N i=1, OLS minimizes the sum of the squared residuals α, β = arg min α,β N i=1 (y i α βx i ) 2 solution implies β OLS = Cov(y i, x i ) Var(x i ) N i=1 = (y i y)(x i x) N i=1 (x i x) 2 = N i=1 y i(x i x) N i=1 x i(x i x) α OLS = y β OLS x 2

properties α, β are unbiased (finite sample property), consistent (asymptotic property) α, β are efficient Var( β) = σ 2 / N i=1 (x i x) 2 = σ 2 / [N Var(x)] smallest variance of any linear, unbiased estimate (Gauss-Markov Theorem), where linear estimators are those in the class β = i ω iy i where ω i is some weighting function 3

1.2 Multiple Regression Model assumptions (i) true relationship y i = α + K = α + x i }{{} k=1 β kx ki + ε i, i = 1,..., N 1xK β }{{} Kx1 + ε i vector vector where K = # of independent variables (also known as regressors or covariates) (ii) E[x ki ε i ] = 0 k (iii) x s are linearly independent (no perfect multicolinearity) other assumptions follow from CLRM 4

estimation given a random sample {y i, x i } N i=1, OLS minimizes the sum of the squared residuals α, β = arg min α,β N i=1 (y i α x i β) 2 solution implies β OLS = (x x) 1 (x y) α OLS = y β OLS x where x is an NxK matrix and y is a Nx1 vector estimators retain same properties as in CLRM 5

2 Maximum Likelihood Estimation (MLE) alternative estimation technique to OLS useful in nonlinear models equivalent to OLS in classical linear regression model intuition: y depends on x, θ (e.g., θ = {β, σ})... choose θ ML to maximize probability of the realized data {y i, x i } N i=1 the likelihood function gives the total probability of observing the realized data as a function of θ 6

general model y i = f(x i, β) + ε i, ε i iid N(0, σ 2 ) θ = {β, σ} data = {y i, x i } N i=1 implies L(θ) = Pr(y 1,..., y N x 1,..., x N, θ) = Pr(y 1 x 1, θ) Pr(y N x N, θ) = i Pr(y i x i, θ) taking logs implies ln[l(θ)] = i ln[pr(y i x i, θ)] and θml = arg max θ ln[l(θ)] 7

example: Classical Linear Regression Model y i = x i β + ε i ε i iid N(0, σ 2 ) what is Pr(y i x i, θ)? Pr(y i x i, θ) = Pr(ε i x i, θ) = Pr(y i x i β x i, θ) { 1 = exp 1 ( ) } 2 yi x i β 2πσ 2 2 σ }{{} NORMAL PDF this implies θml = arg max θ = arg max θ [ { N ln 1 exp 1 ( ) }] 2 yi x i β i=1 2πσ 2 2 σ N 2 ln(2π) N ln σ 1 ( N yi x i β 2 i=1 σ ) 2 which is maximized by minimizing the sum of squared residuals (thus, identical to OLS) 8

alternative notation Pr(y i x i, θ) = 1 ( σ φ εi ) σ where φ( ) is the std normal pdf, 1/σ is the Jacobian this implies θml = arg max θ = arg max θ N [ 1 ( ln i=1 σ φ εi σ N ln σ N ) ] ( φ εi ) i=1 σ which is algebraically equivalent to the above representation 9

properties consistent plim θ ML = θ asymptotically normal θml N(θ, Σ θ ) asymptotically efficient 10

hypothesis testing Wald test equivalent to F-test in OLS requires estimation of only the unrestricted model Likelihood ratio test intuition: imposing restrictions can never improve the value of the likelihood fn; the amount by which imposing the restriction(s) lowers the likelihood fn provides some indication of the validity of the restriction (the bigger the decrease, the more likely the restriction(s) do not hold) requires estimation of the unrestricted and restricted models test statistic LR = 2 { } ln[l( θ UR )] ln[l( θ R )] 0 where LR χ 2 q and q = # of restrictions being tested maximization typically by numerical methods as analytical derivatives are messy 11

3 Limited Dependent Variable (LDV) Models class of models where y depends on x, but y is not continuous common cases y {0, 1} binary model estimation: probit, logit, LPM y [a, b] censored model a, b are known a, b may vary by i if a =, b =, then continuous estimation: censored regression, tobit y {0, 1, 2,...} count model estimation: poisson, negative binomial 12

y {0, 1, 2,...} qualitative response (QR) model values correspond to choices with no natural ordering examples brand choice mode of transportation estimation: multinomial logit, multinomial probit, conditional logit, nested logit y {0, 1, 2,...} ordered QR model values correspond to choices with natural ordering distance between values may vary between choices examples OLF, PT, FT <HS, HS, College, >College estimation: ordered logit, ordered probit 13

3.1 Binary Models applicable to problems where the dependent variable is binary examples: LFP, default on a loan, belong to a FTA, etc. Linear Probability Model (LPM) y i = x i β + ε i ε i N(0, σ 2 i ) estimated by OLS problems heteroskedasticity since implies ε i = x i β if y i = 0 1 x i β if y i = 1 Var(ε i x i ) = x i β(1 x i β) predictions not bounded by 0,1 x i β / [0, 1] and therefore do not correspond to probabilities 14

solution: model Pr(y i = 1 x i ) using proper functional form Pr(y i = 1 x i ) = F (x i β) where F ( ) satisfies lim F (x iβ) 1 x i β lim F (x iβ) 0 x i β obvious candidates are CDFs, since these map numbers from the entire real number line to the unit interval two parametric solutions F ( ) = Φ( ) }{{} = xi β φ(u)du (probit) std normal CDF F ( ) = Λ( ) }{{} logistic CDF = exp(x iβ) 1 + exp(x i β) (logit) 15

interpretation of β Pr(y i = 1) x j = F (x iβ) β x j }{{ j } Marginal Effect where F (x i β) x j = φ(x i β) (probit) Λ(x i β)[1 Λ(x i β)] (logit) notes since β j is more difficult to interpret (it s not the change in the probability given a unit change in x), typically report marginal effects marginal effects are observation-specific 16

common reporting options (i) marginal effects evaluated at the sample mean φ(x β) β j OR Λ(x β)[1 Λ(x β)] β j (ii) the sample mean of the marginal effects 1 N 1 N N N i=1 φ(x i β) β j, OR i=1 Λ(x i β)[1 Λ(x i β)] βj (iii) marginal effects evaluated at some combination of values of interest: φ(x o β) βj OR Λ(x o β)[1 Λ(xo β)] βj where x o is a vector of values of interest (e.g., white, male, 40 years old, with a college diploma) caution needed for marginal effects of dummy vars, interaction effects 17

estimation by MLE ln[l(β)] = i ln[pr(y i x i, β)] where Pr(y i x i, β) = F (x i β) if y i = 1 1 F (x i β) if y i = 0 which implies ln[l(β)] = i:y i =1 ln[f (x iβ)] + i:y i =0 ln[1 F (x iβ)] = i ln[f (x iβ)] y i ln[1 F (x i β)] 1 y i 18

latent variable framework probit/logit model can be recast in a latent (unobserved) variable framework model yi iid = x i β + ε i, ε i N/L 1 if yi y i = > 0 0 if yi 0 y i unobserved y i observed given data {y i, x i } N i=1, estimate β via MLE again, what is Pr(y i x i, θ)? Pr(yi > 0 x i, θ) = Pr(ε i > x i β) ( ) = Pr(ε i < x i β) = F xi β σ if y i = 1 Pr(y i x i, θ) = Pr(yi 0 x i, θ) = Pr(ε i x i β) ( ) [ )] = F = 1 F if y i = 0 x iβ σ which implies identical form of L(θ) ( xi β σ note, σ is not identified since it would appear everywhere as β/σ, so it is normalized to one STATA: -probit-, -logit-, -dprobit-, -margin-, -mfx- 19

3.2 Censored Regression Models applicable to problems where the dependent variable is censored (potentially from above and below) at certain threshholds note, while the dependent variable is censored, the x s are always observed examples: income/wealth may be top-coded, age at first birth for nonmothers, duration of unemployment spells for currently unemployed latent variable setup yi iid = x i β + ε i, ε i N(0, σ 2 ) y i = b i if yi b i yi if yi [a i, b i ] if yi a i a i y i unobserved y i observed terminology right-censoring: if b i i left-censoring: if a i i given data {y i, x i } N i=1, estimate β via MLE 20

again,what is Pr(y i x i, θ)? Pr(y i b i x i, θ) = Pr(ε i b i x i β) Pr(y i x i, θ) = = 1 Pr(ε i < b i x i β) [ ( )] = 1 Φ bi x i β σ if y i = b i ( ) Pr(yi x i, θ) = 1 σ φ y i x i β σ if y i [a i, b i ] Pr(yi a i x i, θ) = Pr(ε i a i x i β) ) = Φ if y i = a i ( ai x i β σ log likelihood fn given by ln[l(θ)] = [ ( )] ai x i β ln Φ i:y i =a i σ + [ ( )] bi x i β ln 1 Φ i:y i =b i σ + [ ( )] 1 y ln i:y i [a i,b i ] σ φ i x i β σ notes interpretation of β is the impact of a x on y (the latent variable) σ is identified as long as y i [a i, b i ] for some i 21

special case: tobit model a i = 0, b i = i examples: labor supply, R&D expenditures by a firm implies the following setup yi iid = x i β + ε i, ε i N(0, σ 2 ) yi if yi y i = > 0 0 if yi 0 y i unobserved y i observed estimate via MLE, where log likelihood fn given by ln[l(θ)] = ( [Φ ln x )] iβ i:y i =0 σ }{{} as in probit + [ ( )] 1 y ln i:y i >0 σ φ i x i β σ }{{} as in CLRM 22

interpretation of β is the impact of a x on y (the latent variable) not directly comparable to OLS, which estimates the change in E[y i x i ]/ x j marginal effects (for comparison to OLS) E[y i x i ] x j ( ) xi β = β j Φ σ these are observation-specific example: Tobit Illustration with One Regressor 5 0 5 10 15 0 2 4 6 8 10 x True values: alpha= 2, beta=2. OLS Data Points Tobit STATA: -tobit-, -cnreg- 23

3.3 Count Models applicable to situations where the dependent variable is a non-negative integer count of events dependent value typically takes on only a few values examples: # of children, # of patents held by a firm, # of doctor visits per year, # of cigarettes smoked per day 3.3.1 Poisson Model setup model expected number of events conditional on x E[y i x i ] = F (x i β) where F ( ) 0 functional form of F ( ) = exp{x i β} estimation via MLE need a distribution for y i, which cannot be normal assume Poisson distribution, which depends only on the mean given by exp{x i β} 24

again, what is Pr(y i x i, θ)? Pr(y i x i, θ) = exp{ λ i}λ y i i y i! (Poisson pdf) where λ i =exp{x i β} can show that E[y i x i ] = Var(y i x i ) = λ i log likelihood fn given by ln[l(θ)] = i [ λ i + y i ln λ i ln(y i!)] interpretation of β as if ln(y) is dependent variable; implies %, or elasticity if ln(x) marginal effects E[y i x i ] x i = λ i β 25

shortcoming Poisson assumes the mean and variance are identical regression-based test (Cameron & Trivedi 1990) H o : Var(y i ) = E[y i ] H 1 : Var(y i ) = E[y i ] + αg(e[y i ]) regress z i = ( y i λ i ) 2 yi λ i 2 on either (i) a constant or (ii) λ i and no constant; t-statistic is a test of the null hypothesis 26

Wooldridge proposes a correction to the standard errors when there is over- or under-dispersion assumes proportionality Var(y x) = σ 2 E[y x] where σ 2 is unknown σ 2 > 1 = overdispersion; σ 2 < 1 = under-dispersion define û i = y i λ i obtain σ 2 = 1 n k 1 i û2 i / λ i multiply usual standard errors by σ = σ 2 > 0 alternative estimation: negative binomial model STATA: -poisson-, -nbreg- 27

3.3.2 Zero-Inflated Poisson Model (skip) applicable to count models with a mass at zero, or where the decision between zero and some positive count is different than the decision among positive counts notation observations assumed to be drawn from either of two regimes observations in regime 1, given by z i = 1, always have a count of zero outcomes for observations in regime 2, given by z i = 2, follow a Poisson process, with possible outcomes given by y i 0 28

MLE set-up entails Pr(y = 0 x) = Pr(z = 1 x) + Pr(y = 0 x, z = 2) Pr(z = 2) Pr(y = j x) = Pr(y = j x, z = 2) Pr(z = 2), j = 1, 2,... log likelihood fn given by ln[l(θ)] = i:y i =0 ln{f (x iγ) + [1 F (x i γ)] exp( λ i )} + i:y i >0 {ln[1 F (x iγ)] λ i + y i ln λ i ln(y i!)} where F ( ) is either the logistic of normal CDF notes as in poisson model, λ i = x i β β is not constrained to be equal to γ some elements of β or γ may be constrained to be zero (i.e., the regressors in the two parts of the model need not be the same) STATA: -zip-, -zinb- 29

3.3.3 Application: Talley et al. (2005) question: determinants of crew injuries in ship accidents data structure sample includes accidents investigated by US Coast Guard includes US ships anywhere, and foreign ships in US waters outcomes # deaths, # injuries, # missing small, non-negative integer count data lots of covariates; mix of discrete and continuous data over 11 years, 1991-2001 model a bit structural, then arrives at reduced form eqn (5) count data = Poisson, negative binomial seperate model by dependent variable and type of ship (freight, tanker, tugboat) results: interpreted as %, also report marginal effects 30

shortcoming non-random sample selection; only accidents included not account for impact of x s on probability of an accident, only severity conditional on accident occuring 31

3.4 Qualitative Response Models applicable to analyses of choice by agents among many (unordered) alternatives e.g., brand choice, mode of transportation, type of mortgage (fixed-30 yr, fixed-15 yr, adjustable rate,...), type of school (public, charter, privatenonreligious, private-religious,...), occupation dependent variable typically coded as (positive) integers corresponding to specific choice; value/order of actual numbers is irrelevant 32

3.4.1 Multinomial Logit setup choose among J + 1 alternatives y i {0, 1,..., J} x s are observation-specific attributes, not choice-specific attributes again, what is Pr(y i = j x i, θ), j = 0, 1,..., J? Pr(y i = j x i, θ) = F (x i, θ) F ( ) should satisfy two criteria F ( ) (0, 1) j Pr(y i = j x i, θ) = 1 i functional form of F ( ) Pr(y i = j x i, θ) = known as multinomial logit exp(x i β j ) J k=0 exp(x iβ k ) 33

notes β s are choice-specific (subscripted by j) MLE based on above model is not identified, since β j = β j + c leaves L(θ) unchanged; implies an infinite # of solutions to identify the model, normalize β 0 = 0 identification also requires that the x s be observation-specific, not choice-specific; cannot identify choice-specific effects of choice-specific variables likelihood function ln[l(θ)] = i:y i =0 ln ( + i:y i 0 1 ) 1 + J k=1 exp(x iβ k ) ( ) d exp(x i β j ) ij ln j=1 1 + J k=1 exp(x iβ k ) J where d ij = 1 if y i = j, 0 otherwise 34

interpretation log odds ratio (relative to base choice) ln where P ij = Pr(y i = j) ( Pij P i0 ) = x i β j implies β j is the % change in odds ratio from unit change in x log odds ratio (relative to any other choice) ln ( Pij P ij ) = x i (β j β j ) 35

shortcoming: Independence of Irrelevant Alternatives (IIA) due to assumption of independent, homoskedastic errors expressed as ln ( Pij P ij ) f(p ik ), k j, j Hausman test estimate full model = β j, j = 1,..., J estimate partial model on only a subset of choices = βj, j = 1,..., J P, where J P < J if IIA holds: βj β j, j = 1,..., J P thus, if partial model and full model yield estimates that are too different then we reject IIA multinomial probit does not impose IIA, but estimation is more complex since it involves multiple integration nested logit is also a possible alternative 36

random utility model (RUM) model can be derived from a RUM utility of observation i from choice j given by U ij = V ij + ε ij = x i β j + ε ij where ε is an iid from a Type I extreme value dbn, where G(ε) = exp( e ε ) observations choose to maximize utility implies Pr(y i = j) = Pr(U ij U ij ) j j =. exp(v ij ) J k=0 exp(v ij) which is identical to above STATA: -mlogit-, -mprobit- 37

3.4.2 Conditional Logit multinomial logit does not identify effects of choice-specific x s usual model with such data is a conditional logit results analagous to many pairwise binary logit models MLE again, what is Pr(y i = j x j, θ), j = 0, 1,..., J? Pr(y i = j x j, θ) = F (x j, θ) F ( ) should satisfy two criteria F ( ) (0, 1) j Pr(y i = j x j, θ) = 1 i functional form of F ( ) Pr(y i = j x j, θ) = known as conditional logit exp(x j β) J k=0 exp(x kβ) likelihood function ln[l(θ)] = ( ) d exp(x j β) ij ln i j J k=0 exp(x kβ) 38

log odds ratio ln ( Pij P ij ) = (x j x j )β which still implies IIA interpretation marginal effects: own continuous attributes Pr(y i = j) x j = {Pr(y i = j) [1 Pr(y i = j)]} β marginal effects: other continuous attributes Pr(y i = j) x j = {Pr(y i = j) Pr(y i = j )} β, j j notes model allows attributes of all other choices to affect probability of each option being chosen however, two derivatives must necessarily be of the opposite sign may not be a good property (e.g., FDI is increasing in own market size, and market size of neighbors) can similarly be derived from a random utility model STATA: -clogit- 39

3.4.3 Nested Logit (skip) reframes the problem as a sequential, multi-stage decision e.g., a business may first decide on a state in which to build a new plant, and then choose a county within the state notation J = total number of potential choices (e.g., the number of counties in the US) L = number of subgroups (e.g., the number of states in the US) J l = number of choices within subgroup l, l = 1,..., L (e.g., the number of counties in state l) implies the following identity J = L l=1 J l x j l = attributes of choice j in subgroup l z l = attributes of subgroup l 40

MLE probability of a given choice given by Pr(y i = j, l) = L h=1 which is the unconditional probability exp(x j l β + z l γ) Jl k=1 exp(x k hβ + z h γ) the unconditional probability is equal to the conditional times the marginal probability Pr(y i = j, l) = Pr(y i = j l) Pr(y i = l) = exp(x j l β) Jl k=1 exp(x L k lβ) h=1 exp(z l γ) J l k=1 exp(x j lβ) Jl k=1 exp(x k hβ + z h γ) define the inclusive value as I l = ln [ Jl k=1 exp(x j lβ) ] re-arranging, we get Pr(y i = j, l) = Pr(y i = j l) Pr(y i = l) exp(x j l β) exp(z l γ + τ l I l ) = Jl L h=1 exp(z hγ + τ h I h ) k=1 exp(x k lβ) restricting τ l = 1, l = 1,..., L, yields the conditional logit model 41

error term is homoskedastic within each subgroup, heteroskedastic across subgroups log odds ratio any two choices ln ( ) Pr(yi = j, l) Pr(y i = j, l ) = exp(x j lβ) exp(z l γ + τ l I l ) J l k=1 exp(x k l β) exp(x j l β) exp(z l γ + τ l I l ) J l = (x j l x j l )β + (z l z l )γ k=1 exp(x k lβ) + (τ l 1)I l (τ l 1)I l which incorporates all attributes of each subgroup l, l any two choices within a subgroup ln which implies IIA any two subgroups ln ( ) Pr(yi = j, l) Pr(y i = j, l) ( ) Pr(yi = l) Pr(y i = l ) = (x j l x j l)β = (z l z l )γ + (τ l I l τ l I l ) which incorporates all attributes of each subgroup l, l 42

interpretation marginal effects: own continuous attributes Pr(y i = j, l) [1 Pr(y i = j l)] = x j l + τ l Pr(y i = j l) [1 Pr(y i = l)] β marginal effects: continuous attributes from j, l Pr(y i = j, l) x j l = { Pr(y i = j l) + τ l Pr(y i = j l) [1 Pr(y i = l)]} β which may or may not be of the same sign as β marginal effects: continuous attributes from j, l Pr(y i = j, l) x j l = { τ l Pr(y i = j l) Pr(y i = l)} β which is of the opposite sign of β STATA: -nlogit- 43

3.4.4 Application: Co & List (2004) question: analyze determinants of location choice of FDI in the US; in particular, are knowledge spillovers important data structure outcome is the state choosen by new foreign-owned plants plants entered the US between 1986 and 1993 x s are state-specific attributes in 1986 of particular interest is the x measuring knowledge spillovers, measured by patent counts model: discrete choice data, with choice-specific x s = conditional logit model results min, median, and max elasticity estimates reported elasticity given by ln(p j ) ln(x jm ) = x jmβ m (1 P j ) knowledge spillovers, unionization rates positively associated with probability of receiving a new plant 44

energy expenditures, pollution costs negatively associated with probability of receiving a new plant 45

3.5 Ordered Response Models applicable to analyses of choice by agents among many ordered alternatives e.g., labor force status (OLF, PT, FT), schooling (<HS, HS, some coll, BA/BS, MA, PhD), bond ratings (AAA,...,D) outcomes coded as discrete integers: 0, 1, 2,... relative to other models MNL, MNP fails to account for ordinal nature of data OLS treats distance between choices as identical (e.g., moving from OLF to PT is equivalent to moving from PT to FT) 46

latent variable framework similar to probit/logit framework model yi iid = x i β + ε i, ε i N/L 0 if yi 0 y i = 1 if y i (0, µ 1] 2 if y i (µ 1, µ 2 ]. J if yi > µ J 1 yi unobserved y i observed where µ s are cutoff parameters that are unknown and to be estimated given data {y i, x i } N i=1, estimate β via MLE 47

again, what is Pr(y i x i, θ)? Pr(yi 0 x i, θ) = Pr(ε i x i β) = F ( x i β) = [1 F (x i β)] if y i = 0 Pr(µ j 1 < yi µ j x i, θ) Pr(y i x i, θ) = = Pr(ε i < µ j x i β) Pr(ε i < µ j 1 x i β) = F ( µ j x i β ) F ( µ j 1 x i β ) if y i = j, j = 1,..., J 1 Pr(y i > µ J 1 x i, θ) = Pr(ε i > µ J 1 x i β) = 1 Pr(ε i < µ J 1 x i β) = 1 F (µ J 1 x i β) if y i = J where µ 0 = 0 and σ is normalized to one, and 0 < µ 1 < < µ J 1 form of F ( ) defines the ordered probit/logit models interpretation marginal effects are outcome-specific in general Pr(y = j x) x = f(xβ)β if j = 0 [ f(µj 1 xβ) f(µ j xβ) ] β if j = 1,..., J 1 f(µ J 1 xβ)β if j = J where f( ) is the corresponding pdf and µ 0 = 0 STATA: -oprobit-, -ologit- 48

3.5.1 Application: Kalb & Williams (2003) question: analyze the differential effects of attributes on juvenile criminal behavior by gender data structure sample of individuals born in 1958 and residing in Philadelphia at some point between ages of 10 and 18 complete criminal history retrospective data on other variables collected in 1988 dependent variable = # of arrests (0, 1, or 2+) other variables represent family background variables issue with respect to sample weighting and missing data can be ignored for our purposes model: ordered, integer data = ordered probit model 49

results coefficient estimates and marginal effects are reported results fairly similar across gender non-whites, those leaving school early are more likely to commit criminal acts no impact of past physical/sexual abuse some differences by gender maternal labor supply has no effect on boys, decreases probability of criminal behavior for girls father with HS diploma and fewer siblings have no effect on girls, decrease probability of criminal behavior for boys 50