The Geometry of Logit and Probit

Similar documents
Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Lecture Notes on Linear Regression

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Limited Dependent Variables

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

APPENDIX A Some Linear Algebra

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Linear Regression Analysis: Terminology and Notation

Basically, if you have a dummy dependent variable you will be estimating a probability.

Linear Approximation with Regularization and Moving Least Squares

10-701/ Machine Learning, Fall 2005 Homework 3

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Lecture 10 Support Vector Machines II

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

β0 + β1xi. You are interested in estimating the unknown parameters β

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Lecture 6: Introduction to Linear Regression

Interpreting Slope Coefficients in Multiple Linear Regression Models: An Example

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

Mathematical Preparations

Lecture 3: Probability Distributions

The Second Anti-Mathima on Game Theory

Chapter 11: Simple Linear Regression and Correlation

Report on Image warping

Systems of Equations (SUR, GMM, and 3SLS)

Logistic Regression Maximum Likelihood Estimation

Formulas for the Determinant

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Goodness of fit and Wilks theorem

PHYS 705: Classical Mechanics. Calculus of Variations II

Laboratory 3: Method of Least Squares

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

Assortment Optimization under MNL

THE SUMMATION NOTATION Ʃ

Laboratory 1c: Method of Least Squares

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM

Difference Equations

8.6 The Complex Number System

Composite Hypotheses testing

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

β0 + β1xi. You are interested in estimating the unknown parameters β

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Affine transformations and convexity

1 Matrix representations of canonical matrices

Chapter 8 Indicator Variables

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

University of California at Berkeley Fall Introductory Applied Econometrics Final examination

MMA and GCMMA two methods for nonlinear optimization

Comparison of Regression Lines

Lecture 12: Discrete Laplacian

Basic R Programming: Exercises

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

Chapter 15 - Multiple Regression

Feb 14: Spatial analysis of data fields

Maximum Likelihood Estimation (MLE)

= = = (a) Use the MATLAB command rref to solve the system. (b) Let A be the coefficient matrix and B be the right-hand side of the system.

x i1 =1 for all i (the constant ).

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Notes on Frequency Estimation in Data Streams

Introduction to Dummy Variable Regressors. 1. An Example of Dummy Variable Regressors

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

Société de Calcul Mathématique SA

Feature Selection: Part 1

STAT 3008 Applied Regression Analysis

Singular Value Decomposition: Theory and Applications

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

9. Binary Dependent Variables

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Generalized Linear Methods

The KMO Method for Solving Non-homogenous, m th Order Differential Equations

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

Open Systems: Chemical Potential and Partial Molar Quantities Chemical Potential

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Homework Notes Week 7

More metrics on cartesian products

Effects of Ignoring Correlations When Computing Sample Chi-Square. John W. Fowler February 26, 2012

Dummy variables in multiple variable regression model

Supplementary Notes for Chapter 9 Mixture Thermodynamics

Bézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0

The Order Relation and Trace Inequalities for. Hermitian Operators

β0 + β1xi and want to estimate the unknown

Edge Isoperimetric Inequalities

Prof. Dr. I. Nasser Phys 630, T Aug-15 One_dimensional_Ising_Model

Transcription:

The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters. To recap, the normal vector s denoted as N j and ts reflecton as -N j. The normal vector s perpendcular to the cuttng plane. The cuttng plane n two dmensons s defned by the equaton N j (W Z j ) + N j (W Z j ) = 0 () where N j and N j are the components of the normal vector and (Z j, Z j ) s the mdpont of the roll call outcomes (see Fgure.). Any pont, (W, W ), that satsfes the equaton above les on the cuttng plane. For example, f the normal vector s (3, -) and the roll call mdpont s (, 0), then ths produces the equaton: 3(w ) (w 0) = 3w 3 w = 3w w 3 = 0 or 3w w = 3 so that (0, -3/), (/3, -), (, 3/), etc., all le on the plane. In three dmensons the cuttng plane s defned by the equaton N j (W Z j ) + N j (W Z j ) + N j3 (W 3 Z j3 ) = 0 () Where, as above, any pont, (W, W, W 3 ), that satsfes the equaton above les on the cuttng plane. For example, f the normal vector s (, -, ) and the roll call mdpont s (,, ), ths produces the equaton: (w ) (w ) + (w 3 ) = w w + w 3 = 0 or w w + w 3 = so that (0, 0, ), (, 0, -), (0,, 3), etc., all le on the plane.

For s dmensons the cuttng plane s defned by the vector equaton N j (W Z j ) = 0 (book,.8) (3) where N j s the s by normal vector such that N j N j =, W and Z j are s by vectors and 0 s an s by vector of zeroes. The normal vector s constraned to be of unt length for roll call votng work but t s a general vector n other applcatons. In general, f W A and W B are both ponts n the plane, N j W A = N j W B = c j, where c j s a scalar constant. Geometrcally, every pont n the plane projects onto the same pont on the lne defned by the normal vector, N j and ts reflecton -N j (see Fgure.). Ths projecton pont for the general case (N j not necessarly normalzed to one; that s, N j N j = ): M j = c j s N k= N j jk (book,.9) (4) To see ths consder the example of the 3-dmensonal plane above: w w = w w + w3 = w 3 [ ] (5) We need to fnd the pont on the normal vector [ - ] that satsfes equaton (5); that s, the pont where the plane passes through the normal vector tself. Let k be a scalar constant. The soluton s: k k = 4k + 4k + k =, so that k= 9 k [ ]

And the pont s 9 9 9 whch s gven n equaton (4). Note that, by constructon, N j M j = c j. In addton, n the roll call context, because the mdpont of the Yea and Nay polcy ponts, Z j, s on the cuttng plane, t also projects to the pont M j. The cuttng plane passes through the lne formed by the normal vector and ts reflecton (see Fgure.) at the pont M j. In the case of a smple probt analyss, the cuttng plane conssts of all possble legslator deal ponts such that the probablty of the correspondng legslator votng Yea or votng Nay s exactly.5; namely: P(legslator votes Yea) = P(legslator votes Nay) = 0+ + +... + ss Φ s = - 0+ + +... + ss Φ s = Φ ( 0) =.5 Where Φ(.) s the dstrbuton functon for the normal and,,, s are legslator s coordnates on the s dmensons. Because the s and s cannot be separately dentfed, the usual assumpton s to set s =. The equaton above reduces to: 0 + + + + s s = 0 or + + + s s = = - 0 (6) where s the s-length vector of legslator coordnates: = 3 s 3

and s an s-length vector of the coeffcents,, 3,, s ; that s: = 3 s Note that the expresson = - 0 s exactly the same as N j W = c j, whch was used above. Namely, set N j = and every pont n the plane projects onto the pont: M j = - 0 s k= k. (book,.0) (7) In other words, n a regular probt context the coeffcents on the ndependent varables form a normal vector to a plane that passes through the pont - 0 s k= k. Note that ths pont s fxed wth regard to s. In partcular, 0 + + +... + ss 0 = + s s s So that - = - s s 0 0 s s k k k= s k= (8) When there s complete separaton that s, no error, ths plays an mportant role below. are: The smple logt case s dentcal to probt when s =. The logt probabltes 4

( 0 + + +... + s s ) e = + e + e ( + + +... + ) ( + + +... + ) 0 s s 0 s s =.5 Cancelng out the denomnator and takng the natural log of both sdes yelds the same equaton as probt: 0 + + + + s s = 0 In both probt and logt the coeffcents on the ndependent varables form a normal vector to a plane that passes through the pont - 0 s k= k. The cosne of the angle between the normal vectors from probt and logt should be very close to one. That s: ' P L cosθ = P L (9) where θ s the angle between the two normal vectors, and. s the correspondng norm of the normal vector. Computng cosθ s a useful check on the two estmaton technques. Several examples of ths are shown n the Appendx. The Geometry of Complete Separaton (Perfect Votng) To smplfy the notaton below let the normal probablty densty functon be 0 + σ 0 + and the dstrbuton functon be Φ σ whch I wll smplfy to just and Φ, respectvely. I leave s n the expressons because t s the source of the dentfcaton problem. Namely, as a practcal matter, the cuttng plane s dentfed (up to a slght wggle, dependng upon the number of ponts) but, because there s no error, 5

6 s 0 and the observed coeffcent vector explodes to entres of + or - because of the mplct dvson of the coeffcents by s. In a standard presentaton the th row of the matrx of ndependent varables would be: 3 s, and the coeffcent vector would be 0 3 s. But for clarty of presentaton I want to separate the ntercept term, 0, from the other s coeffcents. In the general context my use of Yea and Nay below corresponds to the bnary dependent varable beng and 0, respectvely. The frst dervatves for the Probt Log-lkelhood functon are: ln L = Yea Nay Yea Nay Yea Nay s s Yea Nay Φ Φ Φ Φ (0) Let a and b be ndces from to s. Treatng 0 separately, the second dervatves are

+ + ( ) 0 0 +( Φ) ln L σ σ = 0 Yea Φ Nay ( ) + + ( ) 0 0 +( Φ) ln L σ σ = a 0 a Yea Φ Nay ( ) a (A) (B) And the remanng are: b + +b ( ) 0 0 +( Φ) ln L σ σ = a b ab ba bb Yea Φ Nay ( ) (C) Before turnng the pure case of complete separaton, consder the ntermedate but vexng case where one of the ndependent varables s an ndcator varable that separates wth respect to the bnary dependent varable. That s, whenever the ndcator s the correspondng value of the dependent varable s. Let the ndcator varable be. The frst partal dervatve s: ln L = + = = 0 Φ Φ Φ Φ () Yea,x= Yea,x= 0 Nay,x= 0 Yea,x= Because multplcaton by zero cancels the nd and 3 rd terms and the case Nay, x =, by defnton, does not occur. Hence, for equaton () to hold t must be the case that because =, 0 + σ + K σ = 0, where + Φ Φ + K σ σ Yea,x= Yea,x= 0 K + + +... + = s 0 3 3 s s. 7

The K can be treated as constants so that for equaton () to hold t must be the case that +. Note that ths means that the normal vector tself s not dentfed. Wth the K as constants so that s s a constant, then there wll be a dfferent normal vector wth every change n the value of so that the cuttng plane s not dentfed. The case of complete separaton has a dfferent geometry. By defnton, there exsts a plane that perfectly dvdes the Yeas ( s) from the Nays ( 0 s). Let N j be the s by normal vector to ths plane such that N j N j = (I do not need the j subscrpt here but I retan t for notatonal consstency). Hence, as dscussed above, f A and B are both ponts n the plane, then N j A = N j B = c j, where c j s a scalar constant. For complete separaton ether we have: f Yea, then N > c ; and f Nay, then N < c (3) j j j j or f Yea, then N < c ; and f Nay, then N > c j j j j Wthout loss of generalty I wll assume that equaton (3) s the correct polarty. Usng equaton (6) above, ths s equvalent to: 0 0 f Yea, then ; and f Nay, then > < (4) s s s s Or more smply: + +... + + +... + f Yea then > and f Nay, then < s s 0 s s 0 s s, 0; 0 However, f ths s true then the lkelhood functon forces the followng: 0 + 0 + f Yea, then Φ ; and f Nay, then Φ 0 σ σ 8

because σ 0. Agan, however, note that the cuttng plane s dentfed up to a slght wggle because the s are fxed constants and, by defnton, the plane perfectly dvdes the Yeas ( s) from the Nays ( 0 s). Ths plane passes through the s-dmensonal space of the ndependent varables and separates those cases correspondng to Yeas ( s) from those correspondng to Nays ( 0 s). In actual estmaton, the lkelhood functon for the lnear probt (and logt) problem s globally convex the nverse of the Hessan (equaton system above) s negatve defnte. Hence, the gradent vector wll quckly clmb up the surface of the lkelhood functon to the global maxmum. In the case of perfect separaton what ths means n practce s that the gradent vector explodes as t approaches the global maxmum of zero; that s, the vector becomes nfntely long. Therefore, a straghtforward soluton to ths problem s to apply the constrant = at each teraton and then adjustng the standard devaton term, σ, to compensate. When the standard devaton term begns to vansh, stop. It s then a smple matter to take the normal vector and fnd c j wth the Jance algorthm. Note that ths produces the coeffcent vector correspondng to a perfect specfcaton for ths partcular set of ndependent varables,, and the bnary dependent varable, y. However, snce there s no error, there s no probablty, and no standard errors. 9

Appendx Here are two examples from the NES 000 electon study. The varables are whether or not the respondent voted (0=not voted, > 0 voted), and the ndependent varables are party (0 6), ncome ( categores), race (0 = whte; = black), sex (0 = male, = female), south (0=north, =south), educaton (=hgh school, =some college, 3=college). Frst Example: Smple Probt and Smple Logt. probt voted party ncome race sex south educaton age Iteraton 0: log lkelhood = -680.766 Iteraton : log lkelhood = -607.997 Iteraton : log lkelhood = -607.895 Iteraton 3: log lkelhood = -607.890 Iteraton 4: log lkelhood = -607.890 Probt regresson Number of obs = 06 LR ch(7) = 47.08 Prob > ch = 0.0000 Log lkelhood = -607.890 Pseudo R = 0.080 voted Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- party.0905.03884 0.90 0.369 -.0755.0655 ncome.0358093.03463.70 0.007.009847.06776 race.03704.733654 0. 0.83 -.306875.376894 sex -3.3e-06.084055-0.00.000 -.64743.647366 south -.7693.08979 -.97 0.048 -.350758 -.003089 educaton.43938.0566663 7.48 0.000.3874.53500 age.094557.00784 6.99 0.000.040043.04907 _cons -.503988.0846-7.44 0.000 -.900458 -.0757. logt voted party ncome race sex south educaton age Iteraton 0: log lkelhood = -680.766 Iteraton : log lkelhood = -607.9598 Iteraton : log lkelhood = -605.9353 Iteraton 3: log lkelhood = -605.953 Iteraton 4: log lkelhood = -605.953 Logstc regresson Number of obs = 06 LR ch(7) = 49.60 Prob > ch = 0.0000 Log lkelhood = -605.953 Pseudo R = 0.099 voted Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- party.036497.035805.0 0.33 -.03407.06364 ncome.063748.09564.7 0.007.0738.073686 race.0769.8544 0.7 0.789 -.486799.6350639 sex.0598.406487 0. 0.90 -.597536.9579 south -.300596.48330 -.03 0.043 -.5938 -.0098743 educaton.70976.0964645 7.47 0.000.53909.90048 age.0335865.004884 6.97 0.000.0446.0430303 _cons -.6774.35473-7.46 0.000-3.30649 -.93899 0

Probt Normalzed (Normal Vector) Logt Normalzed (Normal Vector) Party 0.04498 0.04586 Income 0.077377 0.078054 Race 0.0807 0.096566 Sex -0.000007 0.0068 South -0.38079-0.380975 Educaton 0.9605 0.93764 Age 0.04040 0.04568 The correlaton (Cosne) between the two normal vectors = 0.99976.

Second Example: Ordered Probt and Ordered Logt. oprobt party voted ncome race sex south educaton age Iteraton 0: log lkelhood = -055.46 Iteraton : log lkelhood = -954.4476 Iteraton : log lkelhood = -954.38 Iteraton 3: log lkelhood = -954.38 Ordered probt regresson Number of obs = 06 LR ch(7) = 0.73 Prob > ch = 0.0000 Log lkelhood = -954.38 Pseudo R = 0.049 party Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- voted.4435.0395748 0.7 0.000.3465583.506887 ncome.076777.0095894.84 0.065 -.007.036476 race -.8669836.43798-6.05 0.000 -.47807 -.586603 sex -.0837943.06489 -.9 0.96 -.08568.04368 south.46657.06937 3.56 0.000.08085.385055 educaton -.009693.043899-0.04 0.964 -.0880097.08407 age -.0066873.004-3. 0.00 -.00886 -.004886 -------------+---------------------------------------------------------------- /cut -.7585.5773 -.04965 -.45474 /cut -.60099.4865 -.573096.065898 /cut3.007637.47737 -.088787.490344 /cut4.588.485304.307086.8937 /cut5.968794.50958.665737.580 /cut6.47069.557338.65386.77585. ologt party voted ncome race sex south educaton age Iteraton 0: log lkelhood = -055.46 Iteraton : log lkelhood = -958.0084 Iteraton : log lkelhood = -957.35 Iteraton 3: log lkelhood = -957.306 Ordered logstc regresson Number of obs = 06 LR ch(7) = 96.3 Prob > ch = 0.0000 Log lkelhood = -957.306 Pseudo R = 0.0477 party Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- voted.69534.06894 0.8 0.000.564633.89647 ncome.095464.060355.84 0.065 -.00886.0609755 race -.45739.404073-6.06 0.000 -.9858 -.986005 sex -.444.0974 -.3 0.89 -.35958.07083 south.3940648.7609 3.36 0.00.644337.636959 educaton -.0745.074965-0.7 0.865 -.59667.34767 age -.05873.0036647-3.6 0.00 -.087699 -.0044047 -------------+---------------------------------------------------------------- /cut -.355.54506 -.8436 -.866589 /cut -.407.47775 -.9066953.06455 /cut3.7795.460457 -.04886.76099 /cut4.8034034.477555.3785.88995 /cut5.534033.59847.0389.09874 /cut6.47.679.898674.96765

Probt Normalzed (Normal Vector) Logt Normalzed (Normal Vector) voted 0.4474 0.46670 Income 0.07680 0.07706 Race -0.867086-0.873346 Sex -0.083804-0.08646 South 0.46686 0.3645 Educaton -0.00970-0.007638 Age -0.006688-0.006944 The correlaton (Cosne) between the two normal vectors = 0.99995. 3