Logit regression Logit regression

Similar documents
Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Topic 9: Sampling Distributions of Estimators

Efficient GMM LECTURE 12 GMM II

Statistical Inference Based on Extremum Estimators

Topic 9: Sampling Distributions of Estimators

(all terms are scalars).the minimization is clearer in sum notation:

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

(7 One- and Two-Sample Estimation Problem )

Topic 9: Sampling Distributions of Estimators

Chapter 6 Principles of Data Reduction

STAC51: Categorical data Analysis

Maximum Likelihood Estimation

¹Y 1 ¹ Y 2 p s. 2 1 =n 1 + s 2 2=n 2. ¹X X n i. X i u i. i=1 ( ^Y i ¹ Y i ) 2 + P n

1.010 Uncertainty in Engineering Fall 2008

Stat 421-SP2012 Interval Estimation Section

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Chapter 8: Estimating with Confidence

Estimation for Complete Data

Lecture 33: Bootstrap

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

Common Large/Small Sample Tests 1/55

Exponential Families and Bayesian Inference

Properties and Hypothesis Testing

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

Section 14. Simple linear regression.

Mathematical Statistics - MS

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

of the matrix is =-85, so it is not positive definite. Thus, the first

Stat 200 -Testing Summary Page 1

5.4 The spatial error model Regression model with spatially autocorrelated errors

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

Chapter 6 Sampling Distributions

Chapter 11. Regression with a Binary Dependent Variable

Questions and Answers on Maximum Likelihood

Asymptotic Results for the Linear Regression Model

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

STA6938-Logistic Regression Model

Expectation and Variance of a random variable

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

Describing the Relation between Two Variables

The Sampling Distribution of the Maximum. Likelihood Estimators for the Parameters of. Beta-Binomial Distribution

TAMS24: Notations and Formulas

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Random Variables, Sampling and Estimation

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Lecture 5. Materials Covered: Chapter 6 Suggested Exercises: 6.7, 6.9, 6.17, 6.20, 6.21, 6.41, 6.49, 6.52, 6.53, 6.62, 6.63.

STATISTICAL INFERENCE

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Lecture 7: Properties of Random Samples

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

f(x i ; ) L(x; p) = i=1 To estimate the value of that maximizes L or equivalently ln L we will set =0, for i =1, 2,...,m p x i (1 p) 1 x i i=1

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model

1 Inferential Methods for Correlation and Regression Analysis

Statistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005

1 Models for Matched Pairs

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

x = Pr ( X (n) βx ) =

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

MATH/STAT 352: Lecture 15

STAT431 Review. X = n. n )

Final Examination Solutions 17/6/2010

32 estimating the cumulative distribution function

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

10-701/ Machine Learning Mid-term Exam Solution

Lecture 2: Monte Carlo Simulation

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

5 : Exponential Family and Generalized Linear Models

Asymptotic distribution of the first-stage F-statistic under weak IVs

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

Sample questions. 8. Let X denote a continuous random variable with probability density function f(x) = 4x 3 /15 for

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

MA Advanced Econometrics: Properties of Least Squares Estimators

Important Formulas. Expectation: E (X) = Σ [X P(X)] = n p q σ = n p q. P(X) = n! X1! X 2! X 3! X k! p X. Chapter 6 The Normal Distribution.

Lecture 3. Properties of Summary Statistics: Sampling Distribution

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

Lecture 11 and 12: Basic estimation theory

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

Transcription:

Logit regressio Logit regressio models the probability of Y= as the cumulative stadard logistic distributio fuctio, evaluated at z = β 0 + β X: Pr(Y = X) = F(β 0 + β X) F is the cumulative logistic distributio fuctio: F(β 0 + β X) = + ( 0 X ) e β + β 9-

Logistic regressio, ctd. Pr(Y = X) = F(β 0 + β X) where F(β 0 + β X) = + ( 0 X ) e β + β. Example: β 0 = -3, β = 2, X =.4, so β 0 + β X = -3 + 2x.4 = -2.2 so Pr(Y = X=.4) = /(+e ( 2.2) ) =.0998 Why bother with logit if we have probit? Historically, umerically coveiet I practice, very similar to probit 9-2

Predicted probabilities from estimated probit ad logit models usually are very close. 9-3

Estimatio ad Iferece i Probit (ad Logit) Models (SW Sectio 9.3) Probit model: Pr(Y = X) = Φ(β 0 + β X) Estimatio ad iferece o How to estimate β 0 ad β? o What is the samplig distributio of the estimators? o Why ca we use the usual methods of iferece? First discuss oliear least squares (easier to explai) The discuss maximum likelihood estimatio (what is actually doe i practice) 9-4

Probit estimatio by oliear least squares Recall OLS: mi [ Y ( b + b X )] b0, b i 0 i i= 2 The result is the OLS estimators ˆ β 0 ad ˆ β I probit, we have a differet regressio fuctio the oliear probit model. So, we could estimate β 0 ad β by oliear least squares: b0, b Y i Φ b0 + bxi i= mi [ ( )] Solvig this yields the oliear least squares estimator of the probit coefficiets. 2 9-5

Noliear least squares, ctd. b0, b Y i Φ b0 + b Xi i= mi [ ( )] How to solve this miimizatio problem? Calculus does t give a explicit solutio sice the first-order coditios will be oliear. Must be solved umerically usig the computer, e.g. by trial ad error method of tryig oe set of values for (b 0,b ), the tryig aother, ad aother, Better idea: use specialized miimizatio algorithms that serarch more efficietly tha trial ad error. I practice, oliear least squares is t used because it is t efficiet a estimator with a smaller variace is 2 9-6

Probit estimatio by maximum likelihood The likelihood fuctio is the coditioal desity of Y,,Y give X,,X, treated as a fuctio of the ukow parameters β 0 ad β. The maximum likelihood estimator (MLE) is the value of (β 0, β ) that maximize the likelihood fuctio. The MLE is the value of (β 0, β ) that best describe the full distributio of the data. Like the oliear-least squares estimator, the MLE of β 0 ad β i the probit ad logit models must be foud umerically. 9-7

I large samples, the MLE is: o Cosistet, o ormally distributed, ad o efficiet (has the smallest variace of all estimators) 9-8

Special case: the probit MLE with o X with probability p Y = 0 with probability (Beroulli distributio) p Data: Y,,Y, i.i.d. Derivatio of the likelihood starts with the desity of Y : so Pr(Y = ) = p ad Pr(Y = 0) = p y y Pr(Y = y ) = p ( p) 9-9

For y = 0, y Pr(Y = 0) = y p ( p) = p 0 (-p) -0 = -p For y =, y Pr(Y = ) = y p ( p) = p (-p) - = p 9-0

Joit desity of (Y,Y 2 ): Because Y ad Y 2 are idepedet, Pr(Y = y,y 2 = y 2 ) = Pr(Y = y )x Pr(Y 2 = y 2 ) y y = [ p ( p) y2 y2 ][ p ( p) ] Joit desity of (Y,..,Y ): Pr(Y = y,y 2 = y 2,,Y = y ) = y [ y p ( p) ][ = y2 y2 p ( p) p i= y i ( p) ] [ ( ) yi i= p y ( ) y p ] The likelihood is the joit desity, treated as a fuctio of the ukow parameters, which here is p: f(p;y,,y ) = p i= Y i ( p) ( ) Yi i= 9-

The MLE maximizes the likelihood. Its stadard to work with the log likelihood, l[f(p;y,,y )]: (The parameters that maximize the likelihood fuctio also maximize the log likelihood fuctio, but the latter is more coveiet to work with.) l[f(p;y,,y )] = ( ) ( Y ) i p Y i= i= i p l( ) + l( ) dl f( p; Y,..., Y ) dp = + p p ( ) ( Y ) i Y i= i= i = 0 9-2

Solvig for p yields the MLE; that is, p ˆ MLE satisfies, or or or + MLE pˆ pˆ ( ) ( Y ) i Y i= MLE i= i = 0 pˆ pˆ ( ) ( Y ) i = Y i= MLE i= i Y pˆ = Y pˆ MLE MLE MLE p ˆ MLE = Y = fractio of s 9-3

The MLE i the o-x case (Beroulli distributio): p ˆ MLE = Y = fractio of s For Y i i.i.d. Beroulli, the MLE is the atural estimator of p, the fractio of s, which is Y We already kow the essetials of iferece: o I large, the samplig distributio of p ˆ MLE = Y is ormally distributed o Thus iferece is as usual: hypothesis testig via t-statistic, 95% cofidece iterval as p ˆ MLE +.96SE 9-4

The probit likelihood with oe X The derivatio starts with the desity of Y, give X : Pr(Y = X ) = Φ(β 0 + β X ) Pr(Y = 0 X ) = Φ(β 0 + β X ) so Pr(Y = y X ) = Φ ( β + β X ) [ Φ ( β + β X )] y 0 0 y The probit likelihood fuctio is the joit desity of Y,,Y give X,,X, treated as a fuctio of β 0, β : 9-5

f(β 0,β ; Y,,Y X,,X ) Y Y = { Φ ( β + β X ) [ Φ ( β + β X )] }x 0 0 Y Y x{ Φ ( β + β X ) [ Φ ( β + β X )] } 0 0 Ca t solve for the maximum explicitly Must maximize usig umerical methods As i the case of o X, i large samples: o ˆ MLE 0 ˆ MLE β, β are cosistet o ˆ β MLE 0, ˆ MLE β are ormally distributed Their stadard errors ca be computed o Testig, cofidece itervals proceeds as usual 9-6

The logit likelihood with oe X The oly differece betwee probit ad logit is the fuctioal form used for the probability: Φ is replaced by the cumulative logistic fuctio. Otherwise, the likelihood is similar; for details see SW App. 9.2 As with probit, o ˆ MLE 0 ˆ MLE β, β are cosistet o ˆ β MLE 0, ˆ MLE β are ormally distributed o Their stadard errors ca be computed o Testig, cofidece itervals proceeds as usual 9-7

Measures of fit The R 2 ad 2 R do t make sese here (why?). So, two other specialized measures are used:. The fractio correctly predicted = fractio of Y s for which predicted probability is >50% (if Y i =) or is <50% (if Y i =0). 2. The pseudo-r 2 measure the fit usig the likelihood fuctio: measures the improvemet i the value of the log likelihood, relative to havig o X s (see SW App. 9.2). This simplifies to the R 2 i the liear model with ormally distributed errors. 9-8

Summary: distributio of the MLE The MLE is ormally distributed for large We worked through this result i detail for the probit model with o X s (the Beroulli distributio) For large, cofidece itervals ad hypothesis testig proceeds as usual If the model is correctly specified, the MLE is efficiet, that is, it has a smaller large- variace tha all other estimators (we did t show this). These methods exted to other models with discrete depedet variables, for example cout data (# crimes/day) see SW App. 9.2. 9-9