Chapter 14 Logistic Regression Models

Similar documents
LINEAR REGRESSION ANALYSIS

Multiple Linear Regression Analysis

STK3100 and STK4100 Autumn 2017

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Chapter 3 Sampling For Proportions and Percentages

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

Objectives of Multiple Regression

STK3100 and STK4100 Autumn 2018

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Special Instructions / Useful Data

Lecture Notes Types of economic variables

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

Probability and. Lecture 13: and Correlation

TESTS BASED ON MAXIMUM LIKELIHOOD

Lecture 8: Linear Regression

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Summary of the lecture in Biostatistics

Simulation Output Analysis

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Logistic regression (continued)

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Functions of Random Variables

ENGI 3423 Simple Linear Regression Page 12-01

Bayes (Naïve or not) Classifiers: Generative Approach

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

9.1 Introduction to the probit and logit models

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

X ε ) = 0, or equivalently, lim

Lecture 3. Sampling, sampling distributions, and parameter estimation

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

CHAPTER VI Statistical Analysis of Experimental Data

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

Chapter 13 Student Lecture Notes 13-1

Chapter 5 Properties of a Random Sample

Simple Linear Regression

Chapter 3 Multiple Linear Regression Model

STA302/1001-Fall 2008 Midterm Test October 21, 2008

Lecture Note to Rice Chapter 8

Econometric Methods. Review of Estimation

Chapter Two. An Introduction to Regression ( )

Sampling Theory MODULE X LECTURE - 35 TWO STAGE SAMPLING (SUB SAMPLING)

Analysis of Variance with Weibull Data

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

Simple Linear Regression and Correlation. Applied Statistics and Probability for Engineers. Chapter 11 Simple Linear Regression and Correlation

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

residual. (Note that usually in descriptions of regression analysis, upper-case

Wu-Hausman Test: But if X and ε are independent, βˆ. ECON 324 Page 1

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

Chapter 4 Multiple Random Variables

ECON 5360 Class Notes GMM

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

Lecture 3 Probability review (cont d)

Linear Regression with One Regressor

Qualifying Exam Statistical Theory Problem Solutions August 2005

Statistics: Unlocking the Power of Data Lock 5

Part I: Background on the Binomial Distribution

Statistics MINITAB - Lab 5

Simple Linear Regression Analysis

Dimensionality Reduction and Learning

Chapter 8: Statistical Analysis of Simulated Data

Introduction to Matrices and Matrix Approach to Simple Linear Regression

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

A New Family of Transformations for Lifetime Data

Maximum Likelihood Estimation

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs

LECTURE - 4 SIMPLE RANDOM SAMPLING DR. SHALABH DEPARTMENT OF MATHEMATICS AND STATISTICS INDIAN INSTITUTE OF TECHNOLOGY KANPUR

Applied Statistics and Probability for Engineers, 5 th edition February 23, b) y ˆ = (85) =

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

Point Estimation: definition of estimators

Continuous Distributions

Fundamentals of Regression Analysis


MEASURES OF DISPERSION

4. Standard Regression Model and Spatial Dependence Tests

Chapter 5 Transformation and Weighting to Correct Model Inadequacies

1 Solution to Problem 6.40

Bayes Estimator for Exponential Distribution with Extension of Jeffery Prior Information

The Mathematical Appendix

22 Nonparametric Methods.

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Chapter 2 Supplemental Text Material

Chapter -2 Simple Random Sampling

Dr. Shalabh. Indian Institute of Technology Kanpur

ρ < 1 be five real numbers. The

Econometrics. 3) Statistical properties of the OLS estimator

Training Sample Model: Given n observations, [[( Yi, x i the sample model can be expressed as (1) where, zero and variance σ

Module 7. Lecture 7: Statistical parameter estimation

Application of Calibration Approach for Regression Coefficient Estimation under Two-stage Sampling Design

Comparison of Dual to Ratio-Cum-Product Estimators of Population Mean

JAM 2015: General Instructions during Examination

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

Transcription:

Chapter 4 Logstc Regresso Models I the lear regresso model X β + ε, there are two types of varables explaatory varables X, X,, X k ad study varable y These varables ca be measured o a cotuous scale as well as lke a dcator varables Whe the explaatory varables are qualtatve, the ther values are expressed as dcator varables ad the dummy varable models are used Whe the study varable s qualtatve varable, the ts values ca be expressed usg a dcator varable takg oly two possble values 0 ad I such a case, the logstc regresso s used For example, y ca deotes the values lke success or falure, yes or o, lke or dslke whch ca be deoted by two values 0 ad Cosder the model y β + β x + β x + + β x + ε 0 k k xβ + ε,,,, where x [, x, x,, x ], β [ β, β, β,, β ] k 0 k The study varable takes two values as y 0 or Assume that y follows a Beroull dstrbuto wth parameter, so ts probablty dstrbuto s wth Py ( y 0 wth Py ( 0 Assumg E( ε 0, E( y + 0( From the model y xβ + ε, we have E( y xβ E( y xβ Ey ( Py ( Thus respose fucto E( y s smply the probablty that y Regresso Aalyss Chapter 4 Logstc Regresso Models Shalabh, IIT Kapur

Note that ε y xβ, so - whe y, the ε xβ - y 0, the ε xβ Recall that earler ε was assumed to follow a ormal dstrbuto whe y was ot a dcator varable Whe y s a dcator varable, the ε takes oly two values, so t caot be assumed to follow a ormal dstrbuto I usual regresso model, the errors are homoskedastc, e, a dcator varable, the [ ] Var( y E y E( y ( (0 ( + y [ ] ( + ( E( y ( σ [ E y ] Thus Var( y depeds o y ad s a fucto mea of Var( ε σ ad so Var( y σ Whe y s y Moreover, sce E( y ad s the probablty, so 0 ad thus there s a costrat o E( y that 0 E( y Ths puts a bg costrat o the choce of lear respose fucto Oe caot ft a model whch the predcted values le outsde the terval of 0 ad Whe y s a dchotomous varable, the emprcal evdeces suggest that the fucto E( y o the whole real le that ca be mapped to [0,] has the sgmod shape It s a olear S shape lke E(y E(y 0 x 0 x Regresso Aalyss Chapter 4 Logstc Regresso Models Shalabh, IIT Kapur

A atural choce for E( y would be the cumulatve dstrbuto fucto of a radom varable I partcular, the logstc dstrbuto, whose cumulatve dstrbuto fucto s the smplfed logstc fucto yelds a good lk ad s gve by exp( y E( y + exp( y exp( x β + exp( x β + exp( x β Lear predctor ad lk fuctos: The systematc compoet E(y s the lear predctor ad s deoted as η β x xβ,,,,, 0,,,, k The lk fucto geeralzed lear model relates the lear predctor η to the mea respose µ Thus g( µ η or µ g ( η I the usual lear models based o the ormally dstrbuted study varable, the lk g ( µ µ s used ad s called as detty lk A lk fucto maps the rage of emprcal approxmato ad carres meagful terpretatos real applcatos µ oto the whole real le, provdes good I case of logstc regresso, the lk fucto s defed as η l Ths trasformato s called as the logt trasformato of probablty ad s called as odds The lk η s also called as log-odds Ths lk fucto s obtaed as follows: Regresso Aalyss Chapter 4 Logstc Regresso Models Shalabh, IIT Kapur 3

+ exp( η or [ η ] + exp( or or e η µ l Note: Smlar to logt fucto, there are other fuctos also whch have same shape as of logstc fucto These fuctos ca also be trasformed through There are two such popular fuctos probt trasformato ad complemetary log-log trasformato The probt trasformato s based o the trasformato of usg the cumulatve dstrbuto fucto of ormal dstrbuto ad based o ths s the probt regresso model The complemetary log-log trasformato of s l[ l( ] Maxmum lkelhood estmato of parameters: Cosder the geeral form of the logstc regresso model y E( y + ε where y s are depedet Beroull radom varable wth parameter wth E( y x β + exp( β exp( x The probablty desty fucto of y s y y ( (,,,,, 0 or f y y The lkelhood fucto s Ly (, y,, y, β, β,, β L f( y k f y y ( ( Regresso Aalyss Chapter 4 Logstc Regresso Models Shalabh, IIT Kapur 4

Sce so y l L l + l( y yl + ( y l( yl + [ l( ] exp( xβ x, + exp( β, + β exp( x x β exp(, x β l exp, l L yx β l + exp( xβ Suppose repeated observatos are avalable at each level of the x -varables Let y be the umbers of s observed for th observato ad l L y + l( y l( be the umber of trals at each observato The The maxmum lkelhood estmate ˆβ of β s obtaed by the umercal maxmzato If V ( ε Ω, the asymptotcally E( ˆ β β V ˆ β X Ω X ( ( After obtag ˆβ, the lear predctor s estmated by ˆ xβ η The ftted value s exp( ˆ η yˆ ˆ + exp( ˆ η exp( ˆ exp( ˆ + η + xβ Regresso Aalyss Chapter 4 Logstc Regresso Models Shalabh, IIT Kapur 5

Iterpretato of parameters: To uderstad the terpretato of the related case wth oly oe varable as η( x β + β x 0 β s the logstc regresso model, frst cosder a smple After fttg of model, ˆ β ˆ 0 ad β are obtaed as the estmators of β0 ad β respectvely The the ftted lear predctor at x x s ˆ( η x ˆ β + ˆ β x 0 whch s the log-odds at x x The ftted value at x x + s ˆ( η x + ˆ β + ˆ β ( x + 0 whch s the log-odds at x x + Thus ˆ β ˆ η( x + ˆ η( x [ x ] [ x ] l odds( + l odds( odds( x + l odds( x odds( x + exp( ˆ β odds( x Ths s termed as odd rato whch s the estmated crease the probablty of success whe value of explaatory varable chages by oe ut Whe there are more tha oe explaatory varables the model, the the terpretato of β s s smlar as the case of sgle explaatory varable case The odds rato s exp ( ˆ β assocated wth explaatory varable x keepg other explaatory varables costat Ths s smlar to the terpretato of β multple lear regresso model If there s a m ut chage s the explaatory varable, the the estmated crease odds rato s exp ( mβ ˆ Regresso Aalyss Chapter 4 Logstc Regresso Models Shalabh, IIT Kapur 6

Test of hypothess: The test of hypothess for the parameters the logstc regresso model s based o asymptotc theory It s a large sample test based o lkelhood rato test based o a statstc termed as devace A model wth exactly p parameters that perfectly fts to the sample data s termed as saturated model The statstc that compares the log-lkelhoods of ftted ad saturated models s called as model devace It s defed as λβ ( l L(saturated model l L( ˆ β where l L( s the log-lkelhood ad ˆβ s the maxmum lkelhood estmate of β I case of logstc regresso model, y 0 or ad s are completely urestrcted So the lkelhood wll be maxmum at y ad the maxmum value of L (saturated modal s Maxmum L(saturated model l Maxmum L(saturated model 0 Let ˆβ be the maxmum lkelhood estmator of β, the log-lkelhood s maxmum at β ˆ β, ad ˆ ˆ l L( β yx β l + exp( xβ l L(saturated model Assumg that the logstc regresso fucto s correct, the large sample dstrbuto of lkelhood rato test statstc λβ ( s approxmately dstrbuted as χ ( p, whe s large Large value of λβ ( mples model s correct Small value of λβ ( mples that model s well ftted ad s as good as the saturated model Note that geerally the ftted model wll be havg smaller umber of parameters tha the saturated model that s based o all the parameters Thus at α % level of sgfcace Regresso Aalyss Chapter 4 Logstc Regresso Models Shalabh, IIT Kapur 7