Week 7: Binary Outcomes (Scott Long Chapter 3 Part 2)
|
|
- Logan Wiggins
- 5 years ago
- Views:
Transcription
1 Week 7: (Scott Long Chapter 3 Part 2) Tsun-Feng Chiang* *School of Economics, Henan University, Kaifeng, China April 29, / 38
2 ML Estimation for Probit and Logit ML Estimation for Probit and Logit Suppose there are N samples drew independently drew from the population and the choice y is binary. We want to estimate β given the data, then the likelihood function is the product of every sample s probability p i of their choices, where P i = L(β y, X) = N p i (eq.7) i=1 { Pr(yi = 1 x i ) if y i = 1 is observed 1 Pr(y i = 1 x i ) if y i = 0 is observed Let N 1 is the number of samples who choose y i = 1 and N 2 is the number of samples who choose y i = 0. (eq. 7) can be divided into two groups according to the choices, 2 / 38
3 ML Estimation for Probit and Logit L(β y, X) = N 1 y=1 N 2 Pr(y i = 1 x i ) [1 Pr(y i = 1 x i )] y=0 From (eq. 4), the equation above can be rewritten as L(β y, X) = N 1 y=1 N 2 F(x i β) [1 F(x i β)] y=0 Take logs to obtain the log likelihood function, N 1 N 2 lnl(β y, X) = lnf(x i β) + ln[1 F(x i β)] y=1 y=0 Unlike the the linear regression model in Chapter 2.6 where ML estimators have closed-form solutions are available, algebraic maximization of lnl(β y, X) is rarely possible in most cases. 3 / 38
4 ML Estimation for Probit and Logit Numerical Methods for ML Estimation The log likelihood functions are complicated, so it is often the numerical methods are used to derived the maximum likelihood estimators. They start with a guess of the values of the parameters and iterate to improve on that guess. Assume we are trying to estimate the vector of parameters β. We begin with an initial guess β 0, called start values, and attempt to improve on this guess by adding a vector ζ 0 of adjustments, β 1 = β 0 + ζ 0 Continue to update the previous iteration according to the equation: β t+1 = β t + ζ t 4 / 38
5 ML Estimation for Probit and Logit Figure (Train, p186): Maximum Likelihood Estimate, one-parameter example Literations stop until when the gradient of the log likelihood function is close 0 or the estimates do not change from one step to the next, β T +1 = β T Then β T is the maximum likelihood estimates. 5 / 38
6 ML Estimation for Probit and Logit The problem is how to find ζ t that reduces the steps of iterations or obtain β T quickly. It is useful to think of ζ t as consisting of two parts: where ζ t = D t γ t γ t is a gradient vector defined as lnl/ β t which indicates the direction of the change in the log likelihood for a change in the parameters. See the one-parameter example, when the direction or slope is positive (negative), then increases (decreases) β t in the next step. D t is a direction matrix that reflects the curvature of the log likelihood function; that is, it indicates how rapidly the gradient is changing. See the one-parameter example, when the slope is changing rapidly (slowly), then take the smaller (larger) next step for β t. 6 / 38
7 ML Estimation for Probit and Logit Figure (Train, p188): Gradient Vector γ t, one-parameter example 7 / 38
8 ML Estimation for Probit and Logit Figure (Train, p188): Direction Matrix D t, one-parameter example 8 / 38
9 ML Estimation for Probit and Logit Different numerical methods provide different direction matrix, the following are the most commonly used, The Newton-Raphson Method The Method of Scoring The BHHH Method D t = ( 2 lnl β t β t ) 1 D t = (E[ 2 lnl β t β t ]) 1 D t = ( N i=1 lnl i β t ( lnl i β t ) ) 1 9 / 38
10 ML Estimation for Probit and Logit Since β is obtained numerically, the covariance matrix must be estimated using numerical methods. Let ˆβ be the maximum likelihood estimates, then for different numerical methods, there are different ways of estimating the covariance matrix. The Newton-Raphson Method The Method of Scoring The BHHH Method Var( ˆβ) = ( N i=1 2 lnl i ) 1 ˆβ ˆβ Var( ˆβ) = (E[ 2 lnl ]) 1 ˆβ ˆβ Var( ˆβ) = ( N i=1 lnl i ˆβ lnl i ˆβ ) 1 10 / 38
11 ML Estimation for Probit and Logit Problems with Numerical Methods There can be some problems for ML estimation by numerical methods, can t find ˆβ after many iterations flat log likelihood function. wrong estimates are obtained local maximum. ML estimates do not exist no variation in the independent variable for one of the outcomes. The problems above could be from the following reasons: Number of observation. Small observations might explain why the model does not converge. Scaling of variables. When the standard deviation of a variables is very large or small relatively to other variables, it is possible to fail to find the ML estimates. Distribution of outcomes. If there are few observations in one outcome, convergence may be difficult. 11 / 38
12 ML Estimation for Probit and Logit Figure: Logit Analysis of Labor Force Participation 12 / 38
13 ML Estimation for Probit and Logit Figure: Probit Analysis of Labor Force Participation 13 / 38
14 ML Estimation for Probit and Logit My R Code: To run the probit and logit, use the command glm. > labor = read.csv(file.choose(), header = TRUE) Logit > labor_logit <- glm(lfp k5 + k618 + age + wc + hc + lwg + inc, family = binomial(link = "logit"), data = labor) > summary(labor_logit) Probit > labor_probit <- glm(lfp k5 + k618 + age + wc + hc + lwg + inc, family = binomial(link = "probit"), data = labor) > summary(labor_probit) 14 / 38
15 The Probability Curve and Parameters The Probability Curve and Parameters Consider a model with a signle x, Pr(y = 1 x) = F(α + βx) The change in α shifts the probability curve (or cdf) in parallel. When β is positive, smaller intercept shifts the curve right, and larger intercept shifts the left (Visual intuition: to achieve a given probability, say 0.5, it needs smaller (larger) x when α is larger (smaller)). The change of β changes the slope of probability curve (or cdf). The larger the β, the steeper the slope (Visual intuition: to achieve a given change in probability, it needs less (more) change in x when the slope is larger (smaller) ). 15 / 38
16 The Probability Curve and Parameters Figure 3.8 A: Effects of Changing α 16 / 38
17 The Probability Curve and Parameters Figure 3.8 B: Effects of Changing β 17 / 38
18 The Probability Curve and Parameters Now consider a model with one more independent variable z, Pr(y = 1 x, z) = F(α + β 1 x + β 2 z) If we assign a value, z for z, then the model above is written as Pr(y = 1 x, z = z) = F(α + β 1 x + β 2 z) = F((α + β 2 z) + β 1 x) The term β 2 z becomes a part of intercept. Therefore, when the value of z changes, the probability curve will shift in parallel with respect to x. This means the effect of a variable on the probability is dependent on the values of other variables. 18 / 38
19 The Probability Curve and Parameters Figure 3.9: How z Affects the Effect of x 19 / 38
20 The Probability Curve and Parameters Figure 3.9: Values of z Create Parallel Curves with Respect to x 20 / 38
21 Interpretation Interpretation - Predicted Probabilities To interpret the estimated results from the logit and probit models, probabilities are the fundamental statistic, Probit : Pr(y = 1 x) = Φ(x ˆβ) = x ˆβ 1 2π exp( t2 2 )dt Logit : Pr(y = 1 x) = Λ(x exp(x ˆβ) ˆβ) = 1 + exp(x ˆβ) Since the model is nonlinear, there is no single method of interpretation can fully describe the relationship between a variable and the outcome. What and how to interpret the predicted probabilities depend on your research purpose or questions. 21 / 38
22 Interpretation The range of probabilities The minimum and maximum probabilites in the sample are defined as min Pr(y = 1 x) = min i F(x i ˆβ) max Pr(y = 1 x) = max i F(x i ˆβ) Listing the largest and smallest predicted probabilities can suggest what variables are important. However, the range is easily affected by the extreme values of x. The effect of each variable on the predicted probabilities We can also see how probability changes when a variable changes, but it requires to control for the values of other variables. Usually the values of other variables are fixed at their means. For example, we can see the predicted change in the probability as x k change from its minimum to it maximum, i.e. Pr(y = 1 x, max x k ) Pr(y = 1 x, min x k ) 22 / 38
23 Interpretation Figure: Probability for Maximum k5 (Table 3.4) Take an example using R, let k5 be x k. Create a new data frame where k5 is the maximum from the original data using the command with. > labor_newdata_k5max <- with(labor, data.frame(k5=max(k5), k618=mean(k618), age=mean(age), wc=mean(wc), hc=mean(hc), lwg=mean(lwg), inc=mean(inc)) ) > labor_newdata_k5max Predict the probability using the command predict. > labor_newdata_k5max$k5maxprob <- predict(labor_probit, newdata=labor_newdata_k5max, type="response") > labor_newdata_k5max 23 / 38
24 Interpretation Figure: Probability for Minimum k5 (Table 3.4) Similarly, we can create a data frame where k5 is the minimum and others are at their means. and calculate the predicted probability Pr(y = 1 x, max k5) Pr(y = 1 x, min k5) = = / 38
25 Interpretation Probabilities Over the Range of a Variable When there are more than one variable, we can compare the effects of two variables while the remaining variables are held constant. However, you can only change the values of one variable at one time. For example, if we want to examine the effects of x j and x l while others are at their means, first fix x l at a value x l, then allow x j to move over a range, for the predicted probability of the probit, Pr(y = 1 x, x l = x l, x j ) = Φ(α + β 1 x 1 + β 2 x β l x l + + β j x j + + β k x k ) We obtain probabilities over some range of x j, when x l = x l. Second, let x l be at ẋ l and still allow x j to move over the same range. The predicted probability is, Pr(y = 1 x, x l = ẋ l, x j ) = Φ(α + β 1 x 1 + β 2 x β l ẋ l + + β j x j + + β k x k ) We can compare how the probability changes when x j changes under two different values of x l. 25 / 38
26 Interpretation Figure: Probabilities Over the Range of Age (Figure 3.10) Create a data frame where the range of age is from 30 to 60 with 5 for each interval, and the wife s college is 0. > labor_newdata2 <- with(labor, data.frame(k5=mean(k5), k618=mean(k618), wc=0, hc=mean(hc), lwg=mean(lwg), inc=mean(inc), age=rep(seq(from =30, to = 60, length.out = 7)) )) > labor_newdata2 26 / 38
27 Interpretation Figure: Probabilities Over the Range of Age (Figure 3.10 Continued) predict the probabilities over the range of age. > labor_newdata2$ageprob <- predict(labor_probit, newdata=labor_newdata2, type="response") > labor_newdata2 27 / 38
28 Interpretation Figure 3.10: Probabilities Over the Range of Age for Two Wife s Education Levels 28 / 38
29 Interpretation Changes in Probabilities: Marginal Effects How to summarize the effect of independent variables? Because the scale of y is arbitrary, β can t be interpreted directly. The marginal effects of x on the probabilities are the better summary. The marginal effect is the change in Pr(y = 1 x) for a change of δ in x l holding all other x at specific values. There are two kinds of marginal effects: Marginal Change An infinitely small change in x l, or δ 0 Discrete Change A finite change in x l. The measures of these two effects agrees when the probability curve is linear. 29 / 38
30 Interpretation Figure 3.13: Marginal Change (δ infinitely small) V.S. Discrete Change (δ = 1) 30 / 38
31 Interpretation Marginal Change Let F be the cdf and f the pdf of a distribution. The derivation for the marginal change of a variable x l on the probability is, Pr(y = 1 x) x l = F(x β) = df(x β) x l dx β x β x l = f (x β)β l (eq.8) For the probit model, Pr(y = 1 x) x l = φ(x β)β l and for the logit model, Pr(y = 1 x) x l = λ(x β)β l = exp(x β) [1 + exp(x β)] 2 β l = Pr(y = 1 x)[1 Pr(y = 1 x)]β l (eq.9) 31 / 38
32 Interpretation The marginal effect is the slope of the probability curve relating x l to Pr(y = 1 x), holding all other variables constant. The sign of the marginal effect is determined by β l, since f (x β) is always positive. The magnitude of the change depends on the magnitude of β l and value of x β. Assume β l is positive, from (eq. 9), the effect of x l on the probability is positive. But when we consider another variable x j, the situation is more complicated (but tractable). Take derivative of (eq. 8) with respect to x l and x j and use the logit cdf, 2 Pr(y = 1 x) x l x j = β l β j Pr(y = 1 x)[1 Pr(y = 1 x)][1 2Pr(y = 1 x)] (How to derive it?) Assume β j is also positive, when Pr(y = 1 x) < 0.5, the increase of x j will make the slope of probability with respect to x l increase; when Pr(y = 1 x) > 0.5, the increase of x j will make the slope of probability with respect to x l decrease (see Figure 3.9). 32 / 38
33 Interpretation Figure 3.12: Marginal Effect in the Binary Response Model (β is positive) 33 / 38
34 Interpretation Overall, there are several things that affect the size of the marginal effect (applied to both marginal change and discrete change), The associated parameter of the variable of interest, i.e. β l in our previous example of (eq. 8) The start value of the variable of interest, i.e. x l The amount of change in x l Values and parameters of other variables Since the value of the marginal effect depends on the levels of all variables, we must decide on which values of the variables to use when computing the effect. One method is to compute the average over all observations: Pr(y = 1 x) mean = 1 x l N N f (x i β)β l i=1 34 / 38
35 Interpretation Another method is to compute the marginal effect at mean of the independent variables, Pr(y = 1 x) x l = f ( x β)β l However, these two methods are limited. The primary reason is the existence of the dummy variable. It is inappropriate to take derivative or average for dummy variables, and we can t see how the predicted probability changes when there is a change in a dummy variable. So discrete change is introduced. Discrete Change (i) A Unit Change in x l. If a variable x l increases from a start value xl 0 to xl 0 + 1, the change in probability is defined as Pr(y = 1 x) x l = Pr(y = 1 x, x 0 l + 1) Pr(y = 1 x, x 0 l ) The start value affects the change in the probability. Usually, x 0 l = x l. 35 / 38
36 Interpretation Or we can use another unit change that is centered around x l, Pr(y = 1 x) x l = Pr(y = 1 x, x l ) Pr(y = 1 x, x l 1 2 ) (ii) A Standard Deviation Change in x l. Similar to the centered around x l, but replace 1 2 with s l 2, where s l is standard deviation of x l. Pr(y = 1 x) x l = Pr(y = 1 x, x l + s l 2 ) Pr(y = 1 x, x l s l 2 ) (iii) A Change from 0 to 1 (or 1 to 0) for Dummy Variables. When x l is a dummy variable, its mean x l is meaningless, and both of x l and x l 1 2 could exceed the range. Consequently, a preferred measure of discrete change for dummy variable is set the start value as 0 (1) and the end value as 1 (0), Pr(y = 1 x) x l = Pr(y = 1 x, x l = 1(0)) Pr(y = 1 x, x l = 0(1)) 36 / 38
37 Interpretation The change in probability depends on the values of other variables. Previously we fix them at their means. But for dummy variables, their means are unreasonable. Another way to fix this problem is to find a baseline whose dummy variables are the same as the modals, and continuous variables are means. For example, x 1 and x 2 are continuous, and x 3 and x 4 are dummy variables. For most observations, x 3 = 1 and x 4 = 0, then the discrete change in the probability with respect to x 1 for the baseline observation is Pr(y = 1 x 1, x 2, x 3 = 1, x 4 = 0) x 1 37 / 38
38 Interpretation 2nd Midterm Date: Tuesday, May 27th, 2014 Time: 9:00 am 11:30 am Location: The Conference Room for Lecture Coverage: Scott Long Chapter 2.6, 3 and 4 Others: Closed Book, Closed Notes. A Simple Calculator for Taking Exponent and Logarithm. 38 / 38
Modeling Binary Outcomes: Logit and Probit Models
Modeling Binary Outcomes: Logit and Probit Models Eric Zivot December 5, 2009 Motivating Example: Women s labor force participation y i = 1 if married woman is in labor force = 0 otherwise x i k 1 = observed
More informationPOLI 8501 Introduction to Maximum Likelihood Estimation
POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,
More informationLinear Regression With Special Variables
Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:
More informationLecture #11: Classification & Logistic Regression
Lecture #11: Classification & Logistic Regression CS 109A, STAT 121A, AC 209A: Data Science Weiwei Pan, Pavlos Protopapas, Kevin Rader Fall 2016 Harvard University 1 Announcements Midterm: will be graded
More informationLecture 12: Application of Maximum Likelihood Estimation:Truncation, Censoring, and Corner Solutions
Econ 513, USC, Department of Economics Lecture 12: Application of Maximum Likelihood Estimation:Truncation, Censoring, and Corner Solutions I Introduction Here we look at a set of complications with the
More informationNinth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"
Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis" June 2013 Bangkok, Thailand Cosimo Beverelli and Rainer Lanz (World Trade Organization) 1 Selected econometric
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationWeek 2: Pooling Cross Section across Time (Wooldridge Chapter 13)
Week 2: Pooling Cross Section across Time (Wooldridge Chapter 13) Tsun-Feng Chiang* *School of Economics, Henan University, Kaifeng, China March 3, 2014 1 / 30 Pooling Cross Sections across Time Pooled
More informationEconometrics II. Seppo Pynnönen. Spring Department of Mathematics and Statistics, University of Vaasa, Finland
Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2018 Part III Limited Dependent Variable Models As of Jan 30, 2017 1 Background 2 Binary Dependent Variable The Linear Probability
More informationUsing the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes 1
Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes 1 JunXuJ.ScottLong Indiana University 2005-02-03 1 General Formula The delta method is a general
More informationApplied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid
Applied Economics Regression with a Binary Dependent Variable Department of Economics Universidad Carlos III de Madrid See Stock and Watson (chapter 11) 1 / 28 Binary Dependent Variables: What is Different?
More informationGeneralized Linear Models for Non-Normal Data
Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture
More informationGoals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1. Multinomial Dependent Variable. Random Utility Model
Goals PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1 Tetsuya Matsubayashi University of North Texas November 2, 2010 Random utility model Multinomial logit model Conditional logit model
More informationECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria
ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria SOLUTION TO FINAL EXAM Friday, April 12, 2013. From 9:00-12:00 (3 hours) INSTRUCTIONS:
More informationEPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7
Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review
More informationAdvanced Quantitative Methods: maximum likelihood
Advanced Quantitative Methods: Maximum Likelihood University College Dublin 4 March 2014 1 2 3 4 5 6 Outline 1 2 3 4 5 6 of straight lines y = 1 2 x + 2 dy dx = 1 2 of curves y = x 2 4x + 5 of curves y
More informationStatistical Distribution Assumptions of General Linear Models
Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions
More informationSTAT5044: Regression and Anova
STAT5044: Regression and Anova Inyoung Kim 1 / 15 Outline 1 Fitting GLMs 2 / 15 Fitting GLMS We study how to find the maxlimum likelihood estimator ˆβ of GLM parameters The likelihood equaions are usually
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationMultilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2
Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationApplied Health Economics (for B.Sc.)
Applied Health Economics (for B.Sc.) Helmut Farbmacher Department of Economics University of Mannheim Autumn Semester 2017 Outlook 1 Linear models (OLS, Omitted variables, 2SLS) 2 Limited and qualitative
More informationUnivariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation
Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation PRE 905: Multivariate Analysis Spring 2014 Lecture 4 Today s Class The building blocks: The basics of mathematical
More informationPartial effects in fixed effects models
1 Partial effects in fixed effects models J.M.C. Santos Silva School of Economics, University of Surrey Gordon C.R. Kemp Department of Economics, University of Essex 22 nd London Stata Users Group Meeting
More informationDiscrete Dependent Variable Models
Discrete Dependent Variable Models James J. Heckman University of Chicago This draft, April 10, 2006 Here s the general approach of this lecture: Economic model Decision rule (e.g. utility maximization)
More informationProbit Estimation in gretl
Probit Estimation in gretl Quantitative Microeconomics R. Mora Department of Economics Universidad Carlos III de Madrid Outline Introduction 1 Introduction 2 3 The Probit Model and ML Estimation The Probit
More informationGibbs Sampling in Latent Variable Models #1
Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor
More informationOrdered Response and Multinomial Logit Estimation
Ordered Response and Multinomial Logit Estimation Quantitative Microeconomics R. Mora Department of Economics Universidad Carlos III de Madrid Outline Introduction 1 Introduction 2 3 Introduction The Ordered
More informationChap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University
Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems
More informationLecture 6: Discrete Choice: Qualitative Response
Lecture 6: Instructor: Department of Economics Stanford University 2011 Types of Discrete Choice Models Univariate Models Binary: Linear; Probit; Logit; Arctan, etc. Multinomial: Logit; Nested Logit; GEV;
More information2. We care about proportion for categorical variable, but average for numerical one.
Probit Model 1. We apply Probit model to Bank data. The dependent variable is deny, a dummy variable equaling one if a mortgage application is denied, and equaling zero if accepted. The key regressor is
More information4. Nonlinear regression functions
4. Nonlinear regression functions Up to now: Population regression function was assumed to be linear The slope(s) of the population regression function is (are) constant The effect on Y of a unit-change
More informationEconometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit
Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit R. G. Pierse 1 Introduction In lecture 5 of last semester s course, we looked at the reasons for including dichotomous variables
More informationLatent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent
Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationBinary Choice Models Probit & Logit. = 0 with Pr = 0 = 1. decision-making purchase of durable consumer products unemployment
BINARY CHOICE MODELS Y ( Y ) ( Y ) 1 with Pr = 1 = P = 0 with Pr = 0 = 1 P Examples: decision-making purchase of durable consumer products unemployment Estimation with OLS? Yi = Xiβ + εi Problems: nonsense
More informationCOMPLEMENTARY LOG-LOG MODEL
COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α
More information7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis
Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression
More informationLinear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52
Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two
More informationLecture 5: LDA and Logistic Regression
Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant
More informationModels of Qualitative Binary Response
Models of Qualitative Binary Response Probit and Logit Models October 6, 2015 Dependent Variable as a Binary Outcome Suppose we observe an economic choice that is a binary signal. The focus on the course
More informationModel Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection
Model Selection in GLMs Last class: estimability/identifiability, analysis of deviance, standard errors & confidence intervals (should be able to implement frequentist GLM analyses!) Today: standard frequentist
More informationKernel Logistic Regression and the Import Vector Machine
Kernel Logistic Regression and the Import Vector Machine Ji Zhu and Trevor Hastie Journal of Computational and Graphical Statistics, 2005 Presented by Mingtao Ding Duke University December 8, 2011 Mingtao
More informationLogistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014
Logistic Regression Advanced Methods for Data Analysis (36-402/36-608 Spring 204 Classification. Introduction to classification Classification, like regression, is a predictive task, but one in which the
More informationEconometrics I Lecture 7: Dummy Variables
Econometrics I Lecture 7: Dummy Variables Mohammad Vesal Graduate School of Management and Economics Sharif University of Technology 44716 Fall 1397 1 / 27 Introduction Dummy variable: d i is a dummy variable
More informationMLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22
MLE and GMM Li Zhao, SJTU Spring, 2017 Li Zhao MLE and GMM 1 / 22 Outline 1 MLE 2 GMM 3 Binary Choice Models Li Zhao MLE and GMM 2 / 22 Maximum Likelihood Estimation - Introduction For a linear model y
More informationTopic 5: Non-Linear Relationships and Non-Linear Least Squares
Topic 5: Non-Linear Relationships and Non-Linear Least Squares Non-linear Relationships Many relationships between variables are non-linear. (Examples) OLS may not work (recall A.1). It may be biased and
More informationECON 594: Lecture #6
ECON 594: Lecture #6 Thomas Lemieux Vancouver School of Economics, UBC May 2018 1 Limited dependent variables: introduction Up to now, we have been implicitly assuming that the dependent variable, y, was
More informationBinary choice 3.3 Maximum likelihood estimation
Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation We explain here the various outputs from the maximum likelihood estimation procedure. Solution of the maximum likelihood
More informationBinary choice. Michel Bierlaire
Binary choice Michel Bierlaire Transport and Mobility Laboratory School of Architecture, Civil and Environmental Engineering Ecole Polytechnique Fédérale de Lausanne M. Bierlaire (TRANSP-OR ENAC EPFL)
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More informationNon-linear panel data modeling
Non-linear panel data modeling Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini May 2010 Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 1
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationThe Multilevel Logit Model for Binary Dependent Variables Marco R. Steenbergen
The Multilevel Logit Model for Binary Dependent Variables Marco R. Steenbergen January 23-24, 2012 Page 1 Part I The Single Level Logit Model: A Review Motivating Example Imagine we are interested in voting
More informationLecture notes to Chapter 11, Regression with binary dependent variables - probit and logit regression
Lecture notes to Chapter 11, Regression with binary dependent variables - probit and logit regression Tore Schweder October 28, 2011 Outline Examples of binary respons variables Probit and logit - examples
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationProportional hazards regression
Proportional hazards regression Patrick Breheny October 8 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/28 Introduction The model Solving for the MLE Inference Today we will begin discussing regression
More informationLecture 16 Solving GLMs via IRWLS
Lecture 16 Solving GLMs via IRWLS 09 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due next class problem set 6, November 18th Goals for today fixed PCA example
More informationLogistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20
Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)
More informationChapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a
Chapter 9 Regression with a Binary Dependent Variable Multiple Choice ) The binary dependent variable model is an example of a a. regression model, which has as a regressor, among others, a binary variable.
More informationGroup comparisons in logit and probit using predicted probabilities 1
Group comparisons in logit and probit using predicted probabilities 1 J. Scott Long Indiana University May 27, 2009 Abstract The comparison of groups in regression models for binary outcomes is complicated
More informationConfidence Intervals for the Odds Ratio in Logistic Regression with One Binary X
Chapter 864 Confidence Intervals for the Odds Ratio in Logistic Regression with One Binary X Introduction Logistic regression expresses the relationship between a binary response variable and one or more
More informationGeneralized Linear Models
Generalized Linear Models Lecture 7. Models with binary response II GLM (Spring, 2018) Lecture 7 1 / 13 Existence of estimates Lemma (Claudia Czado, München, 2004) The log-likelihood ln L(β) in logistic
More informationChapter 11. Regression with a Binary Dependent Variable
Chapter 11 Regression with a Binary Dependent Variable 2 Regression with a Binary Dependent Variable (SW Chapter 11) So far the dependent variable (Y) has been continuous: district-wide average test score
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More informationGradient-Based Learning. Sargur N. Srihari
Gradient-Based Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation
More informationSupport Vector Machines and Bayes Regression
Statistical Techniques in Robotics (16-831, F11) Lecture #14 (Monday ctober 31th) Support Vector Machines and Bayes Regression Lecturer: Drew Bagnell Scribe: Carl Doersch 1 1 Linear SVMs We begin by considering
More informationSingle-level Models for Binary Responses
Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =
More informationRegression with Nonlinear Transformations
Regression with Nonlinear Transformations Joel S Steele Portland State University Abstract Gaussian Likelihood When data are drawn from a Normal distribution, N (µ, σ 2 ), we can use the Gaussian distribution
More informationLikelihood-Based Methods
Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)
More informationFSAN815/ELEG815: Foundations of Statistical Learning
FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression Fall 2014 Course Objectives & Structure Course Objectives & Structure The course provides an introduction
More informationPOLI 7050 Spring 2008 February 27, 2008 Unordered Response Models I
POLI 7050 Spring 2008 February 27, 2008 Unordered Response Models I Introduction For the next couple weeks we ll be talking about unordered, polychotomous dependent variables. Examples include: Voter choice
More informationTMA 4275 Lifetime Analysis June 2004 Solution
TMA 4275 Lifetime Analysis June 2004 Solution Problem 1 a) Observation of the outcome is censored, if the time of the outcome is not known exactly and only the last time when it was observed being intact,
More informationECON Introductory Econometrics. Lecture 11: Binary dependent variables
ECON4150 - Introductory Econometrics Lecture 11: Binary dependent variables Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 11 Lecture Outline 2 The linear probability model Nonlinear probability
More informationIntroduction To Logistic Regression
Introduction To Lecture 22 April 28, 2005 Applied Regression Analysis Lecture #22-4/28/2005 Slide 1 of 28 Today s Lecture Logistic regression. Today s Lecture Lecture #22-4/28/2005 Slide 2 of 28 Background
More informationGeneralized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model
Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example
More informationEconomics Applied Econometrics II
Economics 217 - Applied Econometrics II Professor: Alan Spearot Email: aspearot@ucsc.edu Office Hours: 10AM-12PM Monday, 459 Engineering 2 TA: Bryan Pratt Email: brpratt@ucsc.edu Section times: Friday,
More informationNon-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
More informationGeneralized Linear Models. Last time: Background & motivation for moving beyond linear
Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationLogistic Regression and Generalized Linear Models
Logistic Regression and Generalized Linear Models Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/2 Topics Generative vs. Discriminative models In
More informationGeneralized Linear Models
Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.
More informationIterative Reweighted Least Squares
Iterative Reweighted Least Squares Sargur. University at Buffalo, State University of ew York USA Topics in Linear Classification using Probabilistic Discriminative Models Generative vs Discriminative
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationWeek 10: Theory of the Firm (Jehle and Reny, Chapter 3)
Week 10: Theory of the Firm (Jehle and Reny, Chapter 3) Tsun-Feng Chiang* *School of Economics, Henan University, Kaifeng, China November 22, 2015 First Last (shortinst) Short title November 22, 2015 1
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationEconometrics Problem Set 10
Econometrics Problem Set 0 WISE, Xiamen University Spring 207 Conceptual Questions Dependent variable: P ass Probit Logit LPM Probit Logit LPM Probit () (2) (3) (4) (5) (6) (7) Experience 0.03 0.040 0.006
More informationBinary Outcomes. Objectives. Demonstrate the limitations of the Linear Probability Model (LPM) for binary outcomes
Binary Outcomes Objectives Demonstrate the limitations of the Linear Probability Model (LPM) for binary outcomes Develop latent variable & transformational approach for binary outcomes Present several
More informationLecture 15: Logistic Regression
Lecture 15: Logistic Regression William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 15 What we ll learn in this lecture Model-based regression and classification Logistic regression
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationFor iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.
Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of
More information5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is
Practice Final Exam Last Name:, First Name:. Please write LEGIBLY. Answer all questions on this exam in the space provided (you may use the back of any page if you need more space). Show all work but do
More informationHomework Solutions Applied Logistic Regression
Homework Solutions Applied Logistic Regression WEEK 6 Exercise 1 From the ICU data, use as the outcome variable vital status (STA) and CPR prior to ICU admission (CPR) as a covariate. (a) Demonstrate that
More informationThe classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know
The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is
More informationThe classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA
The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.
More informationMaximum Likelihood Estimation. only training data is available to design a classifier
Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional
More informationstcrmix and Timing of Events with Stata
stcrmix and Timing of Events with Stata Christophe Kolodziejczyk, VIVE August 30, 2017 Introduction I will present a Stata command to estimate mixed proportional hazards competing risks models (stcrmix).
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 12: Logistic regression (v1) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 30 Regression methods for binary outcomes 2 / 30 Binary outcomes For the duration of this
More information