Outline. 1 Association in tables. 2 Logistic regression. 3 Extensions to multinomial, ordinal regression. sociology. 4 Multinomial logistic regression

Size: px
Start display at page:

Download "Outline. 1 Association in tables. 2 Logistic regression. 3 Extensions to multinomial, ordinal regression. sociology. 4 Multinomial logistic regression"

Transcription

1 UL Winter School Quantitative Stream Unit B2: Categorical Data Analysis Brendan Halpin Department of Socioy University of Limerick January 15 16, Overview Overview This 3-day section focuses on categorical data Analysis of data in tables, association between categorical variables : regression with a binary variable as dependent Extensions of istic regression multinomial, ordinal. Draws heavily on Chs 8 (part) and 15 of Agresti & Finlay 3 4 Association Tables display association between categorical variables Made evident by patterns of percentages Tested by χ 2 test How do we characterise association? Is there association? What form does it take? How strong is it? 5 6 Is there association? What form does it take? This is what the χ 2 test determines evidence of association Does not characterise nature or size! Depends on N Other tests exist, such as Fisher s exact test Examine percentages Compare observed and expected: residuals Standardised residuals: behave like z, i.e., should lie in range 2 : +2 about 95% of time, if independence is true z = O E E(1 row proportion)(1 col proportion) O E = E(1 R T )(1 C T ) 7 8

2 How strong is it? Ordinal variables Many possible measures of association Difference in proportions Ratio of proportions or relative rate Ratio of odds or odds ratio (see Ordinal variables may have more structured association Simpler pattern, anaous to correlation X high, Y high; X low, Y low 9 10 Characterising ordinal association Higher order tables Focus on concordant/discordant pairs Pairs of cases which differ on both variables Concordant: case that is higher on one variable also higher on other Discordant: higher on one, lower on the other Gamma, ˆγ = C D C+D Values range 1 γ +1 like correlation in interpretation Has asymptotic standard error t-test possible We can consider association in higher-order tables, e.g., 3-way Is the association between A and B the same for different values of C? Does the association between A and B dissapear if we control for C? Simpson s paradox etc. Scouting 1/3 Scouting example (ch 10): negative association between scouting and delinquency Control for family characteristics (church attendance) and it disappears See also death penalty example: note pattern of odds ratios Cochran-Mantel-Haenszel test: 2 2 k table H 0 : within each of k 2 2 panels, OR = 1 delinq scout 0 1 Total Total Scouting 2/3 Scouting 3/ church and delinq scout church scout Total Total

3 Loglinear modelling More complex questions and larger tables can be handled by linear modelling Treats all variables as dependent variables Can test null hypothesis of independence, as well as specified patterns of interaction Linear Probability Model OLS regression requires interval dependent variable Binary or yes/no dependent variables are not suitable Nor are rates, e.g., n successes out of m trials Errors are distinctly not normal While predicted value can be read as a probability, can depart from 0:1 range Particular difficulties with multiple explanatory variables. OLS gives the linear probability model in this case: Pr(Y = 1) = a + bx data is 0/1, prediction is probability Assumptions violated, but if predicted probabilities in range , not too bad See credit card example: becomes unrealistic only at very low or high income Logistic transformation Probability is bounded [0 : 1] OLS predicted value is unbounded How to transform probability to : range? Odds: p 1 p Log of odds: range is 0 : p 1 p has range : uses this as the dependent variable: ( ) Pr(Y = 1) = a + bx 1 Pr(Y = 1) Alternatively: Pr(Y = 1) 1 Pr(Y = 1) = ea+bx Or: Pr(Y = 1) = ea+bx 1 + e a+bx = e a bx Parameters The b parameter ) is the effect of a unit change in X on ( Pr(Y =1) 1 Pr(Y =1) This implies a multiplicative change of e b Pr(Y =1) in 1 Pr(Y =1), in the Odds Thus an odds ratio But the effect of b on P depends on the level of b See credit card example Death penalty example allows us to see the link between odds ratios and estimates In practice, inference is similar to OLS though based on a different ic For each explanatory variable, H 0 : β = 0 is the interesting null z = ˆβ SE is approximately normally distributed (large sample property) ( ) 2 More usually, the Wald test is used: ˆβ SE has a χ 2 distribution with one degree of freedom 23 24

4 Likelihood ratio tests Nested models The likelihood ratio test is thought more robust than the Wald test for smaller samples Where l 0 is the likelihood of the model without X j, and l 1 that with it, the quantity ( 2 l ) 0 = 2 ( l 0 l 1 ) l 1 is χ 2 distributed with one degree of freedom ( More generally, 2 lo l 1 ) tests nested models: where model 1 contains all the variables in model 0, plus m extra ones, it tests the null that all the extra $β$s are zero (χ 2 with m df) If we compare a model against the null model (no explanatory variables, it tests H 0 : β 1 = β 2 =... = β k = 0 Strong anay with F -test in OLS Maximum likelihood Maximum likelihood estimation Maximum likelihood Iterative search What is this likelihood? Unlike OLS, istic regression (and many, many other models) are extimated by maximum likelihood estimation In general this works by choosing values for the parameter estimates which maximise the probability (likelihood) of observing the actual data OLS can be ML estimated, and yields exactly the same results Sometimes the values can be chosen analytically A likelihood function is written, defining the probability of observing the actual data given parameter estimates Differential calculus derives the values of the parameters that maximise the likelihood, for a given data set Often, such closed form solutions are not possible, and the values for the parameters are chosen by a systematic computerised search (multiple iterations) Extremely flexible, allows estimation of a vast range of complex models within a single framework Maximum likelihood Likelihood as a quantity Tabular data Tabular data Either way, a given model yields a specific maximum likelihood for a give data set This is a probability, henced bounded [$0:1$] Reported as -likelihood, hence bounded [$- :0$] Thus is usually a large negative number Where an iterative solution is used, likelihood at each stage is usually reported normally getting nearer 0 at each step If all the explanatory variables are categorical (or have few fixed values) your data set can be represented as a table If we think of it as a table where each cell contains n yeses and m n noes (n successes out of m trials) we can fit grouped istic regression n successes out of m trials implies a binomial distribution of degree m n m n = α + βx The parameter estimates will be exactly the same as if the data were treated individually Tabular data Tabular data and goodness of fit Goodness of fit and accuracy of classification Fit with individual data But unlike with individual data, we can calculate goodness of fit, by relating observed successes to predicted in each cell If these are close we cannot reject the null hypothesis that the model is incorrect (i.e., you want a high p-value) Where l i is the likelihood of the current model, and l s is the likelihood of the saturated model the test statistic is ( 2 l ) i l s The saturated model predicts perfectly and has as many parameters as there are settings (cells in the table) The test has df of number of settings less number of parameters estimated, and is χ 2 distributed Where the number of settings (combinations of values of explanatory variables) is large, this approach to fit is not feasible Cannot be used with continuous covariates Hosmer-Lemeshow statistic attempts to create an anay Divide sample into deciles of predicted probability Calculate a fit measure based on observed and predicted numbers in the ten groups Simulation shows this is χ 2 distributed with 2 df Not a perfect solution, sensitive to how the cuts are made Pseudo-R 2 measures exist, but none approaches the clean interpretation as in OLS See general/psuedo_rsquareds.htm 31 32

5 Goodness of fit and accuracy of classification Predicting outcomes Goodness of fit and accuracy of classification Some problems Another way of assessing the adequacy of a it model is its accuracy of classification: True yes True no Predicted yes a c Predicted no b d Proportion correctly classified: a a+d a+b+c+d d c+d b+d Sensitivity: a+b ; Specificity: b False positive: ; False negative: c a+c Stata: estat class Zero cells in tables can cause problems: no yeses or no noes for particular settings Not automatically a problem but can give rise to attempts to estimate a parameter as or + If this happens, you will see a large parameter estimate and a huge standard error In individual data, sometimes certain combinations of variables have only successes or only failures In Stata, these cases are dropped from estimation you need to be aware of this as it changes the interpretation (you may wish to drop one of the offending variables instead) Extensions to multinomial, ordinal regression Multinomial istic regression Multinomial istic regression baseline category extension of binary What if we have multiple possible outcomes, not just two? is binary: yes/no Many interesting dependent variables have multiple categories voting intention by party first destination after second-level education housing tenure type We can use binary istic by recoding into two categories dropping all but two categories But that would lose information Multinomial istic regression baseline category extension of binary Multinomial istic regression Another idea: Pick one of the J categories as baseline For each of J 1 other categories, fit binary models contrasting that category with baseline Multinomial istic effectively does that, fitting J 1 models simultaneously P(Y = j) P(Y = J) = α j + β j X, j = 1,..., c 1 Which category is baseline is not critically important, but better for interpretation if it is reasonably large and coherent (i.e. "Other" is a poor choice) Multinomial istic regression baseline category extension of binary Predicting p from formula Multinomial istic regression Interpreting example, inference Example π j = α j + β j X π J π j αj +βj X = e π J αj +βj X π j = π J e J 1 J 1 π J = 1 π k = 1 π J π J = k=1 k=1 e αk+βkx J 1 k=1 eαk+βkx = 1 J k=1 eαk+βkx π j = αj +βj e X J k=1 eαk+βkx Let s attempt to predict housing tenure Owner occupier Local authority renter Private renter using age and employment status Employed Unemployed Not in labour force mit ten3 age i.eun 39 40

6 Multinomial istic regression Interpreting example, inference Stata output Multinomial istic regression Interpreting example, inference Interpretation Multinomial istic regression Number of obs = LR chi2(6) = Prob > chi2 = Log likelihood = Pseudo R2 = ten3 Coef. Std. Err. z P>z [95% Conf. Interval] 1 (base outcome) 2 age eun _cons age eun _cons Stata chooses category 1 (owner) as baseline Each panel is similar in interpretation to a binary regression on that category versus baseline Effects are on the of the odds of being in category j versus the baseline Multinomial istic regression Interpreting example, inference Ordinal it At one level inference is the same: Wald test for H o : β k = 0 LR test between nested models However, each variable has J 1 parameters Better to consider the LR test for dropping the variable across all contrasts: H 0 : j : β j k = 0 Thus retain a variable even for contrasts where it is insignificant as long as it has an effect overall Which category is baseline affects the parameter estimates but not the fit (-likelihood, predicted values, LR test on variables) Ordinal it Predicting ordinal outcomes Ordinal it Stereotype it Stereotype it While mit is attractive for multi-category outcomes, it is imparsimonious For nominal variables this is necessary, but for ordinal variables there should be a better way We consider three useful models Stereotype it it Continuation ratio or sequential it Each approaches the problem is a different way If outcome is ordinal we should see a pattern in the parameter estimates:. mit educ c.age i.sex if age>30 [...] Multinomial istic regression Number of obs = LR chi2(4) = Prob > chi2 = Log likelihood = Pseudo R2 = educ Coef. Std. Err. z P>z [95% Conf. Interval] Hi age sex _cons Med age sex _cons Lo (base outcome) Ordinal it Stereotype it Ordered parameter estimates Ordinal it Stereotype it Scale factor Low education is the baseline The effect of age: for high vs low for medium vs low 0.000, implicitly for low vs low Sex: , and Stereotype it fits a scale factor φ to the parameter estimates to capture this pattern Compare mit: P(Y = j) P(Y = J) = α j + β 1j X 1 + β 2j X, j = 1,..., J 1 with sit P(Y = j) P(Y = J) = α j + φ j β 1 X 1 + φ j β 2 X 2, j = 1,..., J 1 φ is zero for the baseline category, and 1 for the maximum It won t necessarily rank your categories in the right order: sometimes the effects of other variables do not coincide with how you see the ordinality 47 48

7 Ordinal it Stereotype it Sit example Ordinal it Stereotype it Interpreting φ. sit educ age i.sex if age>30 [...] Stereotype istic regression Number of obs = Wald chi2(2) = Log likelihood = Prob > chi2 = ( 1) [phi1_1]_cons = 1 educ Coef. Std. Err. z P>z [95% Conf. Interval] age sex /phi1_1 1 (constrained) /phi1_ /phi1_3 0 (base outcome) /theta /theta /theta3 0 (base outcome) (educ=lo is the base outcome) With low education as the baseline, we find φ estimates thus: High 1 Medium Low 0 That is, averaging across the variables, the effect of medium vs low is times that of high vs low The /theta terms are the α j s Ordinal it Stereotype it Surprises from sit Ordinal it Stereotype it Recover by including non-linear age sit is not guaranteed to respect the order Stereotype istic regression Number of obs = Wald chi2(2) = Log likelihood = Prob > chi2 = ( 1) [phi1_1]_cons = 1 educ Coef. Std. Err. z P>z [95% Conf. Interval] age sex /phi1_1 1 (constrained) /phi1_ /phi1_3 0 (base outcome) /theta /theta /theta3 0 (base outcome) (educ=lo is the base outcome) age has a strongly non-linear effect and changes the order of φ Stereotype istic regression Number of obs = Wald chi2(3) = Log likelihood = Prob > chi2 = ( 1) [phi1_1]_cons = 1 educ Coef. Std. Err. z P>z [95% Conf. Interval] age c.age#c.age sex /phi1_1 1 (constrained) /phi1_ /phi1_3 0 (base outcome) /theta /theta /theta3 0 (base outcome) (educ=lo is the base outcome) Ordinal it Stereotype it Stereotype it Ordinal it The proportional odds model Stereotype it treats ordinality as ordinality in terms of the explanatory variables There can be therefore disagreements between variables about the pattern of ordinality It can be extended to more dimensions, which makes sense for categorical variables whose categories can be thought of as arrayed across more than one dimension See Long and Freese, Ch 6.8 The most commonly used ordinal istic model has another ic It assumes the ordinal variable is based on an unobserved latent variable Unobserved cutpoints divide the latent variable into the groups indexed by the observed ordinal variable The model estimates the effects on the of the odds of being higher rather than lower across the cutpoints Ordinal it The model Ordinal it J 1 contrasts again, but different For j = 1 to J 1, P(Y > j) P(Y <= j) = α j + βx Only one β per variable, whose interpretation is the effect on the odds of being higher rather than lower One α per contrast, taking account of the fact that there are different proportions in each one But rather than compare categories against a baseline it splits into high and low, with all the data involved each time 55 56

8 Ordinal it An example Ordinal it Ordered istic: Stata output Using data from the BHPS, we predict the probability of each of 5 ordered responses to the assertion "homosexual relationships are wrong" Answers from 1: strongly agree, to 5: strongly disagree Sex and age as predictors descriptively women and younger people are more likely to disagree (i.e., have high values) Ordered istic regression Number of obs = LR chi2(2) = Prob > chi2 = Log likelihood = Pseudo R2 = ropfamr Coef. Std. Err. z P>z [95% Conf. Interval] 2.rsex rage /cut /cut /cut /cut Ordinal it Interpretation Ordinal it Linear predictor The betas are straightforward: The effect for women is The OR is e.8339 or Women s odds of being on the "approve" rather than the "disapprove" side of each contrast are times as big as men s Each year of age reduced the -odds by (OR 0.964). The cutpoints are odd: Stata sets up the model in terms of cutpoints in the latent variable, so they are actually α j Thus the α + βx or linear predictor for the contrast between strongly agree (1) and the rest is (2-5 versus 1) female age Between strongly disagree (5) and the rest (1-4 versus 5) and so on female age Ordinal it Predicted odds Ordinal it Predicted odds per contrast The predicted -odds lines are straight and parallel The highest relates to the 1-4 vs 5 contrast Parallel lines means the effect of a variable is the same across all contrasts Exponentiating, this means that the multiplicative effect of a variable is the same on all contrasts: hence "proportional odds" This is a key assumption Ordinal it Predicted probabilities relative to contrasts Ordinal it Predicted probabilities relative to contrasts We predict the probabilities of being above a particular contrast in the standard way Since age has a negative effect, downward sloping sigmoid curves Sigmoid curves are also parallel (same shape, shifted left-right) We get probabilities for each of the five states by subtraction 63 64

9 Ordinal it Ordinal it Testing proportional odds The key elements of inference are standard: Wald tests and LR tests Since there is only one parameter per variable it is more straightforward than MNL However, the key assumption of proportional odds (that there is only one parameter per variable) is often wrong. The effect of a variable on one contrast may differ from another Long and Freese s SPost Stata add-on contains a test for this It is possible to fit each contrast as a binary it The brant command does this, and tests that the parameter estimates are the same across the contrast It needs to use Stata s old-fashioned xi: prefix to handle categorical variables: xi: oit ropfamr i.rsex rage brant, detail Ordinal it Brant test output Ordinal it What to do?. brant, detail Estimated coefficients from j-1 binary regressions y>1 y>2 y>3 y>4 _Irsex_ rage _cons Brant Test of Parallel Regression Assumption Variable chi2 p>chi2 df All _Irsex_ rage A significant test statistic provides evidence that the parallel regression assumption has been violated. In this case the assumption is violated for both variables, but looking at the individual estimates, the differences are not big It s a big data set (14k cases) so it s easy to find departures from assumptions However, the departures can be meaningful. In this case it is worth fitting the "Generalised Ordinal Logit" model Ordinal it Generalised Ordinal Logit Ordinal it Sequential it Sequential it This extends the proportional odds model in this fashion P(Y > j) P(Y <= j) = α j + β j x That is, each variable has a per-contrast parameter At the most imparsimonious this is like a reparameterisation of the MNL in ordinal terms However, can constrain βs to be constant for some variables Get something intermediate, with violations of PO accommodated, but the parsimony of a single parameter where that is acceptable Download Richard William s goit2 to fit this model: ssc install goit2 Different ways of looking at ordinality suit different ordinal regression formations categories arrayed in one (or more) dimension(s): sit categories derived by dividing an unobserved continuum: oit etc categories that represent successive stages: the continuation-ratio model Where you get to higher stages by passing through lower ones, in which you could also stay Educational qualification: you can only progress to the next stage if you have completed all the previous ones Promotion: you can only get to a higher grade by passing through the lower grades Ordinal it Sequential it "Continuation ratio" model Ordinal it Sequential it J 1 contrasts again, again different Here the question is, given you reached level j, what is your chance of going further: P(Y > j) P(Y = j) = α + βx j For each level, the sample is anyone in level j or higher, and the outcome is being in level j + 1 or higher That is, for each contrast except the lowest, you drop the cases that didn t make it that far But rather than splitting high and low, with all the data involved each time, it drops cases below the baseline 71 72

10 Ordinal it Sequential it Fitting CR Ordinal it Sequential it seqit This model implies one equation for each contrast Can be fitted by hand by defining outcome variable and subsample for each contrast (ed has 4 values): Maarten Buis s seqit does it more or less automatically: gen con1 = ed>1 gen con2 = ed>2 replace con2 =. if ed<=1 gen con3 = ed>3 replace con3 =. if ed<=2 it con1 odoby i.osex it con2 odoby i.osex it con3 odoby i.osex seqit ed odoby i.osex, tree(1 : 2 3 4, 2 : 3 4, 3 : 4 ) you need to specify the contrasts You can impose constraints to make parameters equal across contrasts Alternative specific options Cit and the basic idea If we have information on the outcomes (e.g., how expensive they are) but not on the individuals, we can use cit Alternative specific options ascit and practical examples Modelling count data: Poisson distribution If we have information on the outcomes and also on the individuals, we can use ascit, alternative-specific it See examples in Long and Freese Count data raises other problems: Non-negative, integer, open-ended Neither a rate nor suitable for a continuous error distribution If the mean count is high (e.g., counts of newspaper readership) the normal approximation is good If the events are independent, with expected value (mean) µ, their distribution is Poisson: P(X = k) = µk e µ k! The variance is equal to the mean: also µ Poisson Density Poisson regression In the GLM framework we can use a regression to predict the mean value, and use the Poisson distribution to account for variation around the predicted value: Poisson regression Log link: effects on the count are multiplicative Poisson distribution: variation around the predicted count (not ) is Poisson In Stata, either poisson count xvar1 xvar2 or glm count xvar1 xvar2, link() family(poisson) 79 80

11 Over dispersion Negative binomial regression If the variance is higher than predicted, the SEs are not correct This may be due to omitted variables Or to events which are not independent of each other Adding relevant variables can help Negative binomial regression is a form of poisson regression that takes an extra parameter to account for overdispersion. If there is no overdispersion, poisson is more efficient Zero inflated, zero-truncated Another issue with count data is the special role of zero Sometimes we have zero-inflated data, where there is a subset of the population which is always zero and a subset which varies (and may be zero). This means that there are more zeros than poisson would predict, which will bias its estimates. Zero-inflated poisson (zip) can take account of this, by modelling the process that leads to the all-zeroes state Zero-inflated negative binomial (zinb) is a version that also copes with over dispersion Zero truncation also arises, for example in data collection schemes that only see cases with at least one event. ztp deals with this. 83

Outline. 1 Introduction. 2 Logistic regression. 3 Margins and graphical representations of effects. 4 Multinomial logistic regression.

Outline. 1 Introduction. 2 Logistic regression. 3 Margins and graphical representations of effects. 4 Multinomial logistic regression. Outline Brendan Halpin, Socioical Research Methods Cluster, Dept of Socioy, University of Limerick June 20-21 2016 1 2 3 4 Multinomial istic regression 5 6 1 2 What this is About these notes What this

More information

Binary Dependent Variables

Binary Dependent Variables Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous Binary Dependent Variables In some cases the outcome

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Exercise 7.4 [16 points]

Exercise 7.4 [16 points] STATISTICS 226, Winter 1997, Homework 5 1 Exercise 7.4 [16 points] a. [3 points] (A: Age, G: Gestation, I: Infant Survival, S: Smoking.) Model G 2 d.f. (AGIS).008 0 0 (AGI, AIS, AGS, GIS).367 1 (AG, AI,

More information

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

ECON 594: Lecture #6

ECON 594: Lecture #6 ECON 594: Lecture #6 Thomas Lemieux Vancouver School of Economics, UBC May 2018 1 Limited dependent variables: introduction Up to now, we have been implicitly assuming that the dependent variable, y, was

More information

Homework Solutions Applied Logistic Regression

Homework Solutions Applied Logistic Regression Homework Solutions Applied Logistic Regression WEEK 6 Exercise 1 From the ICU data, use as the outcome variable vital status (STA) and CPR prior to ICU admission (CPR) as a covariate. (a) Demonstrate that

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Statistical Modelling with Stata: Binary Outcomes

Statistical Modelling with Stata: Binary Outcomes Statistical Modelling with Stata: Binary Outcomes Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 21/11/2017 Cross-tabulation Exposed Unexposed Total Cases a b a + b Controls

More information

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013 Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

Generalised linear models. Response variable can take a number of different formats

Generalised linear models. Response variable can take a number of different formats Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion

More information

Logistic Regression. Continued Psy 524 Ainsworth

Logistic Regression. Continued Psy 524 Ainsworth Logistic Regression Continued Psy 524 Ainsworth Equations Regression Equation Y e = 1 + A+ B X + B X + B X 1 1 2 2 3 3 i A+ B X + B X + B X e 1 1 2 2 3 3 Equations The linear part of the logistic regression

More information

Using the same data as before, here is part of the output we get in Stata when we do a logistic regression of Grade on Gpa, Tuce and Psi.

Using the same data as before, here is part of the output we get in Stata when we do a logistic regression of Grade on Gpa, Tuce and Psi. Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 14, 2018 This handout steals heavily

More information

Lecture 2: Poisson and logistic regression

Lecture 2: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

Modelling Rates. Mark Lunt. Arthritis Research UK Epidemiology Unit University of Manchester

Modelling Rates. Mark Lunt. Arthritis Research UK Epidemiology Unit University of Manchester Modelling Rates Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 05/12/2017 Modelling Rates Can model prevalence (proportion) with logistic regression Cannot model incidence in

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Fitting Stereotype Logistic Regression Models for Ordinal Response Variables in Educational Research (Stata)

Fitting Stereotype Logistic Regression Models for Ordinal Response Variables in Educational Research (Stata) Journal of Modern Applied Statistical Methods Volume 13 Issue 2 Article 31 11-2014 Fitting Stereotype Logistic Regression Models for Ordinal Response Variables in Educational Research (Stata) Xing Liu

More information

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

Testing Independence

Testing Independence Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1

More information

Comparing IRT with Other Models

Comparing IRT with Other Models Comparing IRT with Other Models Lecture #14 ICPSR Item Response Theory Workshop Lecture #14: 1of 45 Lecture Overview The final set of slides will describe a parallel between IRT and another commonly used

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Lecture 5: Poisson and logistic regression

Lecture 5: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 3-5 March 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p ) Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p. 376-390) BIO656 2009 Goal: To see if a major health-care reform which took place in 1997 in Germany was

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

Analysing categorical data using logit models

Analysing categorical data using logit models Analysing categorical data using logit models Graeme Hutcheson, University of Manchester The lecture notes, exercises and data sets associated with this course are available for download from: www.research-training.net/manchester

More information

Chapter 11. Regression with a Binary Dependent Variable

Chapter 11. Regression with a Binary Dependent Variable Chapter 11 Regression with a Binary Dependent Variable 2 Regression with a Binary Dependent Variable (SW Chapter 11) So far the dependent variable (Y) has been continuous: district-wide average test score

More information

Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson

Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson Lecture 10: Alternatives to OLS with limited dependent variables PEA vs APE Logit/Probit Poisson PEA vs APE PEA: partial effect at the average The effect of some x on y for a hypothetical case with sample

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

Logistic Regression. Building, Interpreting and Assessing the Goodness-of-fit for a logistic regression model

Logistic Regression. Building, Interpreting and Assessing the Goodness-of-fit for a logistic regression model Logistic Regression In previous lectures, we have seen how to use linear regression analysis when the outcome/response/dependent variable is measured on a continuous scale. In this lecture, we will assume

More information

Lab 10 - Binary Variables

Lab 10 - Binary Variables Lab 10 - Binary Variables Spring 2017 Contents 1 Introduction 1 2 SLR on a Dummy 2 3 MLR with binary independent variables 3 3.1 MLR with a Dummy: different intercepts, same slope................. 4 3.2

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Chapter 5: Logistic Regression-I

Chapter 5: Logistic Regression-I : Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation NELS 88 Table 2.3 Adjusted odds ratios of eighth-grade students in 988 performing below basic levels of reading and mathematics in 988 and dropping out of school, 988 to 990, by basic demographics Variable

More information

Group Comparisons: Differences in Composition Versus Differences in Models and Effects

Group Comparisons: Differences in Composition Versus Differences in Models and Effects Group Comparisons: Differences in Composition Versus Differences in Models and Effects Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 15, 2015 Overview.

More information

crreg: A New command for Generalized Continuation Ratio Models

crreg: A New command for Generalized Continuation Ratio Models crreg: A New command for Generalized Continuation Ratio Models Shawn Bauldry Purdue University Jun Xu Ball State University Andrew Fullerton Oklahoma State University Stata Conference July 28, 2017 Bauldry

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs

Immigration attitudes (opposes immigration or supports it) it may seriously misestimate the magnitude of the effects of IVs Logistic Regression, Part I: Problems with the Linear Probability Model (LPM) Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 22, 2015 This handout steals

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Testing and Model Selection

Testing and Model Selection Testing and Model Selection This is another digression on general statistics: see PE App C.8.4. The EViews output for least squares, probit and logit includes some statistics relevant to testing hypotheses

More information

Problem Set 10: Panel Data

Problem Set 10: Panel Data Problem Set 10: Panel Data 1. Read in the data set, e11panel1.dta from the course website. This contains data on a sample or 1252 men and women who were asked about their hourly wage in two years, 2005

More information

Logistic Regression Models for Multinomial and Ordinal Outcomes

Logistic Regression Models for Multinomial and Ordinal Outcomes CHAPTER 8 Logistic Regression Models for Multinomial and Ordinal Outcomes 8.1 THE MULTINOMIAL LOGISTIC REGRESSION MODEL 8.1.1 Introduction to the Model and Estimation of Model Parameters In the previous

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

Inference for Binomial Parameters

Inference for Binomial Parameters Inference for Binomial Parameters Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 58 Inference for

More information

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

ECO220Y Simple Regression: Testing the Slope

ECO220Y Simple Regression: Testing the Slope ECO220Y Simple Regression: Testing the Slope Readings: Chapter 18 (Sections 18.3-18.5) Winter 2012 Lecture 19 (Winter 2012) Simple Regression Lecture 19 1 / 32 Simple Regression Model y i = β 0 + β 1 x

More information

Poisson regression: Further topics

Poisson regression: Further topics Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Lecture 3.1 Basic Logistic LDA

Lecture 3.1 Basic Logistic LDA y Lecture.1 Basic Logistic LDA 0.2.4.6.8 1 Outline Quick Refresher on Ordinary Logistic Regression and Stata Women s employment example Cross-Over Trial LDA Example -100-50 0 50 100 -- Longitudinal Data

More information

Case of single exogenous (iv) variable (with single or multiple mediators) iv à med à dv. = β 0. iv i. med i + α 1

Case of single exogenous (iv) variable (with single or multiple mediators) iv à med à dv. = β 0. iv i. med i + α 1 Mediation Analysis: OLS vs. SUR vs. ISUR vs. 3SLS vs. SEM Note by Hubert Gatignon July 7, 2013, updated November 15, 2013, April 11, 2014, May 21, 2016 and August 10, 2016 In Chap. 11 of Statistical Analysis

More information

CHAPTER 1: BINARY LOGIT MODEL

CHAPTER 1: BINARY LOGIT MODEL CHAPTER 1: BINARY LOGIT MODEL Prof. Alan Wan 1 / 44 Table of contents 1. Introduction 1.1 Dichotomous dependent variables 1.2 Problems with OLS 3.3.1 SAS codes and basic outputs 3.3.2 Wald test for individual

More information

7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure).

7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure). 1 Neuendorf Logistic Regression The Model: Y Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and dichotomous (binomial; 2-value), categorical/nominal data for a single DV... bear in mind that

More information

LOGISTICS REGRESSION FOR SAMPLE SURVEYS

LOGISTICS REGRESSION FOR SAMPLE SURVEYS 4 LOGISTICS REGRESSION FOR SAMPLE SURVEYS Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-002 4. INTRODUCTION Researchers use sample survey methodology to obtain information

More information

Generalized Models: Part 1

Generalized Models: Part 1 Generalized Models: Part 1 Topics: Introduction to generalized models Introduction to maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical outcomes

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Statistical Modelling in Stata 5: Linear Models

Statistical Modelling in Stata 5: Linear Models Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does

More information

Ordinal Variables in 2 way Tables

Ordinal Variables in 2 way Tables Ordinal Variables in 2 way Tables Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 C.J. Anderson (Illinois) Ordinal Variables

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Chapter 19: Logistic regression

Chapter 19: Logistic regression Chapter 19: Logistic regression Self-test answers SELF-TEST Rerun this analysis using a stepwise method (Forward: LR) entry method of analysis. The main analysis To open the main Logistic Regression dialog

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method

Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method Yan Wang 1, Michael Ong 2, Honghu Liu 1,2,3 1 Department of Biostatistics, UCLA School

More information

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21 Sections 2.3, 2.4 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 21 2.3 Partial association in stratified 2 2 tables In describing a relationship

More information

BIOS 625 Fall 2015 Homework Set 3 Solutions

BIOS 625 Fall 2015 Homework Set 3 Solutions BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

Generalized linear models

Generalized linear models Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data

More information

A Journey to Latent Class Analysis (LCA)

A Journey to Latent Class Analysis (LCA) A Journey to Latent Class Analysis (LCA) Jeff Pitblado StataCorp LLC 2017 Nordic and Baltic Stata Users Group Meeting Stockholm, Sweden Outline Motivation by: prefix if clause suest command Factor variables

More information

Logistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015

Logistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015 Logistic regression: Why we often can do what we think we can do Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015 1 Introduction Introduction - In 2010 Carina Mood published an overview article

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 20, 2018

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 20, 2018 Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 20, 2018 References: Long 1997, Long and Freese 2003 & 2006 & 2014,

More information

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models Chapter 6 Multicategory Logit Models Response Y has J > 2 categories. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. 6.1 Logit Models for Nominal Responses

More information

Statistical Analysis of List Experiments

Statistical Analysis of List Experiments Statistical Analysis of List Experiments Graeme Blair Kosuke Imai Princeton University December 17, 2010 Blair and Imai (Princeton) List Experiments Political Methodology Seminar 1 / 32 Motivation Surveys

More information

Chapter 2: Describing Contingency Tables - II

Chapter 2: Describing Contingency Tables - II : Describing Contingency Tables - II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]

More information

ECON Introductory Econometrics. Lecture 11: Binary dependent variables

ECON Introductory Econometrics. Lecture 11: Binary dependent variables ECON4150 - Introductory Econometrics Lecture 11: Binary dependent variables Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 11 Lecture Outline 2 The linear probability model Nonlinear probability

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 7 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 68 Outline of Lecture 7 1 Empirical example: Italian labor force

More information

ssh tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm

ssh tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm Kedem, STAT 430 SAS Examples: Logistic Regression ==================================== ssh abc@glue.umd.edu, tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm a. Logistic regression.

More information

LOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi

LOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi LOGISTIC REGRESSION Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi- lmbhar@gmail.com. Introduction Regression analysis is a method for investigating functional relationships

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Comparing Logit & Probit Coefficients Between Nested Models (Old Version)

Comparing Logit & Probit Coefficients Between Nested Models (Old Version) Comparing Logit & Probit Coefficients Between Nested Models (Old Version) Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 1, 2016 NOTE: Long and Freese s spost9

More information

A Re-Introduction to General Linear Models (GLM)

A Re-Introduction to General Linear Models (GLM) A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing

More information

Cohen s s Kappa and Log-linear Models

Cohen s s Kappa and Log-linear Models Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information