Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013
|
|
- Claire Patrick
- 5 years ago
- Views:
Transcription
1 Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1
2 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2
3 Things not covered (but still fit into the topic) Matched pairs/repeated measures McNemar s Chi-Square Reliability Cohen s Kappa ROC Poisson (Count) models Categorical SEM Tetrachoric Correlation Bernoulli Trials 3
4 Data Types (Levels of Measurement) Discrete/Categorical/ Qualitative Continuous/ Quantitative Nominal/Multinomial: Rank Order/Ordinal: Binary/Dichotomous/ Binomial: Properties: Values arbitrary (no magnitude) No direction (no ordering) Example: Race: 1=AA, 2=Ca, 3=As Measures: Mode, relative frequency Properties: Values semi-arbitrary (no magnitude?) Have direction (ordering) Example: Lickert Scales (LICK-URT): 1-5, Strongly Disagree to Strongly Agree Measures: Mode, relative frequency, median Mean? Properties: 2 Levels Special case of Ordinal or Multinomial Examples: Gender (Multinomial) Disease (Y/N) Measures: Mode, relative frequency, Mean? 4
5 Code 1.1 Contingency Tables Often called Two-way tables or Cross-Tab Have dimensions I x J Can be used to test hypotheses of association between categorical variables 2 X 3 Table Age Groups Gender <40 Years Years >50 Year Female Male
6 Contingency Tables: Test of Independence Chi-Square Test of Independence (χ 2 ) Calculate χ 2 Determine DF: (I-1) * (J-1) Compare to χ 2 critical value for given DF. 2 X 3 Table Age Groups Gender <40 Years Years >50 Year Female Male C1=265 C2=331 C3=264 R1=156 R2=664 N=820 χ 2 = n i=1 O i E 2 Where: O i = Observed Freq i E E E i,j = R i C j i = Expected Freq i N n = number of cells in table 6
7 Code 1.2 Contingency Tables: Test of Independence Pearson Chi-Square Test of Independence (χ 2 ) H 0 : No Association H A : Association.where, how? Not appropriate when Expected (E i ) cell size freq < 5 Use Fisher s Exact Chi-Square χ 2 df 2 = 23.39, p < X 3 Table Age Groups Gender <40 Years Years >50 Year Female Male C1=265 C2=331 C3=264 R1=156 R2=664 N=820 7
8 Contingency Tables 2x2 Disorder (Outcome) Yes No Risk Factor/ Exposure Yes No a c b d a+b c+d a+c b+d a+b+c+d 8
9 Contingency Tables: Measures of Association a= Alcohol Use Yes No Depression Yes 25 c= 20 No b= 10 d= Probability : Depression given Alcohol Use P D A = a a + b = = Depression given NO Alcohol Use P D A = c c + d = = Odds: Depression given Alcohol Use P D A Odds D A = 1 P D A = = 2.5 Depression given NO Alcohol Use P D A Odds D A = 1 P D A = = 0.44 Contrasting Probability: Relative Risk (RR) = P D A) P(D A) = = 2.31 Individuals who used alcohol were 2.31 times more likely to have depression than those who do not use alcohol Contrasting Odds: Odds Ratio(OR) = Odds D A) Odds(D A) = = 5.62 The odds for depression were 5.62 times greater in Alcohol users compared to nonusers. 9
10 Why Odds Ratios? Alcohol Use Yes No Depression Yes a= 25 c= No b= 10*i d= 45*i 55*i i=1 to 45 ( *i) ( *i) ( *i) OR / RR Overall Probability of Depression RR OR 10
11 The Generalized Linear Model General Linear Model (LM) Continuous Outcomes (DV) Linear Regression, t-test, Pearson correlation, ANOVA, ANCOVA Generalized Linear Model (GLM) John Nelder and Robert Wedderburn Maximum Likelihood Estimation Continuous, Categorical, and Count outcomes. Distribution Family and Link Functions Error distributions that are not normal 11
12 Logistic Regression This is the most important model for categorical response data Agresti (Categorical Data Analysis, 2 nd Ed.) Binary Response Predicting Probability (related to the Probit model) Assume (the usual): Independence NOT Homoscedasticity or Normal Errors Linearity (in the Log Odds) Also.adequate cell sizes. 12
13 Logistic Regression The Model Y = π x = e α+ β 1x1 1+e α+ β 1x1 In terms of probability of success π(x) logit π x = ln π(x) 1 π(x) In terms of Logits (Log Odds) = α + β 1 x 1 Logit transform gives us a linear equation 13
14 Code 2.1 Logistic Regression: Example The Output as Logits Logits: H 0 : β=0 Freq. Percent Not Depressed Depressed Y=Depressed Coef SE Z P CI α (_constant) < , Conversion to Probability: e β 1 + e = e 1.51 β = e 1.51 What does H 0 : β=0 mean? e β 1+e β = e0 1+e 0 = 0.5 Conversion to Odds e β = e 1.51 = 0.22 Also=0.1805/0.8195=
15 Code 2.2 Logistic Regression: Example The Output as ORs Odds Ratios: H 0 : β=1 Y=Depressed OR SE Z P CI α (_constant) < , Conversion to Probability: OR = = OR Conversion to Logit (log odds!) Ln(OR) = logit Ln(0.220)=-1.51 Freq. Percent Not Depressed Depressed
16 Code 2.3 Logistic Regression: Example Logistic Regression w/ Single Continuous Predictor: log π(depressed) 1 π(depressed) = α + β(age) AS LOGITS: Y=Depressed Coef SE Z P CI α (_constant) < , β (age) , Interpretation: A 1 unit increase in age results in a increase in the log-odds of depression. Hmmmm.I have no concept of what a log-odds is. Interpret as something else. Logit > 0 so as age increases the risk of depression increases. OR=e^0.013 = For a 1 unit increase in age, there is a increase in the odds of depression. We could also say: For a 1 unit increase in age there is 1.3% increase in the odds of depression[ (1-OR)*100 % change] 16
17 Logistic Regression: GOF Overall Model Likelihood-Ratio Chi-Square Omnibus test for the model Overall model fit? Relative to other models Compares specified model with Null model (no predictors) Χ 2 =-2*(LL 0 -LL 1 ), DF=K parameters estimated 17
18 Code 2.4 Logistic Regression: GOF (Summary Measures) Pseudo-R 2 Not the same meaning as linear regression. There are many of them (Cox and Snell/McFadden) Only comparable within nested models of the same outcome. Hosmer-Lemeshow Models with Continuous Predictors Is the model a better fit than the NULL model. X 2 H 0 : Good Fit for Data, so we want p>0.05 Order the predicted probabilities, group them (g=10) by quantiles, Chi-Square of Group * Outcome using. Df=g-2 Conservative (rarely rejects the null) Pearson Chi-Square Models with categorical predictors Similar to Hosmer-Lemeshow ROC-Area Under the Curve Predictive accuracy/classification 18
19 Code 2.5 Logistic Regression: GOF (Diagnostic Measures) Outliers in Y (Outcome) Pearson Residuals Square root of the contribution to the Pearson χ 2 Deviance Residuals Square root of the contribution to the likeihood-ratio test statistic of a saturated model vs fitted model. Outliers in X (Predictors) Leverage (Hat Matrix/Projection Matrix) Maps the influence of observed on fitted values Influential Observations Pregibon s Delta-Beta influence statistic Similar to Cook s-d in linear regression Detecting Problems Residuals vs Predictors Leverage Vs Residuals Boxplot of Delta-Beta 19
20 Logistic Regression: GOF log π(depressed) 1 π(depressed) = α + β 1 (age) L-R χ 2 (df=1): 2.47, p= H-L GOF: Number of Groups: 10 H-L Chi 2 : 7.12 DF: 8 P: McFadden s R 2 : Y=Depressed Coef SE Z P CI α (_constant) < , β (age) ,
21 Code 2.6 Logistic Regression: Diagnostics Linearity in the Log-Odds Use a lowess (loess) plot Depressed vs Age Lowess smoother Logit transformed smooth Depressed (Logit) age bandwidth =.8 21
22 Code 2.7 Logistic Regression: Example Logistic Regression w/ Single Categorical Predictor: log AS OR: π(depressed) 1 π(depressed) = α + β 1 (gender) Y=Depressed OR SE Z P CI α (_constant) < , β (male) < , Interpretation: The odds of depression are times lower for males compared to females. We could also say: The odds of depression are ( =.701) 70.1% less in males compared to females. Or why not just make males the reference so the OR is positive. Or we could just take the inverse and accomplish the same thing. 1/0.299 =
23 Ordinal Logistic Regression Also called Ordered Logistic or Proportional Odds Model Extension of Binary Logistic Model >2 Ordered responses New Assumption! Proportional Odds BMI3GRP (1=Normal Weight, 2=Overweight, 3=Obese) The predictors effect on the outcome is the same across levels of the outcome. Bmi3grp (1 vs 2,3) = B(age) Bmi3grp (1,2 vs 3) = B(age) 23
24 Ordinal Logistic Regression The Model A latent variable model (Y*) j= number of levels-1 Y = logit(p 1 + p 2 + p j ) = ln βx p 1 +p 2 +p j 1 p 1 p 2 p j = α j + From the equation we can see that the odds ratio is assumed to be independent of the category j 24
25 Code 3.1 Ordinal Logistic Regression Example AS LOGITS: Y=bmi3grp Coef SE Z P CI β1 (age) < , β2 (blood_press) , Threshold1/cut , Threshold2/cut , For a 1 unit increase in Blood Pressure there is a increase in the log-odds of being in a higher bmi category AS OR: Y=bmi3grp OR SE Z P CI β1 (age) < , β2 (blood_press) , Threshold1/cut , Threshold2/cut , For a 1 unit increase in Blood Pressure the odds of being in a higher bmi category are times greater. 25
26 Code 3.2 Ordinal Logistic Regression: GOF Assessing Proportional Odds Assumptions Brant Test of Parallel Regression H 0 : Proportional Odds, thus want p >0.05 Tests each predictor separately and overall Score Test of Parallel Regression H 0 : Proportional Odds, thus want p >0.05 Approx Likelihood-ratio test H 0 : Proportional Odds, thus want p >
27 Code 3.3 Ordinal Logistic Regression: GOF Pseudo R 2 Diagnostics Measures Performed on the j-1 binomial logistic regressions 27
28 Multinomial Logistic Regression Also called multinomial logit/polytomous logistic regression. Same assumptions as the binary logistic model >2 non-ordered responses Or You ve failed to meet the parallel odds assumption of the Ordinal Logistic model 28
29 Multinomial Logistic Regression The Model j= levels for the outcome J=reference level π j x = P Y = j x) where x is a fixed setting of an explanatory variable logit π j (x) = ln π j(x) π J (x) = α + β j1 x 1 + β jp x p Notice how it appears we are estimating a Relative Risk and not an Odds Ratio. It s actually an OR. Similar to conducting separate binary logistic models, but with better type 1 error control 29
30 Code 4.1 Multinomial Logistic Regression Example Does degree of supernatural belief indicate a religious preference? AS OR: Y=religion (ref=catholic(1)) Protestant (2) OR SE Z P CI β (supernatural) , α (_constant) , Evangelical (3) β (supernatural) , α (_constant) < , For a 1 unit increase in supernatural belief, there is a (1-OR= %change) 21.8% increase in the probability of being an Evangelical compared to Catholic. 30
31 Multinomial Logistic Regression GOF Limited GOF tests. Look at LR Chi-square and compare nested models. Essentially, all models are wrong, but some are useful George E.P. Box Pseudo R 2 Similar to Ordinal Perform tests on the j-1 binomial logistic regressions 31
32 Resources Categorical Data Analysis by Alan Agresti UCLA Stat Computing: 32
Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression
Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationStatistical Modelling with Stata: Binary Outcomes
Statistical Modelling with Stata: Binary Outcomes Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 21/11/2017 Cross-tabulation Exposed Unexposed Total Cases a b a + b Controls
More informationCorrelation and regression
1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,
More informationLogistic Regression. Continued Psy 524 Ainsworth
Logistic Regression Continued Psy 524 Ainsworth Equations Regression Equation Y e = 1 + A+ B X + B X + B X 1 1 2 2 3 3 i A+ B X + B X + B X e 1 1 2 2 3 3 Equations The linear part of the logistic regression
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks
More informationLogistic Regression. Building, Interpreting and Assessing the Goodness-of-fit for a logistic regression model
Logistic Regression In previous lectures, we have seen how to use linear regression analysis when the outcome/response/dependent variable is measured on a continuous scale. In this lecture, we will assume
More informationGeneralized Linear Models
York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear
More informationBasic Medical Statistics Course
Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable
More informationCategorical data analysis Chapter 5
Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases
More informationThe material for categorical data follows Agresti closely.
Exam 2 is Wednesday March 8 4 sheets of notes The material for categorical data follows Agresti closely A categorical variable is one for which the measurement scale consists of a set of categories Categorical
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationStatistics in medicine
Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu
More informationCohen s s Kappa and Log-linear Models
Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance
More informationChapter 5: Logistic Regression-I
: Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses
ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationExperimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.
Experimental Design and Statistical Methods Workshop LOGISTIC REGRESSION Jesús Piedrafita Arilla jesus.piedrafita@uab.cat Departament de Ciència Animal i dels Aliments Items Logistic regression model Logit
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More informationSections 4.1, 4.2, 4.3
Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3
More informationSTA102 Class Notes Chapter Logistic Regression
STA0 Class Notes Chapter 0 0. Logistic Regression We continue to study the relationship between a response variable and one or more eplanatory variables. For SLR and MLR (Chapters 8 and 9), our response
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationTurning a research question into a statistical question.
Turning a research question into a statistical question. IGINAL QUESTION: Concept Concept Concept ABOUT ONE CONCEPT ABOUT RELATIONSHIPS BETWEEN CONCEPTS TYPE OF QUESTION: DESCRIBE what s going on? DECIDE
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationSmall n, σ known or unknown, underlying nongaussian
READY GUIDE Summary Tables SUMMARY-1: Methods to compute some confidence intervals Parameter of Interest Conditions 95% CI Proportion (π) Large n, p 0 and p 1 Equation 12.11 Small n, any p Figure 12-4
More informationReview: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:
Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic
More informationInvestigating Models with Two or Three Categories
Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might
More informationBMI 541/699 Lecture 22
BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationGeneralized linear models
Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data
More information11. Generalized Linear Models: An Introduction
Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationGeneralized logit models for nominal multinomial responses. Local odds ratios
Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π
More informationGeneralized Linear Models: An Introduction
Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June 2006 2006 by John Fox Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn,
More information13.1 Categorical Data and the Multinomial Experiment
Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)
More informationProcedia - Social and Behavioral Sciences 109 ( 2014 )
Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 09 ( 04 ) 730 736 nd World Conference On Business, Economics And Management - WCBEM 03 Categorical Principal
More informationStat 642, Lecture notes for 04/12/05 96
Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal
More informationModelling Binary Outcomes 21/11/2017
Modelling Binary Outcomes 21/11/2017 Contents 1 Modelling Binary Outcomes 5 1.1 Cross-tabulation.................................... 5 1.1.1 Measures of Effect............................... 6 1.1.2 Limitations
More informationSOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU.
SOS3003 Applied data analysis for social science Lecture note 08-00 Erling Berge Department of sociology and political science NTNU Erling Berge 00 Literature Logistic regression II Hamilton Ch 7 p7-4
More informationTesting Independence
Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1
More informationTreatment Variables INTUB duration of endotracheal intubation (hrs) VENTL duration of assisted ventilation (hrs) LOWO2 hours of exposure to 22 49% lev
Variable selection: Suppose for the i-th observational unit (case) you record ( failure Y i = 1 success and explanatory variabales Z 1i Z 2i Z ri Variable (or model) selection: subject matter theory and
More informationChapter 1. Modeling Basics
Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical
More informationSTAT 7030: Categorical Data Analysis
STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More informationSingle-level Models for Binary Responses
Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =
More information7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis
Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationGoodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links
Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department
More informationLogistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20
Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)
More informationCh 6: Multicategory Logit Models
293 Ch 6: Multicategory Logit Models Y has J categories, J>2. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. In R, we will fit these models using the
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationEPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7
Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review
More informationREVISED PAGE PROOFS. Logistic Regression. Basic Ideas. Fundamental Data Analysis. bsa350
bsa347 Logistic Regression Logistic regression is a method for predicting the outcomes of either-or trials. Either-or trials occur frequently in research. A person responds appropriately to a drug or does
More informationModel Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV
More informationMore Statistics tutorial at Logistic Regression and the new:
Logistic Regression and the new: Residual Logistic Regression 1 Outline 1. Logistic Regression 2. Confounding Variables 3. Controlling for Confounding Variables 4. Residual Linear Regression 5. Residual
More informationReview of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models
Chapter 6 Multicategory Logit Models Response Y has J > 2 categories. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. 6.1 Logit Models for Nominal Responses
More informationNotes for week 4 (part 2)
Notes for week 4 (part 2) Ben Bolker October 3, 2013 Licensed under the Creative Commons attribution-noncommercial license (http: //creativecommons.org/licenses/by-nc/3.0/). Please share & remix noncommercially,
More informationGeneralized Linear Models
Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More information7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure).
1 Neuendorf Logistic Regression The Model: Y Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and dichotomous (binomial; 2-value), categorical/nominal data for a single DV... bear in mind that
More informationUnit 5 Logistic Regression Practice Problems
Unit 5 Logistic Regression Practice Problems SOLUTIONS R Users Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition. Boca Raton: Chapman and Hall, 2004. Exercises
More informationHierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!
Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationLogistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression
Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024
More informationSTAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).
STAT 515 -- Chapter 13: Categorical Data Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). Many studies allow for more than 2 categories. Example
More informationNormal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,
Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationLog-linear Models for Contingency Tables
Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationChapter 4: Generalized Linear Models-I
: Generalized Linear Models-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More informationBinary Dependent Variables
Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous Binary Dependent Variables In some cases the outcome
More information(c) Interpret the estimated effect of temperature on the odds of thermal distress.
STA 4504/5503 Sample questions for exam 2 1. For the 23 space shuttle flights that occurred before the Challenger mission in 1986, Table 1 shows the temperature ( F) at the time of the flight and whether
More informationSimple logistic regression
Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a
More informationST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios
ST3241 Categorical Data Analysis I Two-way Contingency Tables 2 2 Tables, Relative Risks and Odds Ratios 1 What Is A Contingency Table (p.16) Suppose X and Y are two categorical variables X has I categories
More informationGeneralized Additive Models
Generalized Additive Models The Model The GLM is: g( µ) = ß 0 + ß 1 x 1 + ß 2 x 2 +... + ß k x k The generalization to the GAM is: g(µ) = ß 0 + f 1 (x 1 ) + f 2 (x 2 ) +... + f k (x k ) where the functions
More informationUnit 5 Logistic Regression
PubHlth 640 - Spring 2014 5. Logistic Regression Page 1 of 63 Unit 5 Logistic Regression To all the ladies present and some of those absent - Jerzy Neyman What behaviors influence the chances of developing
More informationExam details. Final Review Session. Things to Review
Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationIntroduction to logistic regression
Introduction to logistic regression Tuan V. Nguyen Professor and NHMRC Senior Research Fellow Garvan Institute of Medical Research University of New South Wales Sydney, Australia What we are going to learn
More informationIntroduction To Logistic Regression
Introduction To Lecture 22 April 28, 2005 Applied Regression Analysis Lecture #22-4/28/2005 Slide 1 of 28 Today s Lecture Logistic regression. Today s Lecture Lecture #22-4/28/2005 Slide 2 of 28 Background
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationAn introduction to biostatistics: part 1
An introduction to biostatistics: part 1 Cavan Reilly September 6, 2017 Table of contents Introduction to data analysis Uncertainty Probability Conditional probability Random variables Discrete random
More informationLogistic Regression Models for Multinomial and Ordinal Outcomes
CHAPTER 8 Logistic Regression Models for Multinomial and Ordinal Outcomes 8.1 THE MULTINOMIAL LOGISTIC REGRESSION MODEL 8.1.1 Introduction to the Model and Estimation of Model Parameters In the previous
More informationNELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation
NELS 88 Table 2.3 Adjusted odds ratios of eighth-grade students in 988 performing below basic levels of reading and mathematics in 988 and dropping out of school, 988 to 990, by basic demographics Variable
More informationLISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014
LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers
More informationGood Confidence Intervals for Categorical Data Analyses. Alan Agresti
Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline
More informationUnit 5 Logistic Regression
BIOSTATS 640 - Spring 2017 5. Logistic Regression Page 1 of 65 Unit 5 Logistic Regression To all the ladies present and some of those absent - Jerzy Neyman What behaviors influence the chances of developing
More informationLogistic Regression: Regression with a Binary Dependent Variable
Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationData-analysis and Retrieval Ordinal Classification
Data-analysis and Retrieval Ordinal Classification Ad Feelders Universiteit Utrecht Data-analysis and Retrieval 1 / 30 Strongly disagree Ordinal Classification 1 2 3 4 5 0% (0) 10.5% (2) 21.1% (4) 42.1%
More informationThe Flight of the Space Shuttle Challenger
The Flight of the Space Shuttle Challenger On January 28, 1986, the space shuttle Challenger took off on the 25 th flight in NASA s space shuttle program. Less than 2 minutes into the flight, the spacecraft
More informationPoisson regression: Further topics
Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to
More informationCHAPTER 1: BINARY LOGIT MODEL
CHAPTER 1: BINARY LOGIT MODEL Prof. Alan Wan 1 / 44 Table of contents 1. Introduction 1.1 Dichotomous dependent variables 1.2 Problems with OLS 3.3.1 SAS codes and basic outputs 3.3.2 Wald test for individual
More informationTento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/
Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.
More information