Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression
|
|
- Ariel Newton
- 6 years ago
- Views:
Transcription
1 Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = weight height Interpretation of linear regression For a given height, PEmax grows by 1.024cmH 2 O per kg body weight. For a given weight, PEmax grows by 0.147cmH 2 O per cm body height. The effect of a single explaining variable is conditional on the other variables present in the model. The effect of each explaining variable is linear variable Number / Frequency Other types of outcomes these are integers; the error term could not be normal... instead look at the mean value: Still we have the problem: E(y) = b 0 + b 1 x 1i + b 2 x 2i mean of a 0-1 variable X: E(X) = P(X = 1) = p p [0,1] a count has a its mean value [0,+ ] 0-1 response variable: Wound infection (dependence on age and on operation time?) p inf optime age p inf optime age p inf optime age p inf optime age
2 p inf optime age p inf optime age p inf optime age p inf optime age Analysis of a 0-1 response variable Response variable binary ( 0 / 1 ) how is the dependence on operation time (optime) and age (age) described? Model for Not good to use p = P {Wound infection} ( [0, 1])? p = a + b 1 x 1 + b 2 x 2! since this would usually not stick to [0,1] Logistic regression Binary outcome (e.g. 1 for success ): Y {0, 1} Probability for success : p = P {Y = 1} [0,1] Odds for success : ω = p 1 p [0,+ ] p = ω 1 + ω Odds-ratio (2 groups): OR = p 1 1 p 1 / p2 1 p 2 [0,+ ] Log-odds: logit is the link function. Linear predictor: Predicted odds: Logistic regression (ctd.) logit(p) = ln ( p 1 p ) [,+ ] logit(p) = b o + b 1 x 1 + b 2 x 2 = η ω = exp(η) Predicted probability: p = ω 1 + ω = exp(η) 1 + exp(η) 7 8
3 Logistic regression interpretation Two groups, with probabilities p 1 and p 2 : ( ) p1 logit(p 1 ) logit(p 2 ) = ln 1 p 1 ( p2 ln ( / p1 p2 = ln 1 p 1 1 p 2 = ln(or) 1 p 2 ) A linear model for logit(p) yields comparisons via odds-ratios. ) 9 Logistic regression in wound infection data Y = { 1 post-operative wound infection 0 no post-operative wound infection p = P {postoperative wound infection} x 1 = operation time in minutes x 2 = age in years Estimated model: logit(p) = x x 2 exp( x x 2 ) p = 1 + exp( x x 2 ) 10 Interpretation of logistic regression: Same operation time (T) Age difference of 10 years (A + 10 vs. A) logit(p 1 ) = T (A + 10) logit(p 2 ) = T A ln(or A+10,A ) = OR A+10,A = exp(0.353) = What does that mean? OR A+10,A = exp(0.353) = If age increases by 10 years, the odds to get a wound infection increases by a factor 1.423, i.e. by 42.3% Odds-ratio refers to the difference in odds for disease between two levels of an explaining variable
4 Calculation of probabilities: ( ) p logit(p) = ln = b 0 + b 1 x 1i + b 2 x 2i 1 p exp(b 0 + b 1 x 1i + b 2 x 2i ) p = 1 + exp(b 0 + b 1 x 1i + b 2 x 2i ) 1 p = exp(b 0 + b 1 x 1i + b 2 x 2i ) The Example yields: logit(p {optime=200 min, age=60 years}) = = = p = e = 1 + e = Dependence of p on age for different operation times Dependence of p on operation time for different ages Predicted probability: 30, 120 and 240 min Predicted probability, 50, 60, 75 years Age Operation time 15 16
5 What does the intercept mean here? ( ) p logit(p) = ln = b 0 + b 1 x 1i + b 2 x 2i 1 p ( ) p x 1i = x 2i = 0 ln = b 0 1 p The intercept is the log-odds for disease in a person with 0 in all covariates. In the wound infection case this would be a person of 0 years which is operated 0 minutes not very meaningful! Wound infection data analyzed in SAS Direct input of data: data brem ; input inf optime age ; cards ; : : ; run ; Analyst: Open Direct programming proc logistic data = brem descend; model inf = optime age; run; (or: proc genmod data = brem descending ; model inf = optime age / dist = binomial link = logit ; estimate "Operation" optime 1 / exp ; estimate "age" age 1 / exp ; run ;) Analyst Statistics/Regression/Logistic click at Single trial in Dependent type choose inf as Dependent; specify Model Pr{..} as 1 choose optime and age as Quantitative The LOGISTIC Procedure Model Information Data Set WORK.BREM Response Variable inf Number of Response Levels 2 Number of Observations 194 Model binary logit Optimization Technique Fisher s scoring Response Profile Ordered Total Value inf Frequency Probability modeled is inf= 1. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC SC Log L SAS Output 19 20
6 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 optime age (or: The LOGISTIC Procedure Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits optime age The GENMOD Procedure Model Information Data Set WORK.BREM Distribution Binomial Link Function Logit Dependent Variable inf Observations Used 194 PROC GENMOD is modeling the probability that inf= 1. Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Intercept <.0001 optime age Scale Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter Estimate Error Confidence Limits Square Pr > ChiSq Contrast Estimate Results Standard Chi- Label Estimate Error Confidence Limits Square Pr>ChiSq Operation Exp(Operation) age Exp(age) Confidence intervals (1 α) c.i. = estimate ± z 1 α/2 std.error 95% confidence interval for OR associated with a difference of 1 year in the age at operation: For ln(or): ± = ( ; ) For OR: exp[( ; )] = ( ; ) or: e e = ( ; ) 95% confidence interval for OR associated with a difference of 10 years in age at operation: For ln(or): ± = ( , ) For OR: exp[( ; )] = ( ; ) or: e e = ( ; ) = ( ; ) 23 24
7 Program: Confidence intervals in SAS using Proc Genmod proc genmod data = brem descending ; model inf = optime age / dist = binomial ; estimate "Op60" optime 60 / exp ; estimate "A10" age 10 / exp ; run ; Output: Effect of scaling and centering of covariates Program: data brem ; set brem ; a50 = ( age - 50 ) / 10 ; op1 = ( optime - 60 ) / 60 ; run ; proc logistic data = brem descend; model inf = op1 a50; run; Standard Wald 95% Chi- Parameter Estimate Error Conf. Limits Square Pr>ChiSq Op Exp(Op60) A Exp(A10) Scaling and centering Output: Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 op a Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits op a The Intercept refers to the log(odds) for a person with value 0 in all covariates, but this is now a person of 50 years, operated for 1 hour. If the covariates are divided by a factor: the estimates are multiplied with this factor. the standard deviations are multiplied by this factor. Wald s test and the p values remain the same. If the covariates are centered around a value: the estimates are not changed. the standard deviations are not changed. Wald s test and the p values remain the same
8 The intercept refers to the log odds for a person with covariates values equal to those which are used for centering. ˆ odds 50,60 = exp( ) = ˆp 50,60 = 1/( ) = , c.i.(odds 50,60 ) = exp( ± ) = ( , ) c.i.(p 50,60 ) = (1/( ),1/( )) = ( , ) The infection probability for a 0-person (50 years old, operated for 1 hour) is 0.052, with a 95% c.i. of (0.024, 0.112). Wald s test: Model reduction For testing the importance of a single covariate, e.g. H 0 : β k = 0. Under H 0, we have approximately: or: estimate std.err. N(0, 1) ( ) 2 estimate χ 2 1 std.err. This is calculated in SAS per default, for each parameter separately Likelihood-ratio-test: Model reduction 2ln(likelihood-ratio) χ 2 df The likelihood-ratio is the ratio between the maximized likelihood functions under two different models, for which the smaller one lacks df (one or more) parameters. The deviance is the likelihood-ratio test statistic for comparing the current model vs. a model with one parameter per observation. Thus, the corresponding df is the number of observations minus the number of parameters in the current model. E.g., in our example we have 194 observations and 3 parameters in our model (intercept, optime, age), so the deviance has df = 191. The deviance on its own is not meaningful! However, the difference in deviances between two (nested) models corresponds to the likelihood-ratio test between the two models. It is assessed with help of the χ 2 distribution with df equal to the difference in the numbers of parameters in the two models. E.g., test of model with both optime and age vs. model with only optime: (191) vs (192): χ 2 = = 7.869, df = 1, p = (a bit different from the Wald test...) 31 32
9 Data from DGA: 2 k table with ordered categories Shoe size CS < Total Yes No Total Recall (lecture on categorical data): χ 2 test for independence: 9.29, with 5 df; p = Partition of χ 2 test in tests for linearity and for trend: χ 2 total (5) = χ2 lin (4) + χ2 trend (1) 9.29 = Logistic regression: Model deviance df p logit(p i ) = β i Test for linearity logit(p i ) = α + β s i Test for trend logit(p i ) = µ Analysis of shoe size data: data shoe ; input cs $ shoeno number ; cards ; Y Y Y Y Y Y N N N N N N ; run; Direct programming: Shoeno as class variable: proc logistic data=shoe descend; weight number; class shoeno / param=ref ref=last; model cs = shoeno; run; Shoeno as quantitative variable: proc logistic data=shoe descend; weight number; model cs = shoeno; run; 35 36
10 Analyst: Shoeno as class variable: Statistics/Regression/Logistic choose cs as Dependent, and shoeno as Class under Variables, choose number as Weight. double-click at the Code node, copy the program to the editor add two options to the class statement: class shoeno / param=ref ref=last For a direct interpretation of the parameter estimates, the last 2 steps are essential! (if the 1st level should be the reference, use ref=first instead...) Shoeno as quantitative variable: Select shoeno as Quantitative instead of Class. Full model (shoe size as a class variable) Response Profile Ordered Total Total Value cs Frequency Weight 1 Y N Probability modeled is cs= Y. Class Level Information Design Variables Class Value shoeno Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC SC Log L Type III Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq shoeno Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 shoeno shoeno shoeno shoeno shoeno Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits shoeno 4 vs shoeno 5 vs shoeno 6 vs shoeno 3.5 vs shoeno 4.5 vs p := P(cs = y shoeno = 3.5) =?: OR ˆ = = ˆp = exp( )/(1 + exp( )) = Model with linear effect of shoe size Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC SC Log L Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio Score Wald Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept shoeno Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits shoeno
11 Model comparisons Difference model deviance df deviance df p full linear intercept only Exercise: Use the output to calculate the predicted values for the probability of a caesarean section for women with shoe sizes 4, 5 and 6, respectively, from the model with a linear effect of shoe size. The test in the second last line is the trend test Case-control studies In a case-control-studie, one chooses: cases (diseased) as verified from a register or so controls, which are persons representing the population to which the cases belong. Thus, persons in case-control-studies are chosen according to the outcome. Typically, the proportion of cases and controls will be specified beforehand. If a variable is important for the development of the disease: Different distributions of the variable between cases and controls. The probability to be a case (in the population), P{disease}, can not be estimated from a case-control study. But, the effects of covariates on the disease probability can be estimated! 43 44
12 Case-control studies Prevalence in the population: p = P {case} p 1 p = odds(case) Selection fractions, i.e. inclusion probabilities π 0 and π 1 : P {inclusion in study case} = π 1 P {inclusion in study control} = π 0 In a case-control study one observes the number of cases and the number of controls, conditional on that they are actually in the study. These depend on diverse covariates (which one is interested in) and on the inclusion probabilities (which one is not interested in) p 1 p case control π 1 1 π 1 π 0 1 π 0 P {case & included} = p π 1 included not included included not included P {control & included} = (1 p) π 0 p π 1 odds(case included) = = p (1 p) π 0 1 p π 1 π 0 Logistic regression Model for the population: [ ] p ln = b 0 + b 1 x 1 + b 2 x 2 1 p Model for the observed: [ ] [ ] p π1 ln[odds(case incl.)] = ln + ln 1 p π 0 ( [ ] ) π1 = ln + b 0 + b 1 x 1 + b 2 x 2 π
13 Analysis of P(case inclusion) i.e. binary observations: Y = { 1 case 0 control Effects of covariates are estimated correctly! Intercept has no meaning depends on π 0 and π 1, which are usually unknown. 49
You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.
The GENMOD Procedure MODEL Statement MODEL response = < effects > < /options > ; MODEL events/trials = < effects > < /options > ; You can specify the response in the form of a single variable or in the
More informationSimple logistic regression
Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a
More informationLogistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S
Logistic regression analysis Birthe Lykke Thomsen H. Lundbeck A/S 1 Response with only two categories Example Odds ratio and risk ratio Quantitative explanatory variable More than one variable Logistic
More informationSTA6938-Logistic Regression Model
Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of
More informationST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses
ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities
More informationQ30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only
Moyale Observed counts 12:28 Thursday, December 01, 2011 1 The FREQ Procedure Table 1 of by Controlling for site=moyale Row Pct Improved (1+2) Same () Worsened (4+5) Group only 16 51.61 1.2 14 45.16 1
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationContrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:
Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.
More information11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies.
Matched and nested case-control studies Bendix Carstensen Steno Diabetes Center, Gentofte, Denmark http://staff.pubhealth.ku.dk/~bxc/ Department of Biostatistics, University of Copengen 11 November 2011
More informationCase-control studies
Matched and nested case-control studies Bendix Carstensen Steno Diabetes Center, Gentofte, Denmark b@bxc.dk http://bendixcarstensen.com Department of Biostatistics, University of Copenhagen, 8 November
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationChapter 5: Logistic Regression-I
: Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More informationModel Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV
More informationSAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ;
SAS Analysis Examples Replication C8 * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ; libname ncsr "P:\ASDA 2\Data sets\ncsr\" ; data c8_ncsr ; set ncsr.ncsr_sub_13nov2015
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationCOMPLEMENTARY LOG-LOG MODEL
COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks
More informationCohen s s Kappa and Log-linear Models
Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance
More informationSection IX. Introduction to Logistic Regression for binary outcomes. Poisson regression
Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationHomework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.
EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests
More informationStat 642, Lecture notes for 04/12/05 96
Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal
More informationBinomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials
Lecture : Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 27 Binomial Model n independent trials (e.g., coin tosses) p = probability of success on each trial (e.g., p =! =
More informationModels for Binary Outcomes
Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.
More informationLecture 10: Introduction to Logistic Regression
Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression Regression for a response variable that follows a binomial distribution Recall the binomial
More informationssh tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm
Kedem, STAT 430 SAS Examples: Logistic Regression ==================================== ssh abc@glue.umd.edu, tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm a. Logistic regression.
More informationBMI 541/699 Lecture 22
BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based
More informationCount data page 1. Count data. 1. Estimating, testing proportions
Count data page 1 Count data 1. Estimating, testing proportions 100 seeds, 45 germinate. We estimate probability p that a plant will germinate to be 0.45 for this population. Is a 50% germination rate
More informationAnalysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013
Analysis of Count Data A Business Perspective George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013 Overview Count data Methods Conclusions 2 Count data Count data Anything with
More informationBIOSTATS Intermediate Biostatistics Spring 2017 Exam 2 (Units 3, 4 & 5) Practice Problems SOLUTIONS
BIOSTATS 640 - Intermediate Biostatistics Spring 2017 Exam 2 (Units 3, 4 & 5) Practice Problems SOLUTIONS Practice Question 1 Both the Binomial and Poisson distributions have been used to model the quantal
More informationSections 4.1, 4.2, 4.3
Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationHierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!
Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter
More informationSTAT 7030: Categorical Data Analysis
STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationNormal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,
Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability
More information7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis
Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.
More informationST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples
ST3241 Categorical Data Analysis I Logistic Regression An Introduction and Some Examples 1 Business Applications Example Applications The probability that a subject pays a bill on time may use predictors
More informationAppendix: Computer Programs for Logistic Regression
Appendix: Computer Programs for Logistic Regression In this appendix, we provide examples of computer programs to carry out unconditional logistic regression, conditional logistic regression, polytomous
More informationLecture 24. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University
Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 1 Odds ratios for retrospective studies 2 Odds ratios approximating the
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More information2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2
PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 When and why do we use logistic regression? Binary Multinomial Theory behind logistic regression Assessing the model Assessing predictors
More informationCHAPTER 1: BINARY LOGIT MODEL
CHAPTER 1: BINARY LOGIT MODEL Prof. Alan Wan 1 / 44 Table of contents 1. Introduction 1.1 Dichotomous dependent variables 1.2 Problems with OLS 3.3.1 SAS codes and basic outputs 3.3.2 Wald test for individual
More informationPubHlth Intermediate Biostatistics Spring 2015 Exam 2 (Units 3, 4 & 5) Study Guide
PubHlth 640 - Intermediate Biostatistics Spring 2015 Exam 2 (Units 3, 4 & 5) Study Guide Unit 3 (Discrete Distributions) Take care to know how to do the following! Learning Objective See: 1. Write down
More informationAnalysing categorical data using logit models
Analysing categorical data using logit models Graeme Hutcheson, University of Manchester The lecture notes, exercises and data sets associated with this course are available for download from: www.research-training.net/manchester
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3
More informationSection Poisson Regression
Section 14.13 Poisson Regression Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 26 Poisson regression Regular regression data {(x i, Y i )} n i=1,
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationPoisson Data. Handout #4
Poisson Data The other response variable of interest records the number of blue spots observed after incubation. This type of data, i.e. count data, is often skewed showing numerous small values with occasional
More informationGeneralized linear models
Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data
More information" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2
Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the
More informationAn ordinal number is used to represent a magnitude, such that we can compare ordinal numbers and order them by the quantity they represent.
Statistical Methods in Business Lecture 6. Binomial Logistic Regression An ordinal number is used to represent a magnitude, such that we can compare ordinal numbers and order them by the quantity they
More informationOverdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion
Biostokastikum Overdispersion is not uncommon in practice. In fact, some would maintain that overdispersion is the norm in practice and nominal dispersion the exception McCullagh and Nelder (1989) Overdispersion
More informationBasic Medical Statistics Course
Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable
More informationLogistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationChapter 14 Logistic and Poisson Regressions
STAT 525 SPRING 2018 Chapter 14 Logistic and Poisson Regressions Professor Min Zhang Logistic Regression Background In many situations, the response variable has only two possible outcomes Disease (Y =
More informationParametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationStat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010
1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of
More informationLogistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20
Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More informationUsing PROC GENMOD to Analyse Ratio to Placebo in Change of Dactylitis. Irmgard Hollweck / Meike Best 13.OCT.2013
Using PROC GENMOD to Analyse Ratio to Placebo in Change of Dactylitis Irmgard Hollweck / Meike Best 13.OCT.2013 Agenda 2 Introduction to Dactylitis Background Definitions: Trial Definitions:Terms Statistics:
More informationGeneralized Additive Models
Generalized Additive Models The Model The GLM is: g( µ) = ß 0 + ß 1 x 1 + ß 2 x 2 +... + ß k x k The generalization to the GAM is: g(µ) = ß 0 + f 1 (x 1 ) + f 2 (x 2 ) +... + f k (x k ) where the functions
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationAge 55 (x = 1) Age < 55 (x = 0)
Logistic Regression with a Single Dichotomous Predictor EXAMPLE: Consider the data in the file CHDcsv Instead of examining the relationship between the continuous variable age and the presence or absence
More informationLogistic Regression. Continued Psy 524 Ainsworth
Logistic Regression Continued Psy 524 Ainsworth Equations Regression Equation Y e = 1 + A+ B X + B X + B X 1 1 2 2 3 3 i A+ B X + B X + B X e 1 1 2 2 3 3 Equations The linear part of the logistic regression
More informationCase-control studies C&H 16
Case-control studies C&H 6 Bendix Carstensen Steno Diabetes Center & Department of Biostatistics, University of Copenhagen bxc@steno.dk http://bendixcarstensen.com PhD-course in Epidemiology, Department
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationLab 8. Matched Case Control Studies
Lab 8 Matched Case Control Studies Control of Confounding Technique for the control of confounding: At the design stage: Matching During the analysis of the results: Post-stratification analysis Advantage
More informationLogistic Regression Analyses in the Water Level Study
Logistic Regression Analyses in the Water Level Study A. Introduction. 166 students participated in the Water level Study. 70 passed and 96 failed to correctly draw the water level in the glass. There
More informationInvestigating Models with Two or Three Categories
Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables
ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53
More informationCorrelation and regression
1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,
More informationSingle-level Models for Binary Responses
Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =
More informationLogistisk regression T.K.
Föreläsning 13: Logistisk regression T.K. 05.12.2017 Your Learning Outcomes Odds, Odds Ratio, Logit function, Logistic function Logistic regression definition likelihood function: maximum likelihood estimate
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationReview: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:
Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationAnalysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not
More informationUnit 5 Logistic Regression Practice Problems
Unit 5 Logistic Regression Practice Problems SOLUTIONS R Users Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition. Boca Raton: Chapman and Hall, 2004. Exercises
More informationSTA102 Class Notes Chapter Logistic Regression
STA0 Class Notes Chapter 0 0. Logistic Regression We continue to study the relationship between a response variable and one or more eplanatory variables. For SLR and MLR (Chapters 8 and 9), our response
More informationLISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014
LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers
More informationGeneralized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model
Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example
More informationLongitudinal Modeling with Logistic Regression
Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More informationIntroduction to the Analysis of Tabular Data
Introduction to the Analysis of Tabular Data Anthropological Sciences 192/292 Data Analysis in the Anthropological Sciences James Holland Jones & Ian G. Robertson March 15, 2006 1 Tabular Data Is there
More informationUNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 Work all problems. 60 points are needed to pass at the Masters Level and 75 to pass at the
More informationChapter 4: Generalized Linear Models-II
: Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More information1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches
Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model
More informationRegression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102
Background Regression so far... Lecture 21 - Sta102 / BME102 Colin Rundel November 18, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationIntroduction to logistic regression
Introduction to logistic regression Tuan V. Nguyen Professor and NHMRC Senior Research Fellow Garvan Institute of Medical Research University of New South Wales Sydney, Australia What we are going to learn
More informationLogistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy
Logistic Regression Some slides from Craig Burkett STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy Titanic Survival Case Study The RMS Titanic A British passenger liner Collided
More informationHomework Solutions Applied Logistic Regression
Homework Solutions Applied Logistic Regression WEEK 6 Exercise 1 From the ICU data, use as the outcome variable vital status (STA) and CPR prior to ICU admission (CPR) as a covariate. (a) Demonstrate that
More information