Categorical data analysis Chapter 5
|
|
- Allan Riley
- 5 years ago
- Views:
Transcription
1 Categorical data analysis Chapter 5
2 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases as β increases When β = 0, the curve flattens to a horizontal straight line, and Y is independent of X. π(x) approaches 1 at the same rate that it approaches 0 The odds multiply by e β for every 1-unit increase in x. In other words, e β is an odds ratio, the odds at X = x + 1 divided by the odds at X = x
3 Linear approximation
4 Linear approximation π(x) a + bx where b = βπ(x)(1 π(x)) When x = α/β, π(x) = 1/2 and the slope at this x is β( 1 2 )( 1 2 ) = β/4. This x = α/β value which makes π(x) = 1/2, is called the median effective level. In toxicology studies, it is called LD 50 (LD=lethal dose), the dose with a 50% chance of a lethal result. From this linear approximation, near x where π(x) = 1/2, a change in x of 1/β corresponds to a change in π(x) of roughly (1/β)(β/4) = 1/4; that is, 1/β approximates the distance between x values where π(x) = 0.5 and where π(x) =0.25 or 0.75.
5 Interpreting parameters An alternative way to interpret the effect reports the values of π(x) at certain x values, such as the min, max and quartiles. The change in π(x) over the middle half of x values, from the lower quartile to the upper quartile, is a useful summary of the effect. It can be compared with the corresponding change over the middle half of values of other quantitative predictors. The intercept parameter α is not usually of particular interest. However, by centering the predictor about 0 [i.e., replacing x by (x x)], α becomes the logit at x = x, and thus e α /(1 + e α ) = π( x) As in ordinary regression, centering is also helpful in complex models containing quadratic or interaction terms to reduce correlations among model parameter estimates.
6 Looking at the data before fitting Plot sample proportions or logits against x. when x is categorical, group the data at each setting of x and plot the sample logit against x. Small adjustment is needed when the sample sucess proportion is 0 or 1: log y i n i y i when x is continuous, we could group the data with nearby x values into categories and then plot. Fit a generalized additive model (GAM) to smooth the trend. A GAM replaces the linear predictor of a GLM by a smooth function. A plot of this fit reveals whether severe discrepancies occur from the S-shaped trend predicted by logistic regression.
7 Example: horseshoe crab mating revisited Define for crab i, y i = 1 if she has at least one satellite and y i = 0 otherwise.
8 Example: horseshoe crab mating revisited When x = 26.3cm, the mean width level in this sample, ˆπ(x) = ˆπ(x) = 0.5 when x = ˆα/ ˆβ = /0.497 = 24.8 which is the median effective level. The estimated odds of a satellite multiply by exp( ˆβ) = exp(0.497) = 1.64 for each 1-cm increase in width, that is, there is a 64% increase.
9 Example: horseshoe crab mating revisited At the mean width, ˆπ(x) = 0.674, and ˆπ(x) increases by about ˆβ[ˆπ(x)(1 ˆπ(x))] = 0.497(0.674)(0.326) = 0.11 for a 1-cm increase in width. The lower quartile, median, and upper quartile for width are 24.9, 26.1, and 27.7; ˆπ(x) at those values equals 0.51, 0.65, 0.81, increasing by 0.3 oer the x values for the middle half of the sample With the female crab s weight as the predctor, logit[ˆπ(x)] = x A 1-kg increase in wieght is not comparable to a 1-cm increase in width, so comparing the β coefficents does not make sense The quartiles for weight are 2.00, 2.35, 2.85; ˆπ(x) at those values are 0.48, 0.64, and 0.81, increasing by 0.33 over the middle half of the sampled weights The effect is similar to that of width, which is not surprising as these predictors are very highly correlated.
10 Logistic regression with retrospetive studies In case-control studies, the explanatory variable X rather than the response variable Y is random Applying the logistic regression to case-control data is effectively modeling P(Y = 1 Z = 1, x) where Z is an indicator indicate whether a subject is sampled (1=yes, 0=No). Assuming a logistic model for P(Y = 1 x), that is, It can be shown that logit[p(y = 1 x)] = α + βx logit[p(y = 1 Z = 1, x)] = α + βx where α = α + log(ρ 1 /ρ 0 ) with ρ 1 = P(Z = 1 y = 1) and ρ 0 = P(Z = 1 y = 0), representing the probability of sampling a case and control respectively.
11 Logistic regression with retrospetive studies When Y is random as in multinomial, Poisson, independent multinomial sampling (row fixed), ρ 1 = ρ 0, that is, sampling rate for cases is same for controls. For most case-control studies, ρ 1 > ρ 0, the intercept estimated is larger than the one if the experiment were a prosepctive study. With case-control studies, it is not possible to estimate β in binary-response models with links other than the logit. This is an important advantage of the logit link and is one reason why logistic regression models are so popular in biomedical studies.
12 Inference about model parameters and probabilities H 0 : β = 0 Wald test z = ˆβ/SE Likelihood-ratio test: -2[loglikelihood( ˆβ)-loglikelihood(β = 0) Score test: standardized derivative of the likelihood at ˆβ. A 95% Confidence interval for the linear predictor is ˆα + ˆβx 0 ± 1.96(SE) where SE is given by the estimated square root of var(ˆα + ˆβx 0 ) = var(ˆα) + x 2 0 var( ˆβ) + 2x 0 cov(ˆα, ˆβ) A 95% CI for π(x 0 ) is then obtained by subsituting each endpoint into the inverse transormation π(x 0 ) = exp(logit)/[1 + exp(logit)].
13 Example: inference for horseshoe crab mating data
14 so that ˆα and its SE are the estimated logit and its SE at x = x = Example: inference for horseshoe crab mating data At width x = 26.5, the estimated logit is (26.5)=0.826 and ˆπ(x) = Software reports from which var(ˆα) ˆ = 6.91, var( ˆ ˆβ) = 0.01, cov(ˆα, ˆ ˆβ) = At x = 26.5 the variance is , so the 95% CI for logit[π(26.5)] equals ± (1.96) (0.0356), or (0.456, 1.196). This translates to the interval (0.61, 0.77) for the probability of satellites (e.g., exp(0.456)/[1+exp(0.456)]=0.61). Since corr(ˆα, ˆβ) is near 1.0, for better computational precision, fit the model using predictor x = x 26.5,
15 Example: inference for horseshoe crab mating data We could ignore the model fit and simply use sample proportions (i.e., the saturated model) to estimate such probabilities. Six female crabs in the sample had x = 26.5, and four of them had sattellites. THe sample proportion estimate at x = 26.5 is ˆπ = 4/6 = THe 95% score CI based on these six observations alone equals (0.3, 0.9). If the logistic model approximates the true probabilities decently, its estimator tends to be closer than the sample proportion to the true value, unless each sample proportion is based on an extremely large sample.
16 Checking goodness of fit: grouped and ungrouped data With ungrouped data, or with continuous or nearly continuous predictors, X 2 and G 2 do not have limiting chi-squared distributions. Two popular alternatives for goodness of fit in this case: group the observed and fitted values for a partition of the space of x values group observed and fitted values according the estimated probabilities of success using the original ungrouped data
17 Partition x space
18 Partition x space in each width category, the fitted value for a yes response is the sum of the estimated probabilities ˆπ(x) for all crabs in that category. X 2 = 5.3 and G 2 =6.2 with df = 8 2 = 6. Neither X 2 nor G 2 shows evidence of lack of fit (P > 0.4). As the number of explanatory variables increases, this strategy loses effectiveness. Simultaneous grouping of values for each varaible can produce a contingency table with a large number of cells, most of which have very small counts (curse of dimensionality)
19 Partition according to estimated probabilities One common approach forms the groups in the partition so they have approximately equal size. With 10 groups, the first pair of observed counts and corresponding fitted counts refers to the n/10 observations having the highest estimated probabilities, the next pair refers to the n/10 observations having the second decile of estimated probabilities, and so on. Hosmer and Lemeshow goodness of fit test: where g is the number of partitions and ˆπ ij denote the corresponding fitted probability for the model fitted to the ungrouped data.
20 Partition according to estimated probabilities When the number of distinct patterns of covariate values equals the sample size, the null distribution of the above statistic is approximately chi-squared with df = g 2. For the horseshoe crab data with continuous width predictor, the Hosmer-Lemeshow statistic with g = 10 groups equals 3.5 with df = 8, indicating a decent fit.
21 Wald inference can be suboptimal Its results depend on the scale for the parameterization. For example, for the model log(π) = α, hypotheses α = 0 is equivalent to π = 0.5 but the wald test statistics are different for the two parameters. Evaluations reveal that the wald test for α = 0 tends to be too conservative and the one for π = 0.5 tends to be too liberal. When a true effect is relatively large, the Wald test is not as powerful as the likelihood-ratio and score test. For the single binomial case for example, suppose n = 25, we would regard y = 24 as stronger evidence against H 0 : α = 0, yet the Wald statistic equals 9.7 when y = 24 and 11 when y = 23. For comparison, the likelihood-ratio statistics are 26.3 and 20.7
22 Logistic models with categorical (factor) predictors Consider a single factor logistic model where X has I categories, and for each value of X, let y i be the number of successes out of n i trials. There are two equivalent ways to represent factors: ANOVA-type representation of factors log π i 1 π i = α + β i, i = 1,..., I The factor has as many parameters as categories. Indicator variables represent a factor logit(π i ) = α + β 1 x 1 + β 2 x β I x I where x i = 1 for observations in row i and x i = 0 otherwise.
23 Effect coding With I groups of data, the model can only have I free parameters. However, there are I + 1 parameters in the model. Therefore, constraints are needed to remove one parameter. Set one of β i s to zero, for example, β I = 0. With this constraint, α is the main effect of category I, which we call baseline category. All the other β i s represent effect differences between category i and the baseline category, that is, β i is the difference between the logits in rows i and I. The main effect of category i is then α + β i.
24 Effect coding Set I β i = 0. i=1 Then α represents the average effect of the categories, and β i s represent the effect deviations from the average effect, that is, β i is the log odd ratio between row i and the average of all rows. - For different constraints, the estimates of the β s are different, but the estimates of the mean reponse and contrasts between β s remain the same.
25 Example: Alcohol and Infant Malformation revisited logit(π i ) = α + β i is a saturated model and the estimated linear predictor ˆα + ˆβ i are the sample logits. Table 5.3 shows that except for the slight reversal between the first and second categories of alcohol consumption, the sample logits and hence the sample porportions of malformation cases increase as alcohol consumption increases.
26 Example: Alcohol and Infant Malformation revisited Test of independence is equivalent to test H 0 : β i = 0, i = 1,..., I The Pearson statistic is X 2 = 12.1(P = 0.02) and the likelihood-ratio statistic is G 2 = 6.2(P = 0.19). The P-values using the exact conditional distributions of X 2 and G 2 are 0.03 and 0.13.
27 Linear logit model for IX2 contingency tables The near-monotone increase in the sample logits in Table 5.3 indicates that the linear logit model may fit better. For ordered factor categories, we may assign scores that describe distances between categories of X and fit the linear (or more complex) linear logit model logit(π i ) = α + βx i With scores (x 1 = 0, x 2 = 0.5, x 3 = 1.5, x 4 = 4.0, x 5 = 7.0), Table 5.4 shows results.
28 Linear logit model for IX2 contingency tables The linear logit model fits as well as the saturated model as X 2 = 2.05 and G 2 = 1.95 with df = 3.
29 Alcohol and infant malformation revisited Pearson test: X 2 (I) = with P-value With scores (0, 0.5, 1.5, 4.0, 7.0), the score test, also called Cochran-Armitage trend test, has z 2 = 6.57 with P-value The test suggests strong evidence of a positive slope. The Wald statistic for the linear logit model equals ( ˆβ/SE) 2 = (0.3166/0.1254) 2 = 6.37(P = 0.012) and the likelihood-ratio statistic equals 4.25(P = 0.039). With highly unbalanced counts, it is best not to use the Wald approach. The asymptotics for the Cochran-Armitage trend test, however, work well even for quite small n when n i are equal and x i are equally spaced.
30 Model smoothing improves precision of estimation and test power Example: Skin damage and Leprosy
31 Example: Skin damage and Leprosy G 2 (I) = 7.28(df = 4) does not show much evidence of association (P=0.12) G 2 (I L) = 6.65 with df = 1(P = 0.01). It gives strong evidence of more positive clinical change at the higher level of infiltration G 2 (L) = 0.63(df = 3) suggests that the linear logit model fits well
32 Multiple logistic regression logit[π(x)] = α + β 1 x + β 2 x β p x p For qualitative predictors, we use indicator variables for its categories The parameter β j refers to the effect of x j on the log odds that Y = 1, adjusting for the other x k. For instance, exp(β j ) is the multiplicative effect on the odds of a 1-unit increase in x j, when we can keep fixed the levels of other x k
33 Logistic models for multiway contingency tables
34 Logistic models for multiway contingency tables Let X be the indicator for AZT treatment (x=1 for immediate ZAT use, x=0 otherwise), Z be the indicator for race (z=1 for whites, z=0 for blacks). logit[p(y = 1)] = α + β 1 x + β 2 z The model assumes homogeneous XY association, that is, the conditional odds ratio between X and Y is the same at each level of Z Conditional independence between X and Y given Z is equivalent to β 1 = 0 Adding the interaction between X and Z to the model, the model has as many parameters as the number of cells, therefore becomes a saturated model.
35 Logistic models for multiway contingency tables
36 Logistic models for multiway contingency tables α is the log odds of developing AIDS symptoms for black subjects without immediate AZT use β 1 is the increment to the log odds for those with immediate AZT use β 2 is the increment to the log odds for white subjects For each race, the estimated odds ratio between immediate AZT use and development of AIDS symptoms equals exp( ) = The Wald confidence interval for this effect is exp[ ± 1.96(0.279)] = (0.28, 0.84)
37 Different coding schemes
38 Different coding schemes For each coding scheme, at a given combination of ZAT use and race, the estimated probability of developing AIDS sympotoms is the same. For instance, the intercept estimate plus the estimate for immediate AZT use plus the estimate for being white is for each scheme, so the estimated probability that white veterans with immediate ZAT use develop AIDS symptoms equals exp( 1.738)/[1 + exp( 1.738)] = 0.15
39 Example: Horseshoe Crab Sattellites revisited
40 Example: Horseshoe Crab Sattellites revisited
41 Example: Horseshoe Crab Sattellites revisited For dark crabs, logit[ˆp(y = 1)] = x; by contrast, for medium-light crabs, logit[ ˆP(Y = 1)] = ( ) x = x At the average width of 26.3cm, ˆP(Y = 1) = for dark crabs and for medium-light crabs. At any given width, the estimate odds that a medium-light crab has a satellite are exp(1.330) = 3.8 times the estimated odds for a dark crab. At width x = 26.3, the odds equal 0.715/0.285=2.51 for a medium-light crab and 0.399/0.601=0.66 for a dark crab.
42 Example: Horseshoe Crab Sattellites revisited To test color effect, we test H 0 : β 1 = β 2 = β 3 = 0. Comparing the models with and without the color covariate, the deviance is 2(L 0 L 1 ) = 7.0 which has df = 3. The P-value is 0.07 which provides slight evidence of a color effect. The model assume a lack of interaction between color and width in their effects. Comparing the models with the interaction and without the interaction, the difference in deviance is 4.4, with df=3. The evidence of interaction is weak (P=0.22)
43 Example: Horseshoe Crab Sattellites revisited
44 Quantitative treatment of ordinal predictor Color has ordered categories, from lightest to darkest, let scores c = (1, 2, 3, 4) be the color categories, the model treats the color predictor as quantitative and may have a linear effect: logit[p(y = 1)] = α + β 1 c + β 2 x The fitted parameters are ˆα = , ˆβ 1 = 0.509(SE = 0.224) and ˆβ 2 = 0.458(SE = 0.104)
45 Quantitative treatment of ordinal predictor The likelihood-ratio statistic comparing this fit to the more complex model having a separate parameter for each color equals 1.66 (df=2). With P = 0.44 the simpler linear model seems adequate. Note in the qualitative-color model, the color parameter estimates are (1.33, 1.40, 1.11, 0), the first three colors are quite similiar. Thus another potential scoring is (1,1,1,0). The model fit is logit[ˆp(y = 1)] = c x The likelihood-ratio statistic comparing linear model with color scores (1,2,3,4) and (1,1,1,0) equals 0.5 (df=2), showing that this simpler model is also adequate.
46 More on interpretations Instantaneous rate of change in probability: Adjusting for other predictors, as a function of a quantitative predictor x j, ˆπ has instantaneous rate of change of ˆβ j ˆπ(1 ˆπ). For example, at predictor settings at which ˆπ = 0.5, the approximate effect of a 1-cm increase in width is (0.478)(0.5)(0.5)=0.12. We could summarize the effect of x j on the probability scale by averaging the instantaneous rates for the sample: 1 n n ˆβ j ˆπ(x i1,..., x ip )[1 ˆπ(x i1,..., x ip )] i=1
47 More on interpretation Describe the effect of x j by setting other predictors at their sample means and compute the estimated probabilities at the upper and lower quartiles of x j.
48 More on interpretation Standardized coefficients: coefficients are standardized in order to compare between predictors having different units. Standardize predictors (x j x j )/s xj Standardize coefficients ˆβ j s xj With binary color, the standard deviation of width is 2.109cm. The standardized coefficient is 0.478(2.109)=1.01. When width is replaced by weight, the standardized coefficient is 1.729(0.577)=1. The unstandardized estimates and are quite different, but width and weight have similar effects, conditonal on whether or not a crab is dark.
STAT 7030: Categorical Data Analysis
STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012
More informationST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples
ST3241 Categorical Data Analysis I Logistic Regression An Introduction and Some Examples 1 Business Applications Example Applications The probability that a subject pays a bill on time may use predictors
More informationChapter 5: Logistic Regression-I
: Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More informationSections 4.1, 4.2, 4.3
Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear
More informationThe material for categorical data follows Agresti closely.
Exam 2 is Wednesday March 8 4 sheets of notes The material for categorical data follows Agresti closely A categorical variable is one for which the measurement scale consists of a set of categories Categorical
More informationHomework 1 Solutions
36-720 Homework 1 Solutions Problem 3.4 (a) X 2 79.43 and G 2 90.33. We should compare each to a χ 2 distribution with (2 1)(3 1) 2 degrees of freedom. For each, the p-value is so small that S-plus reports
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses
ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities
More informationCDA Chapter 3 part II
CDA Chapter 3 part II Two-way tables with ordered classfications Let u 1 u 2... u I denote scores for the row variable X, and let ν 1 ν 2... ν J denote column Y scores. Consider the hypothesis H 0 : X
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks
More informationShort Course Introduction to Categorical Data Analysis
Short Course Introduction to Categorical Data Analysis Alan Agresti Distinguished Professor Emeritus University of Florida, USA Presented for ESALQ/USP, Piracicaba Brazil March 8-10, 2016 c Alan Agresti,
More information(c) Interpret the estimated effect of temperature on the odds of thermal distress.
STA 4504/5503 Sample questions for exam 2 1. For the 23 space shuttle flights that occurred before the Challenger mission in 1986, Table 1 shows the temperature ( F) at the time of the flight and whether
More informationExplanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine.
Horseshoe crab example: There are 173 female crabs for which we wish to model the presence or absence of male satellites dependant upon characteristics of the female horseshoe crabs. 1 satellite present
More informationSingle-level Models for Binary Responses
Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationAnalysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationBMI 541/699 Lecture 22
BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based
More informationGood Confidence Intervals for Categorical Data Analyses. Alan Agresti
Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline
More informationSTAT5044: Regression and Anova
STAT5044: Regression and Anova Inyoung Kim 1 / 18 Outline 1 Logistic regression for Binary data 2 Poisson regression for Count data 2 / 18 GLM Let Y denote a binary response variable. Each observation
More informationCh 6: Multicategory Logit Models
293 Ch 6: Multicategory Logit Models Y has J categories, J>2. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. In R, we will fit these models using the
More informationReview: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:
Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic
More informationGeneralized linear models
Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data
More informationChapter 4: Generalized Linear Models-I
: Generalized Linear Models-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationLogistic Regressions. Stat 430
Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to
More informationGeneralized logit models for nominal multinomial responses. Local odds ratios
Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π
More informationReview of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models
Chapter 6 Multicategory Logit Models Response Y has J > 2 categories. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. 6.1 Logit Models for Nominal Responses
More informationSTAC51: Categorical data Analysis
STAC51: Categorical data Analysis Mahinda Samarakoon April 6, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 25 Table of contents 1 Building and applying logistic regression models (Chap
More informationAnalysing data: regression and correlation S6 and S7
Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationCategorical Data Analysis Chapter 3
Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,
More informationGoodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links
Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.
Exam 3 Review Suppose that X i = x =(x 1,, x k ) T is observed and that Y i X i = x i independent Binomial(n i,π(x i )) for i =1,, N where ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T x) This is called the
More informationCorrelation and regression
1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,
More informationModel Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV
More informationStatistics in medicine
Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationLog-linear Models for Contingency Tables
Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A
More informationLISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014
LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers
More informationSection IX. Introduction to Logistic Regression for binary outcomes. Poisson regression
Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about
More informationGeneralized Linear Models
York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationSTA6938-Logistic Regression Model
Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of
More informationSolutions for Examination Categorical Data Analysis, March 21, 2013
STOCKHOLMS UNIVERSITET MATEMATISKA INSTITUTIONEN Avd. Matematisk statistik, Frank Miller MT 5006 LÖSNINGAR 21 mars 2013 Solutions for Examination Categorical Data Analysis, March 21, 2013 Problem 1 a.
More informationBIOSTATS Intermediate Biostatistics Spring 2017 Exam 2 (Units 3, 4 & 5) Practice Problems SOLUTIONS
BIOSTATS 640 - Intermediate Biostatistics Spring 2017 Exam 2 (Units 3, 4 & 5) Practice Problems SOLUTIONS Practice Question 1 Both the Binomial and Poisson distributions have been used to model the quantal
More informationBasic Medical Statistics Course
Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable
More informationLOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi
LOGISTIC REGRESSION Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi- lmbhar@gmail.com. Introduction Regression analysis is a method for investigating functional relationships
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationLogistic Regression. Continued Psy 524 Ainsworth
Logistic Regression Continued Psy 524 Ainsworth Equations Regression Equation Y e = 1 + A+ B X + B X + B X 1 1 2 2 3 3 i A+ B X + B X + B X e 1 1 2 2 3 3 Equations The linear part of the logistic regression
More informationCohen s s Kappa and Log-linear Models
Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance
More informationLogistic Regression Models for Multinomial and Ordinal Outcomes
CHAPTER 8 Logistic Regression Models for Multinomial and Ordinal Outcomes 8.1 THE MULTINOMIAL LOGISTIC REGRESSION MODEL 8.1.1 Introduction to the Model and Estimation of Model Parameters In the previous
More informationINTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y
INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y Predictor or Independent variable x Model with error: for i = 1,..., n, y i = α + βx i + ε i ε i : independent errors (sampling, measurement,
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationStat 642, Lecture notes for 04/12/05 96
Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal
More informationSection 4.6 Simple Linear Regression
Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationMultiple Logistic Regression for Dichotomous Response Variables
Multiple Logistic Regression for Dichotomous Response Variables Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 Outline
More informationBIOS 625 Fall 2015 Homework Set 3 Solutions
BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationChapter 14 Logistic regression
Chapter 14 Logistic regression Adapted from Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 62 Generalized linear models Generalize regular regression
More information11. Generalized Linear Models: An Introduction
Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and
More informationA Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46
A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response
More informationSections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Sections 3.4, 3.5 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 3.4 I J tables with ordinal outcomes Tests that take advantage of ordinal
More informationLogistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20
Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)
More information12 Modelling Binomial Response Data
c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual
More informationSTAT 525 Fall Final exam. Tuesday December 14, 2010
STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will
More informationChapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models
Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 29 14.1 Regression Models
More informationBIOS 2083 Linear Models c Abdus S. Wahed
Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter
More informationIntroducing Generalized Linear Models: Logistic Regression
Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationCOMPLEMENTARY LOG-LOG MODEL
COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α
More informationRon Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)
Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October
More informationHomework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.
EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests
More informationUNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 Work all problems. 60 points are needed to pass at the Masters Level and 75 to pass at the
More informationSTAT 526 Spring Midterm 1. Wednesday February 2, 2011
STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationLogistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction
More informationGeneralized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.
Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint
More informationPubHlth Intermediate Biostatistics Spring 2015 Exam 2 (Units 3, 4 & 5) Study Guide
PubHlth 640 - Intermediate Biostatistics Spring 2015 Exam 2 (Units 3, 4 & 5) Study Guide Unit 3 (Discrete Distributions) Take care to know how to do the following! Learning Objective See: 1. Write down
More informationChapter 22: Log-linear regression for Poisson counts
Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationLecture 14 Simple Linear Regression
Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent
More informationSTA102 Class Notes Chapter Logistic Regression
STA0 Class Notes Chapter 0 0. Logistic Regression We continue to study the relationship between a response variable and one or more eplanatory variables. For SLR and MLR (Chapters 8 and 9), our response
More informationDescribing Contingency tables
Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.
More informationStat 704: Data Analysis I, Fall 2010
Stat 704: Data Analysis I, Fall 2010 Generalized linear models Generalize regular regression to non-normal data {(Y i,x i )} N i=1, most often Bernoulli or Poisson Y i. The general theory of GLMs has been
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationGeneralized Linear Modeling - Logistic Regression
1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More information