Categorical data analysis Chapter 5

Size: px
Start display at page:

Download "Categorical data analysis Chapter 5"

Transcription

1 Categorical data analysis Chapter 5

2 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases as β increases When β = 0, the curve flattens to a horizontal straight line, and Y is independent of X. π(x) approaches 1 at the same rate that it approaches 0 The odds multiply by e β for every 1-unit increase in x. In other words, e β is an odds ratio, the odds at X = x + 1 divided by the odds at X = x

3 Linear approximation

4 Linear approximation π(x) a + bx where b = βπ(x)(1 π(x)) When x = α/β, π(x) = 1/2 and the slope at this x is β( 1 2 )( 1 2 ) = β/4. This x = α/β value which makes π(x) = 1/2, is called the median effective level. In toxicology studies, it is called LD 50 (LD=lethal dose), the dose with a 50% chance of a lethal result. From this linear approximation, near x where π(x) = 1/2, a change in x of 1/β corresponds to a change in π(x) of roughly (1/β)(β/4) = 1/4; that is, 1/β approximates the distance between x values where π(x) = 0.5 and where π(x) =0.25 or 0.75.

5 Interpreting parameters An alternative way to interpret the effect reports the values of π(x) at certain x values, such as the min, max and quartiles. The change in π(x) over the middle half of x values, from the lower quartile to the upper quartile, is a useful summary of the effect. It can be compared with the corresponding change over the middle half of values of other quantitative predictors. The intercept parameter α is not usually of particular interest. However, by centering the predictor about 0 [i.e., replacing x by (x x)], α becomes the logit at x = x, and thus e α /(1 + e α ) = π( x) As in ordinary regression, centering is also helpful in complex models containing quadratic or interaction terms to reduce correlations among model parameter estimates.

6 Looking at the data before fitting Plot sample proportions or logits against x. when x is categorical, group the data at each setting of x and plot the sample logit against x. Small adjustment is needed when the sample sucess proportion is 0 or 1: log y i n i y i when x is continuous, we could group the data with nearby x values into categories and then plot. Fit a generalized additive model (GAM) to smooth the trend. A GAM replaces the linear predictor of a GLM by a smooth function. A plot of this fit reveals whether severe discrepancies occur from the S-shaped trend predicted by logistic regression.

7 Example: horseshoe crab mating revisited Define for crab i, y i = 1 if she has at least one satellite and y i = 0 otherwise.

8 Example: horseshoe crab mating revisited When x = 26.3cm, the mean width level in this sample, ˆπ(x) = ˆπ(x) = 0.5 when x = ˆα/ ˆβ = /0.497 = 24.8 which is the median effective level. The estimated odds of a satellite multiply by exp( ˆβ) = exp(0.497) = 1.64 for each 1-cm increase in width, that is, there is a 64% increase.

9 Example: horseshoe crab mating revisited At the mean width, ˆπ(x) = 0.674, and ˆπ(x) increases by about ˆβ[ˆπ(x)(1 ˆπ(x))] = 0.497(0.674)(0.326) = 0.11 for a 1-cm increase in width. The lower quartile, median, and upper quartile for width are 24.9, 26.1, and 27.7; ˆπ(x) at those values equals 0.51, 0.65, 0.81, increasing by 0.3 oer the x values for the middle half of the sample With the female crab s weight as the predctor, logit[ˆπ(x)] = x A 1-kg increase in wieght is not comparable to a 1-cm increase in width, so comparing the β coefficents does not make sense The quartiles for weight are 2.00, 2.35, 2.85; ˆπ(x) at those values are 0.48, 0.64, and 0.81, increasing by 0.33 over the middle half of the sampled weights The effect is similar to that of width, which is not surprising as these predictors are very highly correlated.

10 Logistic regression with retrospetive studies In case-control studies, the explanatory variable X rather than the response variable Y is random Applying the logistic regression to case-control data is effectively modeling P(Y = 1 Z = 1, x) where Z is an indicator indicate whether a subject is sampled (1=yes, 0=No). Assuming a logistic model for P(Y = 1 x), that is, It can be shown that logit[p(y = 1 x)] = α + βx logit[p(y = 1 Z = 1, x)] = α + βx where α = α + log(ρ 1 /ρ 0 ) with ρ 1 = P(Z = 1 y = 1) and ρ 0 = P(Z = 1 y = 0), representing the probability of sampling a case and control respectively.

11 Logistic regression with retrospetive studies When Y is random as in multinomial, Poisson, independent multinomial sampling (row fixed), ρ 1 = ρ 0, that is, sampling rate for cases is same for controls. For most case-control studies, ρ 1 > ρ 0, the intercept estimated is larger than the one if the experiment were a prosepctive study. With case-control studies, it is not possible to estimate β in binary-response models with links other than the logit. This is an important advantage of the logit link and is one reason why logistic regression models are so popular in biomedical studies.

12 Inference about model parameters and probabilities H 0 : β = 0 Wald test z = ˆβ/SE Likelihood-ratio test: -2[loglikelihood( ˆβ)-loglikelihood(β = 0) Score test: standardized derivative of the likelihood at ˆβ. A 95% Confidence interval for the linear predictor is ˆα + ˆβx 0 ± 1.96(SE) where SE is given by the estimated square root of var(ˆα + ˆβx 0 ) = var(ˆα) + x 2 0 var( ˆβ) + 2x 0 cov(ˆα, ˆβ) A 95% CI for π(x 0 ) is then obtained by subsituting each endpoint into the inverse transormation π(x 0 ) = exp(logit)/[1 + exp(logit)].

13 Example: inference for horseshoe crab mating data

14 so that ˆα and its SE are the estimated logit and its SE at x = x = Example: inference for horseshoe crab mating data At width x = 26.5, the estimated logit is (26.5)=0.826 and ˆπ(x) = Software reports from which var(ˆα) ˆ = 6.91, var( ˆ ˆβ) = 0.01, cov(ˆα, ˆ ˆβ) = At x = 26.5 the variance is , so the 95% CI for logit[π(26.5)] equals ± (1.96) (0.0356), or (0.456, 1.196). This translates to the interval (0.61, 0.77) for the probability of satellites (e.g., exp(0.456)/[1+exp(0.456)]=0.61). Since corr(ˆα, ˆβ) is near 1.0, for better computational precision, fit the model using predictor x = x 26.5,

15 Example: inference for horseshoe crab mating data We could ignore the model fit and simply use sample proportions (i.e., the saturated model) to estimate such probabilities. Six female crabs in the sample had x = 26.5, and four of them had sattellites. THe sample proportion estimate at x = 26.5 is ˆπ = 4/6 = THe 95% score CI based on these six observations alone equals (0.3, 0.9). If the logistic model approximates the true probabilities decently, its estimator tends to be closer than the sample proportion to the true value, unless each sample proportion is based on an extremely large sample.

16 Checking goodness of fit: grouped and ungrouped data With ungrouped data, or with continuous or nearly continuous predictors, X 2 and G 2 do not have limiting chi-squared distributions. Two popular alternatives for goodness of fit in this case: group the observed and fitted values for a partition of the space of x values group observed and fitted values according the estimated probabilities of success using the original ungrouped data

17 Partition x space

18 Partition x space in each width category, the fitted value for a yes response is the sum of the estimated probabilities ˆπ(x) for all crabs in that category. X 2 = 5.3 and G 2 =6.2 with df = 8 2 = 6. Neither X 2 nor G 2 shows evidence of lack of fit (P > 0.4). As the number of explanatory variables increases, this strategy loses effectiveness. Simultaneous grouping of values for each varaible can produce a contingency table with a large number of cells, most of which have very small counts (curse of dimensionality)

19 Partition according to estimated probabilities One common approach forms the groups in the partition so they have approximately equal size. With 10 groups, the first pair of observed counts and corresponding fitted counts refers to the n/10 observations having the highest estimated probabilities, the next pair refers to the n/10 observations having the second decile of estimated probabilities, and so on. Hosmer and Lemeshow goodness of fit test: where g is the number of partitions and ˆπ ij denote the corresponding fitted probability for the model fitted to the ungrouped data.

20 Partition according to estimated probabilities When the number of distinct patterns of covariate values equals the sample size, the null distribution of the above statistic is approximately chi-squared with df = g 2. For the horseshoe crab data with continuous width predictor, the Hosmer-Lemeshow statistic with g = 10 groups equals 3.5 with df = 8, indicating a decent fit.

21 Wald inference can be suboptimal Its results depend on the scale for the parameterization. For example, for the model log(π) = α, hypotheses α = 0 is equivalent to π = 0.5 but the wald test statistics are different for the two parameters. Evaluations reveal that the wald test for α = 0 tends to be too conservative and the one for π = 0.5 tends to be too liberal. When a true effect is relatively large, the Wald test is not as powerful as the likelihood-ratio and score test. For the single binomial case for example, suppose n = 25, we would regard y = 24 as stronger evidence against H 0 : α = 0, yet the Wald statistic equals 9.7 when y = 24 and 11 when y = 23. For comparison, the likelihood-ratio statistics are 26.3 and 20.7

22 Logistic models with categorical (factor) predictors Consider a single factor logistic model where X has I categories, and for each value of X, let y i be the number of successes out of n i trials. There are two equivalent ways to represent factors: ANOVA-type representation of factors log π i 1 π i = α + β i, i = 1,..., I The factor has as many parameters as categories. Indicator variables represent a factor logit(π i ) = α + β 1 x 1 + β 2 x β I x I where x i = 1 for observations in row i and x i = 0 otherwise.

23 Effect coding With I groups of data, the model can only have I free parameters. However, there are I + 1 parameters in the model. Therefore, constraints are needed to remove one parameter. Set one of β i s to zero, for example, β I = 0. With this constraint, α is the main effect of category I, which we call baseline category. All the other β i s represent effect differences between category i and the baseline category, that is, β i is the difference between the logits in rows i and I. The main effect of category i is then α + β i.

24 Effect coding Set I β i = 0. i=1 Then α represents the average effect of the categories, and β i s represent the effect deviations from the average effect, that is, β i is the log odd ratio between row i and the average of all rows. - For different constraints, the estimates of the β s are different, but the estimates of the mean reponse and contrasts between β s remain the same.

25 Example: Alcohol and Infant Malformation revisited logit(π i ) = α + β i is a saturated model and the estimated linear predictor ˆα + ˆβ i are the sample logits. Table 5.3 shows that except for the slight reversal between the first and second categories of alcohol consumption, the sample logits and hence the sample porportions of malformation cases increase as alcohol consumption increases.

26 Example: Alcohol and Infant Malformation revisited Test of independence is equivalent to test H 0 : β i = 0, i = 1,..., I The Pearson statistic is X 2 = 12.1(P = 0.02) and the likelihood-ratio statistic is G 2 = 6.2(P = 0.19). The P-values using the exact conditional distributions of X 2 and G 2 are 0.03 and 0.13.

27 Linear logit model for IX2 contingency tables The near-monotone increase in the sample logits in Table 5.3 indicates that the linear logit model may fit better. For ordered factor categories, we may assign scores that describe distances between categories of X and fit the linear (or more complex) linear logit model logit(π i ) = α + βx i With scores (x 1 = 0, x 2 = 0.5, x 3 = 1.5, x 4 = 4.0, x 5 = 7.0), Table 5.4 shows results.

28 Linear logit model for IX2 contingency tables The linear logit model fits as well as the saturated model as X 2 = 2.05 and G 2 = 1.95 with df = 3.

29 Alcohol and infant malformation revisited Pearson test: X 2 (I) = with P-value With scores (0, 0.5, 1.5, 4.0, 7.0), the score test, also called Cochran-Armitage trend test, has z 2 = 6.57 with P-value The test suggests strong evidence of a positive slope. The Wald statistic for the linear logit model equals ( ˆβ/SE) 2 = (0.3166/0.1254) 2 = 6.37(P = 0.012) and the likelihood-ratio statistic equals 4.25(P = 0.039). With highly unbalanced counts, it is best not to use the Wald approach. The asymptotics for the Cochran-Armitage trend test, however, work well even for quite small n when n i are equal and x i are equally spaced.

30 Model smoothing improves precision of estimation and test power Example: Skin damage and Leprosy

31 Example: Skin damage and Leprosy G 2 (I) = 7.28(df = 4) does not show much evidence of association (P=0.12) G 2 (I L) = 6.65 with df = 1(P = 0.01). It gives strong evidence of more positive clinical change at the higher level of infiltration G 2 (L) = 0.63(df = 3) suggests that the linear logit model fits well

32 Multiple logistic regression logit[π(x)] = α + β 1 x + β 2 x β p x p For qualitative predictors, we use indicator variables for its categories The parameter β j refers to the effect of x j on the log odds that Y = 1, adjusting for the other x k. For instance, exp(β j ) is the multiplicative effect on the odds of a 1-unit increase in x j, when we can keep fixed the levels of other x k

33 Logistic models for multiway contingency tables

34 Logistic models for multiway contingency tables Let X be the indicator for AZT treatment (x=1 for immediate ZAT use, x=0 otherwise), Z be the indicator for race (z=1 for whites, z=0 for blacks). logit[p(y = 1)] = α + β 1 x + β 2 z The model assumes homogeneous XY association, that is, the conditional odds ratio between X and Y is the same at each level of Z Conditional independence between X and Y given Z is equivalent to β 1 = 0 Adding the interaction between X and Z to the model, the model has as many parameters as the number of cells, therefore becomes a saturated model.

35 Logistic models for multiway contingency tables

36 Logistic models for multiway contingency tables α is the log odds of developing AIDS symptoms for black subjects without immediate AZT use β 1 is the increment to the log odds for those with immediate AZT use β 2 is the increment to the log odds for white subjects For each race, the estimated odds ratio between immediate AZT use and development of AIDS symptoms equals exp( ) = The Wald confidence interval for this effect is exp[ ± 1.96(0.279)] = (0.28, 0.84)

37 Different coding schemes

38 Different coding schemes For each coding scheme, at a given combination of ZAT use and race, the estimated probability of developing AIDS sympotoms is the same. For instance, the intercept estimate plus the estimate for immediate AZT use plus the estimate for being white is for each scheme, so the estimated probability that white veterans with immediate ZAT use develop AIDS symptoms equals exp( 1.738)/[1 + exp( 1.738)] = 0.15

39 Example: Horseshoe Crab Sattellites revisited

40 Example: Horseshoe Crab Sattellites revisited

41 Example: Horseshoe Crab Sattellites revisited For dark crabs, logit[ˆp(y = 1)] = x; by contrast, for medium-light crabs, logit[ ˆP(Y = 1)] = ( ) x = x At the average width of 26.3cm, ˆP(Y = 1) = for dark crabs and for medium-light crabs. At any given width, the estimate odds that a medium-light crab has a satellite are exp(1.330) = 3.8 times the estimated odds for a dark crab. At width x = 26.3, the odds equal 0.715/0.285=2.51 for a medium-light crab and 0.399/0.601=0.66 for a dark crab.

42 Example: Horseshoe Crab Sattellites revisited To test color effect, we test H 0 : β 1 = β 2 = β 3 = 0. Comparing the models with and without the color covariate, the deviance is 2(L 0 L 1 ) = 7.0 which has df = 3. The P-value is 0.07 which provides slight evidence of a color effect. The model assume a lack of interaction between color and width in their effects. Comparing the models with the interaction and without the interaction, the difference in deviance is 4.4, with df=3. The evidence of interaction is weak (P=0.22)

43 Example: Horseshoe Crab Sattellites revisited

44 Quantitative treatment of ordinal predictor Color has ordered categories, from lightest to darkest, let scores c = (1, 2, 3, 4) be the color categories, the model treats the color predictor as quantitative and may have a linear effect: logit[p(y = 1)] = α + β 1 c + β 2 x The fitted parameters are ˆα = , ˆβ 1 = 0.509(SE = 0.224) and ˆβ 2 = 0.458(SE = 0.104)

45 Quantitative treatment of ordinal predictor The likelihood-ratio statistic comparing this fit to the more complex model having a separate parameter for each color equals 1.66 (df=2). With P = 0.44 the simpler linear model seems adequate. Note in the qualitative-color model, the color parameter estimates are (1.33, 1.40, 1.11, 0), the first three colors are quite similiar. Thus another potential scoring is (1,1,1,0). The model fit is logit[ˆp(y = 1)] = c x The likelihood-ratio statistic comparing linear model with color scores (1,2,3,4) and (1,1,1,0) equals 0.5 (df=2), showing that this simpler model is also adequate.

46 More on interpretations Instantaneous rate of change in probability: Adjusting for other predictors, as a function of a quantitative predictor x j, ˆπ has instantaneous rate of change of ˆβ j ˆπ(1 ˆπ). For example, at predictor settings at which ˆπ = 0.5, the approximate effect of a 1-cm increase in width is (0.478)(0.5)(0.5)=0.12. We could summarize the effect of x j on the probability scale by averaging the instantaneous rates for the sample: 1 n n ˆβ j ˆπ(x i1,..., x ip )[1 ˆπ(x i1,..., x ip )] i=1

47 More on interpretation Describe the effect of x j by setting other predictors at their sample means and compute the estimated probabilities at the upper and lower quartiles of x j.

48 More on interpretation Standardized coefficients: coefficients are standardized in order to compare between predictors having different units. Standardize predictors (x j x j )/s xj Standardize coefficients ˆβ j s xj With binary color, the standard deviation of width is 2.109cm. The standardized coefficient is 0.478(2.109)=1.01. When width is replaced by weight, the standardized coefficient is 1.729(0.577)=1. The unstandardized estimates and are quite different, but width and weight have similar effects, conditonal on whether or not a crab is dark.

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples ST3241 Categorical Data Analysis I Logistic Regression An Introduction and Some Examples 1 Business Applications Example Applications The probability that a subject pays a bill on time may use predictors

More information

Chapter 5: Logistic Regression-I

Chapter 5: Logistic Regression-I : Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

Sections 4.1, 4.2, 4.3

Sections 4.1, 4.2, 4.3 Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear

More information

The material for categorical data follows Agresti closely.

The material for categorical data follows Agresti closely. Exam 2 is Wednesday March 8 4 sheets of notes The material for categorical data follows Agresti closely A categorical variable is one for which the measurement scale consists of a set of categories Categorical

More information

Homework 1 Solutions

Homework 1 Solutions 36-720 Homework 1 Solutions Problem 3.4 (a) X 2 79.43 and G 2 90.33. We should compare each to a χ 2 distribution with (2 1)(3 1) 2 degrees of freedom. For each, the p-value is so small that S-plus reports

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

CDA Chapter 3 part II

CDA Chapter 3 part II CDA Chapter 3 part II Two-way tables with ordered classfications Let u 1 u 2... u I denote scores for the row variable X, and let ν 1 ν 2... ν J denote column Y scores. Consider the hypothesis H 0 : X

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks

More information

Short Course Introduction to Categorical Data Analysis

Short Course Introduction to Categorical Data Analysis Short Course Introduction to Categorical Data Analysis Alan Agresti Distinguished Professor Emeritus University of Florida, USA Presented for ESALQ/USP, Piracicaba Brazil March 8-10, 2016 c Alan Agresti,

More information

(c) Interpret the estimated effect of temperature on the odds of thermal distress.

(c) Interpret the estimated effect of temperature on the odds of thermal distress. STA 4504/5503 Sample questions for exam 2 1. For the 23 space shuttle flights that occurred before the Challenger mission in 1986, Table 1 shows the temperature ( F) at the time of the flight and whether

More information

Explanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine.

Explanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine. Horseshoe crab example: There are 173 female crabs for which we wish to model the presence or absence of male satellites dependant upon characteristics of the female horseshoe crabs. 1 satellite present

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013 Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

BMI 541/699 Lecture 22

BMI 541/699 Lecture 22 BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based

More information

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 18 Outline 1 Logistic regression for Binary data 2 Poisson regression for Count data 2 / 18 GLM Let Y denote a binary response variable. Each observation

More information

Ch 6: Multicategory Logit Models

Ch 6: Multicategory Logit Models 293 Ch 6: Multicategory Logit Models Y has J categories, J>2. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. In R, we will fit these models using the

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Generalized linear models

Generalized linear models Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data

More information

Chapter 4: Generalized Linear Models-I

Chapter 4: Generalized Linear Models-I : Generalized Linear Models-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Logistic Regressions. Stat 430

Logistic Regressions. Stat 430 Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to

More information

Generalized logit models for nominal multinomial responses. Local odds ratios

Generalized logit models for nominal multinomial responses. Local odds ratios Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π

More information

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models Chapter 6 Multicategory Logit Models Response Y has J > 2 categories. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. 6.1 Logit Models for Nominal Responses

More information

STAC51: Categorical data Analysis

STAC51: Categorical data Analysis STAC51: Categorical data Analysis Mahinda Samarakoon April 6, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 25 Table of contents 1 Building and applying logistic regression models (Chap

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Categorical Data Analysis Chapter 3

Categorical Data Analysis Chapter 3 Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,

More information

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T. Exam 3 Review Suppose that X i = x =(x 1,, x k ) T is observed and that Y i X i = x i independent Binomial(n i,π(x i )) for i =1,, N where ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T x) This is called the

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response) Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

Solutions for Examination Categorical Data Analysis, March 21, 2013

Solutions for Examination Categorical Data Analysis, March 21, 2013 STOCKHOLMS UNIVERSITET MATEMATISKA INSTITUTIONEN Avd. Matematisk statistik, Frank Miller MT 5006 LÖSNINGAR 21 mars 2013 Solutions for Examination Categorical Data Analysis, March 21, 2013 Problem 1 a.

More information

BIOSTATS Intermediate Biostatistics Spring 2017 Exam 2 (Units 3, 4 & 5) Practice Problems SOLUTIONS

BIOSTATS Intermediate Biostatistics Spring 2017 Exam 2 (Units 3, 4 & 5) Practice Problems SOLUTIONS BIOSTATS 640 - Intermediate Biostatistics Spring 2017 Exam 2 (Units 3, 4 & 5) Practice Problems SOLUTIONS Practice Question 1 Both the Binomial and Poisson distributions have been used to model the quantal

More information

Basic Medical Statistics Course

Basic Medical Statistics Course Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable

More information

LOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi

LOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi LOGISTIC REGRESSION Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi- lmbhar@gmail.com. Introduction Regression analysis is a method for investigating functional relationships

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Logistic Regression. Continued Psy 524 Ainsworth

Logistic Regression. Continued Psy 524 Ainsworth Logistic Regression Continued Psy 524 Ainsworth Equations Regression Equation Y e = 1 + A+ B X + B X + B X 1 1 2 2 3 3 i A+ B X + B X + B X e 1 1 2 2 3 3 Equations The linear part of the logistic regression

More information

Cohen s s Kappa and Log-linear Models

Cohen s s Kappa and Log-linear Models Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance

More information

Logistic Regression Models for Multinomial and Ordinal Outcomes

Logistic Regression Models for Multinomial and Ordinal Outcomes CHAPTER 8 Logistic Regression Models for Multinomial and Ordinal Outcomes 8.1 THE MULTINOMIAL LOGISTIC REGRESSION MODEL 8.1.1 Introduction to the Model and Estimation of Model Parameters In the previous

More information

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y Predictor or Independent variable x Model with error: for i = 1,..., n, y i = α + βx i + ε i ε i : independent errors (sampling, measurement,

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Multiple Logistic Regression for Dichotomous Response Variables

Multiple Logistic Regression for Dichotomous Response Variables Multiple Logistic Regression for Dichotomous Response Variables Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 Outline

More information

BIOS 625 Fall 2015 Homework Set 3 Solutions

BIOS 625 Fall 2015 Homework Set 3 Solutions BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Chapter 14 Logistic regression

Chapter 14 Logistic regression Chapter 14 Logistic regression Adapted from Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 62 Generalized linear models Generalize regular regression

More information

11. Generalized Linear Models: An Introduction

11. Generalized Linear Models: An Introduction Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Sections 3.4, 3.5 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 3.4 I J tables with ordinal outcomes Tests that take advantage of ordinal

More information

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20 Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)

More information

12 Modelling Binomial Response Data

12 Modelling Binomial Response Data c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models

Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 29 14.1 Regression Models

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

COMPLEMENTARY LOG-LOG MODEL

COMPLEMENTARY LOG-LOG MODEL COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game. EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 Work all problems. 60 points are needed to pass at the Masters Level and 75 to pass at the

More information

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

STAT 526 Spring Midterm 1. Wednesday February 2, 2011 STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

PubHlth Intermediate Biostatistics Spring 2015 Exam 2 (Units 3, 4 & 5) Study Guide

PubHlth Intermediate Biostatistics Spring 2015 Exam 2 (Units 3, 4 & 5) Study Guide PubHlth 640 - Intermediate Biostatistics Spring 2015 Exam 2 (Units 3, 4 & 5) Study Guide Unit 3 (Discrete Distributions) Take care to know how to do the following! Learning Objective See: 1. Write down

More information

Chapter 22: Log-linear regression for Poisson counts

Chapter 22: Log-linear regression for Poisson counts Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

STA102 Class Notes Chapter Logistic Regression

STA102 Class Notes Chapter Logistic Regression STA0 Class Notes Chapter 0 0. Logistic Regression We continue to study the relationship between a response variable and one or more eplanatory variables. For SLR and MLR (Chapters 8 and 9), our response

More information

Describing Contingency tables

Describing Contingency tables Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

Stat 704: Data Analysis I, Fall 2010

Stat 704: Data Analysis I, Fall 2010 Stat 704: Data Analysis I, Fall 2010 Generalized linear models Generalize regular regression to non-normal data {(Y i,x i )} N i=1, most often Bernoulli or Poisson Y i. The general theory of GLMs has been

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Generalized Linear Modeling - Logistic Regression

Generalized Linear Modeling - Logistic Regression 1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information