Lecture 14: Introduction to Poisson Regression
|
|
- Morris Owens
- 5 years ago
- Views:
Transcription
1 Lecture 14: Introduction to Poisson Regression Ani Manichaikul 8 May / 52
2 Overview Modelling counts Contingency tables Poisson regression models 2 / 52
3 Modelling counts I Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week Number of coughs in a concert hall, per minute Number of customers arriving in a shop, daily 3 / 52
4 Modelling counts II Counts show up in all kinds of scenarios in our every day life In public health, count data are particularly important in measuring disease rates: We often count the number of people acquiring a particular disease in a given year Modelling counts gives us a framework in which to estimate disease incidence Poisson regression will allow us to measure association between incidence, and risk factors of interest adjusting for other related covariates 4 / 52
5 A whole new model to deal with counts? I So far we have talked about: Linear regression: for normally distributed errors Logistic regression: for binomial distributed errors Typically, our count data do not follow either of these distributions Counts are not binary (0 / 1) Counts are discrete, not continuous Counts typically have a right skewed distribution 5 / 52
6 A whole new model to deal with counts? II So far, the regression strategies we ve discussed allow us to model: Expected values and expected increase in linear regression Log odds or log odds ratios in logistic regression In modelling counts, we are typically more interested in Incidence rates Incidence ratios (when comparing across levels of a risk factor) Poisson regression will provide us with a framework to handle counts properly! 6 / 52
7 Poisson Assumptions Before going on to talk about Poisson regression, let s revisit some key concepts about the Poisson distribution: The occurrences of a random event in an interval of time are independent In theory, an infinite number of occurrences of the event are possible (though perhaps rare) within the interval In any extremely small portion of the interval, the probability of more than one occurrence of the event is approximately zero 7 / 52
8 Poisson Probability I The probability of x occurrence of an event in an interval is: P(X = x) = e λ λ x x!, x = 0, 1, 2,... where λ = the expected number of occurrences in the interval e = a constant ( 2.718) For the Poisson distribution: mean = variance = λ Mode = floor (λ), largest integer less than λ We can also think of λ as the rate parameter 8 / 52
9 Poisson Probability II Poisson probability mass function for λ = 5 Poisson probablity, λ=5 Pr(X=x) x 9 / 52
10 Poisson Probability III Poisson probability mass function for λ = 30 Poisson probablity, λ=30 Pr(X=x) x 10 / 52
11 Poisson and Binomial The Poisson distribution can be used to approximate a binomial distribution when: n is large and p is very small, or np = λ is fixed, and n becomes infinitely large 11 / 52
12 Example: Cancer in a large population Yearly cases of esophageal cancer in a large city 30 cases observed in 1990 P(X = 30) = e λ λ 30 30! λ = yearly average number of cases of esophageal cancer 12 / 52
13 Example: Belief in Afterlife I Men and women were asked whether or not they believed in afterlife (General Social Survey 1991) Possible responses were: yes, no, or unsure Y N or U M F Total / 52
14 Example: Belief in Afterlife II Question: Is belief in the afterlife independent of gender? We could address this question using a χ 2 test To perform a χ 2 test: Begin by stating our model: independece Then, we calculate expected cell counts for each entry in the table 14 / 52
15 Example: Belief in Afterlife III Under an independence model, the probability of belief in afterlife for both genders would be estimated as number who believe ˆp = P(belief in afterlife) = total number asked = Then, we calculate the expected counts as: E(males answering yes) = # men asked ˆp = = 432 E(females answering yes) = # women asked ˆp = = / 52
16 Example: Belief in Afterlife IV The observed and expected cell counts are as follows: Y N or U M 435 (432) 147 (150) 582 F 375 (378) 134 (131) 509 Total Then, the χ 2 statistic is calculated as: i,j (O ij E ij ) 2 E ij = ( ) ( )2 150 ( ) = ( ) / 52
17 Example: Belief in Afterlife V Remeber, we are testing the null hypothesis of independence, versus the alternative that the proportion believing in afterlife differs across genders: H 0 : H a : p male = p female = p overall p male p female We decided to perform the hypothesis test using a χ 2 test. Here, the appropriate degrees of freedom are: (number of rows 1)(number of columns 1) = (2 1)(2 1) = 1 17 / 52
18 Example: Belief in Afterlife VI Let s test the hypothesis at level α = Look up the appropriate critical value in R: > qchisq(1-0.05, df=1) [1] Our observed χ 2 statistic of is much smaller than the critical value We can also look up the corresponding p-value for our test: > pchisq(0.173, df=1, lower.tail=f) [1] Conclusion: fail to reject the null hypothesis 18 / 52
19 Example: Belief in Afterlife Poisson Model I Just now, we had to calculate the expected counts to perform the χ 2 test Actually, we could have written down a linear model to express the expected counts systematically Make use of the Poisson approximation to binomial Model: Y ij Poisson(λ ij ) λ ij = λ α male γresponse Here we interpret λ ij as the Poisson rate for the cell in the i th row and j th column λ is the baseline rate, α is the male effect, and γ is the response 19 / 52
20 Example: Belief in Afterlife Poisson Model II Recall that the log of a product is equal to the sum of the log: log(a b) = log(a) + log(b) So, let s log-transform our model... originally we had: λ ij = λ α male γresponse Taking the log of both sides, we have: log(λ ij ) = log λ + log(α male ) + log(γ response ) This is looking like a linear model / 52
21 Example: Belief in Afterlife Poisson Model III We stated the systematic portion of this model as log(λ ij ) = log λ + log(α male ) + log(γ response ) which we can also write using β s to look like other linear models: log(λ ij ) = β 0 + β 1 I (male) + β 2 I (response) The probabilistic portion of this model enters as: Y ij Poisson(λ ij ) 21 / 52
22 Example: Belief in Afterlife Poisson Model IV In this Poisson regression model: The outcome is the log of the expected cell count The baseline β 0 is the log expected cell count for females responding no β 1 is the increase in log expected cell count for males compared to females β 2 is the increase in log expected cell count for the response yes compared to no 22 / 52
23 Fitting the afterlife model in R I Data format We began with the view of our data in a table Y N or U M 435 (432) 147 (150) 582 F 375 (378) 134 (131) 509 Total However, R expects to see unique observations listed, together with relevant covariates (indicators here), as follows: count male yes / 52
24 Fitting the afterlife model in R II Once the data is entered in R, we can analyze it using the glm command, specifying the family as poisson: > summary(out<-glm(count ~ male + yes, family=poisson)) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) <2e-16 *** male * yes <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Null deviance: on 3 degrees of freedom Residual deviance: on 1 degrees of freedom AIC: / 52
25 Fitting the afterlife model in R III So we fit the model: log(λ ij ) = β 0 + β 1 I (male) + β 2 I (response) and our fitted model is: log(λ ij ) = I (male) I (response) 25 / 52
26 Predicting log expected cell counts I Using the fitted model: log(λ ij ) = β 0 + β 1 I (male) + β 2 I (response) we can get predicted values for log counts in each of the four cells: For females responding no : log(e(count female, no)) = = 4.88 For males responding no : log(e(count male, no)) = = / 52
27 Predicting log expected cell counts II Using the fitted model: log(λ ij ) = β 0 + β 1 I (male) + β 2 I (response) we can get predicted values for log counts in each the four cells: For females responding yes : log(e(count female, yes)) = = 5.94 For males responding yes : log(e(count male, yes)) = = / 52
28 Predicting expected cell counts I Using the fitted model: log(λ ij ) = β 0 + β 1 I (male) + β 2 I (response) we can get predicted values for counts in each of the four cells: For females responding no : E(count female, no) = exp(4.88) = 131 For males responding no : E(count male, no) = exp( ) = exp(5.01) = / 52
29 Predicting expected cell counts II Using the fitted model: log(λ ij ) = β 0 + β 1 I (male) + β 2 I (response) we can get predicted values for counts in each of the four cells: For females responding yes : E(count female, yes) = exp( ) = exp(5.94) = 378 For males responding yes : E(count male, yes) = exp( ) = exp(6.07) = / 52
30 Comparing model fit in R with our fit by hand When we did our χ 2 test, we calculated the expected cell counts by hand: Estimate ˆp = , probability of responding yes under the independence model ˆp (or 1 ˆp) by the number of males (or females) to get the expected number in each cell Expected cell counts represented the numbers we should have gotten under a perfect independence model We obtained the expected cell counts: Y N or U M F Total which are exactly what we got by Poisson regression! 30 / 52
31 Afterlife coefficients I By fitting the independence model, we force the relative rate of responding yes versus no to the question of belief in the afterline to be fixed across males and females Deviation from the independence model suggests the proportion of those believing in afterlife differs by gender 31 / 52
32 Afterlife coefficients II Under the independence model: β 0 = 4.88 is the log expected count of females responding no, the baseline group β 1 = is the difference in log expected counts comparing males to females β 2 = 1.05 is the difference in log expected counts for yes responses compared to no responses 32 / 52
33 Afterlife coefficients III Under the independence model: exp(β 0 ) = is the expected count for females responding no, the baseline group expβ 1 = 1.14 is the ratio comparing the counts of males to females exp(β 2 )=2.85 is the ratio of the number of yes responses compared to no responses 33 / 52
34 A regression model for count data Contingency table: regression model Suppose we have r categories on one axis and c categories on the other We can think of an r c contingency table with r rows and c columns Independence model: log(e(y ij )) = log(λ ij ) = log(λ) + log(α i ) + log(γ j ) To test independence, we write out χ 2 test statistic as: (O ij E ij ) 2 i,j E ij which is distributed as χ 2 (r 1)(c 1) of independence under the null hypothesis 34 / 52
35 Relation to other types of regression Both logistic and poisson regression are types of linear models Extend ideas from linear regression with continuous outcomes to: binary outcomes (logistic regression) count data (Poisson regression) We say logistic regression uses the logit link Poisson regression uses a log link Poisson models is also called log-linear models Regression Outcome Link Link type Distribution name function Linear Normal identity f (µ) = µ Logistic Binary logit logit(p) = log( p 1 p ) Log-linear Poisson log log(p) 35 / 52
36 Interpretation I Under the model log(e(y ij )) = log(λ ij ) = log(λ) + log(α i ) + log(γ j ) we can interpret: λ ij is the predicted rate for the i th row and j th column of our contingency table λ is the baseline rate 36 / 52
37 Interpretation II Under the model log(e(y ij )) = log(λ ij ) = log(λ) + log(α i ) + log(γ j ) we can interpret: α i is the rate ratio, the effect of being in row i compared to baseline (and holding column fixed) γ j is the rate ratio, the effect of being in column j compared to baseline (and holding row fixed) The rate ratio is the factor by which the baseline is multiplied to get the rate, or expected cell count, for our particular non-baseline category 37 / 52
38 Example: Customers at a lumber company I Outcome: Y = number of customers visiting store from region Predictors: X 1 : number of housing units in region X 2 : average household income X 3 : average housing unit age in region X 4 : distance to nearest competitor X 5 : average distance to store in miles Counts are obtained for 110 regions, so our n= / 52
39 Example: Customers at a lumber company II Poisson (log-linear) regression model Given observations and covariates: Y i, X ij, 1 i n, 1 j p Systematic component of model: log(λ i ) = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 + β 5 X 5 Probabilistic component of model: Y i X i Poisson(λ i ) 39 / 52
40 Example: Customers at a lumber company III Here s a look at some of the data: Customers Housing Income Age Competitor Store / 52
41 The Distribution of Customer counts Histogram of Customers The customer counts distribution is clearly not normally distributed Linear regression would not work well here Log-linear regression will work just fine Frequency Customers 41 / 52
42 The fitted model I > summary(lumber.glm <- glm(customers ~ Housing + Income +Age +Competitor +Store, family=poisson()) ) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 2.942e e < 2e-16 *** Housing 6.058e e e-05 *** Income e e e-08 *** Age e e * Competitor 1.684e e e-11 *** Store e e e-15 *** --- Signif. codes: 0 *** ** 0.01 * Null deviance: on 109 degrees of freedom Residual deviance: on 104 degrees of freedom 42 / 52
43 The fitted model II We wrote down the model: log(λ i ) = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 + β 5 X 5 and obtained the parameter estimates in R to be: log(λ i ) = Housing i Income i β 3 Age i Competitor i 0.129Store i 43 / 52
44 Interpretation I We interpret ˆβ 0 = 2.94 as the baseline log expected count, or log rate, in the group with all covariates (housing, income, age, distance to nearest competitor, and average distance to store) set to zero exp( ˆβ 0 ) = exp(2.94) = 18.9 is the expected count of customers in the baseline group This baseline value does not quite make sense It may be helpful to center our covariates, but this is not a big deal if we don t care about baseline because our primary inference is about the increase with respect to covariates 44 / 52
45 Interpretation II We interpret ˆβ 1 = as the increase in log expected count, or the log rate ratio comparing districts whose number of housing units differ by one, adjusting for other covariates exp( ˆβ 1 ) = exp( ) = is the rate ratio comparing districts whose mean housing units differ by one, adjusting for other covariates exp(100 ˆβ 1 ) = exp( ) = is the rate ratio comparing districts whose mean housing units differ by one hundred Keeping other factors constant, a 100 unit increase in housing units, would yield an expected 6.2% increase in customer count 45 / 52
46 Interpretation III Although our reported estimate for β 1 is very small, so is the standard error ( ) The z-statistic reported in R is 4.26, with a p-value of We can conclude that there is a statistically significant association between the customer count and mean housing units in the region We could have calculated this z-statistic ourselves as This z-statistic is also called a Wald statistic ˆβ 1 SE( ˆβ 1 ) 46 / 52
47 Interpretation IV We interpret ˆβ 2 = to be the increase in log expected count, or the log rate ratio comparing districts whose mean income differs by $1, adjusting for other covariates exp( ˆβ 2 ) = exp( ) = is the relative rate comparing regions whose mean income differs by $1, adjusting for other covariates 47 / 52
48 Interpretation V exp(10000 ˆβ 2 ) = exp( ) = is the relative rate comparing regions whose mean income differs by $10000, adjusting for other covariates Keeping other factors constant, the relative rate comparing regions whose average income differs by $10000 is For a $10000 change in average income, we would predict an 11% decrease in the rate of customers visiting the store 48 / 52
49 Interpretation VI A 95% confidence interval for β 2 is ˆβ 2 ± 1.96 SE( ˆβ 2 ) = ± = ( , ) A 95% confidence interval for exp(β 2 ) is e ( , ) = ( , ) A 95% confidence interval for exp(10000β 2 ) is e (10000 ( ),10000 ( )) = (0.854, 0.927) 49 / 52
50 Interpretation VII The 95% confidence interval for the relative rate of customers visiting the store is (0.854, 0.927) for a $10,000 increase in average income for the region Question: Based on this model, if we are going to choose a location to build a new store, should we choose areas with higher or lower income? Does it matter? 50 / 52
51 Summary I Poisson regression gives us a framework in which to build models for count data It is a special case of generalized linear models, so it is closely related to linear and logistic regression modelling All of the same modelling techniques will carry over from linear regression: Adjustment for confounding Allowing for effect modification by fitting interactions Splines and polynomial terms 51 / 52
52 Summary II Can test significance of regression coefficients using z-statistics (also called Wald statistic) Since Poisson regression specifies a model, we can calculate likelihood likelihood ratio tests will apply for testing nested models Will need a new interpretation for regression coefficients: Intercept term interpreted as log rate for baseline Coefficients for covariates interpreted as log rate ratio, or log relative rate Exponentiate to get rid of the logs 52 / 52
Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationLog-linear Models for Contingency Tables
Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationLISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014
LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers
More informationSingle-level Models for Binary Responses
Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =
More informationBMI 541/699 Lecture 22
BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationModeling Overdispersion
James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 Introduction In this lecture we discuss the problem of overdispersion in
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationSTAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression
STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationChapter 22: Log-linear regression for Poisson counts
Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.
More informationIntroduction to the Analysis of Tabular Data
Introduction to the Analysis of Tabular Data Anthropological Sciences 192/292 Data Analysis in the Anthropological Sciences James Holland Jones & Ian G. Robertson March 15, 2006 1 Tabular Data Is there
More informationLogistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression
Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024
More informationSTA102 Class Notes Chapter Logistic Regression
STA0 Class Notes Chapter 0 0. Logistic Regression We continue to study the relationship between a response variable and one or more eplanatory variables. For SLR and MLR (Chapters 8 and 9), our response
More informationTruck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation
Background Regression so far... Lecture 23 - Sta 111 Colin Rundel June 17, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical or categorical
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationSTAT 7030: Categorical Data Analysis
STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012
More informationCohen s s Kappa and Log-linear Models
Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance
More informationGeneralised linear models. Response variable can take a number of different formats
Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion
More information7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis
Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression
More informationSection Poisson Regression
Section 14.13 Poisson Regression Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 26 Poisson regression Regular regression data {(x i, Y i )} n i=1,
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationHypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)
Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) B.H. Robbins Scholars Series June 23, 2010 1 / 29 Outline Z-test χ 2 -test Confidence Interval Sample size and power Relative effect
More informationLecture 2: Probability and Distributions
Lecture 2: Probability and Distributions Ani Manichaikul amanicha@jhsph.edu 17 April 2007 1 / 65 Probability: Why do we care? Probability helps us by: Allowing us to translate scientific questions info
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationRegression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102
Background Regression so far... Lecture 21 - Sta102 / BME102 Colin Rundel November 18, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationGeneralized logit models for nominal multinomial responses. Local odds ratios
Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π
More informationLecture 10: Introduction to Logistic Regression
Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression Regression for a response variable that follows a binomial distribution Recall the binomial
More informationRon Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)
Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October
More informationST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses
ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities
More informationLogistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction
More informationLecture 7 Time-dependent Covariates in Cox Regression
Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the
More informationHierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!
Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter
More informationProbability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?
Probability: Why do we care? Lecture 2: Probability and Distributions Sandy Eckel seckel@jhsph.edu 22 April 2008 Probability helps us by: Allowing us to translate scientific questions into mathematical
More informationLogistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy
Logistic Regression Some slides from Craig Burkett STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy Titanic Survival Case Study The RMS Titanic A British passenger liner Collided
More informationIntroducing Generalized Linear Models: Logistic Regression
Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and
More information22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression
22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then
More informationFaculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics
Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3
More informationSection IX. Introduction to Logistic Regression for binary outcomes. Poisson regression
Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about
More informationWe know from STAT.1030 that the relevant test statistic for equality of proportions is:
2. Chi 2 -tests for equality of proportions Introduction: Two Samples Consider comparing the sample proportions p 1 and p 2 in independent random samples of size n 1 and n 2 out of two populations which
More informationLogistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20
Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)
More informationToday. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News:
Today HW 1: due February 4, 11.59 pm. Aspects of Design CD Chapter 2 Continue with Chapter 2 of ELM In the News: STA 2201: Applied Statistics II January 14, 2015 1/35 Recap: data on proportions data: y
More informationChapter 26: Comparing Counts (Chi Square)
Chapter 6: Comparing Counts (Chi Square) We ve seen that you can turn a qualitative variable into a quantitative one (by counting the number of successes and failures), but that s a compromise it forces
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationChapter 20: Logistic regression for binary response variables
Chapter 20: Logistic regression for binary response variables In 1846, the Donner and Reed families left Illinois for California by covered wagon (87 people, 20 wagons). They attempted a new and untried
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks
More informationTwo Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00
Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section
More informationInference and Regression
Name Inference and Regression Final Examination, 2015 Department of IOMS This course and this examination are governed by the Stern Honor Code. Instructions Please write your name at the top of this page.
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationStatistical Methods III Statistics 212. Problem Set 2 - Answer Key
Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423
More informationAn Introduction to Mplus and Path Analysis
An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More informationR Hints for Chapter 10
R Hints for Chapter 10 The multiple logistic regression model assumes that the success probability p for a binomial random variable depends on independent variables or design variables x 1, x 2,, x k.
More informationST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios
ST3241 Categorical Data Analysis I Two-way Contingency Tables 2 2 Tables, Relative Risks and Odds Ratios 1 What Is A Contingency Table (p.16) Suppose X and Y are two categorical variables X has I categories
More informationLecture 4 Multiple linear regression
Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters
More informationLecture 5: ANOVA and Correlation
Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More information" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2
Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the
More informationConfidence Intervals for the Odds Ratio in Logistic Regression with One Binary X
Chapter 864 Confidence Intervals for the Odds Ratio in Logistic Regression with One Binary X Introduction Logistic regression expresses the relationship between a binary response variable and one or more
More informationMcGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper
Student Name: ID: McGill University Faculty of Science Department of Mathematics and Statistics Statistics Part A Comprehensive Exam Methodology Paper Date: Friday, May 13, 2016 Time: 13:00 17:00 Instructions
More informationHypothesis testing. Data to decisions
Hypothesis testing Data to decisions The idea Null hypothesis: H 0 : the DGP/population has property P Under the null, a sample statistic has a known distribution If, under that that distribution, the
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationModeling the Mean: Response Profiles v. Parametric Curves
Modeling the Mean: Response Profiles v. Parametric Curves Jamie Monogan University of Georgia Escuela de Invierno en Métodos y Análisis de Datos Universidad Católica del Uruguay Jamie Monogan (UGA) Modeling
More informationBinomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials
Lecture : Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 27 Binomial Model n independent trials (e.g., coin tosses) p = probability of success on each trial (e.g., p =! =
More informationLogistic Regressions. Stat 430
Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to
More informationAn Introduction to Path Analysis
An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving
More informationConfidence Intervals for the Odds Ratio in Logistic Regression with Two Binary X s
Chapter 866 Confidence Intervals for the Odds Ratio in Logistic Regression with Two Binary X s Introduction Logistic regression expresses the relationship between a binary response variable and one or
More informationReview: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:
Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic
More informationLab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )
Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p. 376-390) BIO656 2009 Goal: To see if a major health-care reform which took place in 1997 in Germany was
More informationRegression Methods for Survey Data
Regression Methods for Survey Data Professor Ron Fricker! Naval Postgraduate School! Monterey, California! 3/26/13 Reading:! Lohr chapter 11! 1 Goals for this Lecture! Linear regression! Review of linear
More informationHomework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.
EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests
More informationPoisson regression: Further topics
Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to
More informationModule 03 Lecture 14 Inferential Statistics ANOVA and TOI
Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module
More informationLogistic Regression - problem 6.14
Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values
More informationExercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer
Solutions to Exam in 02402 December 2012 Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer 3 1 5 2 5 2 3 5 1 3 Exercise IV.2 IV.3 IV.4 V.1
More informationGeneralized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model
Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example
More informationSTAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).
STAT 515 -- Chapter 13: Categorical Data Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). Many studies allow for more than 2 categories. Example
More informationLecture 5: Poisson and logistic regression
Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 3-5 March 2014 introduction to Poisson regression application to the BELCAP study introduction
More informationStatistics 203 Introduction to Regression Models and ANOVA Practice Exam
Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Prof. J. Taylor You may use your 4 single-sided pages of notes This exam is 7 pages long. There are 4 questions, first 3 worth 10
More informationUNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018 Work all problems. 60 points are needed to pass at the Masters Level and 75
More informationAnalysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not
More informationLecture 1 Introduction to Multi-level Models
Lecture 1 Introduction to Multi-level Models Course Website: http://www.biostat.jhsph.edu/~ejohnson/multilevel.htm All lecture materials extracted and further developed from the Multilevel Model course
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationHomework 1 Solutions
36-720 Homework 1 Solutions Problem 3.4 (a) X 2 79.43 and G 2 90.33. We should compare each to a χ 2 distribution with (2 1)(3 1) 2 degrees of freedom. For each, the p-value is so small that S-plus reports
More informationLecture 2: Poisson and logistic regression
Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 introduction to Poisson regression application to the BELCAP study introduction
More informationLecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti
Lecture 2: Categorical Variable A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti 1 Categorical Variable Categorical variable is qualitative
More information1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches
Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model
More informationLogistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S
Logistic regression analysis Birthe Lykke Thomsen H. Lundbeck A/S 1 Response with only two categories Example Odds ratio and risk ratio Quantitative explanatory variable More than one variable Logistic
More informationSection 4.6 Simple Linear Regression
Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval
More informationRon Heck, Fall Week 3: Notes Building a Two-Level Model
Ron Heck, Fall 2011 1 EDEP 768E: Seminar on Multilevel Modeling rev. 9/6/2011@11:27pm Week 3: Notes Building a Two-Level Model We will build a model to explain student math achievement using student-level
More information