Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.
|
|
- Ruby Wright
- 5 years ago
- Views:
Transcription
1 EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game. or equivalently E(Y/t) = µ. The expected rate of arrests is constant over different teams. This model is plausible if those who attend the games for different terms are the same (similar kind of people who attend different games) Show E(Y) = µt. E(Y/t) = µ. offset= log(t). Model with distribution=poisson and link=identity. Model with distribution=poisson, link=identity, and log(e(y/t)) = logµ = α where log(µ) = α. Estimated Model: Ŷ =.0004t. The estimated expected rate of arrests equals.0004; or we estimate that for every 10 thousand people who attend a game 4 will be arrested. There are two (equivalent) ways in which you could have fit this model: Method 1: E(Y) = α (number attend) = αt In SAS, proc genmod data=fights order=data; model arrests=attend / dist=poisson link=identity noint; run; Note that the options noint mean no intercept. In R, glm(arrests -1 + attend,data=fights,family=poisson(link= identity )) Note that 1 means no intercept. 1
2 With this method the estimate model is Ŷ =.0004t. Method 2: log(e(y)) = α+log(number attend) In SAS, proc genmod data=fights order=data; model arrests= / dist=poisson link=identity offset=logt; run; and in R: glm(arrests offset(t), data=fights,family=poisson)) The estimate model is E(Y) ˆ = exp( )t =.0004t and has the same interpretation as model from Method I. (b) See figure Soccer: Number of Arrests by Number Attending Number Arrested Bournemouth West_Brom Hudderfield Bradford_City Middlesbro BirminghamIpswich_Town Crystal_Palace Leicester_City Blackburn Shrewsbury Swindon_Town Barnsley Stoke_City Sheffield_Utd Millwall Hull_City Plymouth Oldham Reading Aston_Villa Leeds_United Manchester_City Number Attending Game (in thousands) (c) The negative binominal is fit by only changing dist=poisson to dist=negbin in SAS and using glm.nb in R (regardless of whether you use method I or II). I ll show Method II here. 2
3 Poisson Negative Binomial Parameter Estimate SE Wald p-value Est. se Wald p α , 556 < < /φ = D Fit Statistics df G X AIC Note: 95% CI for 1/φ is (0.1355,0.5023) Note that the results from R are a bit different from SAS: Parameter Estimate SE z p-value Est. se z p α < <.01 1/φ = D Fit Statistics df G X 2 AIC (d) The parameter estimates for α for the two models are very similar in value (i.e., for SAS, versus and for R versus ); however, their standard errors differ considerably (i.e., SAS: for Poisson and and R: versus ). This is consistent with the need for the Negative Binomial due to overdispersion; that is, the se from Poisson are too small. (e) There is evidence in support of the Negative Binomial model being the better one (don t need all of these for full credit): The estimated standard errors for Poisson and Negative binomial are very different and that for NB is much larger. This is consistent with there being over dispersion in the data that is a problem for the Poisson. The 95% CI for the dispersion parameter (0.14,0.50) does not include 1 and suggests there is overdispersion in the data.**** Don t need this for for R. G 2 (and X 2 *** don t need this for R) indicate an acceptable fit of the model to the data (i.e., comparing them to a χ 2 with ν = 22 would yield a large p-value. However, this is not the case for the Poisson. The various information criteria (only AIC reported above) are all smaller for the NB than the Poisson. The smaller the value, the better the model. The adjusted Pearson residuals for negative binomial are much smaller than for Poisson (all except 1 are < 1.96, the exception is Bradford City with 2.09) 3
4 It is reasonable to expect that the crowds over teams are heterogeneous, perhaps due to living in different cities with different SES, crowding, etc. Before starting to analyze the data for the next three problems (i.e., 3.13., 3.14 and zero inflated, do a little bit of exploratory data analysis: 1. Compute the mean and variance of number of satellites. Compare. The mean is less than the variance (i.e., 2.92 < (3.15) 2 ), which suggests overdispersion. 2. Plot a histogram of the number of satellites. Comment. Below is a graph of the distribution of satellites. Notice that there are a lot with 0. This could Other than this end of the distribution, Poisson may be OK. A lot of 0s could explain why we have overdispersion (model fitting will help us decide for sure). This might be best fit using a zero-inflated Poission. Figure 1: The distribution of the counts. 3. Look at the relationship between number of satellites (or log of the number) by weight. Comment. 4
5 Also a look at the number of satellites versus weight with a smooth curve (actually a cubic regression) in the Figure 3. 5
6 Figure 2: Initial look at the data: counts versus explanatory variable with a cubic regression curve and log(count) versus explanatory variable with a linear regression curve overlayed.. 6
7 It appears that there is an outlier in terms of weight (i.e., weight> 5). I deleted it and recomputed the mean and variance, but doesn t change results much, so I left it in for the homework answers. And now to do the problems... Problem 3.13 on page 94 of Agresti (2007). The data with SAS code to create a SAS data set is on the course web-site. Note that I re-scaled weight to kg. 1. The prediction equation is ˆµ i = exp( (weight) i ) 2. The estimated mean for a female weighing 2.44kg is ˆµ = exp( (2.44)) = exp(1.0095) = For a one kg increase in weight, the (mean) number of satellites is exp(.5893) = 1.80 times (or 80% larger). Although a 95% confidence interval of ˆβ is given in the SAS output, this comes from ˆβ ± 1.96(ŝe) ± 1.96(0.0650) ± (0.4619, ) The 95% confidence interval for the multiplicative effect (i.e., exp(β)) is found by taking exp of the end-points of the interval for β: (exp(0.4619), exp(0.7167)) (1.59, 2.05) 4. A Wald test: H o : β = 0 versus H a : β 0. X 2 = ( ) = Comparing a chi-square distribution with df = 1 yields a very small p-value; therefore, reject H o and conclude the data support the hypothesis that the number of satellites is related to weight of the female crab. 5. A likelihood ratio test: 2( ) = 71.93, which has a very small p-value (compare to chi-square with df = 1). Conclusion is the same as in part (d). 7
8 Problem 3.14 on page 94 of Agresti (2007). Fitting a negative binomial model The prediction equation is ˆµ = exp( (weight)). The dispersion parameter is 1/φ = D(in Agresti notation) = The estimated standard error of the dispersion parameter is There is evidence that the Negative Binomial gives a better fit than the Poisson: The 95% confidence interval for 1/φ = D is (0.6948,1.4533). The value 0 is not in this interval which suggests we need the scale parameter. All of the global fit statistics are much better for the Negative Binomial than the Poisson: Poisson Negative Binomial Criterion DF Value Value/DF Value Value/DF Deviance Pearson Chi-Square AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) Since the Poisson is a special case of the binomial, we could do a likelihood ratio test (i.e., LR= = , df = 1, p is tiny). Also, according to the information criteria, the Negative Binomial has smaller values and this indicate it s better than the Poisson model. Graphics indicate that the Negative Binomial out-performs the Poisson (I didn t expect graphs, but it you did them, Great!). The Negative Binomial includes more points within the 95% confidence bands and fits the distribution of counts better than the Poisson (however there is room for improve...zip does the best). 8
9 Figure 3: The models were fit to data and then grouped to see how well the models are fitting the data. 9
10 Figure 4: To see how well the various models are doing in terms of fitting the distribution of number of satellites. Neither the Poisson or Negative Binomial are really doing that well; however, the ZIP does pretty good. 10
11 2. A 95% confidence interval for β with the Negative Binomial is ˆβ ± 1.96(se) = ± 1.96(0.1769) = ± (0.4136, ) Versus the one from the Poisson regression that was (0.4619, ) that has half-length equal to The one from the Negative Binomial is wider than the Poisson because the greater the estimated variance with the Negative Binomial (i.e., ˆµ i ˆµ 2 i ) results in greater estimated standard error for β (see page 82 of the text). Fit a zero inflated Poisson regression using weight as a predictor of the mean and width as a predictor in a logit model for the mixing probability. I fit several ZIP models, but the one that seemed to the best in terms of fit of model to data and parameter estimates are significant is one with weight as a predictor in the Poisson regression and width as a predictor in a logit model for the mixing probability. The results are Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood Full Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) Algorithm converged. Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq 11
12 Intercept <.0001 weight Scale NOTE: The scale parameter was held fixed. Analysis Of Maximum Likelihood Zero Inflation Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq Intercept <.0001 width <.0001 So the estimated model for the probability is ˆπ i = exp( (width) i)) 1+exp( (width) i ). The odds of being in the zero class is exp(.5005) = 0.61 times the odds for a one unit increase in width. In other words, the wider the crab, the less likely they re in the zero-class. The estimated probability of a count: P(Y i = y) = ˆµ i = exp( (weight) i ) { ˆπi +(1 ˆπ i )exp( ˆµ i ) for y = 0 (1 ˆπ i ) exp( ˆµ i)ˆµ y i y! for y > 0 Since exp(0.1945) = 1.21, the expected number of satellites is 1.21 than the mean number of satellites with one unit less in weight. 12
13 Figure 5: Observed and fitted from ZIP with logit model. 13
Section Poisson Regression
Section 14.13 Poisson Regression Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 26 Poisson regression Regular regression data {(x i, Y i )} n i=1,
More informationAnalysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013
Analysis of Count Data A Business Perspective George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013 Overview Count data Methods Conclusions 2 Count data Count data Anything with
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationQ30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only
Moyale Observed counts 12:28 Thursday, December 01, 2011 1 The FREQ Procedure Table 1 of by Controlling for site=moyale Row Pct Improved (1+2) Same () Worsened (4+5) Group only 16 51.61 1.2 14 45.16 1
More informationSections 4.1, 4.2, 4.3
Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear
More informationChapter 4: Generalized Linear Models-II
: Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More informationST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses
ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities
More informationLogistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression
Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024
More informationContrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:
Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.
More informationOverdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion
Biostokastikum Overdispersion is not uncommon in practice. In fact, some would maintain that overdispersion is the norm in practice and nominal dispersion the exception McCullagh and Nelder (1989) Overdispersion
More informationChapter 5: Logistic Regression-I
: Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More informationStatistical Inference & Model Checking for Poisson Regression
Statistical Inference & Model Checking for Poisson Regression Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 Outline Inference
More informationChapter 4: Generalized Linear Models-I
: Generalized Linear Models-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationChapter 14 Logistic and Poisson Regressions
STAT 525 SPRING 2018 Chapter 14 Logistic and Poisson Regressions Professor Min Zhang Logistic Regression Background In many situations, the response variable has only two possible outcomes Disease (Y =
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationYou can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.
The GENMOD Procedure MODEL Statement MODEL response = < effects > < /options > ; MODEL events/trials = < effects > < /options > ; You can specify the response in the form of a single variable or in the
More informationCohen s s Kappa and Log-linear Models
Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance
More informationA CASE STUDY IN HANDLING OVER-DISPERSION IN NEMATODE COUNT DATA SCOTT EDWIN DOUGLAS KREIDER. B.S., The College of William & Mary, 2008 A THESIS
A CASE STUDY IN HANDLING OVER-DISPERSION IN NEMATODE COUNT DATA by SCOTT EDWIN DOUGLAS KREIDER B.S., The College of William & Mary, 2008 A THESIS submitted in partial fulfillment of the requirements for
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationSTAT 525 Fall Final exam. Tuesday December 14, 2010
STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will
More informationSTAT 7030: Categorical Data Analysis
STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationModel Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV
More informationReview: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:
Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic
More information(c) Interpret the estimated effect of temperature on the odds of thermal distress.
STA 4504/5503 Sample questions for exam 2 1. For the 23 space shuttle flights that occurred before the Challenger mission in 1986, Table 1 shows the temperature ( F) at the time of the flight and whether
More informationˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.
Exam 3 Review Suppose that X i = x =(x 1,, x k ) T is observed and that Y i X i = x i independent Binomial(n i,π(x i )) for i =1,, N where ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T x) This is called the
More informationParametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationWhere have all the puffins gone 1. An Analysis of Atlantic Puffin (Fratercula arctica) Banding Returns in Newfoundland
Where have all the puffins gone 1 An Analysis of Atlantic Puffin (Fratercula arctica) Banding Returns in Newfoundland Dave Fifield 1 With apologies to Pete Seeger. Introduction Seabirds are the most visible
More informationA Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46
A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response
More informationGeneralized linear models
Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data
More informationFaculty of Health Sciences. Correlated data. Count variables. Lene Theil Skovgaard & Julie Lyng Forman. December 6, 2016
Faculty of Health Sciences Correlated data Count variables Lene Theil Skovgaard & Julie Lyng Forman December 6, 2016 1 / 76 Modeling count outcomes Outline The Poisson distribution for counts Poisson models,
More informationPoisson Data. Handout #4
Poisson Data The other response variable of interest records the number of blue spots observed after incubation. This type of data, i.e. count data, is often skewed showing numerous small values with occasional
More informationLogistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S
Logistic regression analysis Birthe Lykke Thomsen H. Lundbeck A/S 1 Response with only two categories Example Odds ratio and risk ratio Quantitative explanatory variable More than one variable Logistic
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationSTA6938-Logistic Regression Model
Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of
More informationAn Introduction to the Analysis of Rare Events
PharmaSUG - Paper AA- An Introduction to the Analysis of Rare Events Nate Derby, Stakana Analytics, Seattle, WA ABSTRACT Analyzing rare events like disease incidents, natural disasters, or component failures
More informationWrap-up. The General Linear Model is a special case of the Generalized Linear Model. Consequently, we can carry out any GLM as a GzLM.
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Analysis of Continuous Data ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV (Ch13,
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More informationUNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 Work all problems. 60 points are needed to pass at the Masters Level and 75 to pass at the
More informationHierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!
Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationStatistical Methods III Statistics 212. Problem Set 2 - Answer Key
Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423
More informationRegression Models for Risks(Proportions) and Rates. Proportions. E.g. [Changes in] Sex Ratio: Canadian Births in last 60 years
Regression Models for Risks(Proportions) and Rates Proportions E.g. [Changes in] Sex Ratio: Canadian Births in last 60 years Parameter of Interest: (male)... just above 0.5 or (male)/ (female)... "ODDS"...
More informationST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples
ST3241 Categorical Data Analysis I Logistic Regression An Introduction and Some Examples 1 Business Applications Example Applications The probability that a subject pays a bill on time may use predictors
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3
More informationSingle-level Models for Binary Responses
Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =
More informationMultiple Logistic Regression for Dichotomous Response Variables
Multiple Logistic Regression for Dichotomous Response Variables Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 Outline
More informationStatistical View of Least Squares
May 23, 2006 Purpose of Regression Some Examples Least Squares Purpose of Regression Purpose of Regression Some Examples Least Squares Suppose we have two variables x and y Purpose of Regression Some Examples
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationCOMPLEMENTARY LOG-LOG MODEL
COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α
More informationBMI 541/699 Lecture 22
BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based
More informationThe material for categorical data follows Agresti closely.
Exam 2 is Wednesday March 8 4 sheets of notes The material for categorical data follows Agresti closely A categorical variable is one for which the measurement scale consists of a set of categories Categorical
More informationLogistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy
Logistic Regression Some slides from Craig Burkett STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy Titanic Survival Case Study The RMS Titanic A British passenger liner Collided
More informationCategorical data analysis Chapter 5
Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases
More informationChapter 22: Log-linear regression for Poisson counts
Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure
More informationHomework 1 Solutions
36-720 Homework 1 Solutions Problem 3.4 (a) X 2 79.43 and G 2 90.33. We should compare each to a χ 2 distribution with (2 1)(3 1) 2 degrees of freedom. For each, the p-value is so small that S-plus reports
More informationCount data page 1. Count data. 1. Estimating, testing proportions
Count data page 1 Count data 1. Estimating, testing proportions 100 seeds, 45 germinate. We estimate probability p that a plant will germinate to be 0.45 for this population. Is a 50% germination rate
More informationBIOS 625 Fall 2015 Homework Set 3 Solutions
BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's
More informationGeneralized Linear Models
Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.
More informationGeneralized Linear Models for Count, Skewed, and If and How Much Outcomes
Generalized Linear Models for Count, Skewed, and If and How Much Outcomes Today s Class: Review of 3 parts of a generalized model Models for discrete count or continuous skewed outcomes Models for two-part
More informationModeling Overdispersion
James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 Introduction In this lecture we discuss the problem of overdispersion in
More informationStat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010
1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of
More informationGeneralized Linear Models Introduction
Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,
More informationMohammed. Research in Pharmacoepidemiology National School of Pharmacy, University of Otago
Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago What is zero inflation? Suppose you want to study hippos and the effect of habitat variables on their
More informationSubject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study
Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study 1.4 0.0-6 7 8 9 10 11 12 13 14 15 16 17 18 19 age Model 1: A simple broken stick model with knot at 14 fit with
More informationST505/S697R: Fall Homework 2 Solution.
ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)
More information22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression
22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then
More informationExplanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine.
Horseshoe crab example: There are 173 female crabs for which we wish to model the presence or absence of male satellites dependant upon characteristics of the female horseshoe crabs. 1 satellite present
More informationSection 4.6 Simple Linear Regression
Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval
More informationAMS 7 Correlation and Regression Lecture 8
AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation
More informationConfirmatory and Exploratory Data Analyses Using PROC GENMOD: Factors Associated with Red Light Running Crashes
Confirmatory and Exploratory Data Analyses Using PROC GENMOD: Factors Associated with Red Light Running Crashes Li wan Chen, LENDIS Corporation, McLean, VA Forrest Council, Highway Safety Research Center,
More informationA strategy for modelling count data which may have extra zeros
A strategy for modelling count data which may have extra zeros Alan Welsh Centre for Mathematics and its Applications Australian National University The Data Response is the number of Leadbeater s possum
More informationThe GENMOD Procedure (Book Excerpt)
SAS/STAT 9.22 User s Guide The GENMOD Procedure (Book Excerpt) SAS Documentation This document is an individual chapter from SAS/STAT 9.22 User s Guide. The correct bibliographic citation for the complete
More informationStat 579: Generalized Linear Models and Extensions
Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,
More informationGeneralized Additive Models
Generalized Additive Models The Model The GLM is: g( µ) = ß 0 + ß 1 x 1 + ß 2 x 2 +... + ß k x k The generalization to the GAM is: g(µ) = ß 0 + f 1 (x 1 ) + f 2 (x 2 ) +... + f k (x k ) where the functions
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationPoisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Poisson Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Poisson Regression 1 / 49 Poisson Regression 1 Introduction
More informationSAS/STAT 14.2 User s Guide. The GENMOD Procedure
SAS/STAT 14.2 User s Guide The GENMOD Procedure This document is an individual chapter from SAS/STAT 14.2 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute Inc.
More informationLab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )
Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p. 376-390) BIO656 2009 Goal: To see if a major health-care reform which took place in 1997 in Germany was
More informationLISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014
LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers
More informationMath 3330: Solution to midterm Exam
Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the
More informationGeneralized Models: Part 1
Generalized Models: Part 1 Topics: Introduction to generalized models Introduction to maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical outcomes
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationSTA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6
STA 8 Applied Linear Models: Regression Analysis Spring 011 Solution for Homework #6 6. a) = 11 1 31 41 51 1 3 4 5 11 1 31 41 51 β = β1 β β 3 b) = 1 1 1 1 1 11 1 31 41 51 1 3 4 5 β = β 0 β1 β 6.15 a) Stem-and-leaf
More informationGood Confidence Intervals for Categorical Data Analyses. Alan Agresti
Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline
More informationSections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Sections 3.4, 3.5 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 3.4 I J tables with ordinal outcomes Tests that take advantage of ordinal
More informationGeneralized Linear Models
Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n
More informationThe GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next
Book Contents Previous Next SAS/STAT User's Guide Overview Getting Started Syntax Details Examples References Book Contents Previous Next Top http://v8doc.sas.com/sashtml/stat/chap29/index.htm29/10/2004
More informationChapter 1. Modeling Basics
Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical
More informationMixed models in R using the lme4 package Part 5: Generalized linear mixed models
Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates 2011-03-16 Contents 1 Generalized Linear Mixed Models Generalized Linear Mixed Models When using linear mixed
More information