A course in statistical modelling. session 06b: Modelling count data
|
|
- Alexander Barber
- 5 years ago
- Views:
Transcription
1 A Course in Statistical Modelling University of Glasgow 29 and 30 January, 2015 session 06b: Modelling count data Graeme Hutcheson 1 Luiz Moutinho 2 1 Manchester Institute of Education Manchester university 2 Adam Smith Business School Glasgow University Count data is common in the social sciences and is often the variable that is being modelled (the response). Examples of count variables include the number of children in a family, number of credit cards, number of applications, number of parking tickets or the number of excluded pupils. It is important to note that counts are different to continuous data (for example, they can t assume negative values) and need to be analysed using a different technique. The technique we use to analyse count data is Poisson regression, which is a GLM with a log link (the random and systematic components of the model are linked with a log function - giving a log-linear model). In other words, it is the log of the response variable that is linearly related to the explanatory variables.
2 To illustrate the need for the Poisson regression, the following data are analysed using OLS regression (for continuous data) and compared to an analysis using Poisson regression (for count data). It should be obvious from the output which of these models is more appropriate for our data. count data set The example we are going to use here is a simple made up one investigating the relationship between the number of children (0 to 4) and the salary earned by female employees in a particular company. Our model is... children salary which also shows how the dataset needs to be structured... number of children salary ( 1000) The dataset poissonexample.csv is available from the course web-site.
3 An OLS regression model of number of children... children OLS regression The OLS regression model shows a significant negative relationship between children and salary. This model is not particularly accurate, however, as the predicted number of children assume negative values (for salries above 125); something that is not possible salary OLS regression is NOT a good technique to use to model count data... A Poisson regression model of number of children... Poisson regression: default plot children The top graphic shows the default effect plot (Y is on a log scale) and shows clearly that the Poisson model is linear. Of particular note is the non linear scaling of the Y-axis. children salary Poisson regression: rescale.axis salary The lower graphic shows the effect plot drawn with the addition of the command rescale.axis=false and clearly shows that the predicted number of children is non-linear over the salary range and never goes below zero. The Poisson model is a more accurate model of count data than the OLS model...
4 The example above demonstrated that count data behave differently to continuous data making an OLS regression model inappropriate (for the same reason, t-tests and ANOVAs computed on count data are also inappropriate). This is particularly obvious when predictions of counts assume negative values (interpreting effect displays are useful for demonstrating this). During this course we will model count data using Poisson regression, which is simply a GLM model with a log link. A full description of the model and examples are provided below... Poisson regression: an example analysis The following example of Poisson regression uses an actual dataset (Arrests) that is available as part of the effects library. These data give information about the number of police data bases a person appears on (checks), their colour, age and sex, the year in which they were arrested and whether they are currently in employment. Load these data using the Rcmdr menu options...
5 Recoding year as categorical... NOTE: this dataset codes the variable year as continuous; a variable it is probably best to consider as categorical. Recode this variable into a categorical variable (yearcat) using the Rcmdr menus... Defining the model... Previous research indicates that a model of interest is... checks sex*yearcat + colour*yearcat + citizen*yearcat + age We are particularly interested in changes in sex, colour and citizen over the years... The following analyses show a Poisson regression model run using the Rcmdr. The task here is to interpret the effect displays... these should give you a clear picture of the relationships in the data and should agree with the results from the standard output...
6 Running a poisson regression model... age effect plot sex*yearcat effect plot checks checks sex : Female sex : Male age yearcat yearcat*colour effect plot yearcat*citizen effect plot colour : Black colour : White 2.0 citizen : No citizen : Yes checks checks yearcat yearcat
7 Estimate Std. Error z value Pr(> z ) (Intercept) ** sex[t.male] e-06 *** yearcat[t.1998] yearcat[t.1999] ** yearcat[t.2000] ** yearcat[t.2001] * yearcat[t.2002] colour[t.white] ** citizen[t.yes] age < 2e-16 *** sex[t.male]:yearcat[t.1998] * sex[t.male]:yearcat[t.1999] *** sex[t.male]:yearcat[t.2000] *** sex[t.male]:yearcat[t.2001] ** sex[t.male]:yearcat[t.2002] yearcat[t.1998]:colour[t.white] yearcat[t.1999]:colour[t.white] ** yearcat[t.2000]:colour[t.white] yearcat[t.2001]:colour[t.white] yearcat[t.2002]:colour[t.white] yearcat[t.1998]:citizen[t.yes] yearcat[t.1999]:citizen[t.yes] yearcat[t.2000]:citizen[t.yes] yearcat[t.2001]:citizen[t.yes] yearcat[t.2002]:citizen[t.yes] LR Chisq Df Pr(>Chisq) sex < 2.2e-16 *** yearcat *** colour < 2.2e-16 *** citizen *** age < 2.2e-16 *** sex:yearcat ** yearcat:colour yearcat:citizen
8 The effect displays and the standard output provide similar impressions of the results. The effect displays are, however, much easier to interpret, particularly in the presence of interaction terms. The effect displays give the same information as the regression parameters. The regression estimate shows that for each unit increase in age, the log of the number of checks increases by This can be verified from the effect plot as the number of checks increases from 1.55 to 2.60 when age increases by 40. A unit increase is therefore (log(2.6) - log(1.55))/40 = The conclusions about significance are similar for both reporting methods, although the effect displays do indicate that the year 1997 may be responsible for most, if not all, of the significance (a result easy to miss from the standard output). The effect displays give more detailed information than the standard output and do not require any mathematical manipulation of parameters. They are easier to understand and more informative. The standard output is mostly useful to verify and quantify certain aspects of the model. It is not required for UNDERSTANDING the model.
9 The analysis of contingency tables... The Poisson regression models are particularly useful as they allow the analysis of contingency table data... Consider the following contingency table, taken from... Hutcheson, G. D. and Schaefer, L. (2012). Test selection in the 21st century. Journal of Modelling in Management, 7,3: Group A B C North Region South West 1 2 1
10 The information we have here is cell count - a count variable. In order to investigate the relationship between region and group, we need to look at the interaction model (to see if region influences group). The model for this is... cell count region*group Which suggests we need three columns of data; one for cell count, one for region and one for group... frequency group region 2 A north 2 A south 1 A west 1 B north 2 B south 2 B west 3 C north 1 C south 1 C west
11 Poisson model of contingency table: effect display group*region effect plot region : C count region : A region : B 4 2 A B C group Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) group[t.b] group[t.c] region[t.south] region[t.west] group[t.b]:region[t.south] group[t.c]:region[t.south] group[t.b]:region[t.west] group[t.c]:region[t.west] Analysis of Deviance Table (Type II tests) Response: count LR Chisq Df Pr(>Chisq) group region group:region
12 This model shows no significant interactions, which is hardly surprising given the small sample size involved. What is particularly interesting, however, are the statistics for the group:region interaction, which are exactly the same as the standard contingency table chi-square test. It is also interesting to note that this analysis gives the same results as a multinomial regression predicting one of the variables (region or group). The model for this is... region group Which suggests a different data structure consisting of just two groups. The structure of the data can easily be changed using the R command... CONTINGENCYtable01LONG <- as.data.frame(lapply(contingencytable01, function(x) rep(x, CONTINGENCYtable01$count))) EXERCISES...
13 HairEyeColor... Load the HairEyeColor contingency table from the datasets library... Plot the effect displays for the three way interaction hair*eye*sex What does this suggest to you? Does it agree with the tabular output of parameter values and significance estimates? rescale the axis for the effect display... What does this suggest to you? Does it agree with the tabular output of parameter values and significance estimates? You may construct an animation - going through all hair colours or genders using the given.values = c(sexmale = 1) command... Compare Poisson analysis to multinomial A useful exercise, if you have tim,e, is to compare the Poisson regression analyses with a multinomial logit model. First, you will need to transform the HairEyeColor contingency table into a long-format data frame... HairEyeColorLong <- as.data.frame(lapply(haireyecolor, function(x) rep(x, HairEyeColor$Freq))) You should get the same significance values for the models...
A course in statistical modelling. session 09: Modelling count variables
A Course in Statistical Modelling SEED PGR methodology training December 08, 2015: 12 2pm session 09: Modelling count variables Graeme.Hutcheson@manchester.ac.uk blackboard: RSCH80000 SEED PGR Research
More informationGeneralized Linear Models
Generalized Linear Models Methods@Manchester Summer School Manchester University July 2 6, 2018 Generalized Linear Models: a generic approach to statistical modelling www.research-training.net/manchester2018
More informationGLM models and OLS regression
GLM models and OLS regression Graeme Hutcheson, University of Manchester These lecture notes are based on material published in... Hutcheson, G. D. and Sofroniou, N. (1999). The Multivariate Social Scientist:
More informationStatistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010
Statistical Models for Management Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon February 24 26, 2010 Graeme Hutcheson, University of Manchester GLM models and OLS regression The
More informationLog-linear Models for Contingency Tables
Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A
More informationGeneralised linear models. Response variable can take a number of different formats
Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion
More informationAnalysing categorical data using logit models
Analysing categorical data using logit models Graeme Hutcheson, University of Manchester The lecture notes, exercises and data sets associated with this course are available for download from: www.research-training.net/manchester
More informationCohen s s Kappa and Log-linear Models
Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance
More informationTwo Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00
Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section
More informationBivariate data analysis
Bivariate data analysis Categorical data - creating data set Upload the following data set to R Commander sex female male male male male female female male female female eye black black blue green green
More informationQ30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only
Moyale Observed counts 12:28 Thursday, December 01, 2011 1 The FREQ Procedure Table 1 of by Controlling for site=moyale Row Pct Improved (1+2) Same () Worsened (4+5) Group only 16 51.61 1.2 14 45.16 1
More informationBMI 541/699 Lecture 22
BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More informationOverdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion
Biostokastikum Overdispersion is not uncommon in practice. In fact, some would maintain that overdispersion is the norm in practice and nominal dispersion the exception McCullagh and Nelder (1989) Overdispersion
More informationReview of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models
Chapter 6 Multicategory Logit Models Response Y has J > 2 categories. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. 6.1 Logit Models for Nominal Responses
More informationGeneralized linear models
Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data
More informationA Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps
More informationReview: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:
Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationMultiple Regression: Chapter 13. July 24, 2015
Multiple Regression: Chapter 13 July 24, 2015 Multiple Regression (MR) Response Variable: Y - only one response variable (quantitative) Several Predictor Variables: X 1, X 2, X 3,..., X p (p = # predictors)
More informationPaper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD
Paper: ST-161 Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop Institute @ UMBC, Baltimore, MD ABSTRACT SAS has many tools that can be used for data analysis. From Freqs
More informationNormal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,
Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability
More informationFREC 608 Guided Exercise 9
FREC 608 Guided Eercise 9 Problem. Model of Average Annual Precipitation An article in Geography (July 980) used regression to predict average annual rainfall levels in California. Data on the following
More informationLet s see if we can predict whether a student returns or does not return to St. Ambrose for their second year.
Assignment #13: GLM Scenario: Over the past few years, our first-to-second year retention rate has ranged from 77-80%. In other words, 77-80% of our first-year students come back to St. Ambrose for their
More informationModel Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV
More informationMULTINOMIAL LOGISTIC REGRESSION
MULTINOMIAL LOGISTIC REGRESSION Model graphically: Variable Y is a dependent variable, variables X, Z, W are called regressors. Multinomial logistic regression is a generalization of the binary logistic
More informationInvestigating Models with Two or Three Categories
Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might
More informationSection 5: Dummy Variables and Interactions
Section 5: Dummy Variables and Interactions Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Example: Detecting
More informationLecture Notes 12 Advanced Topics Econ 20150, Principles of Statistics Kevin R Foster, CCNY Spring 2012
Lecture Notes 2 Advanced Topics Econ 2050, Principles of Statistics Kevin R Foster, CCNY Spring 202 Endogenous Independent Variables are Invalid Need to have X causing Y not vice-versa or both! NEVER regress
More information13.1 Categorical Data and the Multinomial Experiment
Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationx3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators
Multiple Regression Relating a response (dependent, input) y to a set of explanatory (independent, output, predictor) variables x, x 2, x 3,, x q. A technique for modeling the relationship between variables.
More informationWrap-up. The General Linear Model is a special case of the Generalized Linear Model. Consequently, we can carry out any GLM as a GzLM.
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Analysis of Continuous Data ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV (Ch13,
More informationRon Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)
Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October
More informationOne-Way ANOVA. Some examples of when ANOVA would be appropriate include:
One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement
More informationTesting Independence
Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1
More informationSTAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression
STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test
More informationContingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878
Contingency Tables I. Definition & Examples. A) Contingency tables are tables where we are looking at two (or more - but we won t cover three or more way tables, it s way too complicated) factors, each
More information7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis
Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression
More informationExample: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA
s:5 Applied Linear Regression Chapter 8: ANOVA Two-way ANOVA Used to compare populations means when the populations are classified by two factors (or categorical variables) For example sex and occupation
More informationLogistic Regression - problem 6.14
Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values
More informationStat 8053, Fall 2013: Multinomial Logistic Models
Stat 8053, Fall 2013: Multinomial Logistic Models Here is the example on page 269 of Agresti on food preference of alligators: s is size class, g is sex of the alligator, l is name of the lake, and f is
More informationRegression with Qualitative Information. Part VI. Regression with Qualitative Information
Part VI Regression with Qualitative Information As of Oct 17, 2017 1 Regression with Qualitative Information Single Dummy Independent Variable Multiple Categories Ordinal Information Interaction Involving
More informationLogistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy
Logistic Regression Some slides from Craig Burkett STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy Titanic Survival Case Study The RMS Titanic A British passenger liner Collided
More informationUnit 5 Logistic Regression Practice Problems
Unit 5 Logistic Regression Practice Problems SOLUTIONS R Users Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition. Boca Raton: Chapman and Hall, 2004. Exercises
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationA Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 7 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, Colonic
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks
More informationcor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )
Tutorial 7: Correlation and Regression Correlation Used to test whether two variables are linearly associated. A correlation coefficient (r) indicates the strength and direction of the association. A correlation
More informationSTAT 7030: Categorical Data Analysis
STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012
More informationLogistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression
Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationIntroducing Generalized Linear Models: Logistic Regression
Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and
More informationAnalysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationNon-Gaussian Response Variables
Non-Gaussian Response Variables What is the Generalized Model Doing? The fixed effects are like the factors in a traditional analysis of variance or linear model The random effects are different A generalized
More informationAnnouncements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)
Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall) We will cover Chs. 5 and 6 first, then 3 and 4. Mon,
More informationSPSS LAB FILE 1
SPSS LAB FILE www.mcdtu.wordpress.com 1 www.mcdtu.wordpress.com 2 www.mcdtu.wordpress.com 3 OBJECTIVE 1: Transporation of Data Set to SPSS Editor INPUTS: Files: group1.xlsx, group1.txt PROCEDURE FOLLOWED:
More informationValue Added Modeling
Value Added Modeling Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Background for VAMs Recall from previous lectures
More informationGeneralized logit models for nominal multinomial responses. Local odds ratios
Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π
More informationMcGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper
Student Name: ID: McGill University Faculty of Science Department of Mathematics and Statistics Statistics Part A Comprehensive Exam Methodology Paper Date: Friday, May 13, 2016 Time: 13:00 17:00 Instructions
More informationCorrelation & Simple Regression
Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.
More informationCHAPTER 10. Regression and Correlation
CHAPTER 10 Regression and Correlation In this Chapter we assess the strength of the linear relationship between two continuous variables. If a significant linear relationship is found, the next step would
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More informationFrequency Distribution Cross-Tabulation
Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape
More informationHierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!
Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter
More informationADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables
ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationRegression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.
Regression models Generalized linear models in R Dr Peter K Dunn http://www.usq.edu.au Department of Mathematics and Computing University of Southern Queensland ASC, July 00 The usual linear regression
More informationTables and Figures. This draft, July 2, 2007
and Figures This draft, July 2, 2007 1 / 16 Figures 2 / 16 Figure 1: Density of Estimated Propensity Score Pr(D=1) % 50 40 Treated Group Untreated Group 30 f (P) 20 10 0.01~.10.11~.20.21~.30.31~.40.41~.50.51~.60.61~.70.71~.80.81~.90.91~.99
More informationLecture 24: Partial correlation, multiple regression, and correlation
Lecture 24: Partial correlation, multiple regression, and correlation Ernesto F. L. Amaral November 21, 2017 Advanced Methods of Social Research (SOCI 420) Source: Healey, Joseph F. 2015. Statistics: A
More information5:1LEC - BETWEEN-S FACTORIAL ANOVA
5:1LEC - BETWEEN-S FACTORIAL ANOVA The single-factor Between-S design described in previous classes is only appropriate when there is just one independent variable or factor in the study. Often, however,
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationMixed effects models
Mixed effects models The basic theory and application in R Mitchel van Loon Research Paper Business Analytics Mixed effects models The basic theory and application in R Author: Mitchel van Loon Research
More information22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)
22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are
More informationTopic 28: Unequal Replication in Two-Way ANOVA
Topic 28: Unequal Replication in Two-Way ANOVA Outline Two-way ANOVA with unequal numbers of observations in the cells Data and model Regression approach Parameter estimates Previous analyses with constant
More informationExplanatory Variables Must be Linear Independent...
Explanatory Variables Must be Linear Independent... Recall the multiple linear regression model Y j = β 0 + β 1 X 1j + β 2 X 2j + + β p X pj + ε j, i = 1,, n. is a shorthand for n linear relationships
More informationST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses
ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities
More information11 CHI-SQUARED Introduction. Objectives. How random are your numbers? After studying this chapter you should
11 CHI-SQUARED Chapter 11 Chi-squared Objectives After studying this chapter you should be able to use the χ 2 distribution to test if a set of observations fits an appropriate model; know how to calculate
More informationReview of Multiple Regression
Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate
More informationMultiple Regression. Peerapat Wongchaiwat, Ph.D.
Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com The Multiple Regression Model Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (X i ) Multiple Regression Model
More informationTento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/
Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.
More informationy i s 2 X 1 n i 1 1. Show that the least squares estimators can be written as n xx i x i 1 ns 2 X i 1 n ` px xqx i x i 1 pδ ij 1 n px i xq x j x
Question 1 Suppose that we have data Let x 1 n x i px 1, y 1 q,..., px n, y n q. ȳ 1 n y i s 2 X 1 n px i xq 2 Throughout this question, we assume that the simple linear model is correct. We also assume
More informationHomework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.
EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests
More informationInterpreting and using heterogeneous choice & generalized ordered logit models
Interpreting and using heterogeneous choice & generalized ordered logit models Richard Williams Department of Sociology University of Notre Dame July 2006 http://www.nd.edu/~rwilliam/ The gologit/gologit2
More informationLab # 11: Correlation and Model Fitting
Lab # 11: Correlation and Model Fitting Objectives: 1. Correlations between variables 2. Data Manipulation, creation of squares 3. Model fitting with regression 4. Comparison of models Correlations between
More informationPoisson Data. Handout #4
Poisson Data The other response variable of interest records the number of blue spots observed after incubation. This type of data, i.e. count data, is often skewed showing numerous small values with occasional
More informationAnswer to exercise: Blood pressure lowering drugs
Answer to exercise: Blood pressure lowering drugs The data set bloodpressure.txt contains data from a cross-over trial, involving three different formulations of a drug for lowering of blood pressure:
More informationCorrelation and Regression (Excel 2007)
Correlation and Regression (Excel 2007) (See Also Scatterplots, Regression Lines, and Time Series Charts With Excel 2007 for instructions on making a scatterplot of the data and an alternate method of
More information7 Ordered Categorical Response Models
7 Ordered Categorical Response Models In several chapters so far we have addressed modeling a binary response variable, for instance gingival bleeding upon periodontal probing, which has only two expressions,
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More information(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?)
12. Comparing Groups: Analysis of Variance (ANOVA) Methods Response y Explanatory x var s Method Categorical Categorical Contingency tables (Ch. 8) (chi-squared, etc.) Quantitative Quantitative Regression
More informationSociology 362 Data Exercise 6 Logistic Regression 2
Sociology 362 Data Exercise 6 Logistic Regression 2 The questions below refer to the data and output beginning on the next page. Although the raw data are given there, you do not have to do any Stata runs
More informationAn Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012
An Introduction to Multilevel Models PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012 Today s Class Concepts in Longitudinal Modeling Between-Person vs. +Within-Person
More informationStatistical methods for Education Economics
Statistical methods for Education Economics Massimiliano Bratti http://www.economia.unimi.it/bratti Course of Education Economics Faculty of Political Sciences, University of Milan Academic Year 2007-08
More informationOrdinal Variables in 2 way Tables
Ordinal Variables in 2 way Tables Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 C.J. Anderson (Illinois) Ordinal Variables
More informationA SHORT INTRODUCTION TO PROBABILITY
A Lecture for B.Sc. 2 nd Semester, Statistics (General) A SHORT INTRODUCTION TO PROBABILITY By Dr. Ajit Goswami Dept. of Statistics MDKG College, Dibrugarh 19-Apr-18 1 Terminology The possible outcomes
More informationMath 138 Summer Section 412- Unit Test 1 Green Form, page 1 of 7
Math 138 Summer 1 2013 Section 412- Unit Test 1 Green Form page 1 of 7 1. Multiple Choice. Please circle your answer. Each question is worth 3 points. (a) Social Security Numbers are illustrations of which
More information