A course in statistical modelling. session 09: Modelling count variables
|
|
- Alan Wade
- 5 years ago
- Views:
Transcription
1 A Course in Statistical Modelling SEED PGR methodology training December 08, 2015: 12 2pm session 09: Modelling count variables blackboard: RSCH80000 SEED PGR Research Modules internet: Manchester Institute of Education, University of Manchester Count data is common in the social sciences and is often the variable that is being modelled (the response). Examples of count variables include the number of children in a family, number of credit cards, number of applications, number of parking tickets or the number of excluded pupils. It is important to note that counts are different to continuous data (for example, they can t assume negative values) and need to be analysed using a different technique. The technique we use to analyse count data is Poisson regression, which is a GLM with a log link (the random and systematic components of the model are linked with a log function - giving a log-linear model). In other words, it is the log of the response variable that is linearly related to the explanatory variables.
2 To illustrate the need for the Poisson regression, the following data are analysed using OLS regression (for continuous data) and compared to an analysis using Poisson regression (for count data). It should be obvious from the output which of these models is more appropriate for our data. count data set The example we are going to use here is a simple made up one investigating the relationship between the number of children (0 to 4) and the salary earned by female employees in a particular company. Our model is... children salary which also shows how the dataset needs to be structured... number of children salary ( 1000) The dataset poissonexample.csv is available from the course web-site.
3 A scatterplot showing the relationship between number of children and disposable income children salary The observed data suggest that those with 0 and 1 child tend to have more disposable income; although this is not an obvious relationship to depict. The green straight line is an OLS regression model of these data and the red curved line is a line of local best-fit. It is noticeable that the OLS regression line of this model does not appear to be a particularly accurate fit (see the effect display below). An OLS regression model of number of children... OLS regression children salary The effect display of the OLS regression model suggests a significant negative relationship between children and salary. This model is not particularly accurate, however, as the predicted number of children assume negative values (for salaries above 125); something that is not possible. OLS regression is, therefore, NOT a good technique to use to model these data...
4 A Poisson regression model of number of children... This model is essentially the same as the OLS regression model computed earlier, but with a log link (from the Poisson distribution). children children Poisson regression: default plot salary Poisson regression: rescale.axis salary The top graphic shows the default effect plot (Y is depicted using a log scale) and shows clearly that the Poisson model is linear. Of particular note is the non-linear scaling of the Y-axis. The lower graphic shows the effect plot drawn with the addition of the command rescale.axis=false which depicts Y as a count. This graphic clearly shows that the predicted number of children is non-linear over the salary range and never goes below zero. The Poisson model is a more accurate model of these data than the OLS model and more closely fits the line of local best-fit shown previously.
5 The example above demonstrated that count data behave differently to continuous data making an OLS regression model inappropriate (for the same reason, t-tests and ANOVAs computed on count data are also inappropriate). This is particularly obvious when predictions of counts assume negative values (interpreting effect displays are useful for demonstrating this). During this course we will model count data using Poisson regression, which is simply a GLM model with a log link. A full description of the model and examples are provided below... Poisson regression: an example analysis The following example of Poisson regression uses an actual dataset (Arrests) that is available as part of the effects library. These data give information about the number of police data bases a person appears on (checks), their colour, age and sex, the year in which they were arrested and whether they are currently in employment. Load these data using the Rcmdr menu options...
6 Recoding year as categorical... NOTE: this dataset codes the variable year as continuous; a variable it is probably best to consider as categorical. Recode this variable into a categorical variable (yearcat) using the Rcmdr menus... Defining the model... Previous research indicates that a model of interest is... checks sex*yearcat + colour*yearcat + citizen*yearcat + age We are particularly interested in changes in sex, colour and citizen over the years... The following analyses show a Poisson regression model run using the Rcmdr. The task here is to interpret the effect displays... these should give you a clear picture of the relationships in the data and should agree with the results from the standard output...
7 Running a poisson regression model... Poisson regression model: standard output (TYPE III tests) Estimate Std. Error z value Pr(> z ) (Intercept) ** sex[t.male] e-06 *** yearcat[t.1998] yearcat[t.1999] ** yearcat[t.2000] ** yearcat[t.2001] * yearcat[t.2002] colour[t.white] ** citizen[t.yes] age < 2e-16 *** sex[t.male]:yearcat[t.1998] * sex[t.male]:yearcat[t.1999] *** sex[t.male]:yearcat[t.2000] *** sex[t.male]:yearcat[t.2001] ** sex[t.male]:yearcat[t.2002] yearcat[t.1998]:colour[t.white] yearcat[t.1999]:colour[t.white] ** yearcat[t.2000]:colour[t.white] yearcat[t.2001]:colour[t.white] yearcat[t.2002]:colour[t.white] yearcat[t.1998]:citizen[t.yes] yearcat[t.1999]:citizen[t.yes] yearcat[t.2000]:citizen[t.yes] yearcat[t.2001]:citizen[t.yes] yearcat[t.2002]:citizen[t.yes]
8 Poisson regression model: Analysis of Deviance table (TYPE II tests) LR Chisq Df Pr(>Chisq) sex < 2.2e-16 *** yearcat *** colour < 2.2e-16 *** citizen *** age < 2.2e-16 *** sex:yearcat ** yearcat:colour yearcat:citizen Note the big differences between the significance values for some of the parameters provided by the TYPE II and TYPE III tests (for example, the main effect term for the variable colour ). age effect plot sex*yearcat effect plot checks checks sex : Female sex : Male age yearcat yearcat*colour effect plot yearcat*citizen effect plot colour : Black colour : White 2.0 citizen : No citizen : Yes checks checks yearcat yearcat
9 The effect displays and the standard output provide similar impressions of the results. The effect displays are, however, much easier to interpret, particularly in the presence of interaction terms. The effect displays give the same information as the regression parameters. For example, the regression estimate shows that for each unit increase in age, the log of the number of checks increases by This can be verified from the effect plot as the number of checks increases from 1.55 to 2.60 when age increases by 40. A unit increase is therefore... (log(2.6) - log(1.55))/40 = The conclusions about significance are similar for both reporting methods, although the effect displays do indicate that the year 1997 may be responsible for most, if not all, of the significance (a result easy to miss from the standard output). The effect displays give more detailed information than the standard output and do not require any mathematical manipulation of parameters. They are easier to understand and more informative. The standard output is mostly useful to verify and quantify certain aspects of the model. It is not required for UNDERSTANDING the model.
10 The analysis of contingency tables... The Poisson regression models are particularly useful as they allow the analysis of contingency table data... Consider the following contingency table, which shows the group someone belongs to and the region in which they live, taken from... Hutcheson, G. D. and Schaefer, L. (2012). Test selection in the 21st century. Journal of Modelling in Management, 7,3: Group A B C North Region South West 1 2 1
11 The information we have here is cell count - a count variable. In order to investigate the relationship between region and group, we need to look at the interaction model (to see if region influences group). The model for this is... cell count region*group Which suggests we need three columns of data; one for cell count, one for region and one for group. The model tells us how to structure the data. frequency group region 2 A north 2 A south 1 A west 1 B north 2 B south 2 B west 3 C north 1 C south 1 C west
12 Poisson model of contingency table Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) group[t.b] group[t.c] region[t.south] region[t.west] group[t.b]:region[t.south] group[t.c]:region[t.south] group[t.b]:region[t.west] group[t.c]:region[t.west] Analysis of Deviance Table (Type II tests) Response: count LR Chisq Df Pr(>Chisq) group region group:region
13 Poisson model of contingency table: effect display group*region effect plot region : C count region : A region : B 4 2 A B C group This model shows no significant interactions, which is hardly surprising given the small sample size involved. What is particularly interesting, however, are the statistics for the group:region interaction, which are exactly the same as the likelihood-ratio (G 2 ) test, which is provided as part of the standard chi-square output. This analysis is available as part of the Deducer library that can be installed onto your system using the packages tab in the lower right window of the R-studio. The likelihood-ratio test uses the following command... likelihood.test(group, region) which suggest that the data need to be represented in two columns - one giving information about group and the other information about region.
14 Rearranging the data-frame The data-frame used for the poisson regression model above (the data frame that included cell-count) can be re-arranged to include just information about region and group. The original data-frame (CONTINGENCYtable01) can be rearranged in R using the command... CONTINGENCYtable01LONG <- as.data.frame (lapply(contingencytable01, function(x) rep(x, CONTINGENCYtable01$count))) or can be downloaded from the course web-site in file CONTINGENCYtable01LONG.csv. Computing the likelihood-ratio statistic Load Deducer and run the likelihood test... library(deducer) likelihood.test(contingencytable01long$group, CONTINGENCYtable01LONG$region) Log likelihood ratio (G-test) test of independence without correction Log likelihood ratio statistic (G) = , X-squared df = 4, p-value =
15 It is also interesting to note that this analysis gives the same results as a multinomial regression predicting one of the variables (region or group). The model for this is... region group Which suggests a different data structure consisting of just two groups. A multinomial model can be obtained using the following commands from the Rcmdr. run the MNL model
16 get the analysis of deviance table Results from the MNL model Analysis of Deviance Table (Type II tests) Response: group LR Chisq Df Pr(>Chisq) region
17 EXERCISES... HairEyeColor... Load the HairEyeColor contingency table from the datasets library... Plot the effect displays for the three way interaction hair*eye*sex What does this suggest to you? Does it agree with the tabular output of parameter values and significance estimates? rescale the axis for the effect display... What does this suggest to you? Does it agree with the tabular output of parameter values and significance estimates? You may construct an animation - going through all hair colours or genders using the given.values = c(sexmale = 1) command...
18 Compare Poisson analysis to multinomial A useful exercise, if you have tim,e, is to compare the Poisson regression analyses with a multinomial logit model. First, you will need to transform the HairEyeColor contingency table into a long-format data frame... HairEyeColorLong <- as.data.frame(lapply(haireyecolor, function(x) rep(x, HairEyeColor$Freq))) You should get the same significance values for the models...
A course in statistical modelling. session 06b: Modelling count data
A Course in Statistical Modelling University of Glasgow 29 and 30 January, 2015 session 06b: Modelling count data Graeme Hutcheson 1 Luiz Moutinho 2 1 Manchester Institute of Education Manchester university
More informationGeneralized Linear Models
Generalized Linear Models Methods@Manchester Summer School Manchester University July 2 6, 2018 Generalized Linear Models: a generic approach to statistical modelling www.research-training.net/manchester2018
More informationAnalysing categorical data using logit models
Analysing categorical data using logit models Graeme Hutcheson, University of Manchester The lecture notes, exercises and data sets associated with this course are available for download from: www.research-training.net/manchester
More informationStatistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010
Statistical Models for Management Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon February 24 26, 2010 Graeme Hutcheson, University of Manchester GLM models and OLS regression The
More informationGLM models and OLS regression
GLM models and OLS regression Graeme Hutcheson, University of Manchester These lecture notes are based on material published in... Hutcheson, G. D. and Sofroniou, N. (1999). The Multivariate Social Scientist:
More informationLog-linear Models for Contingency Tables
Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A
More informationBivariate data analysis
Bivariate data analysis Categorical data - creating data set Upload the following data set to R Commander sex female male male male male female female male female female eye black black blue green green
More informationGeneralised linear models. Response variable can take a number of different formats
Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion
More informationQ30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only
Moyale Observed counts 12:28 Thursday, December 01, 2011 1 The FREQ Procedure Table 1 of by Controlling for site=moyale Row Pct Improved (1+2) Same () Worsened (4+5) Group only 16 51.61 1.2 14 45.16 1
More informationTwo Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00
Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section
More informationOverdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion
Biostokastikum Overdispersion is not uncommon in practice. In fact, some would maintain that overdispersion is the norm in practice and nominal dispersion the exception McCullagh and Nelder (1989) Overdispersion
More informationReview of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models
Chapter 6 Multicategory Logit Models Response Y has J > 2 categories. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. 6.1 Logit Models for Nominal Responses
More information7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis
Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression
More informationGeneralized linear models
Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data
More informationReview: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:
Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic
More informationCohen s s Kappa and Log-linear Models
Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationMULTINOMIAL LOGISTIC REGRESSION
MULTINOMIAL LOGISTIC REGRESSION Model graphically: Variable Y is a dependent variable, variables X, Z, W are called regressors. Multinomial logistic regression is a generalization of the binary logistic
More informationUnit 5 Logistic Regression Practice Problems
Unit 5 Logistic Regression Practice Problems SOLUTIONS R Users Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition. Boca Raton: Chapman and Hall, 2004. Exercises
More informationNormal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,
Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability
More informationInvestigating Models with Two or Three Categories
Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might
More informationModel Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV
More informationLogistic Regression - problem 6.14
Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values
More informationSTAT 7030: Categorical Data Analysis
STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012
More informationWrap-up. The General Linear Model is a special case of the Generalized Linear Model. Consequently, we can carry out any GLM as a GzLM.
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Analysis of Continuous Data ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV (Ch13,
More informationLecture Notes 12 Advanced Topics Econ 20150, Principles of Statistics Kevin R Foster, CCNY Spring 2012
Lecture Notes 2 Advanced Topics Econ 2050, Principles of Statistics Kevin R Foster, CCNY Spring 202 Endogenous Independent Variables are Invalid Need to have X causing Y not vice-versa or both! NEVER regress
More informationBMI 541/699 Lecture 22
BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based
More informationContingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878
Contingency Tables I. Definition & Examples. A) Contingency tables are tables where we are looking at two (or more - but we won t cover three or more way tables, it s way too complicated) factors, each
More informationA Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps
More informationSPSS LAB FILE 1
SPSS LAB FILE www.mcdtu.wordpress.com 1 www.mcdtu.wordpress.com 2 www.mcdtu.wordpress.com 3 OBJECTIVE 1: Transporation of Data Set to SPSS Editor INPUTS: Files: group1.xlsx, group1.txt PROCEDURE FOLLOWED:
More informationCHAPTER 10. Regression and Correlation
CHAPTER 10 Regression and Correlation In this Chapter we assess the strength of the linear relationship between two continuous variables. If a significant linear relationship is found, the next step would
More informationAnnouncements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)
Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall) We will cover Chs. 5 and 6 first, then 3 and 4. Mon,
More informationcor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )
Tutorial 7: Correlation and Regression Correlation Used to test whether two variables are linearly associated. A correlation coefficient (r) indicates the strength and direction of the association. A correlation
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationLet s see if we can predict whether a student returns or does not return to St. Ambrose for their second year.
Assignment #13: GLM Scenario: Over the past few years, our first-to-second year retention rate has ranged from 77-80%. In other words, 77-80% of our first-year students come back to St. Ambrose for their
More informationHierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!
Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter
More informationOne-Way ANOVA. Some examples of when ANOVA would be appropriate include:
One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement
More informationHomework Solutions Applied Logistic Regression
Homework Solutions Applied Logistic Regression WEEK 6 Exercise 1 From the ICU data, use as the outcome variable vital status (STA) and CPR prior to ICU admission (CPR) as a covariate. (a) Demonstrate that
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationIntroducing Generalized Linear Models: Logistic Regression
Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and
More informationDidacticiel - Études de cas. In this tutorial, we show how to use TANAGRA ( and higher) for measuring the association between ordinal variables.
Subject Association measures for ordinal variables. In this tutorial, we show how to use TANAGRA (1.4.19 and higher) for measuring the association between ordinal variables. All the measures that we present
More informationIn Class Review Exercises Vartanian: SW 540
In Class Review Exercises Vartanian: SW 540 1. Given the following output from an OLS model looking at income, what is the slope and intercept for those who are black and those who are not black? b SE
More information13.1 Categorical Data and the Multinomial Experiment
Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)
More informationValue Added Modeling
Value Added Modeling Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Background for VAMs Recall from previous lectures
More informationRon Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)
Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationCLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition
CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition Ad Feelders Universiteit Utrecht Department of Information and Computing Sciences Algorithmic Data
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More informationA Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 7 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, Colonic
More informationTesting Independence
Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1
More informationInterpreting and using heterogeneous choice & generalized ordered logit models
Interpreting and using heterogeneous choice & generalized ordered logit models Richard Williams Department of Sociology University of Notre Dame July 2006 http://www.nd.edu/~rwilliam/ The gologit/gologit2
More informationSection 5: Dummy Variables and Interactions
Section 5: Dummy Variables and Interactions Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Example: Detecting
More informationChapter 19 Sir Migo Mendoza
The Linear Regression Chapter 19 Sir Migo Mendoza Linear Regression and the Line of Best Fit Lesson 19.1 Sir Migo Mendoza Question: Once we have a Linear Relationship, what can we do with it? Something
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More informationGeneralized logit models for nominal multinomial responses. Local odds ratios
Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π
More informationADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables
ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More information(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)
3. Descriptive Statistics Describing data with tables and graphs (quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables) Bivariate descriptions
More informationMultiple Regression: Chapter 13. July 24, 2015
Multiple Regression: Chapter 13 July 24, 2015 Multiple Regression (MR) Response Variable: Y - only one response variable (quantitative) Several Predictor Variables: X 1, X 2, X 3,..., X p (p = # predictors)
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks
More informationEPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7
Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review
More informationPaper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD
Paper: ST-161 Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop Institute @ UMBC, Baltimore, MD ABSTRACT SAS has many tools that can be used for data analysis. From Freqs
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationLecture 24: Partial correlation, multiple regression, and correlation
Lecture 24: Partial correlation, multiple regression, and correlation Ernesto F. L. Amaral November 21, 2017 Advanced Methods of Social Research (SOCI 420) Source: Healey, Joseph F. 2015. Statistics: A
More informationLogistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression
Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024
More informationMcGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper
Student Name: ID: McGill University Faculty of Science Department of Mathematics and Statistics Statistics Part A Comprehensive Exam Methodology Paper Date: Friday, May 13, 2016 Time: 13:00 17:00 Instructions
More informationTento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/
Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.
More informationPassing-Bablok Regression for Method Comparison
Chapter 313 Passing-Bablok Regression for Method Comparison Introduction Passing-Bablok regression for method comparison is a robust, nonparametric method for fitting a straight line to two-dimensional
More informationSociology 362 Data Exercise 6 Logistic Regression 2
Sociology 362 Data Exercise 6 Logistic Regression 2 The questions below refer to the data and output beginning on the next page. Although the raw data are given there, you do not have to do any Stata runs
More informationRegression with Qualitative Information. Part VI. Regression with Qualitative Information
Part VI Regression with Qualitative Information As of Oct 17, 2017 1 Regression with Qualitative Information Single Dummy Independent Variable Multiple Categories Ordinal Information Interaction Involving
More informationPsych 230. Psychological Measurement and Statistics
Psych 230 Psychological Measurement and Statistics Pedro Wolf December 9, 2009 This Time. Non-Parametric statistics Chi-Square test One-way Two-way Statistical Testing 1. Decide which test to use 2. State
More informationQuestion 1 carries a weight of 25%; Question 2 carries 20%; Question 3 carries 20%; Question 4 carries 35%.
UNIVERSITY OF EAST ANGLIA School of Economics Main Series PGT Examination 017-18 ECONOMETRIC METHODS ECO-7000A Time allowed: hours Answer ALL FOUR Questions. Question 1 carries a weight of 5%; Question
More informationx3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators
Multiple Regression Relating a response (dependent, input) y to a set of explanatory (independent, output, predictor) variables x, x 2, x 3,, x q. A technique for modeling the relationship between variables.
More informationFrequency Distribution Cross-Tabulation
Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationAnalysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationBinary Dependent Variables
Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous Binary Dependent Variables In some cases the outcome
More informationLogistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy
Logistic Regression Some slides from Craig Burkett STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy Titanic Survival Case Study The RMS Titanic A British passenger liner Collided
More informationAP Statistics L I N E A R R E G R E S S I O N C H A P 7
AP Statistics 1 L I N E A R R E G R E S S I O N C H A P 7 The object [of statistics] is to discover methods of condensing information concerning large groups of allied facts into brief and compendious
More informationNon-Gaussian Response Variables
Non-Gaussian Response Variables What is the Generalized Model Doing? The fixed effects are like the factors in a traditional analysis of variance or linear model The random effects are different A generalized
More informationSTAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression
STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test
More informationMixed models in R using the lme4 package Part 7: Generalized linear mixed models
Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team University of
More informationStatistical methods for Education Economics
Statistical methods for Education Economics Massimiliano Bratti http://www.economia.unimi.it/bratti Course of Education Economics Faculty of Political Sciences, University of Milan Academic Year 2007-08
More informationStat 8053, Fall 2013: Multinomial Logistic Models
Stat 8053, Fall 2013: Multinomial Logistic Models Here is the example on page 269 of Agresti on food preference of alligators: s is size class, g is sex of the alligator, l is name of the lake, and f is
More informationCorrelation & Simple Regression
Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More information26 Chapter 4 Classification
26 Chapter 4 Classification The preceding tree cannot be simplified. 2. Consider the training examples shown in Table 4.1 for a binary classification problem. Table 4.1. Data set for Exercise 2. Customer
More informationLab # 11: Correlation and Model Fitting
Lab # 11: Correlation and Model Fitting Objectives: 1. Correlations between variables 2. Data Manipulation, creation of squares 3. Model fitting with regression 4. Comparison of models Correlations between
More informationFinal Exam - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your
More informationChapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a
Chapter 9 Regression with a Binary Dependent Variable Multiple Choice ) The binary dependent variable model is an example of a a. regression model, which has as a regressor, among others, a binary variable.
More informationPractice exam questions
Practice exam questions Nathaniel Higgins nhiggins@jhu.edu, nhiggins@ers.usda.gov 1. The following question is based on the model y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + u. Discuss the following two hypotheses.
More informationMath 138 Summer Section 412- Unit Test 1 Green Form, page 1 of 7
Math 138 Summer 1 2013 Section 412- Unit Test 1 Green Form page 1 of 7 1. Multiple Choice. Please circle your answer. Each question is worth 3 points. (a) Social Security Numbers are illustrations of which
More informationlme4 Luke Chang Last Revised July 16, Fitting Linear Mixed Models with a Varying Intercept
lme4 Luke Chang Last Revised July 16, 2010 1 Using lme4 1.1 Fitting Linear Mixed Models with a Varying Intercept We will now work through the same Ultimatum Game example from the regression section and
More informationCorrelation and Regression (Excel 2007)
Correlation and Regression (Excel 2007) (See Also Scatterplots, Regression Lines, and Time Series Charts With Excel 2007 for instructions on making a scatterplot of the data and an alternate method of
More informationFREC 608 Guided Exercise 9
FREC 608 Guided Eercise 9 Problem. Model of Average Annual Precipitation An article in Geography (July 980) used regression to predict average annual rainfall levels in California. Data on the following
More informationSoc 63993, Homework #7 Answer Key: Nonlinear effects/ Intro to path analysis
Soc 63993, Homework #7 Answer Key: Nonlinear effects/ Intro to path analysis Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Problem 1. The files
More informationECON 497 Midterm Spring
ECON 497 Midterm Spring 2009 1 ECON 497: Economic Research and Forecasting Name: Spring 2009 Bellas Midterm You have three hours and twenty minutes to complete this exam. Answer all questions and explain
More informationSociology 593 Exam 2 March 28, 2002
Sociology 59 Exam March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably means that
More information