Generalized Linear Models

Size: px
Start display at page:

Download "Generalized Linear Models"

Transcription

1 Generalized Linear Models Summer School Manchester University July 2 6, 2018 Generalized Linear Models: a generic approach to statistical modelling Graeme.Hutcheson@manchester.ac.uk University of Manchester

2 The slides and R-files for this session are available on the course website... Lecture Slides: Manchester/2018Manchester04GLM.pdf R-notebook: Manchester/2018Manchester04.Rmd...or from the course DVD.

3 This course uses a system of analysis that represents research questions in the form of equations. For example...

4 This course uses a system of analysis that represents research questions in the form of equations. For example... mathematics test score gender

5 This course uses a system of analysis that represents research questions in the form of equations. For example... mathematics test score gender success (yes/no) age

6 This course uses a system of analysis that represents research questions in the form of equations. For example... mathematics test score gender success (yes/no) age salary gender + age + ethnicity

7 This course uses a system of analysis that represents research questions in the form of equations. For example... mathematics test score gender success (yes/no) age salary gender + age + ethnicity number of arrests age*gender

8 This course uses a system of analysis that represents research questions in the form of equations. For example... mathematics test score gender success (yes/no) age salary gender + age + ethnicity number of arrests age*gender Representing research questions in this way explicitly identifies the relationships to be tested, the structure of the data and how the model is entered into the analysis programme.

9 The formulation of the research question in equation format is also useful as it is the same representation as that used by the Generalized Linear Model (GLM); a statistical model that may be applied to a range of analytical problems.

10 The formulation of the research question in equation format is also useful as it is the same representation as that used by the Generalized Linear Model (GLM); a statistical model that may be applied to a range of analytical problems. This lecture provides a relatively non-technical introduction to the GLM. Those looking for more detailed treatments are advised to look at the following texts... McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models (2nd edition). Chapman & Hall/CRC. Hutcheson, G. D. and Sofroniou, N. (1999). The Multivariate Social Scientist: Introductory statistics using generalized linear models. Sage Publications.

11 The Generalized Linear Model In it s simplest form, the GLM is a statistical technique that predicts a single variable (the response variable), using one or more other variables (the explanatory variables).

12 The Generalized Linear Model In it s simplest form, the GLM is a statistical technique that predicts a single variable (the response variable), using one or more other variables (the explanatory variables). The response and explanatory variables (also known as the random and systematic components of the model) are linked ( ) according to a function that takes account of the measurement scale of the response variable.

13 The Generalized Linear Model In it s simplest form, the GLM is a statistical technique that predicts a single variable (the response variable), using one or more other variables (the explanatory variables). The response and explanatory variables (also known as the random and systematic components of the model) are linked ( ) according to a function that takes account of the measurement scale of the response variable. response variable explanatory variables

14 The Generalized Linear Model In it s simplest form, the GLM is a statistical technique that predicts a single variable (the response variable), using one or more other variables (the explanatory variables). The response and explanatory variables (also known as the random and systematic components of the model) are linked ( ) according to a function that takes account of the measurement scale of the response variable. response variable explanatory variables There are many link functions that are available for GLM models to take account of the different ways in which the random component (the variable that is being predicted) is distributed (eg. as a number, category, count, skewed, etc.). This course introduces three links that enable continuous, categorical and count response variables to be modelled.

15 To model a continuous response variable, an identity link is used. To model a count response variable, a log link is used. To model a categorical variable, a logit link is used.

16 To model a continuous response variable, an identity link is used. To model a count response variable, a log link is used. To model a categorical variable, a logit link is used. response research link linear variable equation function model continuous Y X identity Y = X count Y X log log(y ) = X categorical Y X logit logit(y ) = X

17 To model a continuous response variable, an identity link is used. To model a count response variable, a log link is used. To model a categorical variable, a logit link is used. response research link linear variable equation function model continuous Y X identity Y = X count Y X log log(y ) = X categorical Y X logit logit(y ) = X These will be explained in detail in later sessions. It is enough at this point to just realise that different response variables can be modelled by changing the link function.

18 In practice, if we know the measurement scale of the variable being predicted, we can identify an appropriate GLM technique to use... If Y is continuous: OLS regression.

19 In practice, if we know the measurement scale of the variable being predicted, we can identify an appropriate GLM technique to use... If Y is continuous: OLS regression. If Y is a count: Poisson regression.

20 In practice, if we know the measurement scale of the variable being predicted, we can identify an appropriate GLM technique to use... If Y is continuous: OLS regression. If Y is a count: Poisson regression. If Y is ordered categorical: Proportional-odds regression.

21 In practice, if we know the measurement scale of the variable being predicted, we can identify an appropriate GLM technique to use... If Y is continuous: OLS regression. If Y is a count: Poisson regression. If Y is ordered categorical: Proportional-odds regression. If Y is unordered categorical: Multinomial regression.

22 In practice, if we know the measurement scale of the variable being predicted, we can identify an appropriate GLM technique to use... If Y is continuous: OLS regression. If Y is a count: Poisson regression. If Y is ordered categorical: Proportional-odds regression. If Y is unordered categorical: Multinomial regression.

23 In practice, if we know the measurement scale of the variable being predicted, we can identify an appropriate GLM technique to use... If Y is continuous: OLS regression. If Y is a count: Poisson regression. If Y is ordered categorical: Proportional-odds regression. If Y is unordered categorical: Multinomial regression. GLM models are particularly powerful, as they are all conceptually very similar. Learning to apply and interpret results from one technique greatly helps in applying and interpreting results from the others.

24 GLM a statistical representation Up till now models have been represented using variable names. This way of looking at models is very useful as it is the way that models are conceptualised and input into the statistical software.

25 GLM a statistical representation Up till now models have been represented using variable names. This way of looking at models is very useful as it is the way that models are conceptualised and input into the statistical software. It is useful, however, to also represent models using a more detailed statistical representation; one that corresponds to the way that the results are produced and reported.

26 GLM a statistical representation Up till now models have been represented using variable names. This way of looking at models is very useful as it is the way that models are conceptualised and input into the statistical software. It is useful, however, to also represent models using a more detailed statistical representation; one that corresponds to the way that the results are produced and reported. The statistical representation simply includes some parameters that quantify the relationships in the data. We not only want to know that Y and X are related, we also want to know HOW they are related. ie. When X changes by a set amount, what is the effect on Y?

27 GLM the parameters A conceptual model of ice cream consumption (from the IceCream dataset) is... consumption temperature

28 GLM the parameters A conceptual model of ice cream consumption (from the IceCream dataset) is... consumption temperature which is represented statistically as... consumption = β 0 + β 1 temperature where

29 GLM the parameters A conceptual model of ice cream consumption (from the IceCream dataset) is... consumption temperature which is represented statistically as... consumption = β 0 + β 1 temperature where β 0 estimates consumption when temperature is zero.

30 GLM the parameters A conceptual model of ice cream consumption (from the IceCream dataset) is... consumption temperature which is represented statistically as... consumption = β 0 + β 1 temperature where β 0 estimates consumption when temperature is zero. β 1 estimates the change in consumption for a unit increase in temperature.

31 The most interesting statistic for us is the parameter that estimates the relationship between temperature and consumption ; the parameter β 1. The formal description of this parameter is...

32 The most interesting statistic for us is the parameter that estimates the relationship between temperature and consumption ; the parameter β 1. The formal description of this parameter is... For a unit increase in X, the estimated change in Y is β 1.

33 The most interesting statistic for us is the parameter that estimates the relationship between temperature and consumption ; the parameter β 1. The formal description of this parameter is... For a unit increase in X, the estimated change in Y is β 1. For a unit increase in temperature, the estimated change in consumption is β 1.

34 The most interesting statistic for us is the parameter that estimates the relationship between temperature and consumption ; the parameter β 1. The formal description of this parameter is... For a unit increase in X, the estimated change in Y is β 1. For a unit increase in temperature, the estimated change in consumption is β 1. The values for the parameters are obtained from the statistical output for the model... β 0 = and β 1 =

35 The following output was obtained by putting the model consumption temperature into the Rcmdr GLM menu (as consumption is continuous, select the identity link from the Gaussian family)... glm(formula = Consumption ~ Temperature, family = gaussian(identity), data = IceCream) Coefficients: Estimate Std. Error (Intercept) Temperature

36 The following output was obtained by putting the model consumption temperature into the Rcmdr GLM menu (as consumption is continuous, select the identity link from the Gaussian family)... glm(formula = Consumption ~ Temperature, family = gaussian(identity), data = IceCream) Coefficients: Estimate Std. Error (Intercept) Temperature These parameters are also represented in the effect display... (use the Rcmdr menu options Models, graphs, effect plot...).

37 Temperature effect plot Consumption Temperature

38 Consumption Temperature effect plot computing β 1 As temperature increases by 40 (30 to 70), consumption increases by (0.3 to 0.425). For a unit increase in temperature, consumption increases by ( ) Temperature

39 Consumption Temperature effect plot Temperature computing β 1 As temperature increases by 40 (30 to 70), consumption increases by (0.3 to 0.425). For a unit increase in temperature, consumption increases by ( ). computing β 0 The value of consumption when temperature = 0. A simple estimation from the graph shows β 0 = 0.2.

40 Categorical explanatory variables... It is useful at this stage to look at categorical explanatory variables. A detailed description of analysing categorical explanatory variables is available in... Hutcheson, G. D. (2011). Categorical Explanatory Variables. Journal of Modelling in Management. 6, 2: research-training.net/reading/jmm6(2)contrastcodingupdated.pdf

41 Categorical explanatory variables... It is useful at this stage to look at categorical explanatory variables. A detailed description of analysing categorical explanatory variables is available in... Hutcheson, G. D. (2011). Categorical Explanatory Variables. Journal of Modelling in Management. 6, 2: research-training.net/reading/jmm6(2)contrastcodingupdated.pdf At a basic level, categorical variables are divided into a number of binary comparisons. Each category is then compared to a specific category (the reference).

42 Categorical explanatory variables... It is useful at this stage to look at categorical explanatory variables. A detailed description of analysing categorical explanatory variables is available in... Hutcheson, G. D. (2011). Categorical Explanatory Variables. Journal of Modelling in Management. 6, 2: research-training.net/reading/jmm6(2)contrastcodingupdated.pdf At a basic level, categorical variables are divided into a number of binary comparisons. Each category is then compared to a specific category (the reference). The following slide shows a model of examination score and whether this is related to the school a child is at. school is an unordered categorical variable with three categories (schoola, schoolb and schoolc).

43 The conceptual model examination score school

44 The conceptual model examination score school is represented statistically as score = β 0 + β 1 schoolb + β 2 schoolc where

45 The conceptual model examination score school is represented statistically as score = β 0 + β 1 schoolb + β 2 schoolc where β 0 estimates score when schoolb and schoolc are both zero (this is equivalent to the score for schoola).

46 The conceptual model examination score school is represented statistically as score = β 0 + β 1 schoolb + β 2 schoolc where β 0 estimates score when schoolb and schoolc are both zero (this is equivalent to the score for schoola). β 1 estimates the change in score for schoolb compared to schoola.

47 The conceptual model examination score school is represented statistically as score = β 0 + β 1 schoolb + β 2 schoolc where β 0 estimates score when schoolb and schoolc are both zero (this is equivalent to the score for schoola). β 1 estimates the change in score for schoolb compared to schoola. β 2 estimates the change in score for schoolc compared to schoola.

48 The following output from the schools.csv dataset was obtained by putting the model score school into the Rcmdr GLM menu (as consumption is continuous, identify the identity link)... glm(formula = SCORE ~ SCHOOL, family = gaussian(identity), data = schools) Coefficients: Estimate Std. Error (Intercept) SCHOOL[T.schoolB] SCHOOL[T.schoolC]

49 The following output from the schools.csv dataset was obtained by putting the model score school into the Rcmdr GLM menu (as consumption is continuous, identify the identity link)... glm(formula = SCORE ~ SCHOOL, family = gaussian(identity), data = schools) Coefficients: Estimate Std. Error (Intercept) SCHOOL[T.schoolB] SCHOOL[T.schoolC] The predicted score at schoola is SchoolB is 7.4 points higher (70.6) and schoolc is 7.6 points lower (56.6).

50 This result can be easily seen in the effect display that accompanies the model SCORE schoola schoolb schoolc SCHOOL

51 This result can be easily seen in the effect display that accompanies the model β 0 = 63.2 SCORE The value of score for schoola (the reference category) schoola schoolb schoolc SCHOOL

52 This result can be easily seen in the effect display that accompanies the model β 0 = 63.2 SCORE The value of score for schoola (the reference category). β 1 = schoolb is 7.4 units higher than schoola. 50 schoola schoolb schoolc SCHOOL

53 This result can be easily seen in the effect display that accompanies the model β 0 = 63.2 SCORE The value of score for schoola (the reference category). β 1 = schoolb is 7.4 units higher than schoola. 50 schoola schoolb schoolc SCHOOL β 2 = 56.6 schoolc is 7.6 units lower than schoola.

54 interpreting parameters for other GLMs The interpretation of the regression coefficients is essentially the same for all GLMs. For example, in order to model the number of checks a suspect appears on and whether this is related to a person s age (these data are from the Arrests dataset from the effects package)... number of checks age

55 interpreting parameters for other GLMs The interpretation of the regression coefficients is essentially the same for all GLMs. For example, in order to model the number of checks a suspect appears on and whether this is related to a person s age (these data are from the Arrests dataset from the effects package)... number of checks age as the number of checks is a count variable, a log-link should be used. The statistical model for this is... log (checks) = β 0 + β 1 age

56 interpreting parameters for other GLMs The interpretation of the regression coefficients is essentially the same for all GLMs. For example, in order to model the number of checks a suspect appears on and whether this is related to a person s age (these data are from the Arrests dataset from the effects package)... number of checks age as the number of checks is a count variable, a log-link should be used. The statistical model for this is... log (checks) = β 0 + β 1 age The parameter β 0 indicates the value of log(checks) when age equals zero. The parameter β 1 indicates the change in log(checks) for a unit change in age.

57 The following output was obtained by putting the model checks age into the Rcmdr GLM menu (as checks is count, identify the log link from the Poisson family)... glm(formula = checks ~ age, family = poisson(log), data = Arrests) Coefficients: Estimate Std. Error (Intercept) age

58 The following output was obtained by putting the model checks age into the Rcmdr GLM menu (as checks is count, identify the log link from the Poisson family)... glm(formula = checks ~ age, family = poisson(log), data = Arrests) Coefficients: Estimate Std. Error (Intercept) age When age = 0, log(count) = For a unit increase in age, log(count) changes by

59 This result can be easily seen in the effect display that accompanies the model... The follow graph shows the relationship between age and log(count) checks age

60 This result can be easily seen in the effect display that accompanies the model... The follow graph shows the relationship between age and log(count) β 0 = (see above) A simple estimation from the graph shows that when age =0, log(checks)=0.15. checks age

61 This result can be easily seen in the effect display that accompanies the model... The follow graph shows the relationship between age and log(count)... β 0 = (see above) A simple estimation from the graph shows that when age =0, log(checks)=0.15. checks age β 1 = (see above) As age increases from 20 to 60 (increase of 40), log(checks) increases from 0.43 to 1 (increase of 0.57). For each unit increase in age, log(checks) increases by 0.57 =

62 The model parameters for GLM models have an easy interpretation; one that is essentially the same for all GLM models.

63 The model parameters for GLM models have an easy interpretation; one that is essentially the same for all GLM models. The parameters can be interpreted from the standard statistical output shown in the tables, or from the effect displays.

64 Assessing significance...

65 Assessing significance for GLMs In order to interpret the models, it is useful to know the significance associated with each of the parameter estimates. Although consumption may increase by for each unit increase in temperature ; there is no indication whether this increase may have been due to chance.

66 Assessing significance for GLMs In order to interpret the models, it is useful to know the significance associated with each of the parameter estimates. Although consumption may increase by for each unit increase in temperature ; there is no indication whether this increase may have been due to chance. In addition to the parameter estimates, we usually also want to know which parameters, or groups of parameters are significant.

67 Assessing significance for GLMs In order to interpret the models, it is useful to know the significance associated with each of the parameter estimates. Although consumption may increase by for each unit increase in temperature ; there is no indication whether this increase may have been due to chance. In addition to the parameter estimates, we usually also want to know which parameters, or groups of parameters are significant. This is where the GLM models excel, as they employ a simple method for determining significance; one that applies generally to all GLMs.

68 Deviance Significance is assessed in GLMs using a common method based on a statistic known as the deviance.

69 Deviance Significance is assessed in GLMs using a common method based on a statistic known as the deviance. The deviance is simply a measure of the difference between the values predicted by the model and the actual values (predicted values compared to the observed data). If the model provides a good prediction of the response variable, the deviance will be relatively small. If the model does not provide accurate predictions of the response variable, the deviance will be relatively large.

70 Deviance Significance is assessed in GLMs using a common method based on a statistic known as the deviance. The deviance is simply a measure of the difference between the values predicted by the model and the actual values (predicted values compared to the observed data). If the model provides a good prediction of the response variable, the deviance will be relatively small. If the model does not provide accurate predictions of the response variable, the deviance will be relatively large. The deviance basically gives an indication of how well the model fits the data.

71 Using the deviance, it is easy to determine the significance of individual and/or groups of variables by comparing nested models. For example, if we wanted to assess whether temperature is a significant predictor of ice cream consumption, we can compare a model of consumption that includes temperature, with a model that does not. consumption = β 0 + β 1 temperature deviance = consumption = β 0 deviance = 0.126

72 Using the deviance, it is easy to determine the significance of individual and/or groups of variables by comparing nested models. For example, if we wanted to assess whether temperature is a significant predictor of ice cream consumption, we can compare a model of consumption that includes temperature, with a model that does not. consumption = β 0 + β 1 temperature deviance = consumption = β 0 deviance = The effect that temperature has had on the model is to reduce the deviance by ( ). All of this information is given in the analysis output...

73 Using the deviance, it is easy to determine the significance of individual and/or groups of variables by comparing nested models. For example, if we wanted to assess whether temperature is a significant predictor of ice cream consumption, we can compare a model of consumption that includes temperature, with a model that does not. consumption = β 0 + β 1 temperature deviance = consumption = β 0 deviance = The effect that temperature has had on the model is to reduce the deviance by ( ). All of this information is given in the analysis output... The Null deviance is the deviance in the response variable without taking into account any other information. The Residual deviance is the deviance in the model that includes the explanatory variables.

74 glm(formula = Consumption ~ Temperature, family = gaussian(identity), data = IceCream) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-09 Temperature e-07 Null deviance: on 29 degrees of freedom Residual deviance: on 28 degrees of freedom Analysis of Deviance Table (Type II tests) Response: Consumption SS Df F Pr(>F) Temperature e-07 Residuals

75 The model above was obtained by running a GLM on the IceCream data and then requesting an analysis of deviance table. These commands can be issued via the Rcmdr menus, or direct to a script file using the commands... GLM.1 <- glm(consumption ~ Temperature, family=gaussian(identity), data=icecream) summary(glm.1) Anova(GLM.1, type="ii", test="f")

76 The model above was obtained by running a GLM on the IceCream data and then requesting an analysis of deviance table. These commands can be issued via the Rcmdr menus, or direct to a script file using the commands... GLM.1 <- glm(consumption ~ Temperature, family=gaussian(identity), data=icecream) summary(glm.1) Anova(GLM.1, type="ii", test="f") The change in deviance associated with removing temperature from the model is assessed for significance using the F-test. A detailed description of significance tests and deviance is provided in... Hutcheson, G. D. and Moutinho, L. (2008). Statistical Modelling for Management. Sage Publications.

77 Temperature effect plot Consumption Temperature

78 Consumption Temperature effect plot Temperature significance of temperature a visual indication of the significance of temperature can be seen in the effect display. It is easy to see from the line and associated confidence intervals shown in the graph (the shaded area of the plot) that predictions of consumption are different as temperature changes. Information about temperature is, therefore, important for predicting consumption.

79 The significance of temperature in the model can be manually calculated by comparing the nested models... model01: consumption = β 0 model02: consumption = β 0 + β 1 temperature This can be easily achieved using the Rcmdr Models, Hypothesis tests, Compare two models... menu option.

80

81 The output clearly shows which models are being compared. Note that the statistics for Temperature are the same as those provided previously... Rcmdr> anova(model01, model02, test="f") Analysis of Deviance Table Model 1: Consumption ~ 1 Model 2: Consumption ~ Temperature Resid. Df Resid. Dev Df Deviance F Pr(>F)

82 The same underlying theory may be applied to testing categorical explanatory variables and different response variables. The following example shows a binary response variable being predicted by a categorical and a continuous explanatory variable. These data are from the Arrests dataset in the effects package (to reproduce these results, don t forget to change the variable year to a categorical variable yearcat ).

83 The same underlying theory may be applied to testing categorical explanatory variables and different response variables. The following example shows a binary response variable being predicted by a categorical and a continuous explanatory variable. These data are from the Arrests dataset in the effects package (to reproduce these results, don t forget to change the variable year to a categorical variable yearcat ). The model we will be investigating is... released yearcat + age

84 The same underlying theory may be applied to testing categorical explanatory variables and different response variables. The following example shows a binary response variable being predicted by a categorical and a continuous explanatory variable. These data are from the Arrests dataset in the effects package (to reproduce these results, don t forget to change the variable year to a categorical variable yearcat ). The model we will be investigating is... released yearcat + age which is represented statistically as... logit(released) = β 0 + β 1 year β 2 year β 3 year β 4 year β 5 year β 6 age

85 or represented more succinctly as... logit(released) = β y=5 β y(1 5) yearcat + β 6 age

86 or represented more succinctly as... logit(released) = β y=5 β y(1 5) yearcat + β 6 age The categorical variable yearcat is represented using 5 binary comparisons; each year is compared to the reference category.

87 or represented more succinctly as... logit(released) = β y=5 β y(1 5) yearcat + β 6 age The categorical variable yearcat is represented using 5 binary comparisons; each year is compared to the reference category. A GLM (using the logit link) is shown below...

88 glm(formula = released ~ age + yearcat, family = binomial(logit), data = Arrests) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) < 2e-16 age yearcat[t.1998] yearcat[t.1999] yearcat[t.2000] e-06 yearcat[t.2001] yearcat[t.2002] Null deviance: on 5225 degrees of freedom Residual deviance: on 5219 degrees of freedom

89 The logit model of released shows the deviance scores at the bottom of the tabular output. The Null deviance tells us how much deviance there is in the variable released (an empty model) and the Residual deviance tells us how much deviance there is in the model with all the explanatory variables included.

90 The logit model of released shows the deviance scores at the bottom of the tabular output. The Null deviance tells us how much deviance there is in the variable released (an empty model) and the Residual deviance tells us how much deviance there is in the model with all the explanatory variables included. In order to find out if each variable is significant, we need to look at the analysis of deviance table... Analysis of Deviance Table (Type II tests) Response: released LR Chisq Df Pr(>Chisq) age ** yearcat ***

91 From the Analysis of Deviance table, we can see that including yearcat in the model results in a reduction in deviance of This is the result of comparing the deviance of the following models... and logit(released) = β 0 + β 1 year β 2 year β 3 year β 4 year β 5 year β 6 age logit(released) = β 0 + β 6 age

92 From the Analysis of Deviance table, we can see that including yearcat in the model results in a reduction in deviance of This is the result of comparing the deviance of the following models... and logit(released) = β 0 + β 1 year β 2 year β 3 year β 4 year β 5 year β 6 age logit(released) = β 0 + β 6 age This change in deviance is significant and provides evidence that the variable yearcat may be influential in predicting whether or not someone is released.

93 These models can be compared using the Rcmdr Models, Hypothesis tests, Compare two models... menu option. Analysis of Deviance Table Model 1: released ~ age Model 2: released ~ yearcat + age Resid. Df Resid. Dev Df Deviance Pr(>Chi)

94 Including the variable age in the model results in a reduction in deviance of This is the result of removing 1 parameter from the model (df = 1). and logit(released) = β 0 + β 1 year β 2 year β 3 year β 4 year β 5 year β 6 age logit(released) = β 0 + β 1 year β 2 year β 3 year β 4 year β 5 year2002

95 Including the variable age in the model results in a reduction in deviance of This is the result of removing 1 parameter from the model (df = 1). and logit(released) = β 0 + β 1 year β 2 year β 3 year β 4 year β 5 year β 6 age logit(released) = β 0 + β 1 year β 2 year β 3 year β 4 year β 5 year2002 This change in deviance is significant and provides evidence that the variable age may be influential in predicting whether or not someone is released.

96 These models can be compared using the Rcmdr Models, Hypothesis tests, Compare two models... menu option. Analysis of Deviance Table Model 1: released ~ yearcat Model 2: released ~ yearcat + age Resid. Df Resid. Dev Df Deviance Pr(>Chi)

97 These results can also be inferred from the effect displays. Although these do not provide estimates of significance, the graphics provide enough information to identify the important relationships visually. yearcat effect plot age effect plot 0.88 released released yearcat age

98 The significance of each category to the prediction of the response variable in logit models is estimated using the z-distribution in the standard regression output. It should be noted that this is a large-sample approximation and the deviance statistic is preferable for assessing significance. Full information about this is available in... Hutcheson, G. D. and Moutinho, L. (2008). Statistical Modelling for Management. Sage Publications.

99 Type II and Type III ANOVA tests The choice of ANOVA test that is used to compare models is very important, particularly for models with interactions.

100 Type II and Type III ANOVA tests The choice of ANOVA test that is used to compare models is very important, particularly for models with interactions. Type III ANOVA tests are those that are computed for individual parameters and are provided in the standard tabular output.

101 Type II and Type III ANOVA tests The choice of ANOVA test that is used to compare models is very important, particularly for models with interactions. Type III ANOVA tests are those that are computed for individual parameters and are provided in the standard tabular output. Type II ANOVA tests are those that are computed for variables and are provided in the Analysis of Deviance table. The difference between the two types of test is very important and will be highlighted using examples later in the course. The difference between the two can be illustrated using the following model (from the ICEcream data) which contains an interaction... consumption temperature income

102 consumption temperature income Type III tests Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 1.216e e Temperature e e Income 7.459e e Temperature:Income 6.250e e Type II tests SS Df F Pr(>F) Temperature e-08 *** Income ** Temperature:Income

103 The significance of the variables temperature and Income are different depending on the type of test used. The type III test for temperature compares the models... consumption = β 0 + β 1 temperature + β 2 income + β 3 temperature:income and consumption = β 0 + β 2 income + β 3 temperature:income

104 The significance of the variables temperature and Income are different depending on the type of test used. The type III test for temperature compares the models... consumption = β 0 + β 1 temperature + β 2 income + β 3 temperature:income and consumption = β 0 + β 2 income + β 3 temperature:income whilst the type II test for temperature compares the models... consumption = β 0 + β 1 temperature + β 2 income and consumption = β 0 + β 2 income

105 The type III test does not provide an indication of the significance of the variable temperature. We need to compare a model that includes temperature with a model thatdoes not - BOTH of these modesl include temperature.

106 The type III test does not provide an indication of the significance of the variable temperature. We need to compare a model that includes temperature with a model thatdoes not - BOTH of these modesl include temperature. The type II test does compare models differentiated on the basis of temperature and are, therefore, more appropriate for assessing the effect of temperature.

107 Traditional tests and equivalent GLM models...

108 Traditional tests and equivalent GLM models The GLM models reproduce or replace many of the traditional tests. For example, tests for independent group designs...

109 Traditional tests and equivalent GLM models The GLM models reproduce or replace many of the traditional tests. For example, tests for independent group designs... Traditional Test GLM one independent variable t-test (unrelated) Mann-Whitney 1-way ANOVA (unrelated) Kruskal-Wallis Jonck-heere Trend chi-square (contingency table) etc., etc. Y X multiple independent variables complex selection of multi-way ANOVA models multi-way contingency tables (log-linear) Y X 1 + X 2

110 Traditional tests and equivalent GLM models... and tests for dependent (or matched) group designs... Traditional Test GLM one independent variable paired t-test Wilcoxon 1-way ANOVA (related) Friedman Pages L-trend etc., etc., Y subject + X multiple independent variables complex selection of multi-way ANOVA models multi-way contingency tables (log-linear) Y subject + X 1 + X 2

111 In order to realise the power of the GLMs, it is important to understand the equivalence of the traditional tests and the GLMs. Detailed information about this is provided in...

112 In order to realise the power of the GLMs, it is important to understand the equivalence of the traditional tests and the GLMs. Detailed information about this is provided in... Hutcheson, G. D. and Schaefer, L. (2012). Test selection in the 21st century. Journal of Modelling in Management, 7,3: http: //

113 The usefulness of the equation format for representing statistical models is evident when the analyses are run in the software.

114 The usefulness of the equation format for representing statistical models is evident when the analyses are run in the software. In order to select the appropriate GLM define equation

115 The usefulness of the equation format for representing statistical models is evident when the analyses are run in the software. In order to select the appropriate GLM define equation 2. Identify scale of measurement for the response variable

116 The usefulness of the equation format for representing statistical models is evident when the analyses are run in the software. In order to select the appropriate GLM define equation 2. Identify scale of measurement for the response variable 3. Select the appropriate model in the Rcmdr.

117 The usefulness of the equation format for representing statistical models is evident when the analyses are run in the software. In order to select the appropriate GLM define equation 2. Identify scale of measurement for the response variable 3. Select the appropriate model in the Rcmdr. Generalized linear model... for continuous and count responses

118 The usefulness of the equation format for representing statistical models is evident when the analyses are run in the software. In order to select the appropriate GLM define equation 2. Identify scale of measurement for the response variable 3. Select the appropriate model in the Rcmdr. Generalized linear model... for continuous and count responses Multinomial logit model... for unordered categorical responses

119 The usefulness of the equation format for representing statistical models is evident when the analyses are run in the software. In order to select the appropriate GLM define equation 2. Identify scale of measurement for the response variable 3. Select the appropriate model in the Rcmdr. Generalized linear model... for continuous and count responses Multinomial logit model... for unordered categorical responses Ordinal regression model... for ordered categorical responses

120

121 Then enter the equation...

122

123

124

125

126 Exercises...

127 Your research model is... salary age + qualification where salary and age are continuous and qualification is recorded in 4 ordered categories. 1. which link function is appropriate for this model? 2. How many parameters would you expect to see in the model for qualification? 3. which statistical technique would you use? 4. what is the statistical model (the linear model)?

128 Your research model is... A-level IQ + gender where A-level is recorded as 6 ordered categories, IQ is recorded as continuous, and gender is recorded as 2 unordered categories. 1. which link function is appropriate for this model? 2. How many parameters would you expect to see in the model for IQ? 3. which statistical technique would you use? 4. what is the statistical model (the linear model)?

129 Your research model is... traffic violations age gender where traffic violations is recorded as the number of recorded violations (0 to 7), age is recorded as continuous, and gender is recorded as 2 unordered categories. 1. which link function is appropriate for this model? 2. How many parameters would you expect to see in the model (including the intercept)? 3. which statistical technique would you use? 4. what is the statistical model (the linear model)? 5. Would you use a TYPE II or a TYPE III test to ascertain if age is significant?

130 Your research model is... Holiday destination age gender where Holiday destination is recorded as 6 unordered categories, age is recorded as continuous, and gender is recorded as 2 unordered categories. 1. which link function is appropriate for this model? 2. which statistical technique would you use? 3. Would you use a TYPE II or a TYPE III test to ascertain if gender is significant?

A course in statistical modelling. session 09: Modelling count variables

A course in statistical modelling. session 09: Modelling count variables A Course in Statistical Modelling SEED PGR methodology training December 08, 2015: 12 2pm session 09: Modelling count variables Graeme.Hutcheson@manchester.ac.uk blackboard: RSCH80000 SEED PGR Research

More information

A course in statistical modelling. session 06b: Modelling count data

A course in statistical modelling. session 06b: Modelling count data A Course in Statistical Modelling University of Glasgow 29 and 30 January, 2015 session 06b: Modelling count data Graeme Hutcheson 1 Luiz Moutinho 2 1 Manchester Institute of Education Manchester university

More information

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010 Statistical Models for Management Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon February 24 26, 2010 Graeme Hutcheson, University of Manchester GLM models and OLS regression The

More information

GLM models and OLS regression

GLM models and OLS regression GLM models and OLS regression Graeme Hutcheson, University of Manchester These lecture notes are based on material published in... Hutcheson, G. D. and Sofroniou, N. (1999). The Multivariate Social Scientist:

More information

Analysing categorical data using logit models

Analysing categorical data using logit models Analysing categorical data using logit models Graeme Hutcheson, University of Manchester The lecture notes, exercises and data sets associated with this course are available for download from: www.research-training.net/manchester

More information

BMI 541/699 Lecture 22

BMI 541/699 Lecture 22 BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 7 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, Colonic

More information

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response) Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/ Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.

More information

Generalised linear models. Response variable can take a number of different formats

Generalised linear models. Response variable can take a number of different formats Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion

More information

(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?)

(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?) 12. Comparing Groups: Analysis of Variance (ANOVA) Methods Response y Explanatory x var s Method Categorical Categorical Contingency tables (Ch. 8) (chi-squared, etc.) Quantitative Quantitative Regression

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and

More information

A discussion on multiple regression models

A discussion on multiple regression models A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion Biostokastikum Overdispersion is not uncommon in practice. In fact, some would maintain that overdispersion is the norm in practice and nominal dispersion the exception McCullagh and Nelder (1989) Overdispersion

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction ReCap. Parts I IV. The General Linear Model Part V. The Generalized Linear Model 16 Introduction 16.1 Analysis

More information

Wrap-up. The General Linear Model is a special case of the Generalized Linear Model. Consequently, we can carry out any GLM as a GzLM.

Wrap-up. The General Linear Model is a special case of the Generalized Linear Model. Consequently, we can carry out any GLM as a GzLM. Model Based Statistics in Biology. Part V. The Generalized Linear Model. Analysis of Continuous Data ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV (Ch13,

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

Analysis of Variance. Contents. 1 Analysis of Variance. 1.1 Review. Anthony Tanbakuchi Department of Mathematics Pima Community College

Analysis of Variance. Contents. 1 Analysis of Variance. 1.1 Review. Anthony Tanbakuchi Department of Mathematics Pima Community College Introductory Statistics Lectures Analysis of Variance 1-Way ANOVA: Many sample test of means Department of Mathematics Pima Community College Redistribution of this material is prohibited without written

More information

Poisson Regression. The Training Data

Poisson Regression. The Training Data The Training Data Poisson Regression Office workers at a large insurance company are randomly assigned to one of 3 computer use training programmes, and their number of calls to IT support during the following

More information

Cohen s s Kappa and Log-linear Models

Cohen s s Kappa and Log-linear Models Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance

More information

Exercise 5.4 Solution

Exercise 5.4 Solution Exercise 5.4 Solution Niels Richard Hansen University of Copenhagen May 7, 2010 1 5.4(a) > leukemia

More information

Types of Statistical Tests DR. MIKE MARRAPODI

Types of Statistical Tests DR. MIKE MARRAPODI Types of Statistical Tests DR. MIKE MARRAPODI Tests t tests ANOVA Correlation Regression Multivariate Techniques Non-parametric t tests One sample t test Independent t test Paired sample t test One sample

More information

R Hints for Chapter 10

R Hints for Chapter 10 R Hints for Chapter 10 The multiple logistic regression model assumes that the success probability p for a binomial random variable depends on independent variables or design variables x 1, x 2,, x k.

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

Week 7.1--IES 612-STA STA doc

Week 7.1--IES 612-STA STA doc Week 7.1--IES 612-STA 4-573-STA 4-576.doc IES 612/STA 4-576 Winter 2009 ANOVA MODELS model adequacy aka RESIDUAL ANALYSIS Numeric data samples from t populations obtained Assume Y ij ~ independent N(μ

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD Paper: ST-161 Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop Institute @ UMBC, Baltimore, MD ABSTRACT SAS has many tools that can be used for data analysis. From Freqs

More information

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. Preface p. xi Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. 6 The Scientific Method and the Design of

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

Non-Gaussian Response Variables

Non-Gaussian Response Variables Non-Gaussian Response Variables What is the Generalized Model Doing? The fixed effects are like the factors in a traditional analysis of variance or linear model The random effects are different A generalized

More information

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC CHI SQUARE ANALYSIS I N T R O D U C T I O N T O N O N - P A R A M E T R I C A N A L Y S E S HYPOTHESIS TESTS SO FAR We ve discussed One-sample t-test Dependent Sample t-tests Independent Samples t-tests

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

Stat 8053, Fall 2013: Multinomial Logistic Models

Stat 8053, Fall 2013: Multinomial Logistic Models Stat 8053, Fall 2013: Multinomial Logistic Models Here is the example on page 269 of Agresti on food preference of alligators: s is size class, g is sex of the alligator, l is name of the lake, and f is

More information

Review of the General Linear Model

Review of the General Linear Model Review of the General Linear Model EPSY 905: Multivariate Analysis Online Lecture #2 Learning Objectives Types of distributions: Ø Conditional distributions The General Linear Model Ø Regression Ø Analysis

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Turning a research question into a statistical question.

Turning a research question into a statistical question. Turning a research question into a statistical question. IGINAL QUESTION: Concept Concept Concept ABOUT ONE CONCEPT ABOUT RELATIONSHIPS BETWEEN CONCEPTS TYPE OF QUESTION: DESCRIBE what s going on? DECIDE

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

The Chi-Square Distributions

The Chi-Square Distributions MATH 03 The Chi-Square Distributions Dr. Neal, Spring 009 The chi-square distributions can be used in statistics to analyze the standard deviation of a normally distributed measurement and to test the

More information

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Prof. J. Taylor You may use your 4 single-sided pages of notes This exam is 7 pages long. There are 4 questions, first 3 worth 10

More information

1 Correlation and Inference from Regression

1 Correlation and Inference from Regression 1 Correlation and Inference from Regression Reading: Kennedy (1998) A Guide to Econometrics, Chapters 4 and 6 Maddala, G.S. (1992) Introduction to Econometrics p. 170-177 Moore and McCabe, chapter 12 is

More information

The Chi-Square Distributions

The Chi-Square Distributions MATH 183 The Chi-Square Distributions Dr. Neal, WKU The chi-square distributions can be used in statistics to analyze the standard deviation σ of a normally distributed measurement and to test the goodness

More information

Sleep data, two drugs Ch13.xls

Sleep data, two drugs Ch13.xls Model Based Statistics in Biology. Part IV. The General Linear Mixed Model.. Chapter 13.3 Fixed*Random Effects (Paired t-test) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch

More information

CDA Chapter 3 part II

CDA Chapter 3 part II CDA Chapter 3 part II Two-way tables with ordered classfications Let u 1 u 2... u I denote scores for the row variable X, and let ν 1 ν 2... ν J denote column Y scores. Consider the hypothesis H 0 : X

More information

Exam details. Final Review Session. Things to Review

Exam details. Final Review Session. Things to Review Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit

More information

Joseph O. Marker Marker Actuarial a Services, LLC and University of Michigan CLRS 2010 Meeting. J. Marker, LSMWP, CLRS 1

Joseph O. Marker Marker Actuarial a Services, LLC and University of Michigan CLRS 2010 Meeting. J. Marker, LSMWP, CLRS 1 Joseph O. Marker Marker Actuarial a Services, LLC and University of Michigan CLRS 2010 Meeting J. Marker, LSMWP, CLRS 1 Expected vs Actual Distribution Test distributions of: Number of claims (frequency)

More information

Generalized linear models

Generalized linear models Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

Extensions of One-Way ANOVA.

Extensions of One-Way ANOVA. Extensions of One-Way ANOVA http://www.pelagicos.net/classes_biometry_fa18.htm What do I want You to Know What are two main limitations of ANOVA? What two approaches can follow a significant ANOVA? How

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Handling Categorical Predictors: ANOVA

Handling Categorical Predictors: ANOVA Handling Categorical Predictors: ANOVA 1/33 I Hate Lines! When we think of experiments, we think of manipulating categories Control, Treatment 1, Treatment 2 Models with Categorical Predictors still reflect

More information

Extensions of One-Way ANOVA.

Extensions of One-Way ANOVA. Extensions of One-Way ANOVA http://www.pelagicos.net/classes_biometry_fa17.htm What do I want You to Know What are two main limitations of ANOVA? What two approaches can follow a significant ANOVA? How

More information

Simple Linear Regression: One Quantitative IV

Simple Linear Regression: One Quantitative IV Simple Linear Regression: One Quantitative IV Linear regression is frequently used to explain variation observed in a dependent variable (DV) with theoretically linked independent variables (IV). For example,

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Logistic Regressions. Stat 430

Logistic Regressions. Stat 430 Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

lme4 Luke Chang Last Revised July 16, Fitting Linear Mixed Models with a Varying Intercept

lme4 Luke Chang Last Revised July 16, Fitting Linear Mixed Models with a Varying Intercept lme4 Luke Chang Last Revised July 16, 2010 1 Using lme4 1.1 Fitting Linear Mixed Models with a Varying Intercept We will now work through the same Ultimatum Game example from the regression section and

More information

INFERENCE FOR REGRESSION

INFERENCE FOR REGRESSION CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We

More information

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00 Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Reaction Days

Reaction Days Stat April 03 Week Fitting Individual Trajectories # Straight-line, constant rate of change fit > sdat = subset(sleepstudy, Subject == "37") > sdat Reaction Days Subject > lm.sdat = lm(reaction ~ Days)

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

12 Modelling Binomial Response Data

12 Modelling Binomial Response Data c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x

More information

Poisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Poisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Poisson Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Poisson Regression 1 / 49 Poisson Regression 1 Introduction

More information

Chapter 4. Regression Models. Learning Objectives

Chapter 4. Regression Models. Learning Objectives Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

1 Introduction to Minitab

1 Introduction to Minitab 1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you

More information

A Re-Introduction to General Linear Models (GLM)

A Re-Introduction to General Linear Models (GLM) A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing

More information

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models Chapter 6 Multicategory Logit Models Response Y has J > 2 categories. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. 6.1 Logit Models for Nominal Responses

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

One-Way ANOVA. Some examples of when ANOVA would be appropriate include: One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Chapter 5: Logistic Regression-I

Chapter 5: Logistic Regression-I : Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Interactions in Logistic Regression

Interactions in Logistic Regression Interactions in Logistic Regression > # UCBAdmissions is a 3-D table: Gender by Dept by Admit > # Same data in another format: > # One col for Yes counts, another for No counts. > Berkeley = read.table("http://www.utstat.toronto.edu/~brunner/312f12/

More information