Generalized Linear Models
|
|
- Esmond Hunter
- 5 years ago
- Views:
Transcription
1 Generalized Linear Models Summer School Manchester University July 2 6, 2018 Generalized Linear Models: a generic approach to statistical modelling Graeme.Hutcheson@manchester.ac.uk University of Manchester
2 The slides and R-files for this session are available on the course website... Lecture Slides: Manchester/2018Manchester04GLM.pdf R-notebook: Manchester/2018Manchester04.Rmd...or from the course DVD.
3 This course uses a system of analysis that represents research questions in the form of equations. For example...
4 This course uses a system of analysis that represents research questions in the form of equations. For example... mathematics test score gender
5 This course uses a system of analysis that represents research questions in the form of equations. For example... mathematics test score gender success (yes/no) age
6 This course uses a system of analysis that represents research questions in the form of equations. For example... mathematics test score gender success (yes/no) age salary gender + age + ethnicity
7 This course uses a system of analysis that represents research questions in the form of equations. For example... mathematics test score gender success (yes/no) age salary gender + age + ethnicity number of arrests age*gender
8 This course uses a system of analysis that represents research questions in the form of equations. For example... mathematics test score gender success (yes/no) age salary gender + age + ethnicity number of arrests age*gender Representing research questions in this way explicitly identifies the relationships to be tested, the structure of the data and how the model is entered into the analysis programme.
9 The formulation of the research question in equation format is also useful as it is the same representation as that used by the Generalized Linear Model (GLM); a statistical model that may be applied to a range of analytical problems.
10 The formulation of the research question in equation format is also useful as it is the same representation as that used by the Generalized Linear Model (GLM); a statistical model that may be applied to a range of analytical problems. This lecture provides a relatively non-technical introduction to the GLM. Those looking for more detailed treatments are advised to look at the following texts... McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models (2nd edition). Chapman & Hall/CRC. Hutcheson, G. D. and Sofroniou, N. (1999). The Multivariate Social Scientist: Introductory statistics using generalized linear models. Sage Publications.
11 The Generalized Linear Model In it s simplest form, the GLM is a statistical technique that predicts a single variable (the response variable), using one or more other variables (the explanatory variables).
12 The Generalized Linear Model In it s simplest form, the GLM is a statistical technique that predicts a single variable (the response variable), using one or more other variables (the explanatory variables). The response and explanatory variables (also known as the random and systematic components of the model) are linked ( ) according to a function that takes account of the measurement scale of the response variable.
13 The Generalized Linear Model In it s simplest form, the GLM is a statistical technique that predicts a single variable (the response variable), using one or more other variables (the explanatory variables). The response and explanatory variables (also known as the random and systematic components of the model) are linked ( ) according to a function that takes account of the measurement scale of the response variable. response variable explanatory variables
14 The Generalized Linear Model In it s simplest form, the GLM is a statistical technique that predicts a single variable (the response variable), using one or more other variables (the explanatory variables). The response and explanatory variables (also known as the random and systematic components of the model) are linked ( ) according to a function that takes account of the measurement scale of the response variable. response variable explanatory variables There are many link functions that are available for GLM models to take account of the different ways in which the random component (the variable that is being predicted) is distributed (eg. as a number, category, count, skewed, etc.). This course introduces three links that enable continuous, categorical and count response variables to be modelled.
15 To model a continuous response variable, an identity link is used. To model a count response variable, a log link is used. To model a categorical variable, a logit link is used.
16 To model a continuous response variable, an identity link is used. To model a count response variable, a log link is used. To model a categorical variable, a logit link is used. response research link linear variable equation function model continuous Y X identity Y = X count Y X log log(y ) = X categorical Y X logit logit(y ) = X
17 To model a continuous response variable, an identity link is used. To model a count response variable, a log link is used. To model a categorical variable, a logit link is used. response research link linear variable equation function model continuous Y X identity Y = X count Y X log log(y ) = X categorical Y X logit logit(y ) = X These will be explained in detail in later sessions. It is enough at this point to just realise that different response variables can be modelled by changing the link function.
18 In practice, if we know the measurement scale of the variable being predicted, we can identify an appropriate GLM technique to use... If Y is continuous: OLS regression.
19 In practice, if we know the measurement scale of the variable being predicted, we can identify an appropriate GLM technique to use... If Y is continuous: OLS regression. If Y is a count: Poisson regression.
20 In practice, if we know the measurement scale of the variable being predicted, we can identify an appropriate GLM technique to use... If Y is continuous: OLS regression. If Y is a count: Poisson regression. If Y is ordered categorical: Proportional-odds regression.
21 In practice, if we know the measurement scale of the variable being predicted, we can identify an appropriate GLM technique to use... If Y is continuous: OLS regression. If Y is a count: Poisson regression. If Y is ordered categorical: Proportional-odds regression. If Y is unordered categorical: Multinomial regression.
22 In practice, if we know the measurement scale of the variable being predicted, we can identify an appropriate GLM technique to use... If Y is continuous: OLS regression. If Y is a count: Poisson regression. If Y is ordered categorical: Proportional-odds regression. If Y is unordered categorical: Multinomial regression.
23 In practice, if we know the measurement scale of the variable being predicted, we can identify an appropriate GLM technique to use... If Y is continuous: OLS regression. If Y is a count: Poisson regression. If Y is ordered categorical: Proportional-odds regression. If Y is unordered categorical: Multinomial regression. GLM models are particularly powerful, as they are all conceptually very similar. Learning to apply and interpret results from one technique greatly helps in applying and interpreting results from the others.
24 GLM a statistical representation Up till now models have been represented using variable names. This way of looking at models is very useful as it is the way that models are conceptualised and input into the statistical software.
25 GLM a statistical representation Up till now models have been represented using variable names. This way of looking at models is very useful as it is the way that models are conceptualised and input into the statistical software. It is useful, however, to also represent models using a more detailed statistical representation; one that corresponds to the way that the results are produced and reported.
26 GLM a statistical representation Up till now models have been represented using variable names. This way of looking at models is very useful as it is the way that models are conceptualised and input into the statistical software. It is useful, however, to also represent models using a more detailed statistical representation; one that corresponds to the way that the results are produced and reported. The statistical representation simply includes some parameters that quantify the relationships in the data. We not only want to know that Y and X are related, we also want to know HOW they are related. ie. When X changes by a set amount, what is the effect on Y?
27 GLM the parameters A conceptual model of ice cream consumption (from the IceCream dataset) is... consumption temperature
28 GLM the parameters A conceptual model of ice cream consumption (from the IceCream dataset) is... consumption temperature which is represented statistically as... consumption = β 0 + β 1 temperature where
29 GLM the parameters A conceptual model of ice cream consumption (from the IceCream dataset) is... consumption temperature which is represented statistically as... consumption = β 0 + β 1 temperature where β 0 estimates consumption when temperature is zero.
30 GLM the parameters A conceptual model of ice cream consumption (from the IceCream dataset) is... consumption temperature which is represented statistically as... consumption = β 0 + β 1 temperature where β 0 estimates consumption when temperature is zero. β 1 estimates the change in consumption for a unit increase in temperature.
31 The most interesting statistic for us is the parameter that estimates the relationship between temperature and consumption ; the parameter β 1. The formal description of this parameter is...
32 The most interesting statistic for us is the parameter that estimates the relationship between temperature and consumption ; the parameter β 1. The formal description of this parameter is... For a unit increase in X, the estimated change in Y is β 1.
33 The most interesting statistic for us is the parameter that estimates the relationship between temperature and consumption ; the parameter β 1. The formal description of this parameter is... For a unit increase in X, the estimated change in Y is β 1. For a unit increase in temperature, the estimated change in consumption is β 1.
34 The most interesting statistic for us is the parameter that estimates the relationship between temperature and consumption ; the parameter β 1. The formal description of this parameter is... For a unit increase in X, the estimated change in Y is β 1. For a unit increase in temperature, the estimated change in consumption is β 1. The values for the parameters are obtained from the statistical output for the model... β 0 = and β 1 =
35 The following output was obtained by putting the model consumption temperature into the Rcmdr GLM menu (as consumption is continuous, select the identity link from the Gaussian family)... glm(formula = Consumption ~ Temperature, family = gaussian(identity), data = IceCream) Coefficients: Estimate Std. Error (Intercept) Temperature
36 The following output was obtained by putting the model consumption temperature into the Rcmdr GLM menu (as consumption is continuous, select the identity link from the Gaussian family)... glm(formula = Consumption ~ Temperature, family = gaussian(identity), data = IceCream) Coefficients: Estimate Std. Error (Intercept) Temperature These parameters are also represented in the effect display... (use the Rcmdr menu options Models, graphs, effect plot...).
37 Temperature effect plot Consumption Temperature
38 Consumption Temperature effect plot computing β 1 As temperature increases by 40 (30 to 70), consumption increases by (0.3 to 0.425). For a unit increase in temperature, consumption increases by ( ) Temperature
39 Consumption Temperature effect plot Temperature computing β 1 As temperature increases by 40 (30 to 70), consumption increases by (0.3 to 0.425). For a unit increase in temperature, consumption increases by ( ). computing β 0 The value of consumption when temperature = 0. A simple estimation from the graph shows β 0 = 0.2.
40 Categorical explanatory variables... It is useful at this stage to look at categorical explanatory variables. A detailed description of analysing categorical explanatory variables is available in... Hutcheson, G. D. (2011). Categorical Explanatory Variables. Journal of Modelling in Management. 6, 2: research-training.net/reading/jmm6(2)contrastcodingupdated.pdf
41 Categorical explanatory variables... It is useful at this stage to look at categorical explanatory variables. A detailed description of analysing categorical explanatory variables is available in... Hutcheson, G. D. (2011). Categorical Explanatory Variables. Journal of Modelling in Management. 6, 2: research-training.net/reading/jmm6(2)contrastcodingupdated.pdf At a basic level, categorical variables are divided into a number of binary comparisons. Each category is then compared to a specific category (the reference).
42 Categorical explanatory variables... It is useful at this stage to look at categorical explanatory variables. A detailed description of analysing categorical explanatory variables is available in... Hutcheson, G. D. (2011). Categorical Explanatory Variables. Journal of Modelling in Management. 6, 2: research-training.net/reading/jmm6(2)contrastcodingupdated.pdf At a basic level, categorical variables are divided into a number of binary comparisons. Each category is then compared to a specific category (the reference). The following slide shows a model of examination score and whether this is related to the school a child is at. school is an unordered categorical variable with three categories (schoola, schoolb and schoolc).
43 The conceptual model examination score school
44 The conceptual model examination score school is represented statistically as score = β 0 + β 1 schoolb + β 2 schoolc where
45 The conceptual model examination score school is represented statistically as score = β 0 + β 1 schoolb + β 2 schoolc where β 0 estimates score when schoolb and schoolc are both zero (this is equivalent to the score for schoola).
46 The conceptual model examination score school is represented statistically as score = β 0 + β 1 schoolb + β 2 schoolc where β 0 estimates score when schoolb and schoolc are both zero (this is equivalent to the score for schoola). β 1 estimates the change in score for schoolb compared to schoola.
47 The conceptual model examination score school is represented statistically as score = β 0 + β 1 schoolb + β 2 schoolc where β 0 estimates score when schoolb and schoolc are both zero (this is equivalent to the score for schoola). β 1 estimates the change in score for schoolb compared to schoola. β 2 estimates the change in score for schoolc compared to schoola.
48 The following output from the schools.csv dataset was obtained by putting the model score school into the Rcmdr GLM menu (as consumption is continuous, identify the identity link)... glm(formula = SCORE ~ SCHOOL, family = gaussian(identity), data = schools) Coefficients: Estimate Std. Error (Intercept) SCHOOL[T.schoolB] SCHOOL[T.schoolC]
49 The following output from the schools.csv dataset was obtained by putting the model score school into the Rcmdr GLM menu (as consumption is continuous, identify the identity link)... glm(formula = SCORE ~ SCHOOL, family = gaussian(identity), data = schools) Coefficients: Estimate Std. Error (Intercept) SCHOOL[T.schoolB] SCHOOL[T.schoolC] The predicted score at schoola is SchoolB is 7.4 points higher (70.6) and schoolc is 7.6 points lower (56.6).
50 This result can be easily seen in the effect display that accompanies the model SCORE schoola schoolb schoolc SCHOOL
51 This result can be easily seen in the effect display that accompanies the model β 0 = 63.2 SCORE The value of score for schoola (the reference category) schoola schoolb schoolc SCHOOL
52 This result can be easily seen in the effect display that accompanies the model β 0 = 63.2 SCORE The value of score for schoola (the reference category). β 1 = schoolb is 7.4 units higher than schoola. 50 schoola schoolb schoolc SCHOOL
53 This result can be easily seen in the effect display that accompanies the model β 0 = 63.2 SCORE The value of score for schoola (the reference category). β 1 = schoolb is 7.4 units higher than schoola. 50 schoola schoolb schoolc SCHOOL β 2 = 56.6 schoolc is 7.6 units lower than schoola.
54 interpreting parameters for other GLMs The interpretation of the regression coefficients is essentially the same for all GLMs. For example, in order to model the number of checks a suspect appears on and whether this is related to a person s age (these data are from the Arrests dataset from the effects package)... number of checks age
55 interpreting parameters for other GLMs The interpretation of the regression coefficients is essentially the same for all GLMs. For example, in order to model the number of checks a suspect appears on and whether this is related to a person s age (these data are from the Arrests dataset from the effects package)... number of checks age as the number of checks is a count variable, a log-link should be used. The statistical model for this is... log (checks) = β 0 + β 1 age
56 interpreting parameters for other GLMs The interpretation of the regression coefficients is essentially the same for all GLMs. For example, in order to model the number of checks a suspect appears on and whether this is related to a person s age (these data are from the Arrests dataset from the effects package)... number of checks age as the number of checks is a count variable, a log-link should be used. The statistical model for this is... log (checks) = β 0 + β 1 age The parameter β 0 indicates the value of log(checks) when age equals zero. The parameter β 1 indicates the change in log(checks) for a unit change in age.
57 The following output was obtained by putting the model checks age into the Rcmdr GLM menu (as checks is count, identify the log link from the Poisson family)... glm(formula = checks ~ age, family = poisson(log), data = Arrests) Coefficients: Estimate Std. Error (Intercept) age
58 The following output was obtained by putting the model checks age into the Rcmdr GLM menu (as checks is count, identify the log link from the Poisson family)... glm(formula = checks ~ age, family = poisson(log), data = Arrests) Coefficients: Estimate Std. Error (Intercept) age When age = 0, log(count) = For a unit increase in age, log(count) changes by
59 This result can be easily seen in the effect display that accompanies the model... The follow graph shows the relationship between age and log(count) checks age
60 This result can be easily seen in the effect display that accompanies the model... The follow graph shows the relationship between age and log(count) β 0 = (see above) A simple estimation from the graph shows that when age =0, log(checks)=0.15. checks age
61 This result can be easily seen in the effect display that accompanies the model... The follow graph shows the relationship between age and log(count)... β 0 = (see above) A simple estimation from the graph shows that when age =0, log(checks)=0.15. checks age β 1 = (see above) As age increases from 20 to 60 (increase of 40), log(checks) increases from 0.43 to 1 (increase of 0.57). For each unit increase in age, log(checks) increases by 0.57 =
62 The model parameters for GLM models have an easy interpretation; one that is essentially the same for all GLM models.
63 The model parameters for GLM models have an easy interpretation; one that is essentially the same for all GLM models. The parameters can be interpreted from the standard statistical output shown in the tables, or from the effect displays.
64 Assessing significance...
65 Assessing significance for GLMs In order to interpret the models, it is useful to know the significance associated with each of the parameter estimates. Although consumption may increase by for each unit increase in temperature ; there is no indication whether this increase may have been due to chance.
66 Assessing significance for GLMs In order to interpret the models, it is useful to know the significance associated with each of the parameter estimates. Although consumption may increase by for each unit increase in temperature ; there is no indication whether this increase may have been due to chance. In addition to the parameter estimates, we usually also want to know which parameters, or groups of parameters are significant.
67 Assessing significance for GLMs In order to interpret the models, it is useful to know the significance associated with each of the parameter estimates. Although consumption may increase by for each unit increase in temperature ; there is no indication whether this increase may have been due to chance. In addition to the parameter estimates, we usually also want to know which parameters, or groups of parameters are significant. This is where the GLM models excel, as they employ a simple method for determining significance; one that applies generally to all GLMs.
68 Deviance Significance is assessed in GLMs using a common method based on a statistic known as the deviance.
69 Deviance Significance is assessed in GLMs using a common method based on a statistic known as the deviance. The deviance is simply a measure of the difference between the values predicted by the model and the actual values (predicted values compared to the observed data). If the model provides a good prediction of the response variable, the deviance will be relatively small. If the model does not provide accurate predictions of the response variable, the deviance will be relatively large.
70 Deviance Significance is assessed in GLMs using a common method based on a statistic known as the deviance. The deviance is simply a measure of the difference between the values predicted by the model and the actual values (predicted values compared to the observed data). If the model provides a good prediction of the response variable, the deviance will be relatively small. If the model does not provide accurate predictions of the response variable, the deviance will be relatively large. The deviance basically gives an indication of how well the model fits the data.
71 Using the deviance, it is easy to determine the significance of individual and/or groups of variables by comparing nested models. For example, if we wanted to assess whether temperature is a significant predictor of ice cream consumption, we can compare a model of consumption that includes temperature, with a model that does not. consumption = β 0 + β 1 temperature deviance = consumption = β 0 deviance = 0.126
72 Using the deviance, it is easy to determine the significance of individual and/or groups of variables by comparing nested models. For example, if we wanted to assess whether temperature is a significant predictor of ice cream consumption, we can compare a model of consumption that includes temperature, with a model that does not. consumption = β 0 + β 1 temperature deviance = consumption = β 0 deviance = The effect that temperature has had on the model is to reduce the deviance by ( ). All of this information is given in the analysis output...
73 Using the deviance, it is easy to determine the significance of individual and/or groups of variables by comparing nested models. For example, if we wanted to assess whether temperature is a significant predictor of ice cream consumption, we can compare a model of consumption that includes temperature, with a model that does not. consumption = β 0 + β 1 temperature deviance = consumption = β 0 deviance = The effect that temperature has had on the model is to reduce the deviance by ( ). All of this information is given in the analysis output... The Null deviance is the deviance in the response variable without taking into account any other information. The Residual deviance is the deviance in the model that includes the explanatory variables.
74 glm(formula = Consumption ~ Temperature, family = gaussian(identity), data = IceCream) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-09 Temperature e-07 Null deviance: on 29 degrees of freedom Residual deviance: on 28 degrees of freedom Analysis of Deviance Table (Type II tests) Response: Consumption SS Df F Pr(>F) Temperature e-07 Residuals
75 The model above was obtained by running a GLM on the IceCream data and then requesting an analysis of deviance table. These commands can be issued via the Rcmdr menus, or direct to a script file using the commands... GLM.1 <- glm(consumption ~ Temperature, family=gaussian(identity), data=icecream) summary(glm.1) Anova(GLM.1, type="ii", test="f")
76 The model above was obtained by running a GLM on the IceCream data and then requesting an analysis of deviance table. These commands can be issued via the Rcmdr menus, or direct to a script file using the commands... GLM.1 <- glm(consumption ~ Temperature, family=gaussian(identity), data=icecream) summary(glm.1) Anova(GLM.1, type="ii", test="f") The change in deviance associated with removing temperature from the model is assessed for significance using the F-test. A detailed description of significance tests and deviance is provided in... Hutcheson, G. D. and Moutinho, L. (2008). Statistical Modelling for Management. Sage Publications.
77 Temperature effect plot Consumption Temperature
78 Consumption Temperature effect plot Temperature significance of temperature a visual indication of the significance of temperature can be seen in the effect display. It is easy to see from the line and associated confidence intervals shown in the graph (the shaded area of the plot) that predictions of consumption are different as temperature changes. Information about temperature is, therefore, important for predicting consumption.
79 The significance of temperature in the model can be manually calculated by comparing the nested models... model01: consumption = β 0 model02: consumption = β 0 + β 1 temperature This can be easily achieved using the Rcmdr Models, Hypothesis tests, Compare two models... menu option.
80
81 The output clearly shows which models are being compared. Note that the statistics for Temperature are the same as those provided previously... Rcmdr> anova(model01, model02, test="f") Analysis of Deviance Table Model 1: Consumption ~ 1 Model 2: Consumption ~ Temperature Resid. Df Resid. Dev Df Deviance F Pr(>F)
82 The same underlying theory may be applied to testing categorical explanatory variables and different response variables. The following example shows a binary response variable being predicted by a categorical and a continuous explanatory variable. These data are from the Arrests dataset in the effects package (to reproduce these results, don t forget to change the variable year to a categorical variable yearcat ).
83 The same underlying theory may be applied to testing categorical explanatory variables and different response variables. The following example shows a binary response variable being predicted by a categorical and a continuous explanatory variable. These data are from the Arrests dataset in the effects package (to reproduce these results, don t forget to change the variable year to a categorical variable yearcat ). The model we will be investigating is... released yearcat + age
84 The same underlying theory may be applied to testing categorical explanatory variables and different response variables. The following example shows a binary response variable being predicted by a categorical and a continuous explanatory variable. These data are from the Arrests dataset in the effects package (to reproduce these results, don t forget to change the variable year to a categorical variable yearcat ). The model we will be investigating is... released yearcat + age which is represented statistically as... logit(released) = β 0 + β 1 year β 2 year β 3 year β 4 year β 5 year β 6 age
85 or represented more succinctly as... logit(released) = β y=5 β y(1 5) yearcat + β 6 age
86 or represented more succinctly as... logit(released) = β y=5 β y(1 5) yearcat + β 6 age The categorical variable yearcat is represented using 5 binary comparisons; each year is compared to the reference category.
87 or represented more succinctly as... logit(released) = β y=5 β y(1 5) yearcat + β 6 age The categorical variable yearcat is represented using 5 binary comparisons; each year is compared to the reference category. A GLM (using the logit link) is shown below...
88 glm(formula = released ~ age + yearcat, family = binomial(logit), data = Arrests) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) < 2e-16 age yearcat[t.1998] yearcat[t.1999] yearcat[t.2000] e-06 yearcat[t.2001] yearcat[t.2002] Null deviance: on 5225 degrees of freedom Residual deviance: on 5219 degrees of freedom
89 The logit model of released shows the deviance scores at the bottom of the tabular output. The Null deviance tells us how much deviance there is in the variable released (an empty model) and the Residual deviance tells us how much deviance there is in the model with all the explanatory variables included.
90 The logit model of released shows the deviance scores at the bottom of the tabular output. The Null deviance tells us how much deviance there is in the variable released (an empty model) and the Residual deviance tells us how much deviance there is in the model with all the explanatory variables included. In order to find out if each variable is significant, we need to look at the analysis of deviance table... Analysis of Deviance Table (Type II tests) Response: released LR Chisq Df Pr(>Chisq) age ** yearcat ***
91 From the Analysis of Deviance table, we can see that including yearcat in the model results in a reduction in deviance of This is the result of comparing the deviance of the following models... and logit(released) = β 0 + β 1 year β 2 year β 3 year β 4 year β 5 year β 6 age logit(released) = β 0 + β 6 age
92 From the Analysis of Deviance table, we can see that including yearcat in the model results in a reduction in deviance of This is the result of comparing the deviance of the following models... and logit(released) = β 0 + β 1 year β 2 year β 3 year β 4 year β 5 year β 6 age logit(released) = β 0 + β 6 age This change in deviance is significant and provides evidence that the variable yearcat may be influential in predicting whether or not someone is released.
93 These models can be compared using the Rcmdr Models, Hypothesis tests, Compare two models... menu option. Analysis of Deviance Table Model 1: released ~ age Model 2: released ~ yearcat + age Resid. Df Resid. Dev Df Deviance Pr(>Chi)
94 Including the variable age in the model results in a reduction in deviance of This is the result of removing 1 parameter from the model (df = 1). and logit(released) = β 0 + β 1 year β 2 year β 3 year β 4 year β 5 year β 6 age logit(released) = β 0 + β 1 year β 2 year β 3 year β 4 year β 5 year2002
95 Including the variable age in the model results in a reduction in deviance of This is the result of removing 1 parameter from the model (df = 1). and logit(released) = β 0 + β 1 year β 2 year β 3 year β 4 year β 5 year β 6 age logit(released) = β 0 + β 1 year β 2 year β 3 year β 4 year β 5 year2002 This change in deviance is significant and provides evidence that the variable age may be influential in predicting whether or not someone is released.
96 These models can be compared using the Rcmdr Models, Hypothesis tests, Compare two models... menu option. Analysis of Deviance Table Model 1: released ~ yearcat Model 2: released ~ yearcat + age Resid. Df Resid. Dev Df Deviance Pr(>Chi)
97 These results can also be inferred from the effect displays. Although these do not provide estimates of significance, the graphics provide enough information to identify the important relationships visually. yearcat effect plot age effect plot 0.88 released released yearcat age
98 The significance of each category to the prediction of the response variable in logit models is estimated using the z-distribution in the standard regression output. It should be noted that this is a large-sample approximation and the deviance statistic is preferable for assessing significance. Full information about this is available in... Hutcheson, G. D. and Moutinho, L. (2008). Statistical Modelling for Management. Sage Publications.
99 Type II and Type III ANOVA tests The choice of ANOVA test that is used to compare models is very important, particularly for models with interactions.
100 Type II and Type III ANOVA tests The choice of ANOVA test that is used to compare models is very important, particularly for models with interactions. Type III ANOVA tests are those that are computed for individual parameters and are provided in the standard tabular output.
101 Type II and Type III ANOVA tests The choice of ANOVA test that is used to compare models is very important, particularly for models with interactions. Type III ANOVA tests are those that are computed for individual parameters and are provided in the standard tabular output. Type II ANOVA tests are those that are computed for variables and are provided in the Analysis of Deviance table. The difference between the two types of test is very important and will be highlighted using examples later in the course. The difference between the two can be illustrated using the following model (from the ICEcream data) which contains an interaction... consumption temperature income
102 consumption temperature income Type III tests Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 1.216e e Temperature e e Income 7.459e e Temperature:Income 6.250e e Type II tests SS Df F Pr(>F) Temperature e-08 *** Income ** Temperature:Income
103 The significance of the variables temperature and Income are different depending on the type of test used. The type III test for temperature compares the models... consumption = β 0 + β 1 temperature + β 2 income + β 3 temperature:income and consumption = β 0 + β 2 income + β 3 temperature:income
104 The significance of the variables temperature and Income are different depending on the type of test used. The type III test for temperature compares the models... consumption = β 0 + β 1 temperature + β 2 income + β 3 temperature:income and consumption = β 0 + β 2 income + β 3 temperature:income whilst the type II test for temperature compares the models... consumption = β 0 + β 1 temperature + β 2 income and consumption = β 0 + β 2 income
105 The type III test does not provide an indication of the significance of the variable temperature. We need to compare a model that includes temperature with a model thatdoes not - BOTH of these modesl include temperature.
106 The type III test does not provide an indication of the significance of the variable temperature. We need to compare a model that includes temperature with a model thatdoes not - BOTH of these modesl include temperature. The type II test does compare models differentiated on the basis of temperature and are, therefore, more appropriate for assessing the effect of temperature.
107 Traditional tests and equivalent GLM models...
108 Traditional tests and equivalent GLM models The GLM models reproduce or replace many of the traditional tests. For example, tests for independent group designs...
109 Traditional tests and equivalent GLM models The GLM models reproduce or replace many of the traditional tests. For example, tests for independent group designs... Traditional Test GLM one independent variable t-test (unrelated) Mann-Whitney 1-way ANOVA (unrelated) Kruskal-Wallis Jonck-heere Trend chi-square (contingency table) etc., etc. Y X multiple independent variables complex selection of multi-way ANOVA models multi-way contingency tables (log-linear) Y X 1 + X 2
110 Traditional tests and equivalent GLM models... and tests for dependent (or matched) group designs... Traditional Test GLM one independent variable paired t-test Wilcoxon 1-way ANOVA (related) Friedman Pages L-trend etc., etc., Y subject + X multiple independent variables complex selection of multi-way ANOVA models multi-way contingency tables (log-linear) Y subject + X 1 + X 2
111 In order to realise the power of the GLMs, it is important to understand the equivalence of the traditional tests and the GLMs. Detailed information about this is provided in...
112 In order to realise the power of the GLMs, it is important to understand the equivalence of the traditional tests and the GLMs. Detailed information about this is provided in... Hutcheson, G. D. and Schaefer, L. (2012). Test selection in the 21st century. Journal of Modelling in Management, 7,3: http: //
113 The usefulness of the equation format for representing statistical models is evident when the analyses are run in the software.
114 The usefulness of the equation format for representing statistical models is evident when the analyses are run in the software. In order to select the appropriate GLM define equation
115 The usefulness of the equation format for representing statistical models is evident when the analyses are run in the software. In order to select the appropriate GLM define equation 2. Identify scale of measurement for the response variable
116 The usefulness of the equation format for representing statistical models is evident when the analyses are run in the software. In order to select the appropriate GLM define equation 2. Identify scale of measurement for the response variable 3. Select the appropriate model in the Rcmdr.
117 The usefulness of the equation format for representing statistical models is evident when the analyses are run in the software. In order to select the appropriate GLM define equation 2. Identify scale of measurement for the response variable 3. Select the appropriate model in the Rcmdr. Generalized linear model... for continuous and count responses
118 The usefulness of the equation format for representing statistical models is evident when the analyses are run in the software. In order to select the appropriate GLM define equation 2. Identify scale of measurement for the response variable 3. Select the appropriate model in the Rcmdr. Generalized linear model... for continuous and count responses Multinomial logit model... for unordered categorical responses
119 The usefulness of the equation format for representing statistical models is evident when the analyses are run in the software. In order to select the appropriate GLM define equation 2. Identify scale of measurement for the response variable 3. Select the appropriate model in the Rcmdr. Generalized linear model... for continuous and count responses Multinomial logit model... for unordered categorical responses Ordinal regression model... for ordered categorical responses
120
121 Then enter the equation...
122
123
124
125
126 Exercises...
127 Your research model is... salary age + qualification where salary and age are continuous and qualification is recorded in 4 ordered categories. 1. which link function is appropriate for this model? 2. How many parameters would you expect to see in the model for qualification? 3. which statistical technique would you use? 4. what is the statistical model (the linear model)?
128 Your research model is... A-level IQ + gender where A-level is recorded as 6 ordered categories, IQ is recorded as continuous, and gender is recorded as 2 unordered categories. 1. which link function is appropriate for this model? 2. How many parameters would you expect to see in the model for IQ? 3. which statistical technique would you use? 4. what is the statistical model (the linear model)?
129 Your research model is... traffic violations age gender where traffic violations is recorded as the number of recorded violations (0 to 7), age is recorded as continuous, and gender is recorded as 2 unordered categories. 1. which link function is appropriate for this model? 2. How many parameters would you expect to see in the model (including the intercept)? 3. which statistical technique would you use? 4. what is the statistical model (the linear model)? 5. Would you use a TYPE II or a TYPE III test to ascertain if age is significant?
130 Your research model is... Holiday destination age gender where Holiday destination is recorded as 6 unordered categories, age is recorded as continuous, and gender is recorded as 2 unordered categories. 1. which link function is appropriate for this model? 2. which statistical technique would you use? 3. Would you use a TYPE II or a TYPE III test to ascertain if gender is significant?
A course in statistical modelling. session 09: Modelling count variables
A Course in Statistical Modelling SEED PGR methodology training December 08, 2015: 12 2pm session 09: Modelling count variables Graeme.Hutcheson@manchester.ac.uk blackboard: RSCH80000 SEED PGR Research
More informationA course in statistical modelling. session 06b: Modelling count data
A Course in Statistical Modelling University of Glasgow 29 and 30 January, 2015 session 06b: Modelling count data Graeme Hutcheson 1 Luiz Moutinho 2 1 Manchester Institute of Education Manchester university
More informationStatistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010
Statistical Models for Management Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon February 24 26, 2010 Graeme Hutcheson, University of Manchester GLM models and OLS regression The
More informationGLM models and OLS regression
GLM models and OLS regression Graeme Hutcheson, University of Manchester These lecture notes are based on material published in... Hutcheson, G. D. and Sofroniou, N. (1999). The Multivariate Social Scientist:
More informationAnalysing categorical data using logit models
Analysing categorical data using logit models Graeme Hutcheson, University of Manchester The lecture notes, exercises and data sets associated with this course are available for download from: www.research-training.net/manchester
More informationBMI 541/699 Lecture 22
BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationA Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 7 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, Colonic
More information7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis
Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression
More informationContents. Acknowledgments. xix
Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables
More informationHierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!
Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter
More informationModel Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV
More informationA Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationTento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/
Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.
More informationGeneralised linear models. Response variable can take a number of different formats
Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion
More information(Where does Ch. 7 on comparing 2 means or 2 proportions fit into this?)
12. Comparing Groups: Analysis of Variance (ANOVA) Methods Response y Explanatory x var s Method Categorical Categorical Contingency tables (Ch. 8) (chi-squared, etc.) Quantitative Quantitative Regression
More informationStatistical Distribution Assumptions of General Linear Models
Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions
More informationIntroducing Generalized Linear Models: Logistic Regression
Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationESP 178 Applied Research Methods. 2/23: Quantitative Analysis
ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and
More informationA discussion on multiple regression models
A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value
More informationUnit 6 - Introduction to linear regression
Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,
More informationOverdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion
Biostokastikum Overdispersion is not uncommon in practice. In fact, some would maintain that overdispersion is the norm in practice and nominal dispersion the exception McCullagh and Nelder (1989) Overdispersion
More informationInvestigating Models with Two or Three Categories
Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might
More informationParametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous
More informationReview: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:
Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic
More informationRon Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)
Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationModel Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction ReCap. Parts I IV. The General Linear Model Part V. The Generalized Linear Model 16 Introduction 16.1 Analysis
More informationWrap-up. The General Linear Model is a special case of the Generalized Linear Model. Consequently, we can carry out any GLM as a GzLM.
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Analysis of Continuous Data ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV (Ch13,
More informationSTAT 7030: Categorical Data Analysis
STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012
More informationAnalysis of Variance. Contents. 1 Analysis of Variance. 1.1 Review. Anthony Tanbakuchi Department of Mathematics Pima Community College
Introductory Statistics Lectures Analysis of Variance 1-Way ANOVA: Many sample test of means Department of Mathematics Pima Community College Redistribution of this material is prohibited without written
More informationPoisson Regression. The Training Data
The Training Data Poisson Regression Office workers at a large insurance company are randomly assigned to one of 3 computer use training programmes, and their number of calls to IT support during the following
More informationCohen s s Kappa and Log-linear Models
Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance
More informationExercise 5.4 Solution
Exercise 5.4 Solution Niels Richard Hansen University of Copenhagen May 7, 2010 1 5.4(a) > leukemia
More informationTypes of Statistical Tests DR. MIKE MARRAPODI
Types of Statistical Tests DR. MIKE MARRAPODI Tests t tests ANOVA Correlation Regression Multivariate Techniques Non-parametric t tests One sample t test Independent t test Paired sample t test One sample
More informationR Hints for Chapter 10
R Hints for Chapter 10 The multiple logistic regression model assumes that the success probability p for a binomial random variable depends on independent variables or design variables x 1, x 2,, x k.
More informationLog-linear Models for Contingency Tables
Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A
More informationWeek 7.1--IES 612-STA STA doc
Week 7.1--IES 612-STA 4-573-STA 4-576.doc IES 612/STA 4-576 Winter 2009 ANOVA MODELS model adequacy aka RESIDUAL ANALYSIS Numeric data samples from t populations obtained Assume Y ij ~ independent N(μ
More informationGeneralized Linear Models (GLZ)
Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationPaper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD
Paper: ST-161 Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop Institute @ UMBC, Baltimore, MD ABSTRACT SAS has many tools that can be used for data analysis. From Freqs
More informationIntroduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.
Preface p. xi Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. 6 The Scientific Method and the Design of
More informationNature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.
Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences
More informationNon-Gaussian Response Variables
Non-Gaussian Response Variables What is the Generalized Model Doing? The fixed effects are like the factors in a traditional analysis of variance or linear model The random effects are different A generalized
More informationCHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC
CHI SQUARE ANALYSIS I N T R O D U C T I O N T O N O N - P A R A M E T R I C A N A L Y S E S HYPOTHESIS TESTS SO FAR We ve discussed One-sample t-test Dependent Sample t-tests Independent Samples t-tests
More informationLISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014
LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers
More informationStat 8053, Fall 2013: Multinomial Logistic Models
Stat 8053, Fall 2013: Multinomial Logistic Models Here is the example on page 269 of Agresti on food preference of alligators: s is size class, g is sex of the alligator, l is name of the lake, and f is
More informationReview of the General Linear Model
Review of the General Linear Model EPSY 905: Multivariate Analysis Online Lecture #2 Learning Objectives Types of distributions: Ø Conditional distributions The General Linear Model Ø Regression Ø Analysis
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationTurning a research question into a statistical question.
Turning a research question into a statistical question. IGINAL QUESTION: Concept Concept Concept ABOUT ONE CONCEPT ABOUT RELATIONSHIPS BETWEEN CONCEPTS TYPE OF QUESTION: DESCRIBE what s going on? DECIDE
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More information22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)
22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationThe Chi-Square Distributions
MATH 03 The Chi-Square Distributions Dr. Neal, Spring 009 The chi-square distributions can be used in statistics to analyze the standard deviation of a normally distributed measurement and to test the
More informationStatistics 203 Introduction to Regression Models and ANOVA Practice Exam
Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Prof. J. Taylor You may use your 4 single-sided pages of notes This exam is 7 pages long. There are 4 questions, first 3 worth 10
More information1 Correlation and Inference from Regression
1 Correlation and Inference from Regression Reading: Kennedy (1998) A Guide to Econometrics, Chapters 4 and 6 Maddala, G.S. (1992) Introduction to Econometrics p. 170-177 Moore and McCabe, chapter 12 is
More informationThe Chi-Square Distributions
MATH 183 The Chi-Square Distributions Dr. Neal, WKU The chi-square distributions can be used in statistics to analyze the standard deviation σ of a normally distributed measurement and to test the goodness
More informationSleep data, two drugs Ch13.xls
Model Based Statistics in Biology. Part IV. The General Linear Mixed Model.. Chapter 13.3 Fixed*Random Effects (Paired t-test) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch
More informationCDA Chapter 3 part II
CDA Chapter 3 part II Two-way tables with ordered classfications Let u 1 u 2... u I denote scores for the row variable X, and let ν 1 ν 2... ν J denote column Y scores. Consider the hypothesis H 0 : X
More informationExam details. Final Review Session. Things to Review
Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit
More informationJoseph O. Marker Marker Actuarial a Services, LLC and University of Michigan CLRS 2010 Meeting. J. Marker, LSMWP, CLRS 1
Joseph O. Marker Marker Actuarial a Services, LLC and University of Michigan CLRS 2010 Meeting J. Marker, LSMWP, CLRS 1 Expected vs Actual Distribution Test distributions of: Number of claims (frequency)
More informationGeneralized linear models
Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data
More informationRegression. Marc H. Mehlman University of New Haven
Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and
More informationExtensions of One-Way ANOVA.
Extensions of One-Way ANOVA http://www.pelagicos.net/classes_biometry_fa18.htm What do I want You to Know What are two main limitations of ANOVA? What two approaches can follow a significant ANOVA? How
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More informationHandling Categorical Predictors: ANOVA
Handling Categorical Predictors: ANOVA 1/33 I Hate Lines! When we think of experiments, we think of manipulating categories Control, Treatment 1, Treatment 2 Models with Categorical Predictors still reflect
More informationExtensions of One-Way ANOVA.
Extensions of One-Way ANOVA http://www.pelagicos.net/classes_biometry_fa17.htm What do I want You to Know What are two main limitations of ANOVA? What two approaches can follow a significant ANOVA? How
More informationSimple Linear Regression: One Quantitative IV
Simple Linear Regression: One Quantitative IV Linear regression is frequently used to explain variation observed in a dependent variable (DV) with theoretically linked independent variables (IV). For example,
More informationGeneralized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.
Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint
More informationLogistic Regressions. Stat 430
Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationlme4 Luke Chang Last Revised July 16, Fitting Linear Mixed Models with a Varying Intercept
lme4 Luke Chang Last Revised July 16, 2010 1 Using lme4 1.1 Fitting Linear Mixed Models with a Varying Intercept We will now work through the same Ultimatum Game example from the regression section and
More informationINFERENCE FOR REGRESSION
CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We
More informationTwo Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00
Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationReaction Days
Stat April 03 Week Fitting Individual Trajectories # Straight-line, constant rate of change fit > sdat = subset(sleepstudy, Subject == "37") > sdat Reaction Days Subject > lm.sdat = lm(reaction ~ Days)
More informationUnit 6 - Simple linear regression
Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable
More information12 Modelling Binomial Response Data
c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual
More informationTABLES AND FORMULAS FOR MOORE Basic Practice of Statistics
TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x
More informationPoisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Poisson Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Poisson Regression 1 / 49 Poisson Regression 1 Introduction
More informationChapter 4. Regression Models. Learning Objectives
Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing
More informationStatistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018
Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More information1 Introduction to Minitab
1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you
More informationA Re-Introduction to General Linear Models (GLM)
A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing
More informationReview of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models
Chapter 6 Multicategory Logit Models Response Y has J > 2 categories. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. 6.1 Logit Models for Nominal Responses
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationOne-Way ANOVA. Some examples of when ANOVA would be appropriate include:
One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement
More informationEPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7
Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review
More informationChapter 5: Logistic Regression-I
: Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More informationInteractions in Logistic Regression
Interactions in Logistic Regression > # UCBAdmissions is a 3-D table: Gender by Dept by Admit > # Same data in another format: > # One col for Yes counts, another for No counts. > Berkeley = read.table("http://www.utstat.toronto.edu/~brunner/312f12/
More information