Model Estimation Example

Size: px
Start display at page:

Download "Model Estimation Example"

Transcription

1 Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions of various model estimation methods come up regularly in factor analysis, structural equation models, mixed (or multilevel) models, and generalized linear models (i.e., which are models for dichotomous, ordinal, multinomial, and count outcomes). Model estimation attempts to determine the extent to which a model-implied covariance (or correlation) matrix is a good approximation of the sample covariance matrix. In general, confirmation of a proposed model relies on the retention of the null hypothesis that is, that the data are consistent with the model hypothesized (Marcoulides & Hershberger, 1997). Failure to reject this null hypothesis implies that the proposed model is a plausible representation of the data, although it is important to note that it may not be the only plausible representation of the data. As Marcoulides and Hershberger (1997) note, evaluating the difference between the two covariance matrices based on the proposed model depends on the estimation method used to solve for the model s parameters [e.g., generalized least squares (GLS), maximum likelihood (ML), weighted least squares (WLS)]. Each approach proceeds iteratively to solve the modelimplied equations until an optimal solution for the model parameters is obtained (i.e., where the implied covariance matrix is close to the observed covariance matrix). The difference in the model-implied matrix and sample matrix is described as a discrepancy function, that is, a way of weighting the differences between the observed ( S ) and model-implied ( Ŝ ) covariance matrices. In matrix terms, we can define this as F = ( s sˆ) W( s sˆ) (1) ij where s and ŝ are nonduplicated elements of the observed and implied covariance matrices S and Ŝ and arranged as vectors. The goal of the analysis is to minimize this function by taking partial derivatives of it by the model parameters with respect to the elements of the two covariance matrices S Ŝ. So, for example, if we have a 3 x 3 covariance matrix, the lower part of the matrix would become a sixelement vector (3 variances and 3 covariances), and ( s s ˆ) would contain the differences between the elements in the two covariance matrices (Loehlin, 1992). The exact form of the discrepancy function is different for each estimation method, and each can have its own set of advantages and disadvantages. In Eq. 1 above, W is a weight matrix and different versions of it (i.e., ML, GLS, WLS) will yield different criteria for weighting the differences between the corresponding elements in the observed and implied covariance matrices. If W in Eq. 1 is an identity (I) matrix (which has 1s as the diagonal elements and 0s as the offdiagonal elements), the expression reduces to the following: ( s sˆ)( s sˆ). This is actually just the sum of the squared differences in the elements in the observed and implied covariance

2 Ronald H. Heck Model Estimation Example 2 matrices, which happens to be the ordinary least squares (OLS) criterion. Unweighted least squares (ULS) estimation is the same as OLS, in that the weight matrix (i.e., the weighted sum of the differences) is also just an identity matrix. Loehlin talks about squaring them because the function above essentially means the product of the two deviations. If the two matrices are identical, the value of the expression will be 0. The greater the difference between the two matrices, the squared differences in their elements will increase. The sum of these is the discrepancy function (F). The larger the discrepancy function becomes the worse the fit, which implies less similarity between the elements in the two matrices. Model estimation involves trying to minimize F by seeking values of the unknown model parameters that make the implied covariance as much like the observed covariance matrix as possible (Loehlin, 1992). For OLS, for example, this is often when the metrics of the variables in the covariance matrix are measured on the same type of scale of measurement. In comparison to OLS (or ULS) estimation, GLS, ML and WLS require considerably more computation. As Loehlin (1992) notes, for variables that are normally distributed (or relatively so), Eq. 1 reduces to the following: 1/2 [( - ˆ) ] 2 tr S S V, (2) where tr is the trace of the matrix (i.e., the sum of the diagonal elements) and V is another weight matrix. This formulation helps clarify the differences between ULS, GLS and ML estimation. As noted for ULS, the weight matrix V is defined as identify (V = I). For GLS, it is the inverse of the 1 sample covariance matrix (V = S ), and for ML it is defied as the inverse of the model-implied 1 covariance matrix (V = Ŝ ). Because the ML discrepancy function uses the inverse of the model-implied covariance matrix Ŝ, which has to be recalculated at each iteration, this makes ML estimation more challenging under certain conditions. It should be noted that ML is typically defined somewhat differently from Eq. 2 (Loehlin, 1992): ˆ 1 F = tr( SS ) p+ ln( Sˆ ) ln( S ), (3) ML that is, the discrepancy function is defined in terms of the trace (i.e., sum of the diagonal elements in the matrix) of the products of the sample and model-implied covariance matrices and the natural logarithms of the determinants of the model-implied and sample covariance matrices, given the number of variables (p) in the matrix. This leads to similar minimizing of the discrepancy function (Loehlin, 1992); however, it is often advantageous to work with logarithms, which often can facilitate solving the discrepancy function more easily. Note that in each of these cases, we are assuming that only the covariance matrices are being estimated (modeling mean structures simply requires additional terms added to each discrepancy function). As this discussion suggests, each general approach to model estimation rests on a somewhat different set of assumptions and statistical theory underlying the estimation of various kinds of models. Since GLS uses the inverse of the S covariance matrix as the weight matrix, an advantage is that it only needs to be calculated once, since the observed covariance matrix S does not change (i.e., it has been described therefore as ML estimation with a single iteration). As noted, ML depends on the model-implied covariance matrix and, therefore, typically more complex calculations (with multivariate normality and large sample sizes, however, GLS and

3 Ronald H. Heck Model Estimation Example 3 ML will produce very similar estimates). In cases where the outcomes are categorical (e.g., dichotomous, ordinal), estimation is considerably more complex than OLS regression models for continuous outcomes, since they depend on estimating probability relationships which follow sampling distributions other than the normal distribution. Such models (referred to as generalized linear models) therefore require iterative techniques such as ML to solve the implied set of relationships. GLS and ML can be used to derive a chi-square fit index through the calculation of: ( N 1) F 2 χ = min, (4) F where min is the value of the discrepancy function at the point of best fit and N is the sample size. As you have likely encountered, however, this model fit index is not always favored because of its reliance on sample size, which can lead to rejecting relatively good-fitting models in larger samples. Empirical work suggests ML estimation will work pretty well with skewness of +/-2 and kurtosis of +/-7 (West et al., 1995). WLS can provide this too, and it does not depend on multivariate normality (i.e., it is often used with ordinal types of outcomes in SEM). However, WLS is based on the variances and covariances among the vector elements (s) with the observed covariance matrix S. So as the original covariance matrix S gets larger, the vector s of its nonduplicated elements increases rapidly in length, and then the weight matrix, whose size is the square of the length of that vector, can become quite large and demanding in terms of calculation of model parameters. Therefore, WLS typically requires much larger sample sizes than ML and GLS estimation. For ML and GLS, model convergence problems certainly increase in samples of 100 or less (and fewer than 3 indicators per factor in factor models). Heywood cases in factor models are also very likely to occur under those sorts of conditions. Where one has cases and at least 3 indicators per factors, convergence becomes less of a problem. ML Estimation ML estimation is probably most often used to estimate various types of models with interval and categorical outcomes, but it does depend on relatively large sample sizes (we can use restricted maximum likelihood in small samples). ML estimation determines the optimal population values for parameters in a model that reduces the discrepancy between the observed and implied matrices, given the current parameter estimates (Hox, 2010). As noted, in ML estimation the discrepancy function is defined in terms of a likelihood function (or likelihood) that the model with a particular set of estimates could have produced the observed covariance matrix. In many cases (since functions may be exponential in nature), it is more convenient to work in terms of the natural logarithm of the likelihood function, called the log-likelihood. One advantage of the log-likelihood is that the terms are additive (instead of products). Because the likelihood of the data can vary from 0.0 to 1.0, rather than maximizing the likelihood function, ML uses a more conceptually convenient function that is inversely related to the likelihood function (described previously), such that the smaller this discrepancy function is, the greater the likelihood that the model with a particular set of parameter estimates could have produced the sample covariance matrix (S). The value will be 0 if the model fits the data perfectly (i.e., the natural log of 1 = 0).

4 Ronald H. Heck Model Estimation Example 4 Note also that the log-likelihood function is in the negative quadrant because of the logarithm of a number between 0 and 1 is negative (e.g., the natural log of 0.2 is -1.61). Estimating the parameters involves making a series of iterative guesses that determines an optimal set of weights for random parameters in the model that minimizes the natural logarithm multiplied by the likelihood of the data. Arriving at a set of final estimates is known as model convergence (i.e., where the estimates no longer change and the likelihood is therefore at its maximum value). It is important that the model actually reaches convergence, as the resulting parameter estimates will not be trustworthy if it has not. Sometimes increasing the number of iterations will result in a model that converges, but often, the failure of the model to converge on a unique solution is an indication that it needs to be changed and re-estimated. Keep in mind that even if a model converges, it does not mean the estimates are the right ones, given the sample data. In the same way, we would not conclude that because we fail to reject a model as consistent with the observed data, that it is the only model that would fit this criterion. For models with categorical outcomes, the likelihood function is a little different from models with continuous outcomes (owning to their different sampling distributions), but the principle of model estimation is the same. In this latter case, ML estimation often employs Fisher scoring, which uses a likelihood function that captures the probability of the observed data that would be obtained over a range of parameter values. For Poisson or binomial distributions this algorithm is simplified to the Newton-Rapson procedure (Azen & Walker, 2011). Both algorithms proceed through making an initial guess for all the model parameters and then adjusting the guess by a second set of model parameters that is adjusted to increase the likelihood function. This is repeated until the estimates no longer change and the iteration process has converged on the values of the final ML estimates (Azen & Walker, 2011). ML estimation produces a model deviance statistic (which is often referred to as -2LL or -2*log likelihood), which is an indicator of how well the model fits the data. We multiply the log likelihood by -2 so it can be expressed easily as a positive number. Models with lower deviance (i.e., a smaller discrepancy function) fit the data better than models with larger deviance. Once we have a solution that converges, we can assess how well the proposed model fits the data using various model fit indices. We can also look at the residuals (or residual matrix) that describes the difference between the model-implied covariance matrix and actual covariance matrix. Large residuals imply that some aspects of the proposed model do not fit the data well. An Example Using an Ordinal Outcome Let s say we wish to estimate a model where the outcome is ordinal and there are two predictors (score on a math test and gender). We will use GENLIN in IBM SPSS since we can easily print relevant information about the model estimation procedures. Model 1: Threshold Only Model (no predictors) We first estimate a baseline model with no predictors. Below (Table 1) we have information about the type of model (the probability model is multinomial, which is appropriate for ordinal outcomes) and the link function (because the outcome is not continuous) is the cumulative logit.

5 Ronald H. Heck Model Estimation Example 5 Table 1. Model Information Dependent Variable courses a Probability Distribution Multinomial Link Function Cumulative logit a. The procedure applies the cumulative link function to the dependent variable values in ascending order. Below (Table 2) we have the distribution of perceptions about taking additional math courses past Algebra I, which shows 45% of students perceived they would not take any further math classes beyond Algebra I (45%), about 38.5% perceived they would take one additional course, and another 15.3% perceived they would take two additional courses. We can also see that only about 1.3% perceived they would take 3-4 additional courses beyond Algebra I. Table 2. Categorical Variable Information N Dependent Variable courses Percent % % % % % Total % We first estimate a model with just the thresholds (i.e., the intercepts). We can see in Table 3 that at the first iteration, we have the initial log likelihood estimate. Because the likelihood, or probability, of the data can vary from 0.0 to 1.0, it is common to take the log of it. The log likelihood in the table below is interpreted as the negative natural log of the likelihood function. The log of 1 is 0 (which would indicate no discrepancy) so, for example, if the initial log likelihood is approximately , that would be an initial likelihood function that is quite small (like just above 0), which suggests that the current model does not fit the data very well. As we add variables the log likelihood is reduced (i.e., closer to 0), which amounts to reducing the discrepancy function (or maximizing the likelihood that the proposed model accounted for the observed data). Table 3. Iteration History Iteration Update Type Number of Step-halvings Log Likelihood a Parameter Threshold (Scale) [courses=0] [courses=1] [courses=2] [courses=3] 0 Initial Scoring a. The kernel of the log likelihood function is displayed.

6 Ronald H. Heck Model Estimation Example 6 Below we can examine various fit criteria. Table 4. Goodness of Fit Value df Value/df Deviance Scaled Deviance Pearson Chi Scaled Pearson Chi Log Likelihood a Akaike's Information Criterion (AIC) Finite Sample Corrected AIC (AICC) Bayesian Information Criterion (BIC) Consistent AIC (CAIC) a. The kernel of the log likelihood function is displayed and used in computing information criteria. Here are the thresholds between the various categories of the outcome variable. Table 5. Parameter Estimates Parameter B Std. Error Hypothesis Test Wald Chi- df Sig. [courses=0] Threshold [courses=1] [courses=2] [courses=3] (Scale) a. Fixed at the displayed value. Model 2: Adding Two Predictors (test score and gender) 1 a Of greater interest is what happens when we add predictors to the model. Our assumption is that adding gender and previous test performance will reduce the size of the log likelihood function. Table 6. Continuous Variable Information N Minimum Maximum Mean Std. Deviation Covariate test female

7 Ronald H. Heck Model Estimation Example 7 Below (Table 7) we can see the iteration history for estimating the model with two predictors. We have the initial estimate of the log likelihood (which is for the model with no predictors). Then the model begins to iterate (using maximum likelihood) to solve the equation in a way that maximizes the estimates of the effects of each predictors on the outcomes. You can see that it takes several trials or iterations to reach an optimal solution of the population estimates from the sample data. You can see at each iteration that the estimates of the test score effect and the female (or gender effect) change a little, until the convergence criteria are satisfied. Table 7. Iteration History Iteration Update Type Number of Stephalvings Log Likelihood b Parameter Threshold test1 female (Scale) [courses=0] [courses=1] [courses=2] [courses=3] 0 Initial Scoring Newton Newton Newton Newton Newton a Redundant parameters are not displayed. Their values are always zero in all iterations. Model: (Threshold), test1, female a. All convergence criteria are satisfied. b. The kernel of the log likelihood function is displayed. Next, in Table 8 we see a summary of the various fit indices for the model. Table 8. Goodness of Fit Value df Value/df Deviance Scaled Deviance Pearson Chi Scaled Pearson Chi Log Likelihood a Akaike's Information Criterion (AIC) Finite Sample Corrected AIC (AICC) Bayesian Information Criterion (BIC) Consistent AIC (CAIC) Model: (Threshold), test1, female

8 Ronald H. Heck Model Estimation Example 8 a. The kernel of the log likelihood function is displayed and used in computing information criteria. From Table 8 we can see that the log likelihood has been reduced considerably in this model. Some of this other model fitting information may be familiar to you (e.g., AIC and BIC). AIC and BIC are estimated from the log likelihood (with additional terms). For example, for the AIC index (where k is the number of parameters in the model): AIC = 2k + ( 2LL) = 2(6) + 18, = The likelihood ratio chi-square, which is calculated directly from the change in the log likelihoods between the initial (no predictors) model, and the second model (with 2 predictors) can be used to construct a test of whether Model 2 fits the data better than Model 1 (the baseline model). Table 9. Omnibus Test a Likelihood Ratio Chi- Df Sig a. Compares the fitted model against the thresholds-only model. We can see that the chi square is significant for 2 degrees of freedom (the two added predictors). Here is how we calculate the coefficient from the change in log likelihoods. Initial log likelihood Log likelihood for Model with 2 predictors Difference in log likelihood*2 ( *2) = (likelihood ratio chi-square for 2 df)

9 Ronald H. Heck Model Estimation Example 9 Finally, we can see the summary of the parameters in the model. We can see the earlier test score (I think it is an 8 th grade test) is a significant predictor of students perceptions of math course taking beyond Algebra I, while gender is not. Table 10. Parameter Estimates Parameter B Std. Error Hypothesis Test Wald Chi- df Sig. [courses=0] Threshold [courses=1] [courses=2] [courses=3] test Female (Scale) 1 a Model: (Threshold), test1, female a. Fixed at the displayed value. We could add further variables and see if we could reduce the log likelihood further, but we will stop here for now. This should provide some type of example regarding how model estimation proceeds and how the criteria used to estimate the model results in a set of parameters and model fit criteria that can be used to evaluate how well the proposed model compares against the actual sample covariance matrix. References Azen, R. & Walker, C. (2011). Categorical data analysis for the behavioral and social sciences. New York: Routledge. Hox, J. (2010). Multilevel analysis: Techniques and applications (2 nd Edition). New York: Routledge. Loehlin, J. C. (1992). Latent variable models: An introduction to factor, path, and structural analysis (2 nd Edition). Hillsdale, NJ: Lawrence Erlbaum. Marcoulides, G. & Hershberger, S. (1997). Multivariate statistical methods: A short course. Mahwah, NJ: Lawrence Erlbaum. West, S., Finch, J., & Curran, P. (1995). Structural equation models with nonnormal variables: Problems and remedies. In R. H. Hoyle (Ed.), Structural equation modeling. Concepts, issues, and applications (pp ). Thousand Oaks, CA: Sage.

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Ron Heck, Fall Week 3: Notes Building a Two-Level Model Ron Heck, Fall 2011 1 EDEP 768E: Seminar on Multilevel Modeling rev. 9/6/2011@11:27pm Week 3: Notes Building a Two-Level Model We will build a model to explain student math achievement using student-level

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

Ronald Heck Week 14 1 EDEP 768E: Seminar in Categorical Data Modeling (F2012) Nov. 17, 2012

Ronald Heck Week 14 1 EDEP 768E: Seminar in Categorical Data Modeling (F2012) Nov. 17, 2012 Ronald Heck Week 14 1 From Single Level to Multilevel Categorical Models This week we develop a two-level model to examine the event probability for an ordinal response variable with three categories (persist

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables /4/04 Structural Equation Modeling and Confirmatory Factor Analysis Advanced Statistics for Researchers Session 3 Dr. Chris Rakes Website: http://csrakes.yolasite.com Email: Rakes@umbc.edu Twitter: @RakesChris

More information

Generalized Models: Part 1

Generalized Models: Part 1 Generalized Models: Part 1 Topics: Introduction to generalized models Introduction to maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical outcomes

More information

An Introduction to Mplus and Path Analysis

An Introduction to Mplus and Path Analysis An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised )

Specifying Latent Curve and Other Growth Models Using Mplus. (Revised ) Ronald H. Heck 1 University of Hawai i at Mānoa Handout #20 Specifying Latent Curve and Other Growth Models Using Mplus (Revised 12-1-2014) The SEM approach offers a contrasting framework for use in analyzing

More information

Additional Notes: Investigating a Random Slope. When we have fixed level-1 predictors at level 2 we show them like this:

Additional Notes: Investigating a Random Slope. When we have fixed level-1 predictors at level 2 we show them like this: Ron Heck, Summer 01 Seminars 1 Multilevel Regression Models and Their Applications Seminar Additional Notes: Investigating a Random Slope We can begin with Model 3 and add a Random slope parameter. If

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Hypothesis Testing for Var-Cov Components

Hypothesis Testing for Var-Cov Components Hypothesis Testing for Var-Cov Components When the specification of coefficients as fixed, random or non-randomly varying is considered, a null hypothesis of the form is considered, where Additional output

More information

Introduction to Generalized Models

Introduction to Generalized Models Introduction to Generalized Models Today s topics: The big picture of generalized models Review of maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Class Notes. Examining Repeated Measures Data on Individuals

Class Notes. Examining Repeated Measures Data on Individuals Ronald Heck Week 12: Class Notes 1 Class Notes Examining Repeated Measures Data on Individuals Generalized linear mixed models (GLMM) also provide a means of incorporang longitudinal designs with categorical

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs) 36-309/749 Experimental Design for Behavioral and Social Sciences Dec 1, 2015 Lecture 11: Mixed Models (HLMs) Independent Errors Assumption An error is the deviation of an individual observed outcome (DV)

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Testing and Model Selection

Testing and Model Selection Testing and Model Selection This is another digression on general statistics: see PE App C.8.4. The EViews output for least squares, probit and logit includes some statistics relevant to testing hypotheses

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation NELS 88 Table 2.3 Adjusted odds ratios of eighth-grade students in 988 performing below basic levels of reading and mathematics in 988 and dropping out of school, 988 to 990, by basic demographics Variable

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Assessing the relation between language comprehension and performance in general chemistry. Appendices

Assessing the relation between language comprehension and performance in general chemistry. Appendices Assessing the relation between language comprehension and performance in general chemistry Daniel T. Pyburn a, Samuel Pazicni* a, Victor A. Benassi b, and Elizabeth E. Tappin c a Department of Chemistry,

More information

Phd Program in Transportation. Transport Demand Modeling. Session 8

Phd Program in Transportation. Transport Demand Modeling. Session 8 Phd Program in Transportation Transport Demand Modeling Luis Martínez (based on the Lessons of Anabela Ribeiro TDM2010) Session 8 Generalized Linear Models Phd in Transportation / Transport Demand Modelling

More information

Generalized Linear Models 1

Generalized Linear Models 1 Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter

More information

Psychology 454: Latent Variable Modeling How do you know if a model works?

Psychology 454: Latent Variable Modeling How do you know if a model works? Psychology 454: Latent Variable Modeling How do you know if a model works? William Revelle Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 18 Outline 1 Goodness

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

13.1 Categorical Data and the Multinomial Experiment

13.1 Categorical Data and the Multinomial Experiment Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Step 2: Select Analyze, Mixed Models, and Linear.

Step 2: Select Analyze, Mixed Models, and Linear. Example 1a. 20 employees were given a mood questionnaire on Monday, Wednesday and again on Friday. The data will be first be analyzed using a Covariance Pattern model. Step 1: Copy Example1.sav data file

More information

Generalized linear models

Generalized linear models Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -

More information

Models for Binary Outcomes

Models for Binary Outcomes Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

Logistic Regression. Continued Psy 524 Ainsworth

Logistic Regression. Continued Psy 524 Ainsworth Logistic Regression Continued Psy 524 Ainsworth Equations Regression Equation Y e = 1 + A+ B X + B X + B X 1 1 2 2 3 3 i A+ B X + B X + B X e 1 1 2 2 3 3 Equations The linear part of the logistic regression

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Frequency Distribution Cross-Tabulation

Frequency Distribution Cross-Tabulation Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape

More information

Testing Independence

Testing Independence Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study 1.4 0.0-6 7 8 9 10 11 12 13 14 15 16 17 18 19 age Model 1: A simple broken stick model with knot at 14 fit with

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

Statistics 3858 : Contingency Tables

Statistics 3858 : Contingency Tables Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 1: August 22, 2012

More information

More Accurately Analyze Complex Relationships

More Accurately Analyze Complex Relationships SPSS Advanced Statistics 17.0 Specifications More Accurately Analyze Complex Relationships Make your analysis more accurate and reach more dependable conclusions with statistics designed to fit the inherent

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Random Intercept Models

Random Intercept Models Random Intercept Models Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline A very simple case of a random intercept

More information

Longitudinal Modeling with Logistic Regression

Longitudinal Modeling with Logistic Regression Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

More information

Can you tell the relationship between students SAT scores and their college grades?

Can you tell the relationship between students SAT scores and their college grades? Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Model fit evaluation in multilevel structural equation models

Model fit evaluation in multilevel structural equation models Model fit evaluation in multilevel structural equation models Ehri Ryu Journal Name: Frontiers in Psychology ISSN: 1664-1078 Article type: Review Article Received on: 0 Sep 013 Accepted on: 1 Jan 014 Provisional

More information

Introduction to Within-Person Analysis and RM ANOVA

Introduction to Within-Person Analysis and RM ANOVA Introduction to Within-Person Analysis and RM ANOVA Today s Class: From between-person to within-person ANOVAs for longitudinal data Variance model comparisons using 2 LL CLP 944: Lecture 3 1 The Two Sides

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013 Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

11. Generalized Linear Models: An Introduction

11. Generalized Linear Models: An Introduction Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and

More information

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game. EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests

More information

CHAPTER 1: BINARY LOGIT MODEL

CHAPTER 1: BINARY LOGIT MODEL CHAPTER 1: BINARY LOGIT MODEL Prof. Alan Wan 1 / 44 Table of contents 1. Introduction 1.1 Dichotomous dependent variables 1.2 Problems with OLS 3.3.1 SAS codes and basic outputs 3.3.2 Wald test for individual

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

A Threshold-Free Approach to the Study of the Structure of Binary Data

A Threshold-Free Approach to the Study of the Structure of Binary Data International Journal of Statistics and Probability; Vol. 2, No. 2; 2013 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education A Threshold-Free Approach to the Study of

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really

More information

General structural model Part 2: Categorical variables and beyond. Psychology 588: Covariance structure and factor models

General structural model Part 2: Categorical variables and beyond. Psychology 588: Covariance structure and factor models General structural model Part 2: Categorical variables and beyond Psychology 588: Covariance structure and factor models Categorical variables 2 Conventional (linear) SEM assumes continuous observed variables

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Eplained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION Ernest S. Shtatland, Ken Kleinman, Emily M. Cain Harvard Medical School, Harvard Pilgrim Health Care, Boston, MA ABSTRACT In logistic regression,

More information

1. BINARY LOGISTIC REGRESSION

1. BINARY LOGISTIC REGRESSION 1. BINARY LOGISTIC REGRESSION The Model We are modelling two-valued variable Y. Model s scheme Variable Y is the dependent variable, X, Z, W are independent variables (regressors). Typically Y values are

More information

A Re-Introduction to General Linear Models

A Re-Introduction to General Linear Models A Re-Introduction to General Linear Models Today s Class: Big picture overview Why we are using restricted maximum likelihood within MIXED instead of least squares within GLM Linear model interpretation

More information

Advanced Quantitative Data Analysis

Advanced Quantitative Data Analysis Chapter 24 Advanced Quantitative Data Analysis Daniel Muijs Doing Regression Analysis in SPSS When we want to do regression analysis in SPSS, we have to go through the following steps: 1 As usual, we choose

More information

Using the same data as before, here is part of the output we get in Stata when we do a logistic regression of Grade on Gpa, Tuce and Psi.

Using the same data as before, here is part of the output we get in Stata when we do a logistic regression of Grade on Gpa, Tuce and Psi. Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 14, 2018 This handout steals heavily

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46 BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

MLMED. User Guide. Nicholas J. Rockwood The Ohio State University Beta Version May, 2017

MLMED. User Guide. Nicholas J. Rockwood The Ohio State University Beta Version May, 2017 MLMED User Guide Nicholas J. Rockwood The Ohio State University rockwood.19@osu.edu Beta Version May, 2017 MLmed is a computational macro for SPSS that simplifies the fitting of multilevel mediation and

More information