Problems with Stepwise Procedures in. Discriminant Analysis. James M. Graham. Texas A&M University
|
|
- Lenard Tucker
- 6 years ago
- Views:
Transcription
1 Running Head: PROBLEMS WITH STEPWISE IN DA Problems with Stepwise Procedures in Discriminant Analysis James M. Graham Texas A&M University Graham, J. M. (2001, January). Problems with stepwise procedures in discriminant analysis. Paper presented at the annual meeting of the Southwest Educational Research Association, New Orleans, LA.
2 Problems with Stepwise 2 Abstract Stepwise procedures are a common analytic technique used in discriminant analysis to reduce the number of variables. Despite the frequency of their use, these procedures are rife with errors and best replaced with more accurate procedures such as all-possible-subsets analyses. The rationale behind stepwise procedures is discussed. Problems with stepwise procedures for both descriptive and predictive discriminant analysis are outlined, and alternative solutions to the stepwise problem are explored. Heuristic examples are used throughout to make the discussion more clear.
3 Problems with Stepwise 3 Problems with Stepwise Procedures in Discriminant Analysis Stepwise procedures are a common analytic procedure used in psychological and educational research to reduce the number of variables and to order variables in a given analysis (Thompson, 1995). As noted by Huberty (1994), It is quite common to find the use of stepwise analyses reported in empirically based journal articles (p. 261). However, the use of stepwise procedures entails a number of problems which can lead to misleading and inaccurate results. As a result, it has been noted by Cliff (1987) that, a large proportion of the published results using this method probably present conclusions that are not supported by the data (pp ). The practice of employing stepwise procedures is particularly confounding, as software is available to conduct all-possible-subsets analyses, a better alternative to stepwise procedures (Lautenschlager, 1991; McCabe, 1975; Morris & Meshbane, 1994). Stepwise procedures are most commonly employed in multiple regression and discriminant analysis. As a detailed description of the problems with stepwise procedures as they apply to multiple regression already exists (Thompson, 1995), the present paper will focus on stepwise procedures as applied to descriptive and predictive discriminant analysis (DDA and PDA, respectively). The problems occurring when stepwise procedures
4 Problems with Stepwise 4 are applied to both DDA and PDA are examined, and heuristic data sets are used to provide examples. Throughout the discussion stepwise results are compared to the results of all-possiblesubsets analyses, which are presented as a viable alternative to the stepwise problem. Stepwise Procedures Variable Selection At times, researchers with a large number of dependent variables are not sure if all variables in their analysis are necessary or useful (Klecka, 1980). Stepwise procedures are often used to reduce the number of dependent variables to the best set of variables to describe group differences (DDA) or to predict group membership (PDA). The process of selecting a smaller set of variables is oftentimes necessary, for a variety of reasons. The concept of parsimony is especially relevant here, as a smaller number of variables may be easier to interpret, and provide a more simple solution than a larger number of variables. Researchers may also have a large number of dependent measures and want to remove variables which are redundant or are measuring the same aspect of group differences. Finally, variable reduction may be necessary due to the cost of administering a large number of instruments in a research or clinical setting. In the case of an ongoing research project, it may be useful for researchers to
5 Problems with Stepwise 5 determine which measures are unnecessary for future studies (Huberty 1989). Variable Ordering Results from stepwise procedures are also often used to order variables in terms of their importance for each of the variables in describing group differences in DDA, or predicting group membership in PDA (Huberty, 1989). For instance, the variable selected in the first step of a stepwise procedure might be labeled the single most important variable (the one with the best descriptive or predictive ability) in the study. The variable selected in the second step might be labeled the second most important, and so on. How Stepwise Procedures Work In order to better understand the problems inherit in stepwise procedures, it is important to have an understanding of how stepwise procedures work. Stepwise procedures in discriminant analysis seek to select or order variables by their contribution to separation between groups. The default method for many statistical software programs is to look at the Wilk s Lambda for each variable. Consequently, this is the most frequently used, though there are other methods (Malhalanobis distance, unexplained variance, Rao s V, etc.). The astute reader will recognize that the Wilk s Lambda is of no interest
6 Problems with Stepwise 6 when using PDA. This will be discussed in greater length later; the following description applies only to DDA. Forward stepwise procedures. In the first step of a forward stepwise analysis, each variable is entered into a separate analysis, and the variable with the best univariate discrimination (lowest Wilk s Lambda in most cases) is selected. Next, each remaining variable is paired with the first and entered into a separate analysis. The variable which, when paired with the first, provides the best multivariate discrimination (again, most often the lowest Wilk s Lambda) is selected next. The third step matches each remaining variable with the first two, and so on. This process is continued until either all variables are selected or the decrease in Wilk s Lambda is insufficient to warrant further variable selection, as determined by the F-ratio. In addition, stepwise procedures may also remove variables already included if they are found to become irrelevant (better explained by other variables) throughout the analysis. It is important to note that while stepwise procedures can remove variables, they often do not. Backwards stepwise procedures. Stepwise procedures can also be used in a reverse manner, to a similar effect. Initially, all variables are included in a discriminant analysis. Next separate analyses are run, each removing a single variable. The variable which contributes the least to
7 Problems with Stepwise 7 group differences (the variable which, when removed, results in the smallest increase in Wilk s Lambda) is removed. This process continues until either all variables are removed or the change in the Wilk s Lambda is too great. Problems With Stepwise Procedures in DDA Variable Selection Stepwise procedures in DDA often seek to answer the question, Which subset of variables will best describe group differences? However, stepwise procedures do not necessarily select the best set of variables for describing group differences (Huberty & Barton, 1989). By entering variables one at a time, stepwise procedures do not include all of the information supplied jointly by two or more variables not already included in the analysis (Huberty, 1989). In order to determine the subset of variables which best describe the differences between groups it is necessary to conduct an all possible subsets analysis, which compares all possible groupings of the variables, rather than using stepwise procedures. The following example uses data included in version 9.0 of SPSS (SPSS, 1998), entitled Employee data.sav. In this example, five variables (educational level, employment category, current salary, beginning salary, and previous experience) are used to describe the differences between minorities and nonminorities. Table 1 summarizes the SPSS output for this
8 Problems with Stepwise 8 analysis, using stepwise discriminant analysis. As shown in Table 1, the use of stepwise discriminant analysis has reduced the initial set of five dependent variables to two: variables 3 and 5. During the first step of the analysis, stepwise procedures selected variable 3 as the single variable which best describes the differences between minorities and non-minorities. In the next step, variable 5 was selected as the variable which best describes the differences between groups, given the presence of variable INSERT TABLE 1 ABOUT HERE Next, consider the all possible subsets analysis conducted using MCCABEPC (Lautenschlager, 1991) on the same data, presented in Table 2. These results show, once again, that variable 3 provides the best univariate description of group differences. The best group of two variables, however, is not variables 3 and 5 as shown in the stepwise analysis. While the stepwise analysis has identified the best set of two variables given the presence of variable 3, the all possible subsets analysis has identified variables 4 and 5 as the best group of two variables. In fact, of the top ten subsets of two variables, less than half even contain variable 3!
9 Problems with Stepwise INSERT TABLE 2 ABOUT HERE Stepwise procedures do not always select the best subset of variables of a given size. In order to determine this, it is important to run an all possible subsets analysis. In the above example, variable 3 was initially selected by the stepwise procedure because it provides the best univariate description of group differences. Because stepwise procedures select all variables after the first based on the presence of all previous variables, they fail to consider subsets which do not include the previous variables. In the above example, the best subset of two variables was never considered by the stepwise procedure, because it did not include the initial variable selected by the stepwise procedure (variable 3)! Variable Ordering Results from stepwise procedures are often used to place variables in order of importance. For instance, in the above example, variable 3 might be called the most important variable in describing group differences and variable 5 the second most important. However, it is important to realize that, due to shared variance, a variable which describes a large amount of shared differences may entered late, or not at all (Huberty, 1989). Again, in the above example, variable 4, which provides
10 Problems with Stepwise 10 the second best univariate description of group differences, is never entered into the stepwise DDA. It is a mistake to use results of a stepwise procedure to order variables according to their importance. Capitalization on Sampling Error While all statistical analyses are affected by sampling error to some extent, stepwise procedures are especially suspect to sampling error. This is due to the fact that stepwise procedures select the variable with the lowest Wilk s Lambda to be entered, no matter how small the difference. If an error is made in selecting a variable (due to sampling error), all previous steps will be incorrect. As a result, stepwise procedures often produce results which are as unlikely to replicate in subsequent samples (Thompson, 1995). All possible subsets analyses are less suspect to sampling error than stepwise procedures, as error within the sample is not compounded as it is in stepwise procedures. In addition, while stepwise analyses only provide a single set of variables, all possible subsets analyses allows the researcher to examine a number of equally plausible subsets, providing them an opportunity to select a set of variables based on theory. For example, consider the scree plot of the best subsets of a given size shown in Figure 1, taken from the all possible subsets analysis above. From this, we can see that there is little
11 Problems with Stepwise 11 difference between the best subsets of 5, 4, 3, and 2 variables. From this, the researcher might discern that a two-variable solution provides the best variable size. Figure 2 shows the Wilk s Lambda values for the best ten variable pairs. This figure shows that the first three pairs of variables appear equally plausible in comparison with the others. To invoke the work of William of Okham, it would make sense for the researcher to choose the pair of variables amongst the first three which is the easiest to explain INSERT FIGURES 1 AND 2 ABOUT HERE McCabe (1975) provides an excellent demonstration of how stepwise methods capitalize on sampling error. In McCabe s example, a population of 400 individuals was used to draw 100 samples. Table 3 shows, for a given subset size, the number (out of 100) of samples with the selected subset in the top grouping of best subsets for the population. For example, for the best subset of two variables, 30 out of 100 all-possiblesubsets analyses actually selected the best subset of two predictors, while only 5 out of 100 stepwise analyses did so. This remarkable difference, which is constant across all subset sizes, demonstrates the extent to which stepwise DDA procedures are affected by sampling error.
12 Problems with Stepwise INSERT TABLE 3 ABOUT HERE Problems with Stepwise in PDA Selection Criteria The problems with stepwise procedures inherent in DDA also exist when stepwise procedures are applied to PDA. However, these problems are overshadowed by the fact that the stepwise procedures on common statistical packages are designed, not for PDA, but for DDA (Huberty & Wisenbaker, 1992). While DDA describes group differences by examining the discriminant function coefficients, PDA is concerned only with predicting group membership (Thompson, 1998). As a result, PDA does not utilize tests of statistical significance such as Wilk s Lambda. Instead, PDA is only concerned with hit rates, the number of cases correctly classified (Huberty, 1994). This distinction is important, because while Wilk s Lambda cannot be adversely affected (made higher) by adding variables to a DDA, hit rates can be made worse (Huberty, 1984; Thompson, 1998). In DDA a completely worthless variable (one which does not contribute to group separation or description) would be given a weight of 0 and its impact essentially removed from the analysis. In PDA, however, the same worthless variable would
13 Problems with Stepwise 13 contribute noise to the prediction analysis, making group prediction less accurate. Perhaps the most commonly used statistical software package, SPSS (1998), includes PDA as an option under DDA. As a result, it is possible to run a stepwise PDA analysis and receive classification results at each step. The variables selected by this analysis, however, are still selected in terms of group separation (e.g., Wilk s Lambda), not by group classification (Huberty, 1984)! Using DDA stepwise selection procedures to receive results for PDA is both inaccurate and inappropriate. To demonstrate the gross problems with the use of stepwise procedures in PDA, consider the following example. The data for the following example was taken from Holzinger and Swineford (1939, pp ), a classic data set often used for heuristic examples. The analyses were conducted using all 301 cases, using scores on the first 14 ability scores to predict track (June or February promotion). Table 4 shows the results of the stepwise procedure performed using SPSS with equal priors assumed. As shown, the stepwise analysis selected 5 variables of the original 14, for a 68.4% hit rate. It is also important to note that, while the discriminant function coefficients are reported by SPSS and reproduced in Table 4, they are not relevant to PDA. This
14 Problems with Stepwise 14 information is provided to demonstrate that stepwise procedures select variables, not because of their contribution to classification, but on the basis of their contribution of group description, which is strictly a DDA concept INSERT TABLE 4 ABOUT HERE Table 5 shows the results of the all-possible-subsets analysis of the same data, performed using a program developed by Morris and Meshbane (1994). As shown, the actual best subset of five variables, with a hit rate of 69.1%, includes only three of the variables selected by the stepwise analysis. Furthermore, it can be seen that 2 additional subsets of five variables perform equally as well as the subset selected by the stepwise procedure. By looking at the best ten subsets of any size, presented in Table 6, it can be seen that 6 subsets outperform the stepwise subset, and that at least 4 other subsets do just as well! In this example, the stepwise analysis has not selected the best subset of five variables, primarily because the selection criteria used is irrelevant to PDA INSERT TABLES 5 AND 6 ABOUT HERE
15 Problems with Stepwise 15 Ignores Priors In PDA, it is often of interest to the researcher to discover the rates of group membership in the general population, and use these numbers as the basis for their analysis. Doing so can greatly increase the predictive ability of the analysis. However, while using non-equal priors does change hit rates, it has no effect on how stepwise procedures select variables for analysis. Table 7 shows the results of a stepwise analysis, using the same data as above but with the priors set by the rates found in the sample. It can be seen here that the stepwise analysis has selected the same subset of five variables as when equal priors were assumed, with a hit rate of 75.4%. These variables have been selected, once again, by the same descriptive criteria used when equal priors were assumed. A comparison of the discriminate function coefficients (again, of no interest in PDA) shown in Tables 4 and 7 reveals that they are identical INSERT TABLE 7 ABOUT HERE An all-possible-subsets analysis with the same unequal priors reveals that, by changing the rates of the priors, a completely different set of variables then found while assuming equal priors provides the best predictive ability. As seen in
16 Problems with Stepwise 16 Tables 8 and 9, a hit rate of 76.7% is obtained from the actual best subset of five variables. Seven other subsets of five variables and at least 10 subsets of any size out-perform the stepwise subset INSERT TABLES 8 AND 9 ABOUT HERE While stepwise procedures can often provide inaccurate and misleading results in DDA, their use in PDA is always incorrect. The distinction between DDA and PDA, which is of great importance, is ignored when stepwise procedures as they are found on common software packages are performed. In addition, the use of such procedures ignores prior group membership rates when selecting variables. Even if one were to develop a stepwise procedure which used hit rates as selection criteria, the analysis would still be riddled with the problems inherit in all stepwise procedures: capitalization on sampling error and not always selecting the best subset of variables. Summary As demonstrated, stepwise procedures as they are commonly applied in DDA not only capitalize on sampling error, but also do not always fulfill their primary function: to select the best subset of variables for describing group differences.
17 Problems with Stepwise 17 The use of stepwise procedures in PDA is never appropriate, as the selection criteria applies only to DDA, not PDA. Stepwise procedures as they are commonly applied in DA are rife with problems. With the availability of all-possiblesubsets software, there is no excuse not to use the more accurate method of selecting variables. In addition to providing more accurate results, all-possible-subsets analyses provide a number of equally plausible solutions to the analysis, allowing the researcher to invoke conscious thought, theory, and parsimony when selecting the best subset of variables.
18 Problems with Stepwise 18 References Cliff, N. (1987). Analyzing multivariate data. San Diego, CA: Harcourt Brace Jovanovich. Holzinger, K. L., & Swineford, F. (1939). A study in factor analysis: The stability of the bi-factor solution (No. 48). Chicago: University of Chicago. Huberty, C. J. (1989). Problems with stepwise methods better alternatives. In B. Thompson (Ed.), Advances in Social Science Methodology (Vol. 1, pp ). Greenwich, CT: JAI Press. Huberty, C. J. (1994). Applied discriminant analysis. New York: Wiley. Huberty, C. J. (1984). Issues in the use and interpretation of discriminant analysis. Psychological Bulletin, 95, Huberty, C. J., & Barton, R. M. (1989). An introduction to discriminant analysis. Measurement and Evaluation in counseling and Development, 22, Huberty, C. J., & Wisenbaker, J. M. (1992). Discriminant Analysis: Potential improvements in typical practice. In B. Thompson (Ed.), Advances in Social Science Methodology (Vol. 2, pp ). Greenwich, CT: JAI Press. Klecka, W. R. (1980). Discriminant analysis. Thousand Oaks, CA: Sage Publications.
19 Problems with Stepwise 19 Lautenschlager, G. J. (1991). MCCABEPC: Computing all possible subsets for discriminant analyses. Based on G. P. McCabe Mainframe FORTRAN program. McCabe, G. P. (1975). Computations for variable selection in discriminant analysis. Technometrics, 17, Morris, J. D. & Meshbane, A. (1994). CLASSVSP.EXE [Computer software]. In C. J. Huberty, C. J.(Author) Applied discriminant analysis. New York: Wiley. SPSS (Version 9.0) [Computer software]. (1998). Chicago: SPSS, Inc. Thompson, B. (1995). Stepwise regression and stepwise discriminant analysis need not apply here: A guidelines editorial. Educational and Psychological Measurement, 55, Thompson, B. (1998, April). Five methodology errors in educational research: The pantheon of statistical significance and other faux pas. Invited address presented at the annual meeting of the American Educational Research Association, San Diego, CA.
20 Problems with Stepwise 20 Table 1 Stepwise DDA Results Variables Wilk s Step in Analysis Lambda , At each step, the variable that minimizes the overall Wilk s Lambda is entered. a. F level, tolerance, or VIN insufficient for further computation.
21 Problems with Stepwise 21 Table 2 All Possible Subsets DDA Results Subset Size Wilk s Variables Lambda , , , , , , , , , , Note: Best 10 subsets of size 3, 4, and 5 computed but excluded from this table.
22 Problems with Stepwise 22 Table 3 Simulated Comparison of Stepwise and All Possible Subsets in DDA. Subset Number of samples with selected subset in the top Size Procedure APSS Stepwise APSS Stepwise APSS Stepwise APSS Stepwise APSS Stepwise APSS Stepwise APSS Stepwise Note: Table adapted from McCabe, 1975.
23 Problems with Stepwise 23 Table 4 Stepwise PDA Results (Equal Priors Assumed). Standardized Discriminant Function Coefficients Variables Function Entered 1 T T5.447 T8.557 T T Classification Results Predicted Group Membership 1 2 Original Note: 68.4% of original cases correctly classified.
24 Problems with Stepwise 24 Table 5 All Possible Subsets PDA Results (Equal Priors Assumed): Best Subsets of 5 Variables. Hits #1 #2 Total Variables % Hits
25 Problems with Stepwise 25 Table 6 All Possible Subsets PDA Results (Equal Priors Assumed): Best 10 Subsets. Hits #1 #2 Total Variables % Hits
26 Problems with Stepwise 26 Table 7 Stepwise PDA Results (Priors From Sample). Standardized Discriminant Function Coefficients Variables Function Entered 1 T T5.447 T8.557 T T Classification Results Predicted Group Membership 1 2 Original Note: 75.4% of original cases correctly classified.
27 Problems with Stepwise 27 Table 8 All Possible Subsets PDA Results (Equal Priors From Sample): Best Subsets of 5 Variables. Hits #1 #2 Total Variables % Hits
28 Problems with Stepwise 28 Table 9 All Possible Subsets PDA Results (Priors From Sample): Best 10 Subsets. Hits #1 #2 Total Variables % Hits
29 Problems with Stepwise 29 Best Subset of Size Wilk's Lambda Figure 1. Scree plot of best subsets of a given size in DDA.
30 Problems with Stepwise 30 Subset ,5 3, 5 2, 5 1, 3 2, 3 3, 4 1, 5 1, 4 2, 4 1, WIlk's Lambda Figure 2. Scree plot of best ten subsets of 2 variables in DDA.
Y (Nominal/Categorical) 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV
1 Neuendorf Discriminant Analysis The Model X1 X2 X3 X4 DF2 DF3 DF1 Y (Nominal/Categorical) Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV 2. Linearity--in
More informationLinear model selection and regularization
Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It
More informationWarner, R. M. (2008). Applied Statistics: From bivariate through multivariate techniques. Thousand Oaks: Sage.
Errata for Warner, R. M. (2008). Applied Statistics: From bivariate through multivariate techniques. Thousand Oaks: Sage. Most recent update: March 4, 2009 Please send information about any errors in the
More informationThe Precise Effect of Multicollinearity on Classification Prediction
Multicollinearity and Classification Prediction The Precise Effect of Multicollinearity on Classification Prediction Mary G. Lieberman John D. Morris Florida Atlantic University The results of Morris and
More informationFeature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size
Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Berkman Sahiner, a) Heang-Ping Chan, Nicholas Petrick, Robert F. Wagner, b) and Lubomir Hadjiiski
More informationModelling Academic Risks of Students in a Polytechnic System With the Use of Discriminant Analysis
Progress in Applied Mathematics Vol. 6, No., 03, pp. [59-69] DOI: 0.3968/j.pam.955803060.738 ISSN 95-5X [Print] ISSN 95-58 [Online] www.cscanada.net www.cscanada.org Modelling Academic Risks of Students
More informationPrincipal component analysis
Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance
More information6.3 How the Associational Criterion Fails
6.3. HOW THE ASSOCIATIONAL CRITERION FAILS 271 is randomized. We recall that this probability can be calculated from a causal model M either directly, by simulating the intervention do( = x), or (if P
More informationInvestigating Models with Two or Three Categories
Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might
More informationChapter 3 Multiple Regression Complete Example
Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be
More informationMultivariate Correlational Analysis: An Introduction
Assignment. Multivariate Correlational Analysis: An Introduction Mertler & Vanetta, Chapter 7 Kachigan, Chapter 4, pps 180-193 Terms you should know. Multiple Regression Linear Equations Least Squares
More informationAn Equivalency Test for Model Fit. Craig S. Wells. University of Massachusetts Amherst. James. A. Wollack. Ronald C. Serlin
Equivalency Test for Model Fit 1 Running head: EQUIVALENCY TEST FOR MODEL FIT An Equivalency Test for Model Fit Craig S. Wells University of Massachusetts Amherst James. A. Wollack Ronald C. Serlin University
More informationMultiple Linear Regression II. Lecture 8. Overview. Readings
Multiple Linear Regression II Lecture 8 Image source:https://commons.wikimedia.org/wiki/file:autobunnskr%c3%a4iz-ro-a201.jpg Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution
More informationMultiple Linear Regression II. Lecture 8. Overview. Readings. Summary of MLR I. Summary of MLR I. Summary of MLR I
Multiple Linear Regression II Lecture 8 Image source:https://commons.wikimedia.org/wiki/file:autobunnskr%c3%a4iz-ro-a201.jpg Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution
More information7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure).
1 Neuendorf Logistic Regression The Model: Y Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and dichotomous (binomial; 2-value), categorical/nominal data for a single DV... bear in mind that
More information2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2
PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 When and why do we use logistic regression? Binary Multinomial Theory behind logistic regression Assessing the model Assessing predictors
More informationDon t be Fancy. Impute Your Dependent Variables!
Don t be Fancy. Impute Your Dependent Variables! Kyle M. Lang, Todd D. Little Institute for Measurement, Methodology, Analysis & Policy Texas Tech University Lubbock, TX May 24, 2016 Presented at the 6th
More informationLECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS
LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal
More informationChapter 14 Student Lecture Notes 14-1
Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this
More informationSampling Distributions
Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Remember sampling? Sampling Part 1 of definition Selecting a subset of the population to create a sample Generally random sampling
More informationTexas A & M University
Capraro & Capraro Commonality Analysis: Understanding Variance Contributions to Overall Canonical Correlation Effects of Attitude Toward Mathematics on Geometry Achievement Robert M. Capraro Mary Margaret
More informationMultiple Linear Regression Viewpoints. Table of Contents
Multiple Linear Regression Viewpoints A Publication sponsored by the American Educational Research Association s Special Interest Group on Multiple Linear Regression: The General Linear Model MLRV Volume
More informationThis is the publisher s copyrighted version of this article.
Archived at the Flinders Academic Commons http://dspace.flinders.edu.au/dspace/ This is the publisher s copyrighted version of this article. The original can be found at: http://iej.cjb.net International
More informationChapter 5. Introduction to Path Analysis. Overview. Correlation and causation. Specification of path models. Types of path models
Chapter 5 Introduction to Path Analysis Put simply, the basic dilemma in all sciences is that of how much to oversimplify reality. Overview H. M. Blalock Correlation and causation Specification of path
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationClassification: Linear Discriminant Analysis
Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based
More informationWorkshop on Statistical Applications in Meta-Analysis
Workshop on Statistical Applications in Meta-Analysis Robert M. Bernard & Phil C. Abrami Centre for the Study of Learning and Performance and CanKnow Concordia University May 16, 2007 Two Main Purposes
More information, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1
Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression
More informationPACKAGE LMest FOR LATENT MARKOV ANALYSIS
PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,
More informationB. Weaver (24-Mar-2005) Multiple Regression Chapter 5: Multiple Regression Y ) (5.1) Deviation score = (Y i
B. Weaver (24-Mar-2005) Multiple Regression... 1 Chapter 5: Multiple Regression 5.1 Partial and semi-partial correlation Before starting on multiple regression per se, we need to consider the concepts
More informationSTRUCTURAL EQUATION MODELING. Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013
STRUCTURAL EQUATION MODELING Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013 Introduction: Path analysis Path Analysis is used to estimate a system of equations in which all of the
More informationMultiple Linear Regression II. Lecture 8. Overview. Readings
Multiple Linear Regression II Lecture 8 Image source:http://commons.wikimedia.org/wiki/file:vidrarias_de_laboratorio.jpg Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution
More informationMultiple Linear Regression II. Lecture 8. Overview. Readings. Summary of MLR I. Summary of MLR I. Summary of MLR I
Multiple Linear Regression II Lecture 8 Image source:http://commons.wikimedia.org/wiki/file:vidrarias_de_laboratorio.jpg Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution
More informationPropensity Score Matching
Methods James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 Methods 1 Introduction 2 3 4 Introduction Why Match? 5 Definition Methods and In
More informationInteractions and Centering in Regression: MRC09 Salaries for graduate faculty in psychology
Psychology 308c Dale Berger Interactions and Centering in Regression: MRC09 Salaries for graduate faculty in psychology This example illustrates modeling an interaction with centering and transformations.
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationOhio s State Tests ITEM RELEASE SPRING 2016 ALGEBRA I
Ohio s State Tests ITEM RELEASE SPRING 2016 ALGEBRA I Table of Contents Questions 1 4: Content Summary and Answer Key... ii Question 1: Question and Scoring Guidelines... 1 Question 1: Sample Response...
More informationNeuendorf MANOVA /MANCOVA. Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y4. Like ANOVA/ANCOVA:
1 Neuendorf MANOVA /MANCOVA Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y1 Y2 Y3 Y4 Like ANOVA/ANCOVA: 1. Assumes equal variance (equal covariance matrices) across cells (groups defined by
More informationFigure 1: Conventional labelling of axes for diagram of frequency distribution. Frequency of occurrence. Values of the variable
1 Social Studies 201 September 20-22, 2004 Histograms See text, section 4.8, pp. 145-159. Introduction From a frequency or percentage distribution table, a statistical analyst can develop a graphical presentation
More informationDescribing distributions with numbers
Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central
More informationNeuendorf MANOVA /MANCOVA. Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y4. Like ANOVA/ANCOVA:
1 Neuendorf MANOVA /MANCOVA Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y1 Y2 Y3 Y4 Like ANOVA/ANCOVA: 1. Assumes equal variance (equal covariance matrices) across cells (groups defined by
More informationNeuendorf MANOVA /MANCOVA. Model: MAIN EFFECTS: X1 (Factor A) X2 (Factor B) INTERACTIONS : X1 x X2 (A x B Interaction) Y4. Like ANOVA/ANCOVA:
1 Neuendorf MANOVA /MANCOVA Model: MAIN EFFECTS: X1 (Factor A) X2 (Factor B) Y1 Y2 INTERACTIONS : Y3 X1 x X2 (A x B Interaction) Y4 Like ANOVA/ANCOVA: 1. Assumes equal variance (equal covariance matrices)
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationLogistic Regression: Regression with a Binary Dependent Variable
Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationApplication of Indirect Race/ Ethnicity Data in Quality Metric Analyses
Background The fifteen wholly-owned health plans under WellPoint, Inc. (WellPoint) historically did not collect data in regard to the race/ethnicity of it members. In order to overcome this lack of data
More informationTypes of Statistical Tests DR. MIKE MARRAPODI
Types of Statistical Tests DR. MIKE MARRAPODI Tests t tests ANOVA Correlation Regression Multivariate Techniques Non-parametric t tests One sample t test Independent t test Paired sample t test One sample
More informationCOMBINING ROBUST VARIANCE ESTIMATION WITH MODELS FOR DEPENDENT EFFECT SIZES
COMBINING ROBUST VARIANCE ESTIMATION WITH MODELS FOR DEPENDENT EFFECT SIZES James E. Pustejovsky, UT Austin j o i n t wo r k w i t h : Beth Tipton, Nor t h we s t e r n Unive r s i t y Ariel Aloe, Unive
More information10. Alternative case influence statistics
10. Alternative case influence statistics a. Alternative to D i : dffits i (and others) b. Alternative to studres i : externally-studentized residual c. Suggestion: use whatever is convenient with the
More informationANOVA, ANCOVA and MANOVA as sem
ANOVA, ANCOVA and MANOVA as sem Robin Beaumont 2017 Hoyle Chapter 24 Handbook of Structural Equation Modeling (2015 paperback), Examples converted to R and Onyx SEM diagrams. This workbook duplicates some
More informationOhio s State Tests ITEM RELEASE SPRING 2016 INTEGRATED MATHEMATICS II
Ohio s State Tests ITEM RELEASE SPRING 2016 INTEGRATED MATHEMATICS II Table of Contents Questions 1 4: Content Summary and Answer Key... ii Question 1: Question and Scoring Guidelines... 1 Question 1:
More informationDISCRIMINANT ANALYSIS IN THE STUDY OF ROMANIAN REGIONAL ECONOMIC DEVELOPMENT
ANALELE ŞTIINŢIFICE ALE UNIVERSITĂŢII ALEXANDRU IOAN CUZA DIN IAŞI Tomul LIV Ştiinţe Economice 007 DISCRIMINANT ANALYSIS IN THE STUDY OF ROMANIAN REGIONAL ECONOMIC DEVELOPMENT ELISABETA JABA * DĂNUŢ VASILE
More informationwe first add 7 and then either divide by x - 7 = 1 Adding 7 to both sides 3 x = x = x = 3 # 8 1 # x = 3 # 4 # 2 x = 6 1 =?
. Using the Principles Together Applying Both Principles a Combining Like Terms a Clearing Fractions and Decimals a Contradictions and Identities EXAMPLE Solve: An important strategy for solving new problems
More informationDescribing distributions with numbers
Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central
More informationConservative variance estimation for sampling designs with zero pairwise inclusion probabilities
Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Peter M. Aronow and Cyrus Samii Forthcoming at Survey Methodology Abstract We consider conservative variance
More informationChapter 13 Section D. F versus Q: Different Approaches to Controlling Type I Errors with Multiple Comparisons
Explaining Psychological Statistics (2 nd Ed.) by Barry H. Cohen Chapter 13 Section D F versus Q: Different Approaches to Controlling Type I Errors with Multiple Comparisons In section B of this chapter,
More informationMultiple Regression of Students Performance Using forward Selection Procedure, Backward Elimination and Stepwise Procedure
ISSN 2278 0211 (Online) Multiple Regression of Students Performance Using forward Selection Procedure, Backward Elimination and Stepwise Procedure Oti, Eric Uchenna Lecturer, Department of Statistics,
More informationThe Application and Promise of Hierarchical Linear Modeling (HLM) in Studying First-Year Student Programs
The Application and Promise of Hierarchical Linear Modeling (HLM) in Studying First-Year Student Programs Chad S. Briggs, Kathie Lorentz & Eric Davis Education & Outreach University Housing Southern Illinois
More informationreview session gov 2000 gov 2000 () review session 1 / 38
review session gov 2000 gov 2000 () review session 1 / 38 Overview Random Variables and Probability Univariate Statistics Bivariate Statistics Multivariate Statistics Causal Inference gov 2000 () review
More informationDetermination of Density 1
Introduction Determination of Density 1 Authors: B. D. Lamp, D. L. McCurdy, V. M. Pultz and J. M. McCormick* Last Update: February 1, 2013 Not so long ago a statistical data analysis of any data set larger
More information2012 Assessment Report. Mathematics with Calculus Level 3 Statistics and Modelling Level 3
National Certificate of Educational Achievement 2012 Assessment Report Mathematics with Calculus Level 3 Statistics and Modelling Level 3 90635 Differentiate functions and use derivatives to solve problems
More informationChapter 4: Regression Models
Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,
More informationThree Factor Completely Randomized Design with One Continuous Factor: Using SPSS GLM UNIVARIATE R. C. Gardner Department of Psychology
Data_Analysis.calm Three Factor Completely Randomized Design with One Continuous Factor: Using SPSS GLM UNIVARIATE R. C. Gardner Department of Psychology This article considers a three factor completely
More informationBayesian Analysis of Multivariate Normal Models when Dimensions are Absent
Bayesian Analysis of Multivariate Normal Models when Dimensions are Absent Robert Zeithammer University of Chicago Peter Lenk University of Michigan http://webuser.bus.umich.edu/plenk/downloads.htm SBIES
More informationTest Yourself! Methodological and Statistical Requirements for M.Sc. Early Childhood Research
Test Yourself! Methodological and Statistical Requirements for M.Sc. Early Childhood Research HOW IT WORKS For the M.Sc. Early Childhood Research, sufficient knowledge in methods and statistics is one
More informationHow to Write a Laboratory Report
How to Write a Laboratory Report For each experiment you will submit a laboratory report. Laboratory reports are to be turned in at the beginning of the lab period, one week following the completion of
More informationCorrelation and Regression (Excel 2007)
Correlation and Regression (Excel 2007) (See Also Scatterplots, Regression Lines, and Time Series Charts With Excel 2007 for instructions on making a scatterplot of the data and an alternate method of
More informationDimensionality Reduction Techniques (DRT)
Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,
More informationCHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the
CHAPTER 4 VARIABILITY ANALYSES Chapter 3 introduced the mode, median, and mean as tools for summarizing the information provided in an distribution of data. Measures of central tendency are often useful
More informationMultiple regression: Model building. Topics. Correlation Matrix. CQMS 202 Business Statistics II Prepared by Moez Hababou
Multiple regression: Model building CQMS 202 Business Statistics II Prepared by Moez Hababou Topics Forward versus backward model building approach Using the correlation matrix Testing for multicolinearity
More informationStatistical Tools for Multivariate Six Sigma. Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc.
Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc. 1 The Challenge The quality of an item or service usually depends on more than one characteristic.
More informationProblems with parallel analysis in data sets with oblique simple structure
Methods of Psychological Research Online 2001, Vol.6, No.2 Internet: http://www.mpr-online.de Institute for Science Education 2001 IPN Kiel Problems with parallel analysis in data sets with oblique simple
More informationLogistic Regression Models for Multinomial and Ordinal Outcomes
CHAPTER 8 Logistic Regression Models for Multinomial and Ordinal Outcomes 8.1 THE MULTINOMIAL LOGISTIC REGRESSION MODEL 8.1.1 Introduction to the Model and Estimation of Model Parameters In the previous
More informationRobustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples
DOI 10.1186/s40064-016-1718-3 RESEARCH Open Access Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples Atinuke Adebanji 1,2, Michael Asamoah Boaheng
More informationEXAMINATION: QUANTITATIVE EMPIRICAL METHODS. Yale University. Department of Political Science
EXAMINATION: QUANTITATIVE EMPIRICAL METHODS Yale University Department of Political Science January 2014 You have seven hours (and fifteen minutes) to complete the exam. You can use the points assigned
More information2 Prediction and Analysis of Variance
2 Prediction and Analysis of Variance Reading: Chapters and 2 of Kennedy A Guide to Econometrics Achen, Christopher H. Interpreting and Using Regression (London: Sage, 982). Chapter 4 of Andy Field, Discovering
More informationCHAPTER 2. Types of Effect size indices: An Overview of the Literature
CHAPTER Types of Effect size indices: An Overview of the Literature There are different types of effect size indices as a result of their different interpretations. Huberty (00) names three different types:
More informationExploratory Factor Analysis and Principal Component Analysis
Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and
More informationIrrational Thoughts. Aim. Equipment. Irrational Investigation: Teacher Notes
Teacher Notes 7 8 9 10 11 12 Aim Identify strategies and techniques to express irrational numbers in surd form. Equipment For this activity you will need: TI-Nspire CAS TI-Nspire CAS Investigation Student
More informationFigure Figure
Figure 4-12. Equal probability of selection with simple random sampling of equal-sized clusters at first stage and simple random sampling of equal number at second stage. The next sampling approach, shown
More informationEstimating Coefficients in Linear Models: It Don't Make No Nevermind
Psychological Bulletin 1976, Vol. 83, No. 2. 213-217 Estimating Coefficients in Linear Models: It Don't Make No Nevermind Howard Wainer Department of Behavioral Science, University of Chicago It is proved
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More informationChapter 19: Logistic regression
Chapter 19: Logistic regression Self-test answers SELF-TEST Rerun this analysis using a stepwise method (Forward: LR) entry method of analysis. The main analysis To open the main Logistic Regression dialog
More informationIB Physics STUDENT GUIDE 13 and Processing (DCP)
IB Physics STUDENT GUIDE 13 Chapter Data collection and PROCESSING (DCP) Aspect 1 Aspect Aspect 3 Levels/marks Recording raw data Processing raw data Presenting processed data Complete/ Partial/1 Not at
More informationResearchers often record several characters in their research experiments where each character has a special significance to the experimenter.
Dimension reduction in multivariate analysis using maximum entropy criterion B. K. Hooda Department of Mathematics and Statistics CCS Haryana Agricultural University Hisar 125 004 India D. S. Hooda Jaypee
More informationAbility Metric Transformations
Ability Metric Transformations Involved in Vertical Equating Under Item Response Theory Frank B. Baker University of Wisconsin Madison The metric transformations of the ability scales involved in three
More informationSRMR in Mplus. Tihomir Asparouhov and Bengt Muthén. May 2, 2018
SRMR in Mplus Tihomir Asparouhov and Bengt Muthén May 2, 2018 1 Introduction In this note we describe the Mplus implementation of the SRMR standardized root mean squared residual) fit index for the models
More informationGroup Dependence of Some Reliability
Group Dependence of Some Reliability Indices for astery Tests D. R. Divgi Syracuse University Reliability indices for mastery tests depend not only on true-score variance but also on mean and cutoff scores.
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationOhio s State Tests ITEM RELEASE SPRING 2016 INTEGRATED MATHEMATICS I
Ohio s State Tests ITEM RELEASE SPRING 2016 INTEGRATED MATHEMATICS I Table of Contents Questions 1 3: Content Summary and Answer Key... ii Question 1: Question and Scoring Guidelines... 1 Question 1: Sample
More informationRobustness of factor analysis in analysis of data with discrete variables
Aalto University School of Science Degree programme in Engineering Physics and Mathematics Robustness of factor analysis in analysis of data with discrete variables Student Project 26.3.2012 Juha Törmänen
More informationTesting and Interpreting Interaction Effects in Multilevel Models
Testing and Interpreting Interaction Effects in Multilevel Models Joseph J. Stevens University of Oregon and Ann C. Schulte Arizona State University Presented at the annual AERA conference, Washington,
More informationSystematic error, of course, can produce either an upward or downward bias.
Brief Overview of LISREL & Related Programs & Techniques (Optional) Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised April 6, 2015 STRUCTURAL AND MEASUREMENT MODELS:
More informationIs economic freedom related to economic growth?
Is economic freedom related to economic growth? It is an article of faith among supporters of capitalism: economic freedom leads to economic growth. The publication Economic Freedom of the World: 2003
More informationDevelopment of a Data Mining Methodology using Robust Design
Development of a Data Mining Methodology using Robust Design Sangmun Shin, Myeonggil Choi, Youngsun Choi, Guo Yi Department of System Management Engineering, Inje University Gimhae, Kyung-Nam 61-749 South
More informationConventional And Robust Paired And Independent-Samples t Tests: Type I Error And Power Rates
Journal of Modern Applied Statistical Methods Volume Issue Article --3 Conventional And And Independent-Samples t Tests: Type I Error And Power Rates Katherine Fradette University of Manitoba, umfradet@cc.umanitoba.ca
More informationCollege Teaching Methods & Styles Journal Second Quarter 2005 Volume 1, Number 2
Illustrating the Central Limit Theorem Through Microsoft Excel Simulations David H. Moen, (Email: dmoen@usd.edu ) University of South Dakota John E. Powell, (Email: jpowell@usd.edu ) University of South
More informationMultivariate analysis
Multivariate analysis Prof dr Ann Vanreusel -Multidimensional scaling -Simper analysis -BEST -ANOSIM 1 2 Gradient in species composition 3 4 Gradient in environment site1 site2 site 3 site 4 site species
More informationCorrelation & Simple Regression
Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.
More informationContrast Analysis: A Tutorial
A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute
More information