Box-Cox Transformations
|
|
- Dale Gregory
- 5 years ago
- Views:
Transcription
1 Box-Cox Transformations Revised: 10/10/2017 Summary... 1 Data Input... 3 Analysis Summary... 3 Analysis Options... 5 Plot of Fitted Model... 6 MSE Comparison Plot... 8 MSE Comparison Table... 9 Skewness and Kurtosis Plot... 9 Lack-of-Fit Test Observed versus Predicted Residual Plots Unusual Residuals Influential Points Forecasts Save Results Calculations Summary The Box-Cox Transformations procedure is designed to determine an optimal transformation for Y while fitting a linear regression model. It is useful when the variability of Y changes as a function of X. Often, an appropriate transformation of Y both stabilizes the variance and makes the deviations around the model more normally distributed. The class of transformations considered are the power transformations defined by Y 1 Y (1) by Statgraphics Technologies, Inc. Box-Cox Transformations - 1
2 in which the data is raised to a power 1 after shifting it a certain amount 2. Often, the shift parameter 2 is set equal to 0. This class includes square roots, logarithms, reciprocals, and other common transformations, depending on the power. Examples include: Power Transformation Description = 2 2 Y Y square = 1 Y Y untransformed data = 0.5 Y Y square root = Y 3 Y cube root = 0 Y ln(y ) logarithm = inverse square root Y Y = -1 1 reciprocal Y Y Note that as 0, the power transformation approaches a logarithm. Sample StatFolio: boxcox.sgp Sample Data: The file plasma.sgd contains data presented by Neter et al. (1998) showing the plasma level of a polyamine for n = 25 healthy children. A portion of the data is shown below: Age Plasma Level It is desired to determine a model relating the plasma level to the age of the child by Statgraphics Technologies, Inc. Box-Cox Transformations - 2
3 Data Input The data input dialog box requests the names of the columns containing the dependent variable Y and the independent variable X: Y: numeric column containing the n observations for the dependent variable Y. X: numeric column containing the n values for the independent variable X. Select: subset selection. Analysis Summary In relating the two variables, the procedure fits a model of the form W 1 0 X (2) where the dependent variable W is related to Y according to 1 1 K1 Y 2 1 W if 1 K2 lny (3) and K n 2 Y i 2 ) i1 1 1/ n ( (4) K 1 1 K (5) by Statgraphics Technologies, Inc. Box-Cox Transformations - 3
4 Note that K 2 is the geometric mean of Y+ 2. Following Box and Cox (1964), the optimal transformation is the one that minimizes the mean squared error for W. The reason for using the standardized variable W instead of Y is to adjust the magnitude of the error sum of squares for the effect of the power transformation. The Analysis Summary shows the optimal power and the resulting model: Box-Cox Transformations - Plasma Level vs. Age Power = Shift = 0.0 Dependent variable: Plasma Level Independent variable: Age Number of observations: 25 Standard T Parameter Estimate Error Statistic P-Value Intercept Slope Analysis of Variance Source Sum of Squares Df Mean Square F-Ratio P-Value Model Residual Total (Corr.) Correlation Coefficient = R-squared = percent Standard Error of Est. = Approximate 95% confidence interval for power: to Included in the output are: Power and shift parameters: the values of and. By default, the power parameter is optimized, while the shift parameter is set to 0. This may be changed using Analysis Options. Also included at the bottom of the screen is an approximate confidence interval for at the default system confidence level. Coefficients: the estimated coefficients, standard errors, t-statistics, and P values. The estimates of the model coefficients can be used to write the fitted equation, which in the example is W = age (6) The t-statistic tests the null hypothesis that the corresponding model parameter equals 0, versus the alternative hypothesis that it does not equal 0. Small P-Values (less than 0.05 if operating at the 5% significance level) indicate that a model coefficient is significantly different from 0. In the sample data, both the intercept and slope are statistically significant. Analysis of Variance: decomposition of the variability of the dependent variable W into a model sums of squares and a residual or error sum of squares. Of particular interest is the F by Statgraphics Technologies, Inc. Box-Cox Transformations - 4
5 test and its associated P-value, which tests the statistical significance of the fitted model. A small P-Value (less than 0.05 if operating at the 5% significance level) indicates that a significant linear relationship exists between Y and X. In the sample data, the model is highly significant. Statistics: summary statistics for the fitted model, including: Correlation coefficient - measures the strength of the linear relationship between W and X on a scale ranging from -1 (perfect negative linear correlation) to +1 (perfect positive linear correlation). R-squared - represents the percentage of the variability in W that has been explained by the fitted regression model, ranging from 0% to 100%. Standard Error of Est. the estimated standard deviation of the residuals (the deviations around the model). This value is used to create prediction limits for new observations. Mean Absolute Error the average absolute value of the residuals. In the sample data, the transformation selected is very close to an inverse square root, implying that 1/ PlasmaLevel is a linear function of Age. According to the confidence interval, however, the actual optimal transformation could be anywhere between a reciprocal and a logarithm. Analysis Options Power: the value of the power parameter If Optimize is selected, this serves as the starting value of the optimization search when OK is pressed. If Optimize is not selected, this is the value used for the transformation. Shift: the value of the power parameter This value is subtracted from the dependent variable Y before the power transformation is performed. Optimize: whether to optimize the power parameter or use the specified value by Statgraphics Technologies, Inc. Box-Cox Transformations - 5
6 Plasma Level Plot of Fitted Model This pane shows the fitted model, together with confidence limits and prediction limits if desired. 24 Plot of Fitted Model Power=-0.504, Shift= Age The plot includes: The line of best fit or prediction equation. This is the equation that would be used to predict values of the dependent variable Y given values of the independent variable X. Note that it does a relatively good job of picking up the increased variability of Plasma Level at low Ages, as well as the curvature in the relationship. Confidence intervals for the mean response at X. These are the inner bounds in the above plot and describe how well the location of the line has been estimated given the available data sample. As the size of the sample n increases, these bounds will become tighter. You should also note that the width of the bounds varies as a function of X, with the line estimated most precisely near the average value x. Prediction limits for new observations. These are the outer bounds in the above plot and describe how precisely one could predict where a single new observation would lie. Regardless of the size of the sample, new observations will vary around the true line. The inclusion of confidence limits and prediction limits and their default confidence level is determined by settings on the ANOVA/Regression tab of the Preferences dialog box, accessible from the Edit menu by Statgraphics Technologies, Inc. Box-Cox Transformations - 6
7 Pane Options Include: the limits to include on the plot. Confidence Level: the confidence percentage for the limits. X-Axis Resolution: the number of values of X at which the line is determined when plotting. Higher resolutions result in smoother plots. Type of Limits: whether to plot two-sided confidence intervals or one-sided confidence bounds. Shade 2-sided limits If checked, the area between the limits will be shaded using a fill color by Statgraphics Technologies, Inc. Box-Cox Transformations - 7
8 MSE MSE Comparison Plot When optimizing the transformation, the power is sought that minimizes the mean squared error of the fit of W as a function of X. To illustrate the result of the search, the MSE Comparison Plot shows the mean squared error in the vicinity of the optimal value: 12 MSE Comparison lambda2 = lambda1 Vertical lines are drawn at the derived 1 and its confidence limits. Notice that the MSE reaches a minimum near 0.5, although it is relatively flat in a wide region around the optimal value, indicating that the power could be changed quite a bit without hurting the model substantially. Pane Options Minimum Lambda1: smallest value of to include in the plot. Maximum Lambda1: largest value of to include in the plot. Resolution: number of different values of at which to calculate the MSE by Statgraphics Technologies, Inc. Box-Cox Transformations - 8
9 MSE Comparison Table This table tabulates the values plotted by the MSE Comparison Plot. MSE Comparison Table Shift (lambda2): 0.0 lambda1 MSE The Pane Options are the same as for the plot. Skewness and Kurtosis Plot This plot shows the values of the standardized skewness and standardized kurtosis as a function of the power parameter Skewness and Kurtosis Plot lambda2 =0.0 skewness kurtosis lambda1 The standardized skewness and standardized kurtosis should both be between 2 and +2 for a transformation that adequately normalizes the data. The plot shows horizontal lines at 2 and +2, with the vertical lines indicating the optimal value of 1 and its confidence limits by Statgraphics Technologies, Inc. Box-Cox Transformations - 9
10 observed Clearly, there is a wide range of values for 1 that would create a reasonable transformation of the data. Lack-of-Fit Test When more than one observation has been recorded at the same value of X, a lack-of-fit test can be performed to determine whether the selected model adequately describes the relationship between Y and X. The Lack-of-Fit pane displays the following table: Analysis of Variance with Lack-of-Fit Source Sum of Squares Df Mean Square F-Ratio P-Value Model Residual Lack-of-Fit Pure Error Total (Corr.) The lack-of-fit test decomposes the residual sum of squares of the transformed values W into 2 components: 1. Pure error: variability of the W values at the same value of X. 2. Lack-of-fit: variability of the average W values around the fitted model. Of primary interest is the P-Value for lack-of-fit. A small P-value (below 0.05 if operating at the 5% significance level) indicates that the selected model does not adequately describe the observed relationship. For the example data, the large P-value indicates that the linear model adequately explains the relationship between Plasma Level and Age. Observed versus Predicted The Observed versus Predicted plot shows the observed values of Y on the vertical axis and the predicted values Ŷ on the horizontal axis, in the untransformed metric Plot of Plasma Level predicted 2017 by Statgraphics Technologies, Inc. Box-Cox Transformations - 10
11 Studentized residual If the model fits well, the points should be randomly scattered around the diagonal line. It is sometimes possible to see curvature in this plot, which would indicate the need for a curvilinear model rather than a linear model. In this case, the change in variability in the above plot as the predicted values increase is not a concern, since that was stabilized by the Box-Cox transformation. Residual Plots As with all statistical models, it is good practice to examine the residuals. In a regression, the residuals are defined by e i W Wˆ (7) i i i.e., the residuals are the differences between the transformed data values and the fitted linear regression model. The Box-Cox Transformations procedure creates 3 residual plots: 1. versus X. 2. versus predicted value Wˆ. 3. versus row number. Residuals versus X This plot is helpful in visualizing how well the transformation accounted for any curvature in the data. 3.3 Residual Plot Age The residuals should be randomly scattered around by Statgraphics Technologies, Inc. Box-Cox Transformations - 11
12 Studentized residual Studentized residual Residuals versus Predicted This plot is helpful in visualizing how well the model dealt with any heteroscedasticity in the data. 3.3 Residual Plot predicted Plasma Level If the transformation was effective, the variability should be approximately equal everywhere. Residuals versus Observation This plot shows the residuals versus row number in the datasheet: 3.3 Residual Plot row number If the data are arranged in chronological order, any pattern in the data might indicate an outside influence. Pane Options The following residuals may be plotted on each residual plot: 2017 by Statgraphics Technologies, Inc. Box-Cox Transformations - 12
13 Plasma Level 1. Residuals the residuals from the least squares fit. 2. Studentized residuals the difference between the observed values w i and the predicted values ŵi when the model is fit using all observations except the i-th, divided by the estimated standard error. These residuals are sometimes called externally deleted residuals, since they measure how far each value is from the fitted model when that model is fit using all of the data except the point being considered. This is important, since a large outlier might otherwise affect the model so much that it would not appear to be unusually far away from the line. Unusual Residuals Once the model has been fit, it is useful to study the residuals to determine whether any outliers exist that should be removed from the data. The Unusual Residuals pane lists all observations that have Studentized residuals of 2.0 or greater in absolute value. Unusual Residuals Predicted Studentized Row X Y Y Residual Residual Studentized residuals greater than 3 in absolute value correspond to points more than 3 standard deviations from the fitted model, which is an extremely rare event for a normal distribution. Note that row #18 is more than 2.5 standard deviations out and would be worth investigating further. Points can be removed from the fit while examining the Plot of the Fitted Model by clicking on a point and then pressing the Exclude/Include button on the analysis toolbar: Plot of Fitted Model Power= , Shift= Age Excluded points are marked with an X. For the sample data, removing row #18 has little effect on the fitted model or optimal transformation by Statgraphics Technologies, Inc. Box-Cox Transformations - 13
14 Influential Points In fitting a regression model, all observations do not have an equal influence on the parameter estimates in the fitted model. In a simple linear regression, points located at very low or very high values of X have greater influence than those located nearer to the mean of X. The Influential Points pane displays any observations that have high influence on the fitted model: Influential Points Predicted Studentized Row X Y Y Residual Leverage Average leverage of single data point = The above table shows every point with leverage equal to 3 or more times that of an average data point, where the leverage of an observation is a measure of its influence on the estimated model coefficients. In general, values with leverage exceeding 5 times that of an average data value should be examined closely, since they have unusually large impact on the fitted model. In the sample data, there are no observations with unusually large leverage. Forecasts The Forecasts pane creates predictions using the fitted model. Predicted Values 95.00% 95.00% Predicted Prediction Limits Confidence Limits X Y Lower Upper Lower Upper Included in the table are: X - the value of the independent variable at which the prediction is to be made. Predicted Y - the predicted value of the dependent variable using the fitted model. Prediction limits - prediction limits for new observations at the selected level of confidence (corresponds to the outer bounds on the plot of the fitted model). Confidence limits - confidence limits for the mean value of Y at the selected level of confidence (corresponds to the inner bounds on the plot of the fitted model). For example, at X = 3, 95% of all children would be expected to have plasma levels between 5.47 and by Statgraphics Technologies, Inc. Box-Cox Transformations - 14
15 Pane Options Confidence Level: confidence percentage for the intervals. Type of Limits: whether to display two-sided limits or one-sided bounds. Forecast at X: up to 10 values of X at which to make predictions. Save Results The following results may be saved to the datasheet: 1. Predicted Values the predicted value of Y corresponding to each of the n observations. 2. Lower Limits for Predictions the lower prediction limits for each predicted value. 3. Upper Limits for Predictions the upper prediction limits for each predicted value. 4. Lower Limits for Forecast Means the lower confidence limits for the mean value of Y at each of the n values of X. 5. Upper Limits for Forecast Means the upper confidence limits for the mean value of Y at each of the n values of X. 6. Residuals the n residuals. 7. Studentized Residuals the n Studentized residuals. 8. Leverages the leverage values corresponding to the n values of X. 9. Transformed Data the n transformed values W. Note: If limits are saved, they will correspond to the settings on the Forecasts pane. If two-sided limits are displayed in the Forecasts table, then the saved limits will also be two-sided. If onesided bounds are displayed in the table, then the saved limits will also be one-sided. Calculations 2017 by Statgraphics Technologies, Inc. Box-Cox Transformations - 15
16 The linear regression is performed on the transformed values W. Prediction limits are calculated in the transformed metric and inverted before being displayed. For details on the calculations, see the Simple Regression documentation by Statgraphics Technologies, Inc. Box-Cox Transformations - 16
Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.
STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed
More informationPolynomial Regression
Polynomial Regression Summary... 1 Analysis Summary... 3 Plot of Fitted Model... 4 Analysis Options... 6 Conditional Sums of Squares... 7 Lack-of-Fit Test... 7 Observed versus Predicted... 8 Residual Plots...
More informationNonlinear Regression. Summary. Sample StatFolio: nonlinear reg.sgp
Nonlinear Regression Summary... 1 Analysis Summary... 4 Plot of Fitted Model... 6 Response Surface Plots... 7 Analysis Options... 10 Reports... 11 Correlation Matrix... 12 Observed versus Predicted...
More informationRidge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014
Ridge Regression Summary... 1 Data Input... 4 Analysis Summary... 5 Analysis Options... 6 Ridge Trace... 7 Regression Coefficients... 8 Standardized Regression Coefficients... 9 Observed versus Predicted...
More informationThe entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.
One-Way ANOVA Summary The One-Way ANOVA procedure is designed to construct a statistical model describing the impact of a single categorical factor X on a dependent variable Y. Tests are run to determine
More informationHow To: Deal with Heteroscedasticity Using STATGRAPHICS Centurion
How To: Deal with Heteroscedasticity Using STATGRAPHICS Centurion by Dr. Neil W. Polhemus July 28, 2005 Introduction When fitting statistical models, it is usually assumed that the error variance is the
More informationLAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION
LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION In this lab you will first learn how to display the relationship between two quantitative variables with a scatterplot and also how to measure the strength of
More informationMultivariate T-Squared Control Chart
Multivariate T-Squared Control Chart Summary... 1 Data Input... 3 Analysis Summary... 4 Analysis Options... 5 T-Squared Chart... 6 Multivariate Control Chart Report... 7 Generalized Variance Chart... 8
More informationLAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION
LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the
More informationDistribution Fitting (Censored Data)
Distribution Fitting (Censored Data) Summary... 1 Data Input... 2 Analysis Summary... 3 Analysis Options... 4 Goodness-of-Fit Tests... 6 Frequency Histogram... 8 Comparison of Alternative Distributions...
More informationItem Reliability Analysis
Item Reliability Analysis Revised: 10/11/2017 Summary... 1 Data Input... 4 Analysis Options... 5 Tables and Graphs... 5 Analysis Summary... 6 Matrix Plot... 8 Alpha Plot... 10 Correlation Matrix... 11
More informationDOE Wizard Screening Designs
DOE Wizard Screening Designs Revised: 10/10/2017 Summary... 1 Example... 2 Design Creation... 3 Design Properties... 13 Saving the Design File... 16 Analyzing the Results... 17 Statistical Model... 18
More informationCorrespondence Analysis
STATGRAPHICS Rev. 7/6/009 Correspondence Analysis The Correspondence Analysis procedure creates a map of the rows and columns in a two-way contingency table for the purpose of providing insights into the
More informationMultiple Variable Analysis
Multiple Variable Analysis Revised: 10/11/2017 Summary... 1 Data Input... 3 Analysis Summary... 3 Analysis Options... 4 Scatterplot Matrix... 4 Summary Statistics... 6 Confidence Intervals... 7 Correlations...
More informationFactor Analysis. Summary. Sample StatFolio: factor analysis.sgp
Factor Analysis Summary... 1 Data Input... 3 Statistical Model... 4 Analysis Summary... 5 Analysis Options... 7 Scree Plot... 9 Extraction Statistics... 10 Rotation Statistics... 11 D and 3D Scatterplots...
More informationPrincipal Components. Summary. Sample StatFolio: pca.sgp
Principal Components Summary... 1 Statistical Model... 4 Analysis Summary... 5 Analysis Options... 7 Scree Plot... 8 Component Weights... 9 D and 3D Component Plots... 10 Data Table... 11 D and 3D Component
More informationChapter 16. Simple Linear Regression and dcorrelation
Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationLOOKING FOR RELATIONSHIPS
LOOKING FOR RELATIONSHIPS One of most common types of investigation we do is to look for relationships between variables. Variables may be nominal (categorical), for example looking at the effect of an
More informationDiagnostics and Remedial Measures
Diagnostics and Remedial Measures Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Diagnostics and Remedial Measures 1 / 72 Remedial Measures How do we know that the regression
More informationCanonical Correlations
Canonical Correlations Summary The Canonical Correlations procedure is designed to help identify associations between two sets of variables. It does so by finding linear combinations of the variables in
More informationKeller: Stats for Mgmt & Econ, 7th Ed July 17, 2006
Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationChapter 16. Simple Linear Regression and Correlation
Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice
The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test
More informationFractional Polynomial Regression
Chapter 382 Fractional Polynomial Regression Introduction This program fits fractional polynomial models in situations in which there is one dependent (Y) variable and one independent (X) variable. It
More informationCHAPTER 10. Regression and Correlation
CHAPTER 10 Regression and Correlation In this Chapter we assess the strength of the linear relationship between two continuous variables. If a significant linear relationship is found, the next step would
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)
The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE
More informationHow To: Analyze a Split-Plot Design Using STATGRAPHICS Centurion
How To: Analyze a SplitPlot Design Using STATGRAPHICS Centurion by Dr. Neil W. Polhemus August 13, 2005 Introduction When performing an experiment involving several factors, it is best to randomize the
More informationAn area chart emphasizes the trend of each value over time. An area chart also shows the relationship of parts to a whole.
Excel 2003 Creating a Chart Introduction Page 1 By the end of this lesson, learners should be able to: Identify the parts of a chart Identify different types of charts Create an Embedded Chart Create a
More information1 A Review of Correlation and Regression
1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then
More informationUnit 6 - Introduction to linear regression
Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,
More informationRatio of Polynomials Fit One Variable
Chapter 375 Ratio of Polynomials Fit One Variable Introduction This program fits a model that is the ratio of two polynomials of up to fifth order. Examples of this type of model are: and Y = A0 + A1 X
More informationAssumptions in Regression Modeling
Fall Semester, 2001 Statistics 621 Lecture 2 Robert Stine 1 Assumptions in Regression Modeling Preliminaries Preparing for class Read the casebook prior to class Pace in class is too fast to absorb without
More informationArrhenius Plot. Sample StatFolio: arrhenius.sgp
Summary The procedure is designed to plot data from an accelerated life test in which failure times have been recorded and percentiles estimated at a number of different temperatures. The percentiles P
More informationUnit 6 - Simple linear regression
Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable
More informationRegression. Marc H. Mehlman University of New Haven
Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and
More informationSeasonal Adjustment using X-13ARIMA-SEATS
Seasonal Adjustment using X-13ARIMA-SEATS Revised: 10/9/2017 Summary... 1 Data Input... 3 Limitations... 4 Analysis Options... 5 Tables and Graphs... 6 Analysis Summary... 7 Data Table... 9 Trend-Cycle
More informationBivariate data analysis
Bivariate data analysis Categorical data - creating data set Upload the following data set to R Commander sex female male male male male female female male female female eye black black blue green green
More information1 Introduction to Minitab
1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you
More informationNCSS Statistical Software. Harmonic Regression. This section provides the technical details of the model that is fit by this procedure.
Chapter 460 Introduction This program calculates the harmonic regression of a time series. That is, it fits designated harmonics (sinusoidal terms of different wavelengths) using our nonlinear regression
More informationInvestigating Models with Two or Three Categories
Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might
More informationReview of Multiple Regression
Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate
More informationProbability Plots. Summary. Sample StatFolio: probplots.sgp
STATGRAPHICS Rev. 9/6/3 Probability Plots Summary... Data Input... 2 Analysis Summary... 2 Analysis Options... 3 Uniform Plot... 3 Normal Plot... 4 Lognormal Plot... 4 Weibull Plot... Extreme Value Plot...
More informationHypothesis Testing for Var-Cov Components
Hypothesis Testing for Var-Cov Components When the specification of coefficients as fixed, random or non-randomly varying is considered, a null hypothesis of the form is considered, where Additional output
More informationRegression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).
Regression Analysis Two variables may be related in such a way that the magnitude of one, the dependent variable, is assumed to be a function of the magnitude of the second, the independent variable; however,
More informationSTATISTICS 110/201 PRACTICE FINAL EXAM
STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable
More informationLinear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).
Linear Regression Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). A dependent variable is a random variable whose variation
More information2. Outliers and inference for regression
Unit6: Introductiontolinearregression 2. Outliers and inference for regression Sta 101 - Spring 2016 Duke University, Department of Statistical Science Dr. Çetinkaya-Rundel Slides posted at http://bit.ly/sta101_s16
More informationComputer simulation of radioactive decay
Computer simulation of radioactive decay y now you should have worked your way through the introduction to Maple, as well as the introduction to data analysis using Excel Now we will explore radioactive
More informationTABLES AND FORMULAS FOR MOORE Basic Practice of Statistics
TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x
More informationUsing SPSS for One Way Analysis of Variance
Using SPSS for One Way Analysis of Variance This tutorial will show you how to use SPSS version 12 to perform a one-way, between- subjects analysis of variance and related post-hoc tests. This tutorial
More informationEDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS
EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS The data used in this example describe teacher and student behavior in 8 classrooms. The variables are: Y percentage of interventions
More informationRatio of Polynomials Fit Many Variables
Chapter 376 Ratio of Polynomials Fit Many Variables Introduction This program fits a model that is the ratio of two polynomials of up to fifth order. Instead of a single independent variable, these polynomials
More informationAnalysis of Covariance (ANCOVA) with Two Groups
Chapter 226 Analysis of Covariance (ANCOVA) with Two Groups Introduction This procedure performs analysis of covariance (ANCOVA) for a grouping variable with 2 groups and one covariate variable. This procedure
More informationRemedial Measures Wrap-Up and Transformations-Box Cox
Remedial Measures Wrap-Up and Transformations-Box Cox Frank Wood October 25, 2011 Last Class Graphical procedures for determining appropriateness of regression fit - Normal probability plot Tests to determine
More informationRidge Regression. Chapter 335. Introduction. Multicollinearity. Effects of Multicollinearity. Sources of Multicollinearity
Chapter 335 Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates are unbiased, but their variances
More informationTrendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues
Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +
More informationComparison of Regression Lines
STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence
More informationStat 101 Exam 1 Important Formulas and Concepts 1
1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative
More informationRegression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear
Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear relationship between: - one independent variable X and -
More informationBE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club
BE640 Intermediate Biostatistics 2. Regression and Correlation Simple Linear Regression Software: SAS Emergency Calls to the New York Auto Club Source: Chatterjee, S; Handcock MS and Simonoff JS A Casebook
More informationSingle and multiple linear regression analysis
Single and multiple linear regression analysis Marike Cockeran 2017 Introduction Outline of the session Simple linear regression analysis SPSS example of simple linear regression analysis Additional topics
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationMultivariate Capability Analysis Using Statgraphics. Presented by Dr. Neil W. Polhemus
Multivariate Capability Analysis Using Statgraphics Presented by Dr. Neil W. Polhemus Multivariate Capability Analysis Used to demonstrate conformance of a process to requirements or specifications that
More informationMULTIPLE LINEAR REGRESSION IN MINITAB
MULTIPLE LINEAR REGRESSION IN MINITAB This document shows a complicated Minitab multiple regression. It includes descriptions of the Minitab commands, and the Minitab output is heavily annotated. Comments
More informationQuestion Possible Points Score Total 100
Midterm I NAME: Instructions: 1. For hypothesis testing, the significant level is set at α = 0.05. 2. This exam is open book. You may use textbooks, notebooks, and a calculator. 3. Do all your work in
More informationTest 3 Practice Test A. NOTE: Ignore Q10 (not covered)
Test 3 Practice Test A NOTE: Ignore Q10 (not covered) MA 180/418 Midterm Test 3, Version A Fall 2010 Student Name (PRINT):............................................. Student Signature:...................................................
More informationFrom Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...
From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. Contents About This Book... xiii About The Author... xxiii Chapter 1 Getting Started: Data Analysis with JMP...
More informationUnit 10: Simple Linear Regression and Correlation
Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for
More informationUsing Tables and Graphing Calculators in Math 11
Using Tables and Graphing Calculators in Math 11 Graphing calculators are not required for Math 11, but they are likely to be helpful, primarily because they allow you to avoid the use of tables in some
More informationAMS 7 Correlation and Regression Lecture 8
AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation
More informationDensity Temp vs Ratio. temp
Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,
More informationRegression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin
Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n
More informationHow to Run the Analysis: To run a principal components factor analysis, from the menus choose: Analyze Dimension Reduction Factor...
The principal components method of extraction begins by finding a linear combination of variables that accounts for as much variation in the original variables as possible. This method is most often used
More informationPsychology Seminar Psych 406 Dr. Jeffrey Leitzel
Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Structural Equation Modeling Topic 1: Correlation / Linear Regression Outline/Overview Correlations (r, pr, sr) Linear regression Multiple regression interpreting
More informationTopic 18: Model Selection and Diagnostics
Topic 18: Model Selection and Diagnostics Variable Selection We want to choose a best model that is a subset of the available explanatory variables Two separate problems 1. How many explanatory variables
More informationy = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output
12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation y = a + bx y = dependent variable a = intercept b = slope x = independent variable Section 12.1 Inference for Linear
More informationAnalysis of Bivariate Data
Analysis of Bivariate Data Data Two Quantitative variables GPA and GAES Interest rates and indices Tax and fund allocation Population size and prison population Bivariate data (x,y) Case corr® 2 Independent
More informationSTAT 350 Final (new Material) Review Problems Key Spring 2016
1. The editor of a statistics textbook would like to plan for the next edition. A key variable is the number of pages that will be in the final version. Text files are prepared by the authors using LaTeX,
More informationIndependent Samples ANOVA
Independent Samples ANOVA In this example students were randomly assigned to one of three mnemonics (techniques for improving memory) rehearsal (the control group; simply repeat the words), visual imagery
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationRegression Analysis. BUS 735: Business Decision Making and Research
Regression Analysis BUS 735: Business Decision Making and Research 1 Goals and Agenda Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn
More informationCh 13 & 14 - Regression Analysis
Ch 3 & 4 - Regression Analysis Simple Regression Model I. Multiple Choice:. A simple regression is a regression model that contains a. only one independent variable b. only one dependent variable c. more
More information13 Simple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity
More informationPassing-Bablok Regression for Method Comparison
Chapter 313 Passing-Bablok Regression for Method Comparison Introduction Passing-Bablok regression for method comparison is a robust, nonparametric method for fitting a straight line to two-dimensional
More informationLab 1 Uniform Motion - Graphing and Analyzing Motion
Lab 1 Uniform Motion - Graphing and Analyzing Motion Objectives: < To observe the distance-time relation for motion at constant velocity. < To make a straight line fit to the distance-time data. < To interpret
More informationSTA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007
STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.
More informationBasic Business Statistics 6 th Edition
Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based
More informationRegression diagnostics
Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model
More informationStatistics II Exercises Chapter 5
Statistics II Exercises Chapter 5 1. Consider the four datasets provided in the transparencies for Chapter 5 (section 5.1) (a) Check that all four datasets generate exactly the same LS linear regression
More informationRegression used to predict or estimate the value of one variable corresponding to a given value of another variable.
CHAPTER 9 Simple Linear Regression and Correlation Regression used to predict or estimate the value of one variable corresponding to a given value of another variable. X = independent variable. Y = dependent
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationASSIGNMENT 3 SIMPLE LINEAR REGRESSION. Old Faithful
ASSIGNMENT 3 SIMPLE LINEAR REGRESSION In the simple linear regression model, the mean of a response variable is a linear function of an explanatory variable. The model and associated inferential tools
More informationChapter 12: Multiple Regression
Chapter 12: Multiple Regression 12.1 a. A scatterplot of the data is given here: Plot of Drug Potency versus Dose Level Potency 0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 Dose Level b. ŷ = 8.667 + 0.575x
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationIndex I-1. in one variable, solution set of, 474 solving by factoring, 473 cubic function definition, 394 graphs of, 394 x-intercepts on, 474
Index A Absolute value explanation of, 40, 81 82 of slope of lines, 453 addition applications involving, 43 associative law for, 506 508, 570 commutative law for, 238, 505 509, 570 English phrases for,
More informationMULTIPLE REGRESSION METHODS
DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 MULTIPLE REGRESSION METHODS I. AGENDA: A. Residuals B. Transformations 1. A useful procedure for making transformations C. Reading:
More informationINFERENCE FOR REGRESSION
CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We
More informationLectures on Simple Linear Regression Stat 431, Summer 2012
Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population
More information22S39: Class Notes / November 14, 2000 back to start 1
Model diagnostics Interpretation of fitted regression model 22S39: Class Notes / November 14, 2000 back to start 1 Model diagnostics 22S39: Class Notes / November 14, 2000 back to start 2 Model diagnostics
More information