Box-Cox Transformations

Size: px
Start display at page:

Download "Box-Cox Transformations"

Transcription

1 Box-Cox Transformations Revised: 10/10/2017 Summary... 1 Data Input... 3 Analysis Summary... 3 Analysis Options... 5 Plot of Fitted Model... 6 MSE Comparison Plot... 8 MSE Comparison Table... 9 Skewness and Kurtosis Plot... 9 Lack-of-Fit Test Observed versus Predicted Residual Plots Unusual Residuals Influential Points Forecasts Save Results Calculations Summary The Box-Cox Transformations procedure is designed to determine an optimal transformation for Y while fitting a linear regression model. It is useful when the variability of Y changes as a function of X. Often, an appropriate transformation of Y both stabilizes the variance and makes the deviations around the model more normally distributed. The class of transformations considered are the power transformations defined by Y 1 Y (1) by Statgraphics Technologies, Inc. Box-Cox Transformations - 1

2 in which the data is raised to a power 1 after shifting it a certain amount 2. Often, the shift parameter 2 is set equal to 0. This class includes square roots, logarithms, reciprocals, and other common transformations, depending on the power. Examples include: Power Transformation Description = 2 2 Y Y square = 1 Y Y untransformed data = 0.5 Y Y square root = Y 3 Y cube root = 0 Y ln(y ) logarithm = inverse square root Y Y = -1 1 reciprocal Y Y Note that as 0, the power transformation approaches a logarithm. Sample StatFolio: boxcox.sgp Sample Data: The file plasma.sgd contains data presented by Neter et al. (1998) showing the plasma level of a polyamine for n = 25 healthy children. A portion of the data is shown below: Age Plasma Level It is desired to determine a model relating the plasma level to the age of the child by Statgraphics Technologies, Inc. Box-Cox Transformations - 2

3 Data Input The data input dialog box requests the names of the columns containing the dependent variable Y and the independent variable X: Y: numeric column containing the n observations for the dependent variable Y. X: numeric column containing the n values for the independent variable X. Select: subset selection. Analysis Summary In relating the two variables, the procedure fits a model of the form W 1 0 X (2) where the dependent variable W is related to Y according to 1 1 K1 Y 2 1 W if 1 K2 lny (3) and K n 2 Y i 2 ) i1 1 1/ n ( (4) K 1 1 K (5) by Statgraphics Technologies, Inc. Box-Cox Transformations - 3

4 Note that K 2 is the geometric mean of Y+ 2. Following Box and Cox (1964), the optimal transformation is the one that minimizes the mean squared error for W. The reason for using the standardized variable W instead of Y is to adjust the magnitude of the error sum of squares for the effect of the power transformation. The Analysis Summary shows the optimal power and the resulting model: Box-Cox Transformations - Plasma Level vs. Age Power = Shift = 0.0 Dependent variable: Plasma Level Independent variable: Age Number of observations: 25 Standard T Parameter Estimate Error Statistic P-Value Intercept Slope Analysis of Variance Source Sum of Squares Df Mean Square F-Ratio P-Value Model Residual Total (Corr.) Correlation Coefficient = R-squared = percent Standard Error of Est. = Approximate 95% confidence interval for power: to Included in the output are: Power and shift parameters: the values of and. By default, the power parameter is optimized, while the shift parameter is set to 0. This may be changed using Analysis Options. Also included at the bottom of the screen is an approximate confidence interval for at the default system confidence level. Coefficients: the estimated coefficients, standard errors, t-statistics, and P values. The estimates of the model coefficients can be used to write the fitted equation, which in the example is W = age (6) The t-statistic tests the null hypothesis that the corresponding model parameter equals 0, versus the alternative hypothesis that it does not equal 0. Small P-Values (less than 0.05 if operating at the 5% significance level) indicate that a model coefficient is significantly different from 0. In the sample data, both the intercept and slope are statistically significant. Analysis of Variance: decomposition of the variability of the dependent variable W into a model sums of squares and a residual or error sum of squares. Of particular interest is the F by Statgraphics Technologies, Inc. Box-Cox Transformations - 4

5 test and its associated P-value, which tests the statistical significance of the fitted model. A small P-Value (less than 0.05 if operating at the 5% significance level) indicates that a significant linear relationship exists between Y and X. In the sample data, the model is highly significant. Statistics: summary statistics for the fitted model, including: Correlation coefficient - measures the strength of the linear relationship between W and X on a scale ranging from -1 (perfect negative linear correlation) to +1 (perfect positive linear correlation). R-squared - represents the percentage of the variability in W that has been explained by the fitted regression model, ranging from 0% to 100%. Standard Error of Est. the estimated standard deviation of the residuals (the deviations around the model). This value is used to create prediction limits for new observations. Mean Absolute Error the average absolute value of the residuals. In the sample data, the transformation selected is very close to an inverse square root, implying that 1/ PlasmaLevel is a linear function of Age. According to the confidence interval, however, the actual optimal transformation could be anywhere between a reciprocal and a logarithm. Analysis Options Power: the value of the power parameter If Optimize is selected, this serves as the starting value of the optimization search when OK is pressed. If Optimize is not selected, this is the value used for the transformation. Shift: the value of the power parameter This value is subtracted from the dependent variable Y before the power transformation is performed. Optimize: whether to optimize the power parameter or use the specified value by Statgraphics Technologies, Inc. Box-Cox Transformations - 5

6 Plasma Level Plot of Fitted Model This pane shows the fitted model, together with confidence limits and prediction limits if desired. 24 Plot of Fitted Model Power=-0.504, Shift= Age The plot includes: The line of best fit or prediction equation. This is the equation that would be used to predict values of the dependent variable Y given values of the independent variable X. Note that it does a relatively good job of picking up the increased variability of Plasma Level at low Ages, as well as the curvature in the relationship. Confidence intervals for the mean response at X. These are the inner bounds in the above plot and describe how well the location of the line has been estimated given the available data sample. As the size of the sample n increases, these bounds will become tighter. You should also note that the width of the bounds varies as a function of X, with the line estimated most precisely near the average value x. Prediction limits for new observations. These are the outer bounds in the above plot and describe how precisely one could predict where a single new observation would lie. Regardless of the size of the sample, new observations will vary around the true line. The inclusion of confidence limits and prediction limits and their default confidence level is determined by settings on the ANOVA/Regression tab of the Preferences dialog box, accessible from the Edit menu by Statgraphics Technologies, Inc. Box-Cox Transformations - 6

7 Pane Options Include: the limits to include on the plot. Confidence Level: the confidence percentage for the limits. X-Axis Resolution: the number of values of X at which the line is determined when plotting. Higher resolutions result in smoother plots. Type of Limits: whether to plot two-sided confidence intervals or one-sided confidence bounds. Shade 2-sided limits If checked, the area between the limits will be shaded using a fill color by Statgraphics Technologies, Inc. Box-Cox Transformations - 7

8 MSE MSE Comparison Plot When optimizing the transformation, the power is sought that minimizes the mean squared error of the fit of W as a function of X. To illustrate the result of the search, the MSE Comparison Plot shows the mean squared error in the vicinity of the optimal value: 12 MSE Comparison lambda2 = lambda1 Vertical lines are drawn at the derived 1 and its confidence limits. Notice that the MSE reaches a minimum near 0.5, although it is relatively flat in a wide region around the optimal value, indicating that the power could be changed quite a bit without hurting the model substantially. Pane Options Minimum Lambda1: smallest value of to include in the plot. Maximum Lambda1: largest value of to include in the plot. Resolution: number of different values of at which to calculate the MSE by Statgraphics Technologies, Inc. Box-Cox Transformations - 8

9 MSE Comparison Table This table tabulates the values plotted by the MSE Comparison Plot. MSE Comparison Table Shift (lambda2): 0.0 lambda1 MSE The Pane Options are the same as for the plot. Skewness and Kurtosis Plot This plot shows the values of the standardized skewness and standardized kurtosis as a function of the power parameter Skewness and Kurtosis Plot lambda2 =0.0 skewness kurtosis lambda1 The standardized skewness and standardized kurtosis should both be between 2 and +2 for a transformation that adequately normalizes the data. The plot shows horizontal lines at 2 and +2, with the vertical lines indicating the optimal value of 1 and its confidence limits by Statgraphics Technologies, Inc. Box-Cox Transformations - 9

10 observed Clearly, there is a wide range of values for 1 that would create a reasonable transformation of the data. Lack-of-Fit Test When more than one observation has been recorded at the same value of X, a lack-of-fit test can be performed to determine whether the selected model adequately describes the relationship between Y and X. The Lack-of-Fit pane displays the following table: Analysis of Variance with Lack-of-Fit Source Sum of Squares Df Mean Square F-Ratio P-Value Model Residual Lack-of-Fit Pure Error Total (Corr.) The lack-of-fit test decomposes the residual sum of squares of the transformed values W into 2 components: 1. Pure error: variability of the W values at the same value of X. 2. Lack-of-fit: variability of the average W values around the fitted model. Of primary interest is the P-Value for lack-of-fit. A small P-value (below 0.05 if operating at the 5% significance level) indicates that the selected model does not adequately describe the observed relationship. For the example data, the large P-value indicates that the linear model adequately explains the relationship between Plasma Level and Age. Observed versus Predicted The Observed versus Predicted plot shows the observed values of Y on the vertical axis and the predicted values Ŷ on the horizontal axis, in the untransformed metric Plot of Plasma Level predicted 2017 by Statgraphics Technologies, Inc. Box-Cox Transformations - 10

11 Studentized residual If the model fits well, the points should be randomly scattered around the diagonal line. It is sometimes possible to see curvature in this plot, which would indicate the need for a curvilinear model rather than a linear model. In this case, the change in variability in the above plot as the predicted values increase is not a concern, since that was stabilized by the Box-Cox transformation. Residual Plots As with all statistical models, it is good practice to examine the residuals. In a regression, the residuals are defined by e i W Wˆ (7) i i i.e., the residuals are the differences between the transformed data values and the fitted linear regression model. The Box-Cox Transformations procedure creates 3 residual plots: 1. versus X. 2. versus predicted value Wˆ. 3. versus row number. Residuals versus X This plot is helpful in visualizing how well the transformation accounted for any curvature in the data. 3.3 Residual Plot Age The residuals should be randomly scattered around by Statgraphics Technologies, Inc. Box-Cox Transformations - 11

12 Studentized residual Studentized residual Residuals versus Predicted This plot is helpful in visualizing how well the model dealt with any heteroscedasticity in the data. 3.3 Residual Plot predicted Plasma Level If the transformation was effective, the variability should be approximately equal everywhere. Residuals versus Observation This plot shows the residuals versus row number in the datasheet: 3.3 Residual Plot row number If the data are arranged in chronological order, any pattern in the data might indicate an outside influence. Pane Options The following residuals may be plotted on each residual plot: 2017 by Statgraphics Technologies, Inc. Box-Cox Transformations - 12

13 Plasma Level 1. Residuals the residuals from the least squares fit. 2. Studentized residuals the difference between the observed values w i and the predicted values ŵi when the model is fit using all observations except the i-th, divided by the estimated standard error. These residuals are sometimes called externally deleted residuals, since they measure how far each value is from the fitted model when that model is fit using all of the data except the point being considered. This is important, since a large outlier might otherwise affect the model so much that it would not appear to be unusually far away from the line. Unusual Residuals Once the model has been fit, it is useful to study the residuals to determine whether any outliers exist that should be removed from the data. The Unusual Residuals pane lists all observations that have Studentized residuals of 2.0 or greater in absolute value. Unusual Residuals Predicted Studentized Row X Y Y Residual Residual Studentized residuals greater than 3 in absolute value correspond to points more than 3 standard deviations from the fitted model, which is an extremely rare event for a normal distribution. Note that row #18 is more than 2.5 standard deviations out and would be worth investigating further. Points can be removed from the fit while examining the Plot of the Fitted Model by clicking on a point and then pressing the Exclude/Include button on the analysis toolbar: Plot of Fitted Model Power= , Shift= Age Excluded points are marked with an X. For the sample data, removing row #18 has little effect on the fitted model or optimal transformation by Statgraphics Technologies, Inc. Box-Cox Transformations - 13

14 Influential Points In fitting a regression model, all observations do not have an equal influence on the parameter estimates in the fitted model. In a simple linear regression, points located at very low or very high values of X have greater influence than those located nearer to the mean of X. The Influential Points pane displays any observations that have high influence on the fitted model: Influential Points Predicted Studentized Row X Y Y Residual Leverage Average leverage of single data point = The above table shows every point with leverage equal to 3 or more times that of an average data point, where the leverage of an observation is a measure of its influence on the estimated model coefficients. In general, values with leverage exceeding 5 times that of an average data value should be examined closely, since they have unusually large impact on the fitted model. In the sample data, there are no observations with unusually large leverage. Forecasts The Forecasts pane creates predictions using the fitted model. Predicted Values 95.00% 95.00% Predicted Prediction Limits Confidence Limits X Y Lower Upper Lower Upper Included in the table are: X - the value of the independent variable at which the prediction is to be made. Predicted Y - the predicted value of the dependent variable using the fitted model. Prediction limits - prediction limits for new observations at the selected level of confidence (corresponds to the outer bounds on the plot of the fitted model). Confidence limits - confidence limits for the mean value of Y at the selected level of confidence (corresponds to the inner bounds on the plot of the fitted model). For example, at X = 3, 95% of all children would be expected to have plasma levels between 5.47 and by Statgraphics Technologies, Inc. Box-Cox Transformations - 14

15 Pane Options Confidence Level: confidence percentage for the intervals. Type of Limits: whether to display two-sided limits or one-sided bounds. Forecast at X: up to 10 values of X at which to make predictions. Save Results The following results may be saved to the datasheet: 1. Predicted Values the predicted value of Y corresponding to each of the n observations. 2. Lower Limits for Predictions the lower prediction limits for each predicted value. 3. Upper Limits for Predictions the upper prediction limits for each predicted value. 4. Lower Limits for Forecast Means the lower confidence limits for the mean value of Y at each of the n values of X. 5. Upper Limits for Forecast Means the upper confidence limits for the mean value of Y at each of the n values of X. 6. Residuals the n residuals. 7. Studentized Residuals the n Studentized residuals. 8. Leverages the leverage values corresponding to the n values of X. 9. Transformed Data the n transformed values W. Note: If limits are saved, they will correspond to the settings on the Forecasts pane. If two-sided limits are displayed in the Forecasts table, then the saved limits will also be two-sided. If onesided bounds are displayed in the table, then the saved limits will also be one-sided. Calculations 2017 by Statgraphics Technologies, Inc. Box-Cox Transformations - 15

16 The linear regression is performed on the transformed values W. Prediction limits are calculated in the transformed metric and inverted before being displayed. For details on the calculations, see the Simple Regression documentation by Statgraphics Technologies, Inc. Box-Cox Transformations - 16

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure. STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed

More information

Polynomial Regression

Polynomial Regression Polynomial Regression Summary... 1 Analysis Summary... 3 Plot of Fitted Model... 4 Analysis Options... 6 Conditional Sums of Squares... 7 Lack-of-Fit Test... 7 Observed versus Predicted... 8 Residual Plots...

More information

Nonlinear Regression. Summary. Sample StatFolio: nonlinear reg.sgp

Nonlinear Regression. Summary. Sample StatFolio: nonlinear reg.sgp Nonlinear Regression Summary... 1 Analysis Summary... 4 Plot of Fitted Model... 6 Response Surface Plots... 7 Analysis Options... 10 Reports... 11 Correlation Matrix... 12 Observed versus Predicted...

More information

Ridge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014

Ridge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014 Ridge Regression Summary... 1 Data Input... 4 Analysis Summary... 5 Analysis Options... 6 Ridge Trace... 7 Regression Coefficients... 8 Standardized Regression Coefficients... 9 Observed versus Predicted...

More information

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials. One-Way ANOVA Summary The One-Way ANOVA procedure is designed to construct a statistical model describing the impact of a single categorical factor X on a dependent variable Y. Tests are run to determine

More information

How To: Deal with Heteroscedasticity Using STATGRAPHICS Centurion

How To: Deal with Heteroscedasticity Using STATGRAPHICS Centurion How To: Deal with Heteroscedasticity Using STATGRAPHICS Centurion by Dr. Neil W. Polhemus July 28, 2005 Introduction When fitting statistical models, it is usually assumed that the error variance is the

More information

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION In this lab you will first learn how to display the relationship between two quantitative variables with a scatterplot and also how to measure the strength of

More information

Multivariate T-Squared Control Chart

Multivariate T-Squared Control Chart Multivariate T-Squared Control Chart Summary... 1 Data Input... 3 Analysis Summary... 4 Analysis Options... 5 T-Squared Chart... 6 Multivariate Control Chart Report... 7 Generalized Variance Chart... 8

More information

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the

More information

Distribution Fitting (Censored Data)

Distribution Fitting (Censored Data) Distribution Fitting (Censored Data) Summary... 1 Data Input... 2 Analysis Summary... 3 Analysis Options... 4 Goodness-of-Fit Tests... 6 Frequency Histogram... 8 Comparison of Alternative Distributions...

More information

Item Reliability Analysis

Item Reliability Analysis Item Reliability Analysis Revised: 10/11/2017 Summary... 1 Data Input... 4 Analysis Options... 5 Tables and Graphs... 5 Analysis Summary... 6 Matrix Plot... 8 Alpha Plot... 10 Correlation Matrix... 11

More information

DOE Wizard Screening Designs

DOE Wizard Screening Designs DOE Wizard Screening Designs Revised: 10/10/2017 Summary... 1 Example... 2 Design Creation... 3 Design Properties... 13 Saving the Design File... 16 Analyzing the Results... 17 Statistical Model... 18

More information

Correspondence Analysis

Correspondence Analysis STATGRAPHICS Rev. 7/6/009 Correspondence Analysis The Correspondence Analysis procedure creates a map of the rows and columns in a two-way contingency table for the purpose of providing insights into the

More information

Multiple Variable Analysis

Multiple Variable Analysis Multiple Variable Analysis Revised: 10/11/2017 Summary... 1 Data Input... 3 Analysis Summary... 3 Analysis Options... 4 Scatterplot Matrix... 4 Summary Statistics... 6 Confidence Intervals... 7 Correlations...

More information

Factor Analysis. Summary. Sample StatFolio: factor analysis.sgp

Factor Analysis. Summary. Sample StatFolio: factor analysis.sgp Factor Analysis Summary... 1 Data Input... 3 Statistical Model... 4 Analysis Summary... 5 Analysis Options... 7 Scree Plot... 9 Extraction Statistics... 10 Rotation Statistics... 11 D and 3D Scatterplots...

More information

Principal Components. Summary. Sample StatFolio: pca.sgp

Principal Components. Summary. Sample StatFolio: pca.sgp Principal Components Summary... 1 Statistical Model... 4 Analysis Summary... 5 Analysis Options... 7 Scree Plot... 8 Component Weights... 9 D and 3D Component Plots... 10 Data Table... 11 D and 3D Component

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

LOOKING FOR RELATIONSHIPS

LOOKING FOR RELATIONSHIPS LOOKING FOR RELATIONSHIPS One of most common types of investigation we do is to look for relationships between variables. Variables may be nominal (categorical), for example looking at the effect of an

More information

Diagnostics and Remedial Measures

Diagnostics and Remedial Measures Diagnostics and Remedial Measures Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Diagnostics and Remedial Measures 1 / 72 Remedial Measures How do we know that the regression

More information

Canonical Correlations

Canonical Correlations Canonical Correlations Summary The Canonical Correlations procedure is designed to help identify associations between two sets of variables. It does so by finding linear combinations of the variables in

More information

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006 Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice

The Model Building Process Part I: Checking Model Assumptions Best Practice The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test

More information

Fractional Polynomial Regression

Fractional Polynomial Regression Chapter 382 Fractional Polynomial Regression Introduction This program fits fractional polynomial models in situations in which there is one dependent (Y) variable and one independent (X) variable. It

More information

CHAPTER 10. Regression and Correlation

CHAPTER 10. Regression and Correlation CHAPTER 10 Regression and Correlation In this Chapter we assess the strength of the linear relationship between two continuous variables. If a significant linear relationship is found, the next step would

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE

More information

How To: Analyze a Split-Plot Design Using STATGRAPHICS Centurion

How To: Analyze a Split-Plot Design Using STATGRAPHICS Centurion How To: Analyze a SplitPlot Design Using STATGRAPHICS Centurion by Dr. Neil W. Polhemus August 13, 2005 Introduction When performing an experiment involving several factors, it is best to randomize the

More information

An area chart emphasizes the trend of each value over time. An area chart also shows the relationship of parts to a whole.

An area chart emphasizes the trend of each value over time. An area chart also shows the relationship of parts to a whole. Excel 2003 Creating a Chart Introduction Page 1 By the end of this lesson, learners should be able to: Identify the parts of a chart Identify different types of charts Create an Embedded Chart Create a

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Ratio of Polynomials Fit One Variable

Ratio of Polynomials Fit One Variable Chapter 375 Ratio of Polynomials Fit One Variable Introduction This program fits a model that is the ratio of two polynomials of up to fifth order. Examples of this type of model are: and Y = A0 + A1 X

More information

Assumptions in Regression Modeling

Assumptions in Regression Modeling Fall Semester, 2001 Statistics 621 Lecture 2 Robert Stine 1 Assumptions in Regression Modeling Preliminaries Preparing for class Read the casebook prior to class Pace in class is too fast to absorb without

More information

Arrhenius Plot. Sample StatFolio: arrhenius.sgp

Arrhenius Plot. Sample StatFolio: arrhenius.sgp Summary The procedure is designed to plot data from an accelerated life test in which failure times have been recorded and percentiles estimated at a number of different temperatures. The percentiles P

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

Seasonal Adjustment using X-13ARIMA-SEATS

Seasonal Adjustment using X-13ARIMA-SEATS Seasonal Adjustment using X-13ARIMA-SEATS Revised: 10/9/2017 Summary... 1 Data Input... 3 Limitations... 4 Analysis Options... 5 Tables and Graphs... 6 Analysis Summary... 7 Data Table... 9 Trend-Cycle

More information

Bivariate data analysis

Bivariate data analysis Bivariate data analysis Categorical data - creating data set Upload the following data set to R Commander sex female male male male male female female male female female eye black black blue green green

More information

1 Introduction to Minitab

1 Introduction to Minitab 1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you

More information

NCSS Statistical Software. Harmonic Regression. This section provides the technical details of the model that is fit by this procedure.

NCSS Statistical Software. Harmonic Regression. This section provides the technical details of the model that is fit by this procedure. Chapter 460 Introduction This program calculates the harmonic regression of a time series. That is, it fits designated harmonics (sinusoidal terms of different wavelengths) using our nonlinear regression

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

Probability Plots. Summary. Sample StatFolio: probplots.sgp

Probability Plots. Summary. Sample StatFolio: probplots.sgp STATGRAPHICS Rev. 9/6/3 Probability Plots Summary... Data Input... 2 Analysis Summary... 2 Analysis Options... 3 Uniform Plot... 3 Normal Plot... 4 Lognormal Plot... 4 Weibull Plot... Extreme Value Plot...

More information

Hypothesis Testing for Var-Cov Components

Hypothesis Testing for Var-Cov Components Hypothesis Testing for Var-Cov Components When the specification of coefficients as fixed, random or non-randomly varying is considered, a null hypothesis of the form is considered, where Additional output

More information

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv). Regression Analysis Two variables may be related in such a way that the magnitude of one, the dependent variable, is assumed to be a function of the magnitude of the second, the independent variable; however,

More information

STATISTICS 110/201 PRACTICE FINAL EXAM

STATISTICS 110/201 PRACTICE FINAL EXAM STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable

More information

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). Linear Regression Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). A dependent variable is a random variable whose variation

More information

2. Outliers and inference for regression

2. Outliers and inference for regression Unit6: Introductiontolinearregression 2. Outliers and inference for regression Sta 101 - Spring 2016 Duke University, Department of Statistical Science Dr. Çetinkaya-Rundel Slides posted at http://bit.ly/sta101_s16

More information

Computer simulation of radioactive decay

Computer simulation of radioactive decay Computer simulation of radioactive decay y now you should have worked your way through the introduction to Maple, as well as the introduction to data analysis using Excel Now we will explore radioactive

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x

More information

Using SPSS for One Way Analysis of Variance

Using SPSS for One Way Analysis of Variance Using SPSS for One Way Analysis of Variance This tutorial will show you how to use SPSS version 12 to perform a one-way, between- subjects analysis of variance and related post-hoc tests. This tutorial

More information

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS The data used in this example describe teacher and student behavior in 8 classrooms. The variables are: Y percentage of interventions

More information

Ratio of Polynomials Fit Many Variables

Ratio of Polynomials Fit Many Variables Chapter 376 Ratio of Polynomials Fit Many Variables Introduction This program fits a model that is the ratio of two polynomials of up to fifth order. Instead of a single independent variable, these polynomials

More information

Analysis of Covariance (ANCOVA) with Two Groups

Analysis of Covariance (ANCOVA) with Two Groups Chapter 226 Analysis of Covariance (ANCOVA) with Two Groups Introduction This procedure performs analysis of covariance (ANCOVA) for a grouping variable with 2 groups and one covariate variable. This procedure

More information

Remedial Measures Wrap-Up and Transformations-Box Cox

Remedial Measures Wrap-Up and Transformations-Box Cox Remedial Measures Wrap-Up and Transformations-Box Cox Frank Wood October 25, 2011 Last Class Graphical procedures for determining appropriateness of regression fit - Normal probability plot Tests to determine

More information

Ridge Regression. Chapter 335. Introduction. Multicollinearity. Effects of Multicollinearity. Sources of Multicollinearity

Ridge Regression. Chapter 335. Introduction. Multicollinearity. Effects of Multicollinearity. Sources of Multicollinearity Chapter 335 Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates are unbiased, but their variances

More information

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear

Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear relationship between: - one independent variable X and -

More information

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club BE640 Intermediate Biostatistics 2. Regression and Correlation Simple Linear Regression Software: SAS Emergency Calls to the New York Auto Club Source: Chatterjee, S; Handcock MS and Simonoff JS A Casebook

More information

Single and multiple linear regression analysis

Single and multiple linear regression analysis Single and multiple linear regression analysis Marike Cockeran 2017 Introduction Outline of the session Simple linear regression analysis SPSS example of simple linear regression analysis Additional topics

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Multivariate Capability Analysis Using Statgraphics. Presented by Dr. Neil W. Polhemus

Multivariate Capability Analysis Using Statgraphics. Presented by Dr. Neil W. Polhemus Multivariate Capability Analysis Using Statgraphics Presented by Dr. Neil W. Polhemus Multivariate Capability Analysis Used to demonstrate conformance of a process to requirements or specifications that

More information

MULTIPLE LINEAR REGRESSION IN MINITAB

MULTIPLE LINEAR REGRESSION IN MINITAB MULTIPLE LINEAR REGRESSION IN MINITAB This document shows a complicated Minitab multiple regression. It includes descriptions of the Minitab commands, and the Minitab output is heavily annotated. Comments

More information

Question Possible Points Score Total 100

Question Possible Points Score Total 100 Midterm I NAME: Instructions: 1. For hypothesis testing, the significant level is set at α = 0.05. 2. This exam is open book. You may use textbooks, notebooks, and a calculator. 3. Do all your work in

More information

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered)

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered) Test 3 Practice Test A NOTE: Ignore Q10 (not covered) MA 180/418 Midterm Test 3, Version A Fall 2010 Student Name (PRINT):............................................. Student Signature:...................................................

More information

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author... From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. Contents About This Book... xiii About The Author... xxiii Chapter 1 Getting Started: Data Analysis with JMP...

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

Using Tables and Graphing Calculators in Math 11

Using Tables and Graphing Calculators in Math 11 Using Tables and Graphing Calculators in Math 11 Graphing calculators are not required for Math 11, but they are likely to be helpful, primarily because they allow you to avoid the use of tables in some

More information

AMS 7 Correlation and Regression Lecture 8

AMS 7 Correlation and Regression Lecture 8 AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

How to Run the Analysis: To run a principal components factor analysis, from the menus choose: Analyze Dimension Reduction Factor...

How to Run the Analysis: To run a principal components factor analysis, from the menus choose: Analyze Dimension Reduction Factor... The principal components method of extraction begins by finding a linear combination of variables that accounts for as much variation in the original variables as possible. This method is most often used

More information

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Psychology Seminar Psych 406 Dr. Jeffrey Leitzel Structural Equation Modeling Topic 1: Correlation / Linear Regression Outline/Overview Correlations (r, pr, sr) Linear regression Multiple regression interpreting

More information

Topic 18: Model Selection and Diagnostics

Topic 18: Model Selection and Diagnostics Topic 18: Model Selection and Diagnostics Variable Selection We want to choose a best model that is a subset of the available explanatory variables Two separate problems 1. How many explanatory variables

More information

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation y = a + bx y = dependent variable a = intercept b = slope x = independent variable Section 12.1 Inference for Linear

More information

Analysis of Bivariate Data

Analysis of Bivariate Data Analysis of Bivariate Data Data Two Quantitative variables GPA and GAES Interest rates and indices Tax and fund allocation Population size and prison population Bivariate data (x,y) Case corr&reg 2 Independent

More information

STAT 350 Final (new Material) Review Problems Key Spring 2016

STAT 350 Final (new Material) Review Problems Key Spring 2016 1. The editor of a statistics textbook would like to plan for the next edition. A key variable is the number of pages that will be in the final version. Text files are prepared by the authors using LaTeX,

More information

Independent Samples ANOVA

Independent Samples ANOVA Independent Samples ANOVA In this example students were randomly assigned to one of three mnemonics (techniques for improving memory) rehearsal (the control group; simply repeat the words), visual imagery

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Regression Analysis. BUS 735: Business Decision Making and Research

Regression Analysis. BUS 735: Business Decision Making and Research Regression Analysis BUS 735: Business Decision Making and Research 1 Goals and Agenda Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn

More information

Ch 13 & 14 - Regression Analysis

Ch 13 & 14 - Regression Analysis Ch 3 & 4 - Regression Analysis Simple Regression Model I. Multiple Choice:. A simple regression is a regression model that contains a. only one independent variable b. only one dependent variable c. more

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

Passing-Bablok Regression for Method Comparison

Passing-Bablok Regression for Method Comparison Chapter 313 Passing-Bablok Regression for Method Comparison Introduction Passing-Bablok regression for method comparison is a robust, nonparametric method for fitting a straight line to two-dimensional

More information

Lab 1 Uniform Motion - Graphing and Analyzing Motion

Lab 1 Uniform Motion - Graphing and Analyzing Motion Lab 1 Uniform Motion - Graphing and Analyzing Motion Objectives: < To observe the distance-time relation for motion at constant velocity. < To make a straight line fit to the distance-time data. < To interpret

More information

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Statistics II Exercises Chapter 5

Statistics II Exercises Chapter 5 Statistics II Exercises Chapter 5 1. Consider the four datasets provided in the transparencies for Chapter 5 (section 5.1) (a) Check that all four datasets generate exactly the same LS linear regression

More information

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable.

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable. CHAPTER 9 Simple Linear Regression and Correlation Regression used to predict or estimate the value of one variable corresponding to a given value of another variable. X = independent variable. Y = dependent

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

ASSIGNMENT 3 SIMPLE LINEAR REGRESSION. Old Faithful

ASSIGNMENT 3 SIMPLE LINEAR REGRESSION. Old Faithful ASSIGNMENT 3 SIMPLE LINEAR REGRESSION In the simple linear regression model, the mean of a response variable is a linear function of an explanatory variable. The model and associated inferential tools

More information

Chapter 12: Multiple Regression

Chapter 12: Multiple Regression Chapter 12: Multiple Regression 12.1 a. A scatterplot of the data is given here: Plot of Drug Potency versus Dose Level Potency 0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 Dose Level b. ŷ = 8.667 + 0.575x

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Index I-1. in one variable, solution set of, 474 solving by factoring, 473 cubic function definition, 394 graphs of, 394 x-intercepts on, 474

Index I-1. in one variable, solution set of, 474 solving by factoring, 473 cubic function definition, 394 graphs of, 394 x-intercepts on, 474 Index A Absolute value explanation of, 40, 81 82 of slope of lines, 453 addition applications involving, 43 associative law for, 506 508, 570 commutative law for, 238, 505 509, 570 English phrases for,

More information

MULTIPLE REGRESSION METHODS

MULTIPLE REGRESSION METHODS DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 MULTIPLE REGRESSION METHODS I. AGENDA: A. Residuals B. Transformations 1. A useful procedure for making transformations C. Reading:

More information

INFERENCE FOR REGRESSION

INFERENCE FOR REGRESSION CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

22S39: Class Notes / November 14, 2000 back to start 1

22S39: Class Notes / November 14, 2000 back to start 1 Model diagnostics Interpretation of fitted regression model 22S39: Class Notes / November 14, 2000 back to start 1 Model diagnostics 22S39: Class Notes / November 14, 2000 back to start 2 Model diagnostics

More information