Nonlinear Regression. Summary. Sample StatFolio: nonlinear reg.sgp

Size: px
Start display at page:

Download "Nonlinear Regression. Summary. Sample StatFolio: nonlinear reg.sgp"

Transcription

1 Nonlinear Regression Summary... 1 Analysis Summary... 4 Plot of Fitted Model... 6 Response Surface Plots... 7 Analysis Options Reports Correlation Matrix Observed versus Predicted Residual Plots Unusual Residuals Influential Points Save Results Calculations Summary The Nonlinear Regression procedure fits a user-specified function relating a single dependent variable Y to one or more independent variables X. The model is estimated using nonlinear least squares. The fitted model may be plotted, forecasts generated from it, and unusual residuals identified. Sample StatFolio: nonlinear reg.sgp Sample Data The file nonlin.sgd contains data on the amount of available chlorine in samples of a product as a function of the number of weeks since it was produced. The data, from Draper and Smith (1998), consists of n = 44 samples, a portion of which are shown below: Weeks Chlorine It is desired to fit the following model to the data: b weeks 8 chlorine a (0.49 a) e (1) This model, suggested by a subject matter expert, contains two unknowns: a, the asymptotic baseline value reached at large values of weeks, and b, the exponential rate of decay by StatPoint Technologies, Inc. Nonlinear Regression - 1

2 Data Input The first of two data input dialog boxes requests the name of the dependent variable and the model to be fit: Dependent Variable: numeric column containing the n values of Y. Function: a STATGRAPHICS expression representing the function to be fit. It must include one or more names of numeric columns, representing the independent variables. It may also include functions such as SQRT or EXP. Any unrecognized names are considered to represent model parameters that need to be estimated. Weight: an optional numeric column containing weights to be applied to the squared residuals when performing a weighted least squares fit. Select: subset selection by StatPoint Technologies, Inc. Nonlinear Regression - 2

3 The second dialog box requests initial estimates for each of the unknown model parameters: Enter an initial estimate for each parameter. The program will begin with the initial estimates and perform a numerical search to find estimates that minimize the residual sum of squares. Depending upon the complexity of the model, poor estimates may or may not lead to an optimal solution. In all but the simplest cases, intelligent selection of initial estimates can greatly improve the chances of obtaining a good solution. Typically, it is important to at least give estimates with the proper sign (positive or negative), since the search procedure might otherwise move in an entirely wrong direction by StatPoint Technologies, Inc. Nonlinear Regression - 3

4 Analysis Summary The Analysis Summary shows the results of the fit. Nonlinear Regression - chlorine Dependent variable: chlorine Independent variables: weeks Function to be estimated: a+(0.49-a)*exp(-b*(weeks-8)) Initial parameter estimates: a = 0.1 b = 0.1 Number of observations: 44 Estimation method: Marquardt Estimation stopped due to convergence of residual sum of squares. Number of iterations: 4 Number of function calls: 14 Estimation Results Asymptotic 95.0% Asymptotic Confidence Interval Parameter Estimate Standard Error Lower Upper a b Analysis of Variance Source Sum of Squares Df Mean Square Model Residual Total Total (Corr.) R-Squared = percent R-Squared (adjusted for d.f.) = percent Standard Error of Est. = Mean absolute error = Durbin-Watson statistic = Lag 1 residual autocorrelation = Residual Analysis Estimation n 44 MSE MAE MAPE ME MPE Validation Included in the output are: Data Summary: a summary of the input data. Function to be Estimated: the function to be estimated and the initial parameter estimates. Estimation Statistics: the method of estimation used and the number of iterations and function calls performed by StatPoint Technologies, Inc. Nonlinear Regression - 4

5 Parameter Estimates: the estimated parameters with approximate confidence intervals. Confidence intervals that do not contain 0 indicate that the model parameter is statistically significant at the stated confidence level. Analysis of Variance: decomposition of the variability of the dependent variable Y into a model sum of squares and a residual or error sum of squares. Statistics: summary statistics for the fitted model, including: R-squared - represents the percentage of the variability in Y which has been explained by the fitted regression model, ranging from 0% to 100%. For the sample data, the regression has accounted for about 87.3% of the variability amongst the observed chlorine concentrations. Adjusted R-Squared the R-squared statistic, adjusted for the number of coefficients in the model. This value is often used to compare models with different numbers of coefficients. Standard Error of Est. the estimated standard deviation of the residuals (the deviations around the model). This value is used to create prediction limits for new observations. Mean Absolute Error the average absolute value of the residuals. Durbin-Watson Statistic a measure of serial correlation in the residuals. If the residuals vary randomly, this value should be close to 2. A small P-value indicates a non-random pattern in the residuals. For data recorded over time, a small P-value could indicate that some trend over time has not been accounted for. Lag 1 Residual Autocorrelation the estimated correlation between consecutive residuals, on a scale of 1 to 1. Values far from 0 indicate that significant structure remains unaccounted for by the model. Residual Analysis if a subset of the rows in the datasheet have been excluded from the analysis using the Select field on the data input dialog box, the fitted model is used to make predictions of the Y values for those rows. This table shows statistics on the prediction errors, defined by e i y yˆ (2) i i Included are the mean squared error (MSE), the mean absolute error (MAE), the mean absolute percentage error (MAPE), the mean error (ME), and the mean percentage error (MPE). This validation statistics can be compared to the statistics for the fitted model to determine how well that model predicts observations outside of the data used to fit it. For the sample data, the fitted model is chlorine = ( )exp( (weeks-8)) (3) The model begins with chlorine = 0.49 at weeks = 8 and drops exponentially to a baseline at approximately 0.39 as weeks increase by StatPoint Technologies, Inc. Nonlinear Regression - 5

6 chlorine Plot of Fitted Model This Plot of Fitted Model pane plots the fitted model versus any one of the independent variables, with the other variables set equal to values specified on the Pane Options dialog box Plot of Fitted Model weeks Pane Options Select any one variable to plot on the horizontal axis, together with its range. For the other variables, enter values to be substituted into the fitted model by StatPoint Technologies, Inc. Nonlinear Regression - 6

7 temperature material Response Surface Plots If more than one independent variable is included in the model, surface and contour plots can be created. For example, Draper and Smith (1998) report on an experiment in which the fraction of material Y remaining after a chemical reaction was described by the model Y 1 1 exp 1X1 exp 2 (4) X where X 1 was the reaction time in minutes and X 2 was the reaction temperature in degrees Kelvin. The data is saved in the file nlreact.sgd and the analysis in nlreact.sgp. A surface plot of the fitted model is shown below: Estimated Response Surface time temperature In a surface plot, the height of the surface represents the predicted value of Y. The second option labeled Response Surface Plots on the Graphical Options menu creates a contour plot: Contours of Estimated Response Surface time material In a contour plot of the above form, each line represents combinations of X 1 and X 2 that result in the same predicted value for Y. Various other formats are available using Pane Options by StatPoint Technologies, Inc. Nonlinear Regression - 7

8 Pane Options Type: choose from a 3-D Surface Plot, where the height of the surface represents the value of Y versus any two independent variables; a 2-D Contour Plot, where lines or colored regions represent the value of Y as a function of any two independent variables; a 2-D Square Plot, where the predicted value of Y is shown at different combinations of 2 independent variables; or a 3-D Cube Plot, in which the predicted value of Y is shown at different combinations of 3 independent variables. Contours: the limits and spacing of the contour lines or regions. The contours may be drawn as solid Lines representing a single value of Y, Painted Regions representing intervals, or using a Continuous range of colors. Resolution: the number of divisions along each axis at which the value of Y is plotted. Increasing the resolution may improve the quality of the plot, but it can also increase the length of time required to draw it. Surface: for a surface plot, the number of divisions along each axis between the lines used to draw the surface. The surface may be drawn as a Wire Frame (transparent mesh), as a solid colored surface, or contoured (colored according to values of Y). Contours below puts a contour plot in the bottom of the cube. Show Points plots the observations with lines drawn to the surface. Factors: press this button to select the factors to be plotted. A dialog box similar to that described for the Plot of Fitted Model will be displayed by StatPoint Technologies, Inc. Nonlinear Regression - 8

9 material temperature Example Contour Plot with Continuous Colors Contours of Estimated Response Surface time material Example Surface Plot with Contour Below and Show Points Selected Estimated Response Surface temperature time material by StatPoint Technologies, Inc. Nonlinear Regression - 9

10 Analysis Options The Analysis Options dialog box controls the algorithm used to fit the model: Method: method used to estimate the model parameters. The Gauss-Newton method uses a linearization technique that fits a sequence of linear regression models to locate the minimum residual sum of squares. The Steepest-Descent method follows the gradient of the residual sum of squares surface. Marquardt s method, the default, is a fast and reliable compromise between the other two. Stopping Criterion 1: The algorithm is assumed to have converged when the relative change in the residuals sums of squares from one iteration to the next is less than this value. Stopping Criterion 2: The algorithm is assumed to have converged when the relative change in all parameter estimates from one iteration to the next is less than this value. Maximum Iterations: Estimation stops if convergence is not achieved within this many iterations. Maximum Function Calls: Estimation stops if convergence is not achieved when the function being fit has been evaluated this many times. Multiple function evaluations are done during each iteration. Marquardt Parameter: The magnitude of the Marquardt parameter controls the extent to which the other two methods are traded off against each other. For details on the Marquardt algorithm, see Box, Jenkins and Reinsel (1994). Confidence Level: the percentage used to calculate the asymptotic confidence intervals for the model coefficients by StatPoint Technologies, Inc. Nonlinear Regression - 10

11 Reports The Reports pane creates predictions using the fitted model. By default, the table includes a line for each row in the datasheet that has complete information on the X variables and a missing value for the Y variable. This allows you to add columns to the bottom of the datasheet corresponding to levels at which you want predictions without affecting the fitted model. For example, suppose a prediction is desired at Weeks = 50 (admittedly an extrapolation of the model). In row #45 of the datasheet, the value 50 would be added to the Weeks column but the Chlorine column would be left blank. The resulting table is shown below: Regression Results for chlorine Fitted Stnd. Error Lower 95.0% CL Upper 95.0% CL Lower 95.0% CL Upper 95.0% CL Row Value for Forecast for Forecast for Forecast for Mean for Mean Included in the table are: Row - the row number in the data sheet containing the values of the independent variables. Fitted Value - the predicted value of the dependent variable using the fitted model. Standard Error for Forecast - the estimated standard error for predicting a single new observation. Confidence Limits for Forecast - prediction limits for new observations. Confidence Limits for Mean - confidence limits for the mean value of Y at the settings of the independent variables. For row #45, the predicted chlorine level is approximately A new sample at Weeks = 50 would be expected to be between and with 95% confidence (provided the extrapolation held The mean chlorine level at 50 weeks is estimated to be somewhere between and Using Pane Options, additional information about the predicted values and residuals for the data used to fit the model can also be included in the table by StatPoint Technologies, Inc. Nonlinear Regression - 11

12 Pane Options You may include: Observed Y the observed values of the dependent variable. Fitted Y the predicted values from the fitted model. Residuals the ordinary residuals (observed minus predicted). Studentized Residuals the Studentized deleted residuals as described earlier. Standard Errors for Forecasts the standard errors for new observations at values of the independent variables corresponding to each row of the datasheet. Confidence Limits for Individual Forecasts confidence intervals for new observations. Confidence Limits for Forecast Means confidence intervals for the mean value of Y at values of the independent variables corresponding to each row of the datasheet. Correlation Matrix The Correlation Matrix displays estimates of the correlation between the estimated coefficients. Asymptotic correlation matrix for coefficient estimates a b a b This table can be helpful in determining how well the effects of different independent variables have been separated from each other by StatPoint Technologies, Inc. Nonlinear Regression - 12

13 observed Observed versus Predicted The Observed versus Predicted plot shows the observed values of Y on the vertical axis and the predicted values Yˆ on the horizontal axis Plot of chlorine predicted If the model fits well, the points should be randomly scattered around the diagonal line. It is sometimes possible to see curvature in this plot, which would indicate the need for a curvilinear model rather than a linear model. Any change in variability from low values of Y to high values of Y might also indicate the need to transform the dependent variable before fitting a model to the data. Residual Plots As with all statistical models, it is good practice to examine the residuals. In a regression, the residuals are defined by e i y yˆ (5) i i i.e., the residuals are the differences between the observed data values and the fitted model. The Nonlinear Regression procedure creates various type of residual plots, depending on Pane Options by StatPoint Technologies, Inc. Nonlinear Regression - 13

14 percentage Studentized residual Scatterplot versus X This plot is helpful in visualizing any need for a different model. 4.4 Residual Plot predicted chlorine Normal Probability Plot This plot can be used to determine whether or not the deviations around the line follow a normal distribution, which is the assumption used to form the prediction intervals. Normal Probability Plot for chlorine Studentized residual If the deviations follow a normal distribution, they should fall approximately along a straight line. In the above plot, the data deviate quite a bit from the straight line, indicating that the deviations follow a distribution with longer tails than that of a normal distribution by StatPoint Technologies, Inc. Nonlinear Regression - 14

15 autocorrelation Residual Autocorrelations This plot calculates the autocorrelation between residuals as a function of the number of rows between them in the datasheet. 1 Residual Autocorrelations for chlorine lag It is only relevant if the data have been collected sequentially. Any bars extending beyond the probability limits would indicate significant dependence between residuals separated by the indicated lag, which would violate the assumption of independence made when fitting the regression model. Pane Options Plot: the type of residuals to plot: 1. Residuals the residuals from the least squares fit. 2. Studentized residuals the difference between the observed values y i and the predicted values ŷ i when the model is fit using all observations except the i-th, divided by the estimated standard error. These residuals are sometimes called externally deleted residuals, since they measure how far each value is from the fitted model when that 2013 by StatPoint Technologies, Inc. Nonlinear Regression - 15

16 model is fit using all of the data except the point being considered. This is important, since a large outlier might otherwise affect the model so much that it would not appear to be unusually far away from the line. Type: the type of plot to be created. A Scatterplot is used to test for curvature. A Normal Probability Plot is used to determine whether the model residuals come from a normal distribution. An Autocorrelation Function is used to test for dependence between consecutive residuals. Plot Versus: for a Scatterplot, the quantity to plot on the horizontal axis. Number of Lags: for an Autocorrelation Function, the maximum number of lags. For small data sets, the number of lags plotted may be less than this value. Confidence Level: for an Autocorrelation Function, the level used to create the probability limits. Unusual Residuals Once the model has been fit, it is useful to study the residuals to determine whether any outliers exist that should be removed from the data. The Unusual Residuals pane lists all observations that have Studentized residuals of 2.0 or greater in absolute value. Unusual Residuals for chlorine Predicted Studentized Row Y Y Residual Residual Studentized residuals greater than 3 in absolute value correspond to points more than 3 standard deviations from the fitted model, which is a rare event for a normal distribution. Row #17 is more than 3.5 standard deviations from the fitted model, which is a very rare event if the deviations follow a normal distribution. Note: Points can be removed from the fit while examining the Plot of the Fitted Model by clicking on a point and then pressing the Exclude/Include button on the analysis toolbar. Excluded points are marked with an X by StatPoint Technologies, Inc. Nonlinear Regression - 16

17 Influential Points In fitting a regression model, all observations do not have an equal influence on the parameter estimates in the fitted model. In a simple regression, points located at very low or very high values of X have greater influence than those located nearer to the mean of X. The Influential Points pane displays any observations that have high influence on the fitted model: Influential Points for chlorine Mahalanobis Cook's Row Leverage Distance DFITS Distance Average leverage of single data point = Points are placed on this list for one of the following reasons: Leverage measures how distant an observation is from the mean of all n observations in the space of the independent variables. The higher the leverage, the greater the impact of the point on the fitted values ŷ. Points are placed on the list if their leverage is more than 3 times that of an average data point. Mahalanobis Distance measures the distance of a point from the center of the collection of points in the multivariate space of the independent variables. Since this distance is related to leverage, it is not used to select points for the table. DFITS measures the difference between the predicted values ŷ i when the model is fit with and without the i-th data point. Points are placed on the list if the absolute value of DFITS exceeds 2 p / n, where p is the number of coefficients in the fitted model by StatPoint Technologies, Inc. Nonlinear Regression - 17

18 Save Results The following results may be saved to the datasheet: 1. Predicted Values the predicted value of Y corresponding to each of the n observations. 2. Standard Errors of Predictions - the standard errors for the n predicted values. 3. Lower Limits for Predictions the lower prediction limits for each predicted value. 4. Upper Limits for Predictions the upper prediction limits for each predicted value. 5. Standard Errors of Means - the standard errors for the mean value of Y at each of the n values of X. 6. Lower Limits for Forecast Means the lower confidence limits for the mean value of Y at each of the n values of X. 7. Upper Limits for Forecast Means the upper confidence limits for the mean value of Y at each of the n values of X. 8. Residuals the n residuals. 9. Studentized Residuals the n Studentized residuals. 10. Leverages the leverage values corresponding to the n values of X. 11. DFITS Statistics the value of the DFITS statistic corresponding to the n values of X. 12. Mahalanobis Distances the Mahalanobis distance corresponding to the n values of X. 13. Coefficients the estimated model coefficients. 14. Function a text string containing the STATGRAPHICS expression for the function that was fit. Calculations Parameter estimates are found by numerically minimizing the residual sums of squares. The variance-covariance matrix of the coefficients is estimated from the partial derivatives in the neighborhood of the least squares solution by StatPoint Technologies, Inc. Nonlinear Regression - 18

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure. STATGRAPHICS Rev. 9/13/213 Calibration Models Summary... 1 Data Input... 3 Analysis Summary... 5 Analysis Options... 7 Plot of Fitted Model... 9 Predicted Values... 1 Confidence Intervals... 11 Observed

More information

Ridge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014

Ridge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014 Ridge Regression Summary... 1 Data Input... 4 Analysis Summary... 5 Analysis Options... 6 Ridge Trace... 7 Regression Coefficients... 8 Standardized Regression Coefficients... 9 Observed versus Predicted...

More information

Polynomial Regression

Polynomial Regression Polynomial Regression Summary... 1 Analysis Summary... 3 Plot of Fitted Model... 4 Analysis Options... 6 Conditional Sums of Squares... 7 Lack-of-Fit Test... 7 Observed versus Predicted... 8 Residual Plots...

More information

Box-Cox Transformations

Box-Cox Transformations Box-Cox Transformations Revised: 10/10/2017 Summary... 1 Data Input... 3 Analysis Summary... 3 Analysis Options... 5 Plot of Fitted Model... 6 MSE Comparison Plot... 8 MSE Comparison Table... 9 Skewness

More information

How To: Deal with Heteroscedasticity Using STATGRAPHICS Centurion

How To: Deal with Heteroscedasticity Using STATGRAPHICS Centurion How To: Deal with Heteroscedasticity Using STATGRAPHICS Centurion by Dr. Neil W. Polhemus July 28, 2005 Introduction When fitting statistical models, it is usually assumed that the error variance is the

More information

Multivariate T-Squared Control Chart

Multivariate T-Squared Control Chart Multivariate T-Squared Control Chart Summary... 1 Data Input... 3 Analysis Summary... 4 Analysis Options... 5 T-Squared Chart... 6 Multivariate Control Chart Report... 7 Generalized Variance Chart... 8

More information

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION In this lab you will first learn how to display the relationship between two quantitative variables with a scatterplot and also how to measure the strength of

More information

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials. One-Way ANOVA Summary The One-Way ANOVA procedure is designed to construct a statistical model describing the impact of a single categorical factor X on a dependent variable Y. Tests are run to determine

More information

DOE Wizard Screening Designs

DOE Wizard Screening Designs DOE Wizard Screening Designs Revised: 10/10/2017 Summary... 1 Example... 2 Design Creation... 3 Design Properties... 13 Saving the Design File... 16 Analyzing the Results... 17 Statistical Model... 18

More information

Automatic Forecasting

Automatic Forecasting Automatic Forecasting Summary The Automatic Forecasting procedure is designed to forecast future values of time series data. A time series consists of a set of sequential numeric data taken at equally

More information

Item Reliability Analysis

Item Reliability Analysis Item Reliability Analysis Revised: 10/11/2017 Summary... 1 Data Input... 4 Analysis Options... 5 Tables and Graphs... 5 Analysis Summary... 6 Matrix Plot... 8 Alpha Plot... 10 Correlation Matrix... 11

More information

Factor Analysis. Summary. Sample StatFolio: factor analysis.sgp

Factor Analysis. Summary. Sample StatFolio: factor analysis.sgp Factor Analysis Summary... 1 Data Input... 3 Statistical Model... 4 Analysis Summary... 5 Analysis Options... 7 Scree Plot... 9 Extraction Statistics... 10 Rotation Statistics... 11 D and 3D Scatterplots...

More information

Arrhenius Plot. Sample StatFolio: arrhenius.sgp

Arrhenius Plot. Sample StatFolio: arrhenius.sgp Summary The procedure is designed to plot data from an accelerated life test in which failure times have been recorded and percentiles estimated at a number of different temperatures. The percentiles P

More information

Distribution Fitting (Censored Data)

Distribution Fitting (Censored Data) Distribution Fitting (Censored Data) Summary... 1 Data Input... 2 Analysis Summary... 3 Analysis Options... 4 Goodness-of-Fit Tests... 6 Frequency Histogram... 8 Comparison of Alternative Distributions...

More information

Correspondence Analysis

Correspondence Analysis STATGRAPHICS Rev. 7/6/009 Correspondence Analysis The Correspondence Analysis procedure creates a map of the rows and columns in a two-way contingency table for the purpose of providing insights into the

More information

Principal Components. Summary. Sample StatFolio: pca.sgp

Principal Components. Summary. Sample StatFolio: pca.sgp Principal Components Summary... 1 Statistical Model... 4 Analysis Summary... 5 Analysis Options... 7 Scree Plot... 8 Component Weights... 9 D and 3D Component Plots... 10 Data Table... 11 D and 3D Component

More information

Multiple Variable Analysis

Multiple Variable Analysis Multiple Variable Analysis Revised: 10/11/2017 Summary... 1 Data Input... 3 Analysis Summary... 3 Analysis Options... 4 Scatterplot Matrix... 4 Summary Statistics... 6 Confidence Intervals... 7 Correlations...

More information

Ratio of Polynomials Fit One Variable

Ratio of Polynomials Fit One Variable Chapter 375 Ratio of Polynomials Fit One Variable Introduction This program fits a model that is the ratio of two polynomials of up to fifth order. Examples of this type of model are: and Y = A0 + A1 X

More information

Seasonal Adjustment using X-13ARIMA-SEATS

Seasonal Adjustment using X-13ARIMA-SEATS Seasonal Adjustment using X-13ARIMA-SEATS Revised: 10/9/2017 Summary... 1 Data Input... 3 Limitations... 4 Analysis Options... 5 Tables and Graphs... 6 Analysis Summary... 7 Data Table... 9 Trend-Cycle

More information

How To: Analyze a Split-Plot Design Using STATGRAPHICS Centurion

How To: Analyze a Split-Plot Design Using STATGRAPHICS Centurion How To: Analyze a SplitPlot Design Using STATGRAPHICS Centurion by Dr. Neil W. Polhemus August 13, 2005 Introduction When performing an experiment involving several factors, it is best to randomize the

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

Probability Plots. Summary. Sample StatFolio: probplots.sgp

Probability Plots. Summary. Sample StatFolio: probplots.sgp STATGRAPHICS Rev. 9/6/3 Probability Plots Summary... Data Input... 2 Analysis Summary... 2 Analysis Options... 3 Uniform Plot... 3 Normal Plot... 4 Lognormal Plot... 4 Weibull Plot... Extreme Value Plot...

More information

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the

More information

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Circle the single best answer for each multiple choice question. Your choice should be made clearly. TEST #1 STA 4853 March 6, 2017 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. There are 32 multiple choice

More information

Ratio of Polynomials Fit Many Variables

Ratio of Polynomials Fit Many Variables Chapter 376 Ratio of Polynomials Fit Many Variables Introduction This program fits a model that is the ratio of two polynomials of up to fifth order. Instead of a single independent variable, these polynomials

More information

Fractional Polynomial Regression

Fractional Polynomial Regression Chapter 382 Fractional Polynomial Regression Introduction This program fits fractional polynomial models in situations in which there is one dependent (Y) variable and one independent (X) variable. It

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice

The Model Building Process Part I: Checking Model Assumptions Best Practice The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test

More information

Regression Diagnostics Procedures

Regression Diagnostics Procedures Regression Diagnostics Procedures ASSUMPTIONS UNDERLYING REGRESSION/CORRELATION NORMALITY OF VARIANCE IN Y FOR EACH VALUE OF X For any fixed value of the independent variable X, the distribution of the

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE

More information

Circle a single answer for each multiple choice question. Your choice should be made clearly.

Circle a single answer for each multiple choice question. Your choice should be made clearly. TEST #1 STA 4853 March 4, 215 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. There are 31 questions. Circle

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Canonical Correlations

Canonical Correlations Canonical Correlations Summary The Canonical Correlations procedure is designed to help identify associations between two sets of variables. It does so by finding linear combinations of the variables in

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

STATISTICS 110/201 PRACTICE FINAL EXAM

STATISTICS 110/201 PRACTICE FINAL EXAM STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS The data used in this example describe teacher and student behavior in 8 classrooms. The variables are: Y percentage of interventions

More information

Analysis of Covariance (ANCOVA) with Two Groups

Analysis of Covariance (ANCOVA) with Two Groups Chapter 226 Analysis of Covariance (ANCOVA) with Two Groups Introduction This procedure performs analysis of covariance (ANCOVA) for a grouping variable with 2 groups and one covariate variable. This procedure

More information

NCSS Statistical Software. Harmonic Regression. This section provides the technical details of the model that is fit by this procedure.

NCSS Statistical Software. Harmonic Regression. This section provides the technical details of the model that is fit by this procedure. Chapter 460 Introduction This program calculates the harmonic regression of a time series. That is, it fits designated harmonics (sinusoidal terms of different wavelengths) using our nonlinear regression

More information

8. Example: Predicting University of New Mexico Enrollment

8. Example: Predicting University of New Mexico Enrollment 8. Example: Predicting University of New Mexico Enrollment year (1=1961) 6 7 8 9 10 6000 10000 14000 0 5 10 15 20 25 30 6 7 8 9 10 unem (unemployment rate) hgrad (highschool graduates) 10000 14000 18000

More information

An area chart emphasizes the trend of each value over time. An area chart also shows the relationship of parts to a whole.

An area chart emphasizes the trend of each value over time. An area chart also shows the relationship of parts to a whole. Excel 2003 Creating a Chart Introduction Page 1 By the end of this lesson, learners should be able to: Identify the parts of a chart Identify different types of charts Create an Embedded Chart Create a

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines) Dr. Maddah ENMG 617 EM Statistics 11/28/12 Multiple Regression (3) (Chapter 15, Hines) Problems in multiple regression: Multicollinearity This arises when the independent variables x 1, x 2,, x k, are

More information

Multiple Regression Examples

Multiple Regression Examples Multiple Regression Examples Example: Tree data. we have seen that a simple linear regression of usable volume on diameter at chest height is not suitable, but that a quadratic model y = β 0 + β 1 x +

More information

Assumptions in Regression Modeling

Assumptions in Regression Modeling Fall Semester, 2001 Statistics 621 Lecture 2 Robert Stine 1 Assumptions in Regression Modeling Preliminaries Preparing for class Read the casebook prior to class Pace in class is too fast to absorb without

More information

Bivariate data analysis

Bivariate data analysis Bivariate data analysis Categorical data - creating data set Upload the following data set to R Commander sex female male male male male female female male female female eye black black blue green green

More information

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006 Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Design of Experiments

Design of Experiments Design of Experiments D R. S H A S H A N K S H E K H A R M S E, I I T K A N P U R F E B 19 TH 2 0 1 6 T E Q I P ( I I T K A N P U R ) Data Analysis 2 Draw Conclusions Ask a Question Analyze data What to

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

Non-linear least squares

Non-linear least squares Non-linear least squares Concept of non-linear least squares We have extensively studied linear least squares or linear regression. We see that there is a unique regression line that can be determined

More information

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author... From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. Contents About This Book... xiii About The Author... xxiii Chapter 1 Getting Started: Data Analysis with JMP...

More information

10 Model Checking and Regression Diagnostics

10 Model Checking and Regression Diagnostics 10 Model Checking and Regression Diagnostics The simple linear regression model is usually written as i = β 0 + β 1 i + ɛ i where the ɛ i s are independent normal random variables with mean 0 and variance

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Detecting and Assessing Data Outliers and Leverage Points

Detecting and Assessing Data Outliers and Leverage Points Chapter 9 Detecting and Assessing Data Outliers and Leverage Points Section 9.1 Background Background Because OLS estimators arise due to the minimization of the sum of squared errors, large residuals

More information

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). Linear Regression Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). A dependent variable is a random variable whose variation

More information

A Second Course in Statistics: Regression Analysis

A Second Course in Statistics: Regression Analysis FIFTH E D I T I 0 N A Second Course in Statistics: Regression Analysis WILLIAM MENDENHALL University of Florida TERRY SINCICH University of South Florida PRENTICE HALL Upper Saddle River, New Jersey 07458

More information

Appendix A Summary of Tasks. Appendix Table of Contents

Appendix A Summary of Tasks. Appendix Table of Contents Appendix A Summary of Tasks Appendix Table of Contents Reporting Tasks...357 ListData...357 Tables...358 Graphical Tasks...358 BarChart...358 PieChart...359 Histogram...359 BoxPlot...360 Probability Plot...360

More information

Electromagnetic Forces on Parallel Current-

Electromagnetic Forces on Parallel Current- Page 1 of 5 Tutorial Models : Electromagnetic Forces on Parallel Current-Carrying Wires Electromagnetic Forces on Parallel Current- Carrying Wires Introduction One ampere is defined as the constant current

More information

appstats8.notebook October 11, 2016

appstats8.notebook October 11, 2016 Chapter 8 Linear Regression Objective: Students will construct and analyze a linear model for a given set of data. Fat Versus Protein: An Example pg 168 The following is a scatterplot of total fat versus

More information

Multiphysics Modeling

Multiphysics Modeling 11 Multiphysics Modeling This chapter covers the use of FEMLAB for multiphysics modeling and coupled-field analyses. It first describes the various ways of building multiphysics models. Then a step-by-step

More information

Image Analysis Technique for Evaluation of Air Permeability of a Given Fabric

Image Analysis Technique for Evaluation of Air Permeability of a Given Fabric International Journal of Engineering Research and Development ISSN: 2278-067X, Volume 1, Issue 10 (June 2012), PP.16-22 www.ijerd.com Image Analysis Technique for Evaluation of Air Permeability of a Given

More information

STAT 4385 Topic 06: Model Diagnostics

STAT 4385 Topic 06: Model Diagnostics STAT 4385 Topic 06: Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 1/ 40 Outline Several Types of Residuals Raw, Standardized, Studentized

More information

Independent Samples ANOVA

Independent Samples ANOVA Independent Samples ANOVA In this example students were randomly assigned to one of three mnemonics (techniques for improving memory) rehearsal (the control group; simply repeat the words), visual imagery

More information

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES 4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES FOR SINGLE FACTOR BETWEEN-S DESIGNS Planned or A Priori Comparisons We previously showed various ways to test all possible pairwise comparisons for

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Unit 11: Multiple Linear Regression

Unit 11: Multiple Linear Regression Unit 11: Multiple Linear Regression Statistics 571: Statistical Methods Ramón V. León 7/13/2004 Unit 11 - Stat 571 - Ramón V. León 1 Main Application of Multiple Regression Isolating the effect of a variable

More information

CHAPTER EIGHT Linear Regression

CHAPTER EIGHT Linear Regression 7 CHAPTER EIGHT Linear Regression 8. Scatter Diagram Example 8. A chemical engineer is investigating the effect of process operating temperature ( x ) on product yield ( y ). The study results in the following

More information

Introduction to Linear regression analysis. Part 2. Model comparisons

Introduction to Linear regression analysis. Part 2. Model comparisons Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual

More information

Statistics II Exercises Chapter 5

Statistics II Exercises Chapter 5 Statistics II Exercises Chapter 5 1. Consider the four datasets provided in the transparencies for Chapter 5 (section 5.1) (a) Check that all four datasets generate exactly the same LS linear regression

More information

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

CHAPTER 5. Outlier Detection in Multivariate Data

CHAPTER 5. Outlier Detection in Multivariate Data CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for

More information

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships Objectives 2.3 Least-squares regression Regression lines Prediction and Extrapolation Correlation and r 2 Transforming relationships Adapted from authors slides 2012 W.H. Freeman and Company Straight Line

More information

Using Microsoft Excel

Using Microsoft Excel Using Microsoft Excel Objective: Students will gain familiarity with using Excel to record data, display data properly, use built-in formulae to do calculations, and plot and fit data with linear functions.

More information

Computer simulation of radioactive decay

Computer simulation of radioactive decay Computer simulation of radioactive decay y now you should have worked your way through the introduction to Maple, as well as the introduction to data analysis using Excel Now we will explore radioactive

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

THE CRYSTAL BALL FORECAST CHART

THE CRYSTAL BALL FORECAST CHART One-Minute Spotlight THE CRYSTAL BALL FORECAST CHART Once you have run a simulation with Oracle s Crystal Ball, you can view several charts to help you visualize, understand, and communicate the simulation

More information

Regression. Marc H. Mehlman University of New Haven

Regression. Marc H. Mehlman University of New Haven Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and

More information

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)

More information

Determination of Density 1

Determination of Density 1 Introduction Determination of Density 1 Authors: B. D. Lamp, D. L. McCurdy, V. M. Pultz and J. M. McCormick* Last Update: February 1, 2013 Not so long ago a statistical data analysis of any data set larger

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression OI CHAPTER 7 Important Concepts Correlation (r or R) and Coefficient of determination (R 2 ) Interpreting y-intercept and slope coefficients Inference (hypothesis testing and confidence

More information

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories. Chapter Goals To understand the methods for displaying and describing relationship among variables. Formulate Theories Interpret Results/Make Decisions Collect Data Summarize Results Chapter 7: Is There

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

Decision 411: Class 7

Decision 411: Class 7 Decision 411: Class 7 Confidence limits for sums of coefficients Use of the time index as a regressor The difficulty of predicting the future Confidence intervals for sums of coefficients Sometimes the

More information

Lab 1 Uniform Motion - Graphing and Analyzing Motion

Lab 1 Uniform Motion - Graphing and Analyzing Motion Lab 1 Uniform Motion - Graphing and Analyzing Motion Objectives: < To observe the distance-time relation for motion at constant velocity. < To make a straight line fit to the distance-time data. < To interpret

More information

Hotelling s One- Sample T2

Hotelling s One- Sample T2 Chapter 405 Hotelling s One- Sample T2 Introduction The one-sample Hotelling s T2 is the multivariate extension of the common one-sample or paired Student s t-test. In a one-sample t-test, the mean response

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Relate Attributes and Counts

Relate Attributes and Counts Relate Attributes and Counts This procedure is designed to summarize data that classifies observations according to two categorical factors. The data may consist of either: 1. Two Attribute variables.

More information

Chapter 2: Looking at Data Relationships (Part 3)

Chapter 2: Looking at Data Relationships (Part 3) Chapter 2: Looking at Data Relationships (Part 3) Dr. Nahid Sultana Chapter 2: Looking at Data Relationships 2.1: Scatterplots 2.2: Correlation 2.3: Least-Squares Regression 2.5: Data Analysis for Two-Way

More information

Multivariate Capability Analysis Using Statgraphics. Presented by Dr. Neil W. Polhemus

Multivariate Capability Analysis Using Statgraphics. Presented by Dr. Neil W. Polhemus Multivariate Capability Analysis Using Statgraphics Presented by Dr. Neil W. Polhemus Multivariate Capability Analysis Used to demonstrate conformance of a process to requirements or specifications that

More information

Genstat. Regression.

Genstat. Regression. Genstat Regression www.vsni.co.uk A Guide to Regression, Nonlinear and Generalized Linear Models in Genstat (18 th Edition) by Roger Payne. Genstat is developed by VSN International Ltd, in collaboration

More information

Single Sample Means. SOCY601 Alan Neustadtl

Single Sample Means. SOCY601 Alan Neustadtl Single Sample Means SOCY601 Alan Neustadtl The Central Limit Theorem If we have a population measured by a variable with a mean µ and a standard deviation σ, and if all possible random samples of size

More information

Regression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr

Regression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr Regression Model Specification in R/Splus and Model Diagnostics By Daniel B. Carr Note 1: See 10 for a summary of diagnostics 2: Books have been written on model diagnostics. These discuss diagnostics

More information

Using SPSS for One Way Analysis of Variance

Using SPSS for One Way Analysis of Variance Using SPSS for One Way Analysis of Variance This tutorial will show you how to use SPSS version 12 to perform a one-way, between- subjects analysis of variance and related post-hoc tests. This tutorial

More information