Better Exponential Curve Fitting Using Excel

Similar documents
ABSTRACT EXPONENTIAL CURVE FIT

Six Sigma Black Belt Study Guides

Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Module 2A Turning Multivariable Models into Interactive Animated Simulations

An area chart emphasizes the trend of each value over time. An area chart also shows the relationship of parts to a whole.

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Regression and Nonlinear Axes

LOOKING FOR RELATIONSHIPS

Describing the Relationship between Two Variables

Error Analysis, Statistics and Graphing Workshop

Contents. 9. Fractional and Quadratic Equations 2 Example Example Example

THE CRYSTAL BALL SCATTER CHART

Simple Linear Regression

Math 243 OpenStax Chapter 12 Scatterplots and Linear Regression OpenIntro Section and

Computer simulation of radioactive decay

How to Make or Plot a Graph or Chart in Excel

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

18-Dec-12 PHYS Simple Pendulum. To investigate the fundamental physical properties of a simple pendulum.

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

1 Introduction to Minitab

Regression Using an Excel Spreadsheet Using Technology to Determine Regression

Using Microsoft Excel

5-Sep-15 PHYS101-2 GRAPHING

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Boyle s Law: A Multivariable Model and Interactive Animated Simulation

Lecture Prepared By: Mohammad Kamrul Arefin Lecturer, School of Business, North South University

Data Analysis. with Excel. An introduction for Physical scientists. LesKirkup university of Technology, Sydney CAMBRIDGE UNIVERSITY PRESS

M61 1 M61.1 PC COMPUTER ASSISTED DETERMINATION OF ANGULAR ACCELERATION USING TORQUE AND MOMENT OF INERTIA

Basic Business Statistics 6 th Edition

Taguchi Method and Robust Design: Tutorial and Guideline

Review of Section 1.1. Mathematical Models. Review of Section 1.1. Review of Section 1.1. Functions. Domain and range. Piecewise functions

Excel for Scientists and Engineers Numerical Method s. E. Joseph Billo

Polynomial Models Studio Excel 2007 for Windows Instructions

Ratio of Polynomials Search One Variable

Linear Regression 3.2

Lab #10 Atomic Radius Rubric o Missing 1 out of 4 o Missing 2 out of 4 o Missing 3 out of 4

Mnova Software for Analyzing Reaction Monitoring NMR Spectra

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

AP Statistics. Chapter 9 Re-Expressing data: Get it Straight

How to Run the Analysis: To run a principal components factor analysis, from the menus choose: Analyze Dimension Reduction Factor...

Chapter 12 - Part I: Correlation Analysis

SME 864 Mark Urban-Lurain

Index. Cambridge University Press Data Analysis for Physical Scientists: Featuring Excel Les Kirkup Index More information

COLLEGE ALGEBRA FINAL REVIEW 9) 4 = 7. 13) 3log(4x 4) + 8 = ) Write as the sum of difference of logarithms; express powers as factors.

Regression Models. Chapter 4

Box-Cox Transformations

ASSIGNMENT 3 SIMPLE LINEAR REGRESSION. Old Faithful

1 M62 M62.1 CONSERVATION OF ANGULAR MOMENTUM FOR AN INELASTIC COLLISION

Correlation and Regression

How many states. Record high temperature

Chapter 3: Examining Relationships

SOLVING PROBLEMS BASED ON WINQSB FORECASTING TECHNIQUES

Non-Linear Regression

CURVE FITTING LEAST SQUARE LINE. Consider the class of linear function of the form. = Ax+ B...(1)

Fault Tolerant Computing CS 530DL

Parametric Estimating Nonlinear Regression

Statistics for Managers using Microsoft Excel 6 th Edition

Experiment: Go-Kart Challenge

6.0 Graphs. Graphs illustrate a relationship between two quantities or two sets of experimental data.

4.1 Least Squares Prediction 4.2 Measuring Goodness-of-Fit. 4.3 Modeling Issues. 4.4 Log-Linear Models

EXPERIMENT 30A1: MEASUREMENTS. Learning Outcomes. Introduction. Experimental Value - True Value. 100 True Value

Chapter 1 Linear Equations and Graphs

TALLINN UNIVERSITY OF TECHNOLOGY, INSTITUTE OF PHYSICS 6. THE TEMPERATURE DEPENDANCE OF RESISTANCE

Remember that C is a constant and ë and n are variables. This equation now fits the template of a straight line:

you-try-it-01.xlsx Step-by-Step Guide ver. 3/1/2010

Notebook Tab 6 Pages 183 to ConteSolutions

Chapter 16. Simple Linear Regression and dcorrelation

Topics Covered in Math 115

Chapter 16. Simple Linear Regression and Correlation

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Chapter 14 Student Lecture Notes 14-1

Advanced Regression Topics: Violation of Assumptions

Economics 203: Intermediate Microeconomics. Calculus Review. A function f, is a rule assigning a value y for each value x.

Intermediate Algebra Chapter 12 Review

Experiment: Oscillations of a Mass on a Spring

Lab 1 Uniform Motion - Graphing and Analyzing Motion

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

SECTION 5.1: Polynomials

UNST 232 Mentor Section Assignment 5 Historical Climate Data

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

One-sided and two-sided t-test

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

Photometry of Supernovae with Makali i

Introduce Exploration! Before we go on, notice one more thing. We'll come back to the derivation if we have time.

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Business Statistics. Lecture 9: Simple Regression

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Physics Department. Experiment 03: Work and Energy

1 A Review of Correlation and Regression

Ratio of Polynomials Fit One Variable

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS

Regression Analysis. BUS 735: Business Decision Making and Research

Appendix 4. Some Equations for Curve Fitting

4.1 Introduction. 4.2 The Scatter Diagram. Chapter 4 Linear Correlation and Regression Analysis

Phenomenological Models

Estimating Dates: Scatterplots

Regression. Simple Linear Regression Multiple Linear Regression Polynomial Linear Regression Decision Tree Regression Random Forest Regression

Analysis of Bivariate Data

Transcription:

Better Exponential Curve Fitting Using Excel Mike Middleton DSI 2010 San Diego Michael R. Middleton, Ph.D. Decision Toolworks Mike@DecisionToolworks.com 415.310.7190

Background The exponential function, Y=c*EXP(b*x), is useful for fitting some non-linear single-bulge data patterns. In Excel, you can create an XY (Scatter) chart and add a best-fit trendline based on the exponential function. Problem: Regarding the fitted curve for Excel s Exponential Trendline, (1) the reported value for R Squared is incorrect, and (2) the fitted values do not minimize Sum of Squared Deviations. DSI 2010 San Diego www.decisiontoolworks.com 2

Cisco Revenue Example Data from example originally presented in Winston (2004) Model for growth of Cisco revenue during 1900-1999 Potentially useful for projecting revenues and determining company value For 1900-1999, Cisco revenue seems to grow by approximately the same percentage each year The exponential function, Y=c*EXP(b*X), has the property that for each unit increase in X the value of Y increases by a constant percentage DSI 2010 San Diego www.decisiontoolworks.com 3

Revenue, in Millions Cisco Data and XY Chart 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 A B C D E F G H I $ Millions Year Revenue Cisco Annual Revenue, 1990-1999 X Y $20,000 1 $70 2 $183 3 $340 $15,000 4 $649 5 $1,243 6 $1,979 $10,000 7 $4,096 8 $6,440 $5,000 9 $8,459 10 $12,154 $0 0 2 4 6 8 10 Year In Excel 2010, select data A4:B13. Insert XY Scatter chart. Use Chart Tools Layout to add chart title and axes titles. Right-click a data point to select the data series, and choose Add Trendline from the shortcut menu. DSI 2010 San Diego www.decisiontoolworks.com 4

Trendline Dialog Box DSI 2010 San Diego www.decisiontoolworks.com 5

Revenue, in Millions Excel Chart with Exponential Trendline 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 A B C D E F G H I $ Millions Year Revenue Cisco Annual Revenue, 1990-1999 X Y $20,000 1 $70 y = 58.553e 2 $183 R² = 0.9828 3 $340 $15,000 4 $649 5 $1,243 6 $1,979 $10,000 7 $4,096 8 $6,440 $5,000 9 $8,459 10 $12,154 $0 0 2 4 6 8 10 Year Next, compute the fitted values for Y, and use worksheet functions and formulas to compute the actual value of R Squared DSI 2010 San Diego www.decisiontoolworks.com 6

Actual R Squared for Exponential Trendline 1 2 3 4 5 6 7 8 9 10 11 12 13 A B C D E F G H $ Millions Exponential Trendline Year Revenue X Y Fitted Y 1 $70 $103 SS Total 156,733,316 Total SS 2 $183 $183 SS Regression 125,667,007 Explained SS 3 $340 $323 SS Residual 31,066,309 Unexplained SS, SSD 4 $649 $571 5 $1,243 $1,009 R Squared 0.802 Explained SS / Total SS 6 $1,979 $1,783 7 $4,096 $3,151 StDev(Residuals) $1,763 8 $6,440 $5,568 9 $8,459 $9,840 10 $12,154 $17,389 Excel s Trendline reports R Squared = 0.9828 Actual R Squared = 0.802 Approximately 80% of the variation in Y is explained by X using the fitted exponential function DSI 2010 San Diego www.decisiontoolworks.com 7

Shortcut Excel functions for R Squared calculations 1 2 3 4 5 6 7 8 9 10 11 12 13 A B C D E F G H $ Millions Exponential Trendline Year Revenue X Y Fitted Y 1 $70 $103 SS Total =COUNT(B4:B13)*VARP(B4:B13) Total SS 2 $183 $183 SS Regression =F4-F6 Explained SS 3 $340 $323 SS Residual =SUMXMY2(B4:B13,C4:C13) Unexplained SS, SSD 4 $649 $571 5 $1,243 $1,009 R Squared =F5/F4 Explained SS / Total SS 6 $1,979 $1,783 7 $4,096 $3,151 StDev(Residuals) =SQRT(F6/COUNT(B4:B13)) 8 $6,440 $5,568 9 $8,459 $9,840 10 $12,154 $17,389 Note that we cannot use Excel s worksheet functions RSQ or PEARSON^2 or CORREL^2 to compute R Squared because those functions are based on a linear fit between Y and X. DSI 2010 San Diego www.decisiontoolworks.com 8

Setup display for better fit using Excel s Solver 1 2 3 4 5 6 7 8 9 10 11 12 13 A B C D E F G H $ Millions Coeff c Coeff b Year Revenue 58.55266 0.569367 X Y c*exp(b*x) 1 $70 $103 SS Total 156,733,316 Total SS 2 $183 $183 SS Regression 125,666,623 Explained SS 3 $340 $323 SS Residual 31,066,693 Unexplained SS, SSD 4 $649 $571 5 $1,243 $1,009 R Squared 0.802 Explained SS / Total SS 6 $1,979 $1,783 7 $4,096 $3,151 StDev(Residuals) $1,763 8 $6,440 $5,568 9 $8,459 $9,840 10 $12,154 $17,389 Tentative values for coefficients in E2:F2 (Solver Changing Cells ) Formula for fitted value in C4 depends on coefficients and X, copied to C5:C13 Sum of Squared Deviations formula in F6 (Solver Objective ) to be minimized DSI 2010 San Diego www.decisiontoolworks.com 9

Setup formulas for better fit using Excel s Solver 1 2 3 4 5 6 7 8 9 10 11 12 13 A B C D E F G H $ Millions Coeff c Coeff b Year Revenue 58.55266 0.569367 X Y c*exp(b*x) 1 $70 =$E$2*EXP($F$2*A4) SS Total =COUNT(B4:B13)*VARP(B4:B13) Total SS 2 $183 =$E$2*EXP($F$2*A5) SS Regression =F4-F6 Explained SS 3 $340 =$E$2*EXP($F$2*A6) SS Residual =SUMXMY2(B4:B13,C4:C13) Unexplained SS, SSD 4 $649 =$E$2*EXP($F$2*A7) 5 $1,243 =$E$2*EXP($F$2*A8) R Squared =F5/F4 Explained SS / Total SS 6 $1,979 =$E$2*EXP($F$2*A9) 7 $4,096 =$E$2*EXP($F$2*A10) StDev(Residuals) =SQRT(F6/COUNT(B4:B13)) 8 $6,440 =$E$2*EXP($F$2*A11) 9 $8,459 =$E$2*EXP($F$2*A12) 10 $12,154 =$E$2*EXP($F$2*A13) Tentative values for coefficients in E2:F2 (Solver Changing Cells ) Formula for fitted value in C4 depends on coefficients and X (absolute references to E2:F2, relative reference to A4), copied to C5:C13 Sum of Squared Deviations formula in F6 (Solver Objective ) to be minimized DSI 2010 San Diego www.decisiontoolworks.com 10

Excel 2010 Solver Parameters Dialog Box DSI 2010 San Diego www.decisiontoolworks.com 11

Excel 2010 Solver Options Dialog Boxes DSI 2010 San Diego www.decisiontoolworks.com 12

Results for Exponential Fit using Solver 1 2 3 4 5 6 7 8 9 10 11 12 13 A B C D E F G H $ Millions Coeff c Coeff b Year Revenue 217.0084285 0.40542436 X Y c*exp(b*x) 1 $70 $325 SS Total 156,733,316 Total SS 2 $183 $488 SS Regression 154,746,736 Explained SS 3 $340 $732 SS Residual 1,986,580 Unexplained SS, SSD 4 $649 $1,098 5 $1,243 $1,648 R Squared 0.987 Explained SS / Total SS 6 $1,979 $2,471 7 $4,096 $3,707 StDev(Residuals) $446 8 $6,440 $5,560 9 $8,459 $8,339 10 $12,154 $12,509 Excel s Trendline reported R Squared = 0.9828, but its actual R Squared = 0.802 and StDev(Residuals) = $1,763 Solver s better fit has actual R Squared = 0.987 and StDev(Residuals) = $446 DSI 2010 San Diego www.decisiontoolworks.com 13

Revenue, in Millions Visual Comparison of Fits $18,000 Cisco Annual Revenue, 1990-1999 $16,000 Trendline $14,000 $12,000 $10,000 Solver $8,000 $6,000 $4,000 $2,000 $0 0 2 4 6 8 10 Year DSI 2010 San Diego www.decisiontoolworks.com 14

Comparison of Current/Previous Ratios $ Millions R^2=0.802, SD(Resid)=$1763 R^2=0.987, SD(Resid)=$446 Year Revenue Actual Trendline Exponential Solver Fit Exponential X Y Current/Previous Fitted Y Current/Previous Fitted Y Current/Previous 1 $70 $103 $325 2 $183 2.614 $183 1.767 $488 1.500 3 $340 1.858 $323 1.767 $732 1.500 4 $649 1.909 $571 1.767 $1,098 1.500 5 $1,243 1.915 $1,009 1.767 $1,648 1.500 6 $1,979 1.592 $1,783 1.767 $2,471 1.500 7 $4,096 2.070 $3,151 1.767 $3,707 1.500 8 $6,440 1.572 $5,568 1.767 $5,560 1.500 9 $8,459 1.314 $9,840 1.767 $8,339 1.500 10 $12,154 1.437 $17,389 1.767 $12,509 1.500 Average Ratio, 2 to 10 1.809 Average Ratio, 3 to 10 1.708 Average Ratio, 8 to 10 1.441 DSI 2010 San Diego www.decisiontoolworks.com 15

Excel s Method for Fitting Exponential Trendline, 1 of 2 The exponential model creates a trendline using the equation y = c * e bx. Excel uses a log transformation of the original y data to determine fitted values, so the values of the dependent variable in your data set must be positive. The exponential trendline feature does not find values of b and c that minimize the sum of squared deviations between actual y and predicted y (= c * e bx ). Instead, Excel's method takes the logarithm of both sides of the exponential formula, which then can be written as Ln(y) = Ln(c) + b * x and uses standard linear regression with Ln(y) as the dependent variable and x as the explanatory variable. That is, Excel finds the intercept and slope that minimize the sum of squared deviations between actual Ln(y) and predicted Ln(y), using the formula Ln(y) = Intercept + Slope * x. Therefore, the Intercept value corresponds to Ln(c), and c in the exponential formula is equal to Exp(Intercept). The Slope value corresponds to b in the exponential formula. - Middleton (1995) DSI 2010 San Diego www.decisiontoolworks.com 16

Ln(Y), Ln(Revenue) Ln(Y), Ln(Revenue) Excel s Method for Fitting Exponential Trendline, 2 of 2 X Y Ln(Y) 1 70 4.248495 2 183 5.209486 3 340 5.828946 4 649 6.475433 5 1243 7.125283 6 1979 7.590347 7 4096 8.317766 8 6440 8.770284 9 8459 9.042986 10 12154 9.405414 12 10 8 6 4 2 0 Plot of Ln(Y) vs X 0 5 10 X, Year 12 10 8 6 4 2 0 Linear Fit for Ln(Y) vs X Ln(y) = 0.5694x + 4.0699 R² = 0.9828 0 5 10 X, Year Y = c*exp(b*x) LN(Y) = LN(c) + b*x Fit: LN(Y) = 4.0699 + 0.5694*X b = 0.5694 c = EXP(LN(c)) c = EXP(4.0699) c = 58.55 DSI 2010 San Diego www.decisiontoolworks.com 17

General Steps for Curve Fitting Goal: explain variation in a variable of interest, Y Prepare a histogram for Y, the dependent (or response) variable Find data for explanatory variable(s) that make sense Look at the data: plot XY (Scatter) charts to see relationships Propose a functional form for the relationship, based on knowledge of the underlying process, visual examination of the plot, parsimony, etc. Determine values for the parameters of the function best fit, minimize sum of squared deviations answers the question: What is the relationship? Perform diagnostics, e.g., R Squared, StDev(Residuals), etc. answers the question: How good is the relationship? Use the function prediction for cross-sectional data, mostly interpolation forecasts for time-series data, mostly extrapolation DSI 2010 San Diego www.decisiontoolworks.com 18

Summary of Excel Trendline Options Exponential: Y=c*EXP(b*X), transforms data before fit, not the best fit, inaccurate R Squared Linear: Y=b 0 +b 1 *X, OK Logarithmic: Y = c*ln(x)+b, OK Polynomial: Y=b 0 +b 1 *X 1 +b 2 *X 2 +, OK Power: Y = c*x^b, transforms data before fit, not the best fit, inaccurate R Squared Moving Average: OK, but non-standard diagnostics DSI 2010 San Diego www.decisiontoolworks.com 19

Excel s Method for Fitting Power Trendline The power model creates a trendline using the equation y = c * x b. Excel uses a log transformation of the original x and y data to determine fitted values, so the values of both the dependent and explanatory variables in your data set must be positive. The power trendline feature does not find values of b and c that minimize the sum of squared deviations between actual y and predicted y (= c * x b ). Instead, Excel's method takes the logarithm of both sides of the power formula, which then can be written as Ln(y) = Ln(c) + b * Ln(x), and uses standard linear regression with Ln(y) as the dependent variable and Ln(x) as the explanatory variable. That is, Excel finds the intercept and slope that minimize the sum of squared deviations between actual Ln(y) and predicted Ln(y), using the formula Ln(y) = Intercept + Slope * Ln(x). Therefore, the Intercept value corresponds to Ln(c), and c in the power formula is equal to Exp(Intercept). The Slope value corresponds to b in the power formula. DSI 2010 San Diego www.decisiontoolworks.com 20

References Middleton, M.R. 1995. Data Analysis Using Microsoft Excel 5.0. Duxbury Press, Belmont, CA. Winston, W.L. 2004. Microsoft Excel Data Analysis and Business Modeling. Microsoft Press, Redmond, WA. DSI 2010 San Diego www.decisiontoolworks.com 21

Better Exponential Curve Fitting Using Excel Mike Middleton, DSI 2010 San Diego Michael R. Middleton, Ph.D. Decision Toolworks Mike@DecisionToolworks.com 415.310.7190 PowerPoint Slides, Slides PDF File, and Excel Workbook http://www.decisiontoolworks.com/expcurvefit2010.pptx http://www.decisiontoolworks.com/expcurvefit2010.pdf http://www.decisiontoolworks.com/expcurvefit2010.xlsx