Introduction to Regression

Similar documents
Introduction to Regression

Confidence Interval for the mean response

Predict y from (possibly) many predictors x. Model Criticism Study the importance of columns

INFERENCE FOR REGRESSION

Multiple Regression Examples

Simple Linear Regression: A Model for the Mean. Chap 7

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Ch 13 & 14 - Regression Analysis

Basic Business Statistics, 10/e

Model Building Chap 5 p251

Basic Business Statistics 6 th Edition

Models with qualitative explanatory variables p216

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

General Linear Model (Chapter 4)

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Correlation & Simple Regression

Is economic freedom related to economic growth?

Chapter 12: Multiple Regression

SMAM 319 Exam 1 Name. 1.Pick the best choice for the multiple choice questions below (10 points 2 each)

Multiple Regression: Chapter 13. July 24, 2015

28. SIMPLE LINEAR REGRESSION III

Time series and Forecasting

STAT 212 Business Statistics II 1

Steps for Regression. Simple Linear Regression. Data. Example. Residuals vs. X. Scatterplot. Make a Scatter plot Does it make sense to plot a line?

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

SMAM 319 Exam1 Name. a B.The equation of a line is 3x + y =6. The slope is a. -3 b.3 c.6 d.1/3 e.-1/3

St 412/512, D. Schafer, Spring 2001

SMAM 314 Practice Final Examination Winter 2003

Histogram of Residuals. Residual Normal Probability Plot. Reg. Analysis Check Model Utility. (con t) Check Model Utility. Inference.

Simple Linear Regression. Steps for Regression. Example. Make a Scatter plot. Check Residual Plot (Residuals vs. X)

23. Inference for regression

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

Statistics - Lecture Three. Linear Models. Charlotte Wickham 1.

Q Lecture Introduction to Regression

AP Statistics Bivariate Data Analysis Test Review. Multiple-Choice

22S39: Class Notes / November 14, 2000 back to start 1

Chapter 14 Multiple Regression Analysis

Chapter 14 Student Lecture Notes 14-1

Inference for the Regression Coefficient

Stat 501, F. Chiaromonte. Lecture #8

Week 9: An Introduction to Time Series

1 Introduction to Minitab

Final Exam Bus 320 Spring 2000 Russell

Confidence Intervals, Testing and ANOVA Summary

Lecture 18: Simple Linear Regression

Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables

Lecture notes on Regression & SAS example demonstration

School of Mathematical Sciences. Question 1

Lecture 3: Inference in SLR

1. Review of Lecture level factors Homework A 2 3 experiment in 16 runs with no replicates

Applied Econometrics. Professor Bernard Fingleton

SMAM 314 Exam 42 Name

School of Mathematical Sciences. Question 1. Best Subsets Regression

Multiple Regression Methods

The simple linear regression model discussed in Chapter 13 was written as

42 GEO Metro Japan

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Analysis of Bivariate Data

Review of Regression Basics

MULTIPLE REGRESSION ANALYSIS AND OTHER ISSUES. Business Statistics

(4) 1. Create dummy variables for Town. Name these dummy variables A and B. These 0,1 variables now indicate the location of the house.

STAT 3A03 Applied Regression With SAS Fall 2017

Six Sigma Black Belt Study Guides

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS

Inference for Regression Inference about the Regression Model and Using the Regression Line

Introduction to Regression

Statistics for Managers using Microsoft Excel 6 th Edition

Introduction to Econometrics

STK4900/ Lecture 3. Program

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Data Set 8: Laysan Finch Beak Widths

Lecture 1 Linear Regression with One Predictor Variable.p2

STA 101 Final Review

Multiple Regression an Introduction. Stat 511 Chap 9

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Simple Linear Regression: One Qualitative IV

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

Stat 231 Final Exam. Consider first only the measurements made on housing number 1.

STATISTICS 110/201 PRACTICE FINAL EXAM

The ARIMA Procedure: The ARIMA Procedure

Orthogonal contrasts for a 2x2 factorial design Example p130

FinQuiz Notes

Topic 14: Inference in Multiple Regression

Examination paper for TMA4255 Applied statistics

Start with review, some new definitions, and pictures on the white board. Assumptions in the Normal Linear Regression Model

LEARNING WITH MINITAB Chapter 12 SESSION FIVE: DESIGNING AN EXPERIMENT

O2. The following printout concerns a best subsets regression. Questions follow.

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Interpreting the coefficients

MBA Statistics COURSE #4

III. Inferential Tools

Chapter 7 Student Lecture Notes 7-1

Outline. Linear OLS Models vs: Linear Marginal Models Linear Conditional Models. Random Intercepts Random Intercepts & Slopes

STAT 360-Linear Models

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means

A Second Course in Statistics: Regression Analysis

10. Alternative case influence statistics

Transcription:

Introduction to Regression Using Mult Lin Regression Derived variables Many alternative models Which model to choose? Model Criticism Modelling Objective Model Details Data and Residuals Assumptions 1

Data Like This Values of coefficients Sampling Distributions Standard Errors 95% Confidence Intervals 95% Prediction Intervals ANOVA etc 2

Derived variables General Logs Proportions and Ratios Too many (derived) variables Redundancy Many versions of same model Indicator variables categorical data Time series applications Indicator variables eg seasonal effects Lagged variables Differences Logs and Rate of Return

Gas Gas Gas Consumption vs Temp 7 6 Period 1 Fitted Line Plot Gas = 6.854-0.92 Temperature S 0.2814 R-Sq 94.4% R-Sq(adj) 94.1% Weekly gas consumption (in 1000 cubic feet) and the average outside temperature (in degrees Celsius) at one house in south-east England for two "heating seasons", one of 26 weeks before, and one of 0 weeks after cavity-wall insulation was installed. The object of the exercise was to assess the effect of the insulation on gas consumption. The house thermostat was set at 20 C throughout. 5 4 Period 2 Fitted Line Plot Gas = 4.724-0.2779 Temperature 2 0 2 4 6 Temperature 8 10 5 S 0.54848 R-Sq 81.% R-Sq(adj) 80.6% 4 Comparative 2 1 0 2 4 6 Temperature 8 10 4

Objective Nominal focus on prediction Predict gas consumption in future for this house Knowing temp and whether or not insulated Actual interest Does insulation make a difference At all temps? How much? Slope? Intercept? SEs? Data Like This 5

Using an Indicator variable Insulated Week Temperature Gas Insulated Week Temperature Gas 0 1-0.8 7.2 1 27-0.7 4.8 0 2-0.7 6.9 1 28 0.8 4.6 0 0.4 6.4 1 29 1.0 4.7 0 4 2.5 6.0 1 0 1.4 4.0 etc etc One stacked data set Week Insulation Temperature Gas 22 0 7.6.5 2 0 8.0 4.0 24 0 8.5.6 25 0 9.1.1 26 0 10.2 2.6 27 1-0.7 4.8 28 1 0.8 4.6 29 1 1.0 4.7 etc Two parallel data sets 6

Temperature Gas Simple Regression & Indicator Variable 8 7 6 5 4 2 1 10 8 6 4 2 0 0.0 0.0 0.2 0.2 Fitted Line Plot Gas = 4.750-1.267 Insulated 0.4 Insulated 0.6 0.8 Fitted Line Plot Temperature = 5.50-0.8867 Insulated 0.4 Insulated 0.6 0.8 1.0 1.0 S 0.987577 R-Sq 29.8% R-Sq(adj) 28.5% S 2.7812 R-Sq 2.6% R-Sq(adj) 0.8% Gas vs Insulated Insulated = 0 Avg Gas = 4.750 Insulated = 1 Avg Gas =.48 Diff = -1.267 Temp vs Insulated Coeff Unit Increase Random Error Design Implications 7

Gas SLR with indicator var & T-test Fitted Line Plot Gas = 4.750-1.267 Insulated Two-sample T for Gas 8 7 6 5 4 2 1 0.0 0.2 0.4 Insulated 0.6 0.8 1.0 S 0.987577 R-Sq 29.8% R-Sq(adj) 28.5% Insulated N Mean StDev SE Mean 0 26 4.75 1.16 0.2 1 0.48 0.806 0.15 Difference = μ (0) - μ (1) T-Value = 4.79 P-Value = 0.000 DF = 54 Using Pooled StDev = 0.9876 Regression Analysis: Gas versus Insulated S R-sq R-sq(adj) R-sq(pred) 0.987577 29.79% 28.49% 24.5% Coefficients Term Coef SE Coef T-Value P-Value Constant 4.750 0.194 24.5 0.000 Insulated -1.267 0.265-4.79 0.000 8

Indicator Variables in Regression Response variable Predictors x Temp, x Insulated(0 /1) Statistical Model 1 1 2 2 1 2 Y x x ; ~ N 0, When x 0 Y x 2 1 1 0 1 1 When x 1 Y x Y Y Gas x 2 2 1 1 Y x 1 1 1 2 Common Slopes Diff bet Int'cpts No interaction Binary Indicator Variable 1 1 0 2 9

Multiple Regression Output Regression Analysis: Gas versus Temperature, Insulated The regression equation is Gas = 6.55-0.7 Temperature - 1.57 Insulated Predictor Coef SE Coef Constant 6.551 0.1181 Temperature -0.67 0.0177 Insulated -1.5652 0.0970 ˆ 1.565 SE ˆ 0.097 2 2 Rough 95%CI 1.57 2(0.097) Prev ( 1.76, 1.7) Mean Diff 1.27 2(0.274) Parallel lines 10

Implementation: Categorical Variable 11

Regression Output: Categorical Var Regression Analysis: Gas versus Temperature, Insulated Categorical predictor coding (1, 0) Model Summary S R-sq 0.57412 90.97% Coefficients Regression Equation Term Coef SE Coef T-Value P-Value Constant 6.551 0.118 55.48 0.000 Temperature -0.67 0.0178-18.95 0.000 Insulated 1-1.5652 0.0971-16.1 0.000 Insulated 0 Gas = 6.551-0.67 Temperature 1 Gas = 4.986-0.67 Temperature 12

Aside: Omitted predictors Hidden/Lurking variables Subset of data Used in exam Uninformed by insulation status Slope positive On avg, gas consumption increases with temp! Knowing insulation status Slopes negative On avg, gas consumption decreases with temp 1

Interaction? Refine the question Different slopes as well? 14

Indicator Variables in Regression Response variable Y Gas Predictors x Temp, x Insulated(0 /1), x Temp x Combined statistical model 1 2 2 Y x x x ; ~ N 0, 1 1 2 2 When x 0 Y x 2 1 1 0 1 1 When x 1 Y x 2 2 1 1 2 Y x diff in intercepts; diff in slopes 2 15

New Derived Variable 16

Modelling two regression lines Regression Analysis: Gas versus Temperature, Insulated, Ins X Temp Gas = 6.85-0.9 Temperature - 2.1 Insulated + 0.115 Ins X Temp Predictor Coef SE Coef Constant 6.858 0.160 Temperature -0.924 0.02249 Insulated -2.100 0.1801 Ins X Temp 0.1150 0.0211 S = 0.2004 R-Sq = 92.8% R-Sq(adj) = 92.4% Which coeff most fundamantal to theory of heat loss? 17

Alt Models of two regression lines Nearly equivalent Two sep lin regs Gas vs Temp Exercise Compare Coeff Ests 95% Ints Response variable 1 2 a) One model, w interaction b) Two sep models Predictors x Temp, x Insulated(0 / 1) Two Statistical Models Y Gas 2 2 0; NoIns NoIns 1 ; 0, NoIns x Y x N 2 2 1; Ins Ins 1 ; 0, Ins x Y x N 18

Multiple indicator variables Will also meet Redundancy Multiple formulations of same model 19

Housing Completions, quarterly, 1978 to 2000 Quarter 1978 1979 1980 1981 1982 198 1984 1985 Q1 5777 7276 58 6642 5981 4859 5129 4947 Q2 4772 4510 6001 4710 488 5862 4671 5188 Q 4579 4278 5879 5570 554 466 4947 90 Q4 424 4274 68 614 4894 4564 195 60 Quarter 1986 1987 1988 1989 1990 1991 1992 199 Q1 5186 4144 682 554 4296 4692 4155 684 Q2 719 6 298 985 4477 898 560 4487 Q 45 491 747 5277 5011 4600 5919 5121 Q4 726 478 477 4484 4752 5282 505 6009 Quarter 1994 1995 1996 1997 1998 1999 2000 Q1 4291 5770 6582 744 8010 990 1002 Q2 5266 6149 720 8799 9506 10227 11590 Q 6871 6806 764 9140 1010 10788 11892 Q4 7160 7879 871 10081 11474 12079 1287 20

Completions Figure 1.0 Housing Completions, quarterly, 1978 to 2000 14000 12000 Time Series Plot of Completions Take objective: forecast one quarter ahead Quarter Q1 Q2 Q Q4 10000 8000 6000 4000 2000 Quarter Q1 Year 1978 Q1 1981 Q1 1984 Q1 1987 Q1 1990 Q1 199 Q1 1996 Q1 1999 21

Comps Aside: Cubic/Quadratic Regression Fitted Line plot Options Log Quadratic Cubic 16000 14000 12000 Fitted Line Plot Comps = - 1.44E+10 + 217840 time - 10988 time**2 + 1.848 time** Regression 95% PI S 822.624 R-Sq 88.% R-Sq(adj) 87.9% 10000 8000 6000 4000 2000 1980 1985 1990 time 1995 2000 22

Modelling Options Focus on stable linear structure post 199 Assume this structure will continue Exploit structure extension of Indicator Vars Disadvantage: smaller data set One model for entire data set Note: structure has changed; might change again Exploit weaker structure Use Lagged variables Advantage: use all data. 2

Completions Comps, quarterly, 199 to 2000 Target is 2001 Q1 Use Q1 data only? OR Use all 199-2000 data? 4 parallel lines more efficient Why/What sense? Option 1 work since 199 Time Series Plot of Completions 1000 12000 11000 10000 9000 8000 7000 6000 5000 Quarter Q1 Q2 Q Q4 4000 Quarter Year Q1 199 Q1 1994 Q1 1995 Q1 1996 Q1 1997 Q1 1998 Q1 1999 24

Completions Completions Q1 only Fitted Line Plot Completions = - 1945191 + 977.8 year Other Qs; 4 sep lines 11000 10000 S 16.477 R-Sq 98.5% R-Sq(adj) 98.% 9000 8000 7000 6000 5000 4000 000 199 1994 1995 1996 1997 year 1998 1999 2000 Later, use Time since 1978 Changes intercept only Pred = -1945191 + 977.82001.00 ± 2(16.5) = (9795, 11061) 25

Linear in Time plus Quarterly Ind Vars Create set of binary variables Q1, Q2, Q, Q4 Comps = 1 Q 1 + 2 Q 2 + Q + 4 Q 4 + Time + Year. Quarter time Time since 1978 Comps Q1 Q2 Q Q4 199 Q1 199 15.00 684 1 0 0 0 199 Q2 199.25 15.25 4487 0 1 0 0 199 Q 199.5 15.50 5089 0 0 1 0 199 Q4 199.75 15.75 6041 0 0 0 1 1994 Q1 1994 16.00 4291 1 0 0 0 1994 Q2 1994.25 16.25 5266 0 1 0 0 1994 Q 1994.5 16.50 685 0 0 1 0 1994 Q4 1994.75 16.75 7196 0 0 0 1 26

Multiple Indicator Vars: Tech Issue Regression Analysis: Comps versus Time since 1978, Q1, Q2, Q, Q4 * Q4 is highly correlated with other X variables * Q4 has been removed from the equation. The regression equation is Comps = - 9452 + 986 Time since 1978-1792 Q1-119 Q2-758 Q Y Q Q Q Q t Interp of t 1 1 2 2 4 4 0 and all Q 0 i Redundancy Alternatives 0 No Constant Use indicator variables only equiv Enter " Quarter" as categorical variable 27

Multiple Indicator Vars: Tech Issue Regression Analysis: Comps versus Time since 1978, Q1, Q2, Q, Q4 * Q4 is highly correlated with other X variables * Q4 has been removed from the equation. Comps = - 9452 + 986 Time since 1978-1792 Q1-119 Q2-758 Q S = 297.82 OR Note -11244 = -9452-1792 -9452 = -9452 +0 etc Regression Analysis: Comps versus Time since 1978, Q1, Q2, Q, Q4 No constant option Comps = 986 Time since 1978-11244 Q1-10592 Q2-10210 Q - 9452 Q4 S = 297.82 28

Multiple Indicator Vars: Tech Issue Regression Analysis: Comps versus Time since 1978, Q1, Q2, Q, Q4 * Q4 is highly correlated with other X variables * Q4 has been removed from the equation. Comps = - 9452 + 986 Time since 1978-1792 Q1-119 Q2-758 Q S = 297.82 OR Note -11244 = -9452-1792 -9452 = -9452 +0 etc Regression Analysis: Comps versus Time since 1978, Q1, Q2, Q, Q4 No constant option Comps = 986 Time since 1978-11244 Q1-10592 Q2-10210 Q - 9452 Q4 S = 297.82 29

Categorical Variable approach Model Summary Regression Equations S R-sq 297.82 98.76% Quarter Q1 Comps = -11244 + 986.5 t Q2 Comps = -10592 + 986.5 t Coefficients Q Q4 Comps = -10210 + 986.5 t Comps = -9452 + 986.5 t Term Coef SE Coef Constant -11244 47 time since 1978 986.5 22.9 Quarter Consider Q2 Q1 at t = 0 Q2 65 149 Q 104 149 Q4 1792 150 0

Derived variables and Transforms in Time Series Lags Differences Rates of Return Log scale 1

Completions All Comps, quarterly, 1978 to 2000 Option 2 use all data, but diff model 14000 12000 Time Series Plot of Completions Quarter Q1 Q2 Q Q4 10000 8000 6000 4000 2000 Quarter Q1 Year 1978 Q1 1981 Q1 1984 Q1 1987 Q1 1990 Q1 199 Q1 1996 Q1 1999 2

Comps Auto-Regression for Time Series Basic idea next value like last value (Lag1) 14000 12000 Fitted Line Plot Comps = 564.6 + 0.9171 Lag1Comp S 1167.61 R-Sq 76.1% R-Sq(adj) 75.8% 10000 8000 6000 4000 2000 2000 4000 6000 8000 Lag1Comp 10000 12000

Auto-Regression for Time Series Basic idea next value like last value (Lag1) Auto Regression Y Y + * Y + t 0 lag1 t1 t + * Y t 0 lag1 t1 + * Y lag 4 t4 t Year. QuarterComps Lag1Comp Lag4Comp 1978 Q1 5777 1978 Q2 4772 5777 1978 Q 4588 4772 1978 Q4 424 4588 1979 Q1 7276 424 5777 1979 Q2 451 7276 4772 1979 Q 4284 451 4588 1979 Q4 4257 4284 424 1980 Q1 778 4257 7276 4

Using two lagged variables Regression Analysis: Comps versus Lag1Comp, Lag4Comp The regression equation is Comps = - 87 + 0.28 Lag1Comp + 0.782 Lag4Comp : S = 780.7 Comp Q4 2000 = 1287, Comp Q1 2000 = 1002 95% Pred Int Comp Q1 2001 = 11892 ± 2(780.7)= (100, 145) 5

Using Lagged Variables Basic Idea Current Quarter like prev quarter same Q last year Matrix Plot of Completions, Lag1Comp, Lag4Comp 4000 8000 12000 4000 8000 12000 12000 Completions 8000 4000 12000 Lag1Comp 8000 4000 Lag4Comp 6

1994 1994 1995 1996 1997 1997 1998 1999 2000 2000 2001 Comparison 16000 14000 12000 10000 8000 6000 4000 2000 0 Forecasting models Comps Linear in Time, quarter indicators Lag1 and Lag 4 Modelling Options 1 Parallel Linear Regressions Y Q Q Q Q t t 1 1 2 2 4 4 2 Seasonal AutoRegression Y Y Y t 1 t1 4 t4 t More efficient for prediction Fewer modelling assumptions Different modelling strategy t Lin in time + Q Lag 1 and lag 4 Comps Lag 1 Lag 4 Q1 Q2 Q inds 2000 22 Q1 1002 12079 990 1 0 0 10451 1140.17 2000 22.25 Q2 11590 1002 10227 0 1 0 1147.5 10989.57 2000 22.5 Q 11892 11590 10788 0 0 1 11945 11850.74 2000 22.75 Q4 1287 11892 12079 0 0 0 12979.5 12959.5 2001 2 Q1 1287 1002 1 0 0 1147 11891.51 2001 2.25 Q2? 11590 0 1 0 12.5? 2001 2.5 Q? 11892 0 0 1 1291 2001 2.75 Q4? 1287 0 0 0 1965.5 2002 24 Q1?? 1 0 0 1242 7

Model Criticism Criticism Does it make sense? Are there outliers? Choice amongst alternatives R 2 SE 8

Extra: Logs lags and differences Financial data IBM share price Natural language %age change MINITAB language logs 9

Financial Series- IBM Prices daily Simple Reg on Time 40

Logprice Logprice Log IBM Prices Log(Y t ) vs t Log(Y t ) vs log(y t-1 ) IBM Prices Logprice = 1.64 + 0.000561 t IBM Prices Logprice = 0.002264 + 0.9990 lag1logprice 2.0 1.9 1.8 Regression 95% PI S 0.094 R-Sq 94.4% R-Sq(adj) 94.4% 2.0 1.9 1.8 Regression 95% PI S 0.0080199 R-Sq 99.8% R-Sq(adj) 99.8% 1.7 1.7 1.6 1.6 1.5 1.4 1. 1.2 0 200 400 t 600 800 1000 1.5 1.4 1. 1. 1.4 1.5 1.6 1.7 lag1logprice 1.8 1.9 2.0 41

price price Modeled in Log Scale, presented in original units Log(Y Log(Y t )vs log(y t-1 ) t ) vs t IBM Prices log10(price) = 1.64 + 0.000561 t IBM Prices log10(price) = 0.002264 + 0.9990 log10(lag1price) 100 90 80 70 Regression 95% PI S 0.094 R-Sq 94.4% R-Sq(adj) 94.4% 100 90 80 70 Regression 95% PI S 0.0080199 R-Sq 99.8% R-Sq(adj) 99.8% 60 60 50 50 40 40 0 0 20 20 10 0 200 400 t 600 800 1000 10 20 0 40 50 60 lag1price 70 80 90 100 42

Differences/ Ratios First Differences Seasonal Diffs Today Yesterday This Q same Q last year Ratio Y(t) / Y(t-1) Rate of Return 100 x(y(t) Y(t-1))/ Y(t-1) 100 x (Ratio -1) Log(Ratio) Log( Y(t) ) Log ( Y(t-1) ) 4

La g1diff Financial Series- IBM Prices daily Simple Regression of Daily Diffs vs Time IBM Prices Lag1diff = 0.01424 + 0.000109 t 5.0 2.5 Regression 95% PI S 0.951260 R-Sq 0.1% R-Sq(adj) 0.0% 0.0-2.5-5.0 0 200 400 t 600 800 1000 44

Lag1difflog Financial Series- IBM Prices daily Simple Regression of First Diffs of LogPrice vs Time IBM Prices Lag1difflog = 0.000568 + 0.000000 t 0.04 0.0 0.02 0.01 Regression 95% PI S 0.0080216 R-Sq 0.0% R-Sq(adj) 0.0% 0.00-0.01-0.02-0.0-0.04-0.05 0 200 400 t 600 800 1000 45

Lag1difflog Financial Series- IBM Prices daily IBM Prices Lag1difflog = 0.000568 + 0.000000 t Interpretation 0.04 0.0 0.02 0.01 0.00 Regression 95% PI S 0.0080216 R-Sq 0.0% R-Sq(adj) 0.0% -0.01-0.02-0.0-0.04 log P log P 0 time t t1 t t1-0.05 0 200 400 600 800 1000 t log Pt P log t t 0.00057 or in (0.00057 0.016,0.00057 0.016) P t1 P t1 in (-0.0154, 0.0166) Pt 10 or in 10,10 1.001 or in 0.96,1.04 0.00057 0.016 0.016 In summary Rate of return 0.1% per day 4% P 46

Financial Series Day to day changes most naturally expressed as % change price tomorrow = price today small change Log(price t+1)= Log(price t) + Log(small change) Average drift per day (for logs) is 0.00057 ie about 0.1% growth pd = 61% pa 47

Financial Series Confidence in future prediction pt est hi lo 0.0006-0.015 0.0166 10^ Factor 1.001 0.9652 1.090 Eg initial capital 1000 Day 1 1001. 965 109 2 1002.6 92 1079 100.9 899 1122 4 1005. 868 1165 5 1006.6 88 1211 64 1612.4 0.0 infinity 65 1614.5 0.0 infinity 61% per annum?? 48

Derived Variables Why use derived variables? Adding extra variables gives more options Challenge Is there a cost? Which is best Scientific insight can powerful & simple analysis 49