A Short Introduction to Curve Fitting and Regression by Brad Morantz

Similar documents
Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Inferences for Regression

The Multiple Regression Model

Regression Models - Introduction

Analysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total

STAT5044: Regression and Anova. Inyoung Kim

THE EFFECTS OF MULTICOLLINEARITY IN ORDINARY LEAST SQUARES (OLS) ESTIMATION

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Regression. Simple Linear Regression Multiple Linear Regression Polynomial Linear Regression Decision Tree Regression Random Forest Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Regression Analysis for Data Containing Outliers and High Leverage Points

Residuals from regression on original data 1

General Linear Statistical Models

ECON3150/4150 Spring 2015

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

ISQS 5349 Final Exam, Spring 2017.

Basic Business Statistics 6 th Edition

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Formal Statement of Simple Linear Regression Model

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Chapter 1. Linear Regression with One Predictor Variable

Prediction Intervals in the Presence of Outliers

Inference for the Regression Coefficient

Statistical Modelling in Stata 5: Linear Models

Statistical Techniques II EXST7015 Simple Linear Regression

Ch 2: Simple Linear Regression

Applied Statistics and Econometrics

Correlation Analysis

Analysis of Bivariate Data

Simple Linear Regression

CAS MA575 Linear Models

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

The Simple Linear Regression Model

Statistics for Managers using Microsoft Excel 6 th Edition

The Classical Linear Regression Model

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Assignment 9 Answer Keys

Answer Keys to Homework#10

Business Statistics. Lecture 10: Correlation and Linear Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES

Unbalanced Data in Factorials Types I, II, III SS Part 1

SIMPLE REGRESSION ANALYSIS. Business Statistics

Regression. Marc H. Mehlman University of New Haven

The Simple Regression Model. Part II. The Simple Regression Model

Multiple Regression Analysis

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Confidence Intervals, Testing and ANOVA Summary

Model Selection. Frank Wood. December 10, 2009

Categorical Predictor Variables

Inference for Regression Simple Linear Regression

Statistics II Exercises Chapter 5

Design & Analysis of Experiments 7E 2009 Montgomery

STAT5044: Regression and Anova

STA 4210 Practise set 2a

Density Temp vs Ratio. temp

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Mathematics for Economics MA course

Regression Models - Introduction

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

Question Possible Points Score Total 100

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

STAT 350: Summer Semester Midterm 1: Solutions

How To: Deal with Heteroscedasticity Using STATGRAPHICS Centurion

14 Multiple Linear Regression

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Diagnostics and Remedial Measures

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Inference for Regression Inference about the Regression Model and Using the Regression Line

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Regression: Ordinary Least Squares

MATH 644: Regression Analysis Methods

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Introduction to Linear regression analysis. Part 2. Model comparisons

Package ForecastCombinations

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

2.2 Classical Regression in the Time Series Context

Section 11: Quantitative analyses: Linear relationships among variables

Chapter 10. Supplemental Text Material

The prediction of house price

MS&E 226: Small Data

Chapter 2: simple regression model

Random and Mixed Effects Models - Part II

ECON3150/4150 Spring 2016

Essential of Simple regression

Correlation and Regression

Data Set 8: Laysan Finch Beak Widths

ECON The Simple Regression Model

NATCOR Regression Modelling for Time Series

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Chapter 16. Simple Linear Regression and dcorrelation

Basic Business Statistics, 10/e

An Introduction to Bayesian Linear Regression

ST430 Exam 1 with Answers

Simple Linear Regression

Simple Linear Regression for the Climate Data

Transcription:

A Short Introduction to Curve Fitting and Regression by Brad Morantz bradscientist@machine-cognition.com

Overview What can regression do for me? Example Model building Error Metrics OLS Regression Robust Regression Correlation Regression Table Limitations Another method - Maximum Likelihood

Example You have the mass, payload mass, and distance of a number of missiles. You need an equation for maximum distance based upon mass of missile & payload Regression will let you build a model, based upon these observations that will be of the form Max distance = B 1 * total mass + B 2 * payload mass + constant With this equation, you can answer questions

Model Building Why build a model? It can help us to forecast Can assist in design Two ways Causal factors Data driven Regression is the latter Model based upon observations

MSE Residual Difference between model and actual e = Y - Ŷ If we summed them, the negative and positive would cancel out If we took Absolute value, would not be differentiable about origin So we take sum of squares MSE is mean of sum of squares Plot of values Line is the model Dots are the actual

RMSE RMSE = sqrt(mse) If we accept that MSE is estimator of variance, then RMSE is estimator of standard deviation. It is also a metric of how much error there is, or how well a model fits a set of data MSE and RMSE are often used as cost functions in many mathematical operations

OLS Regression Ordinary Least Squares Yhat = B 0 + B 1 X 1 +.... + B n X n + e B 0 is the Y intercept B n is the coefficient for each variate X n is the variate e is the error The program calculates the coefficients to produce a line/model with the least amount of squared error (MSE)

Robust Regression Based on work by Kaufmann & Rousseuw Uses median instead of mean Not standard practice No automatic routines to do it Is less affected by outliers

Simple Test of Model Plot out the residuals Look at this plot Are the residuals approximately constant? Or do you see a trend that they are growing or attenuating? (heteroscedasticity) If it is this latter situation, the causal factors are changing and the model will need to be modified

Linear Correlation Coefficient Shows the relationship between two variables Positive means that they travel in the same direction Negative means that they go in opposite directions Like covariance, but it has no scale Can use for comparisons Range is from -1 to +1 Zero means no relationship, or independent R xx = X T X (which becomes easy to implement in a matrix language like Fortran or Matlab)

Correlation and Dependence If two items are correlated, then r 0 They are not independent If they are independent, then r = 0 P(a) = P(a b); b is of no effect on a Need to have some theory to support above

R 2 Pearsonian correlation coefficient Square of r, the correlation coefficient Fraction of the variability in the system that is explained by this model Real world is usually 10% to 70% Adjusted R 2 Only use for model building Penalizes for additional causal variables

Reading a Regression Table Y X 1.113711 1.575213 1.446594 2.122573 2.505273 3.399938 5.416551 3.836957 5.875342 4.120897 6.141051 5.128728 6.484368 6.373329 8.516112 8.217644 8.853574 8.980098 9.272497 9.876384 12 10 8 6 4 2 X X Residuals X Residual Plo 2 1 0-1 0 2 4 6 8 10 12-2 X 0 0 2 4 6 8 10 This residual plot looks acceptable About as much on top as on bottom SUMMARY OUTPUT Regression Statistics Multiple R 0.9483 R Square 0.899273 Adjusted R 0.886682 Standard E 1.008021 Observatio 10 The R Square is very good, so is the adjusted R Square This model is very explanatory ANOVA df SS MS F ignificance F Regressio 1 72.573 72.573 71.42264 2.94E-05 Residual 8 8.128851 1.016106 Total 9 80.70185 Coefficienttandard Er t Stat P-value Lower 95%Upper 95%Lower 95.0Upper 95.0% Intercept 0.295648 0.7 0.422355 0.683888-1.318555 1.909852-1.318555 1.909852 X 0.982041 0.116201 8.451192 2.94E-05 0.71408 1.250002 0.71408 1.250002 The model is good as the F value is >> 3 and the significance is << than 0.05 X is a good coefficient, as it is statistically significant, but the intercept is not with a P of 0.68 and a t <3

Limitations of OLS Built on the assumption of linear relationship Error grows when non-linear Can use variable transformation Harder to interpret Limited to one dependent variable Limited to range of data, can not accurately interpolate out of it

Maximum Likelihood Can calculate regression coefficients If probability distribution of error terms is available Calculates the minimum variance unbiased estimators Calculates the best way to fit a mathematical model to the data Choose the estimate for unknown population parameter which would maximize the probability of getting what we have

References Applied Linear Statistical Models by Neter, Kutner, Nachtsheim, & Wasserman A Handbook for Linear Regression by Younger Introduction to Linear Regression Analysis by Montgomery & Peck The Application of Regression Analysis by Wittink A Second Course in Business Statistics: Regression Analysis by Mendenhall & McClave Most good statistics books have some information on regression analysis