Example. Row Hydrogen Carbon

Similar documents
12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

STA302/1001-Fall 2008 Midterm Test October 21, 2008

Probability and. Lecture 13: and Correlation

Statistics MINITAB - Lab 5

Simple Linear Regression

Transforming Numerical Methods Education for the STEM Undergraduate Torque (N-m)

Multiple Linear Regression Analysis

Multiple Choice Test. Chapter Adequacy of Models for Regression

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Simple Linear Regression

Objectives of Multiple Regression

ESS Line Fitting

Simple Linear Regression and Correlation.

residual. (Note that usually in descriptions of regression analysis, upper-case

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018

Can we take the Mysticism Out of the Pearson Coefficient of Linear Correlation?

Chapter Two. An Introduction to Regression ( )

ENGI 3423 Simple Linear Regression Page 12-01

Chapter 13 Student Lecture Notes 13-1

Simple Linear Regression - Scalar Form

Lecture 8: Linear Regression

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

Applied Statistics and Probability for Engineers, 5 th edition February 23, b) y ˆ = (85) =

Using Statistics To Make Inferences 9

Spreadsheet Problem Solving

: At least two means differ SST

1. The weight of six Golden Retrievers is 66, 61, 70, 67, 92 and 66 pounds. The weight of six Labrador Retrievers is 54, 60, 72, 78, 84 and 67.


Correlation and Simple Linear Regression

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Reaction Time VS. Drug Percentage Subject Amount of Drug Times % Reaction Time in Seconds 1 Mary John Carl Sara William 5 4

Regression. Linear Regression. A Simple Data Display. A Batch of Data. The Mean is 220. A Value of 474. STAT Handout Module 15 1 st of June 2009

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

Lecture Notes 2. The ability to manipulate matrices is critical in economics.

4. Standard Regression Model and Spatial Dependence Tests

Linear Regression with One Regressor

Lecture 1: Introduction to Regression

Simple Linear Regression and Correlation. Applied Statistics and Probability for Engineers. Chapter 11 Simple Linear Regression and Correlation

Lecture 1: Introduction to Regression

Topic 4: Simple Correlation and Regression Analysis

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Lecture 1 Review of Fundamental Statistical Concepts

REVIEW OF SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION

Lecture Notes Types of economic variables

Functions of Random Variables

Statistics: Unlocking the Power of Data Lock 5

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

Third handout: On the Gini Index

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

Answer key to problem set # 2 ECON 342 J. Marcelo Ochoa Spring, 2009

ε. Therefore, the estimate

Correlation and Regression Analysis

Chapter 2 Simple Linear Regression

UNIVERSITY OF TORONTO AT SCARBOROUGH. Sample Exam STAC67. Duration - 3 hours

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Lecture Notes Forecasting the process of estimating or predicting unknown situations

Econometric Methods. Review of Estimation

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs

CHAPTER 2. = y ˆ β x (.1022) So we can write

Chapter 2 Supplemental Text Material

Econ 388 R. Butler 2016 rev Lecture 5 Multivariate 2 I. Partitioned Regression and Partial Regression Table 1: Projections everywhere

MEASURES OF DISPERSION

Simple Linear Regression Analysis

LINEAR REGRESSION ANALYSIS

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Lecture 2: Linear Least Squares Regression

Summary of the lecture in Biostatistics

Maximum Likelihood Estimation

Evaluating Polynomials

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

Introduction to Matrices and Matrix Approach to Simple Linear Regression

QA 622 AUTUMN QUARTER ACADEMIC YEAR

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Statistics Descriptive and Inferential Statistics. Instructor: Daisuke Nagakura

General Method for Calculating Chemical Equilibrium Composition

Fundamentals of Regression Analysis

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

STK3100 and STK4100 Autumn 2018

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

Handout #8. X\Y f(x) 0 1/16 1/ / /16 3/ / /16 3/16 0 3/ /16 1/16 1/8 g(y) 1/16 1/4 3/8 1/4 1/16 1

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

ENGI 4421 Propagation of Error Page 8-01

PGE 310: Formulation and Solution in Geosystems Engineering. Dr. Balhoff. Interpolation

i 2 σ ) i = 1,2,...,n , and = 3.01 = 4.01

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

UNIT 2 SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS

Chapter Statistics Background of Regression Analysis

Previous lecture. Lecture 8. Learning outcomes of this lecture. Today. Statistical test and Scales of measurement. Correlation

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class)

ln( weekly earn) age age

A Primer on Summation Notation George H Olson, Ph. D. Doctoral Program in Educational Leadership Appalachian State University Spring 2010

Handout #1. Title: Foundations of Econometrics. POPULATION vs. SAMPLE

Transcription:

SMAM 39 Least Squares Example. Heatg ad combusto aalyses were performed order to study the composto of moo rocks collected by Apollo 4 ad 5 crews. Recorded c ad c of the Mtab output are the determatos of hydroge (H) ad carbo (C) parts per mllo (PPM) for specmes. Row Hydroge Carbo 0.0 05.0 8.0 0.0 3 90.0 99.0 4 8.0.0 5 38.0 50.0 6 0.0 50.0 7.8 7.3 8 66.0 74.0 9.0 7.7 0 0.0 45.0 85.0 5 x = 533. 8 x y y xy = 434. 84 = 6 = 4864. 58 = 43760. 84 Some of the above tems wll be computed o a had held calculator. The computg formula for xy x y = = = b = x ( x) = = ( 43760. 84) ( 533. 8)( 6) = =. 79 ( 434. 84) ( 533. 8)

a = ( y bx) a = ( 6 (. 79)( 533. 8)) = 8. 0598 The least square equato s y = 8.0598 +.79x The predcted values are obtaed by substtutg the x value to the least square equato to obta the predcted y. The dfferece betwee the observed ad the predcted values are called the resduals. For example whe x = 8 the observed value of y =. The predcted value s y = 8.0598 +.79() = 35.47 The resdual s 35.47 = 3.47. Oe way to fd SSE s to fd the sum of the squares of the resduals. The SSR may be foud after fdg SST by subtracto. Ths method s proe to roudoff errors. It s better to fd SSR frst. Use the followg computg formulae S S S xx yy xy ( x ) = x ( y ) = y = xy x y SSR =bs xy = (.79)(365.4)=0780.6 SST = S yy =3566.3 SSE = 695.7 00R =00( 0780.6/3566.3)=79.4% of the varato s accouted for.

Worksheet sze: 00000 cells MTB > ame c='hydroge'\ MTB > ame c='carbo' MTB > set c DATA> 0 8 90 8 38 0.8 66.0 0 85 DATA> ed MTB > set c DATA> 05 0 99 50 50 7.3 74 7.7 45 5 DATA> ed MTB > prt c c Data Dsplay Row Hydroge Carbo 0.0 05.0 8.0 0.0 3 90.0 99.0 4 8.0.0 5 38.0 50.0 6 0.0 50.0 7.8 7.3 8 66.0 74.0 9.0 7.7 0 0.0 45.0 85.0 5.0 Cosder the scatterplot. MTB > plot c*c

x 05+ x x Carbo x 70+ x x x x 35+ x 0+ ++++++ Hydroge 0 5 50 75 00 5 The graph of a straght le mght be a reasoable ft. The correlato coeffcet s gve the computato. MTB > corr c c Correlatos (Pearso) Correlato of Carbo ad Hydroge = 0.89. The regresso le s gve below MTB > regress c o c Regresso Aalyss The regresso equato s Carbo = 8. + 0.79 Hydroge Predctor Coef Stdev trato p Costat 8.059 8.394.5 0.060 Hydroge 0.79 0.34 5.90 0.000 s = 7.59 Rsq = 79.5% Rsq(adj) = 77.% Aalyss of Varace SOURCE DF SS MS F p Regresso 078 078 34.83 0.000 Error 9 786 30 Total 0 3566

Uusual Observatos Obs. Hydroge Carbo Ft Stdev.Ft Resdual St.Resd 85 5.00 85.3 7. 34.3.4R R deotes a obs. wth a large st. resd. The regresso le accouts for 79.5% of the varato. We ca fd the predcted values ad plot them. MTB > let c5=8.058+.79*c MTB > ame c5='predct' MTB > prt c c5 Row Hydroge predct 0.0 3.00 0 0.0 33.88 8.0 8.936 85.0 85.30 3 90.0 89.66 4 8.0 4.388 5 38.0 48.4 6 0.0 33.88 7.8 0.73 8 66.0 70.77 9.0 9.640 I the plot below the letter b are the predcted values ad the letter a are the observed values. TB > gstd * NOTE * Stadard Graphcs are eabled. Professoal Graphcs are dsabled. Use the GPRO commad to eable Professoal Graphcs. MTB > mplot c*c c5*c

Character Multple Plot A B 05+ A A B BB A 70+ B A A A 35+ ++++++ 0 5 50 75 00 5 A = Carbo vs. Hydroge B = C5(predct) vs. Hydroge Ths gves a dea of how good the ft s. Recorded here are the scores of 6 studets o a mdterm ad fal exam s statstcs. Data Dsplay Row mdterm fal 8 80 75 8 3 7 83 4 6 57 5 96 00 6 56 30 7 85 68 8 8 56 9 70 40 0 77 87 7 65 9 86 3 88 8 4 79 57 5 77 75 6 68 47 MTB > Aga lets make a scatter plot MTB > ame c3='mdterm' MTB > ame c4='fal' MTB > set c3 DATA> 8 75 7 6 96 56 85 8 70 77 7 9 88 79 77 68 DATA> ed MTB > set c4

DATA> 80 8 83 57 00 30 68 56 40 87 65 86 8 57 75 47 DATA> ed MTB > plot c4*c3 haracter Plot 00+ x Fal x x x x x x 75+ x x x x x x 50+ x x x 5+ ++++++ Mdterm 5 30 45 60 75 90 MTB > GPro. MTB > Observe that oe of the observatos (8,56) s way out. MTB > corr c4 c3 Correlatos (Pearso) Correlato of fal ad mdterm = 0.583 MTB > regress c4 o,c3 Regresso Aalyss The regresso equato s fal = 3.0 + 0.65 mdterm Predctor Coef Stdev trato p Costat.95 7.4.3 0.08 mdterm 0.65 0.37.69 0.08 s = 6. Rsq = 34.0% Rsq(adj) = 9.3% Aalyss of Varace

SOURCE DF SS MS F p Regresso 898.7 898.7 7. 0.08 Error 4 368.3 6.9 Total 5 5579.9 Uusual Observatos Obs. mdterm fal Ft Stdev.Ft Resdual St.Resd 8 8.0 56.00 34. 3.37.79.37RX Oly 34% of the varato s accouted for. Redog the regresso wthout the uusual observato mproves the ft cosderably but ot eough to make t worthwhle. MTB > let c6=c3 MTB > let c7=c4 The uusual observato was deleted from the colums o the worksheet. MTB > regress c7 o,c6 Regresso Aalyss The regresso equato s C7 = 37. +.39 C6 5 cases used cases cota mssg values Predctor Coef Stdev trato p Costat 37.09 4.6.5 0.56 C6.39 0.39 4.36 0.00 s = 3.00 Rsq = 59.4% Rsq(adj) = 56.3% Aalyss of Varace SOURCE DF SS MS F p Regresso 36.4 36.4 9.0 0.00 Error 3 98.5 69. Total 4 544.9 Theoretcal Devlopmet Gve a set of data pots (X, Y ) the objectve s to fd the straght le such that the sum of the squares of the dfferece betwee the observed values ad

those that would be predcted by the regresso equato s a mmum. Ths amouts to fd the values of the slope ad the y tercept such that Fab (, ) = ( Y a bx) = s mmzed. A o calculus dervato of the LS Equato s gve o the ext page.

Dervato of Least Square Formula wthout Calculus Notato S xx = Ú =ƒ Hx x L S xy = Ú =ƒ Hx x L Hy y ) S yy = Ú =ƒ Hy y L. The goal s to fd a ad b so that F(a,b)=Ú =ƒ Hy a bx L s mmmzed. Ths represets the dfferece betwee the observed values ad those predcted by the best fttg equato. Now addg ad subtractg y ad bx Ú =ƒ Hy a bx L = Ú =ƒ @Hy yl + Hy a b xl bhx xme = Ú =ƒ Hy y L +Hy a b xl +b Ú =ƒ = S yy + Hy a b xl +b S xx bs xy = S yy + Hy a b xl +S xx Jb bs xy = S yy IS xym Sxx Sxx + Iy a bxm +S xx Jb S xy Sxx N The above expresso s mmzed whe Hx x M bú =ƒ Hx xl Hy y ) + J S xy Sxx N IS xym ) Sxx b= S xy Sxx ad a = y bx.

Oce the regresso equato s derved the corrected sum of squares ca be broke up to two parts a sum of squares due to regresso ad a sum of squares due to error. SST =Ú = Hy yl =Ú = Hy a bx + a + bx yl =Ú = Hy a bx L +Ú = Ha + bx yl +Ú = Hy a bx L Ha + bx yl Sce a = y bx. y a bx =y y +bx bx Ú = Hy a bx L Ha + bx yl=ú = Hy y + b x bx L Hbx b xl = bs xy b S xx = S xy Sxx S xy Sxx Sxx=0 The cross term s therefore zero ad SST =SSR +SSE SSR = Ú = Ha + bx yl = b Ú = Hx xl = bs xy The quatty R = SSR represets the proporto of the SST varato accouted for by the regresso le.