Probability and. Lecture 13: and Correlation

Similar documents
Chapter 13 Student Lecture Notes 13-1

Linear Regression with One Regressor

Simple Linear Regression

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

Multiple Linear Regression Analysis

ENGI 3423 Simple Linear Regression Page 12-01

Simple Linear Regression


STA302/1001-Fall 2008 Midterm Test October 21, 2008

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018

Statistics MINITAB - Lab 5

Simple Linear Regression and Correlation. Applied Statistics and Probability for Engineers. Chapter 11 Simple Linear Regression and Correlation

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

4. Standard Regression Model and Spatial Dependence Tests

Lecture Notes Types of economic variables

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

Objectives of Multiple Regression

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Chapter 14 Logistic Regression Models

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

Simple Linear Regression and Correlation.

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Correlation and Simple Linear Regression

Applied Statistics and Probability for Engineers, 5 th edition February 23, b) y ˆ = (85) =

Statistics: Unlocking the Power of Data Lock 5

Lecture 8: Linear Regression

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Multiple Choice Test. Chapter Adequacy of Models for Regression

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Simple Linear Regression - Scalar Form

residual. (Note that usually in descriptions of regression analysis, upper-case

Chapter 11 The Analysis of Variance

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs

Fundamentals of Regression Analysis

Example. Row Hydrogen Carbon

Chapter Two. An Introduction to Regression ( )

ESS Line Fitting

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Functions of Random Variables

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Econometrics. 3) Statistical properties of the OLS estimator

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

UNIVERSITY OF TORONTO AT SCARBOROUGH. Sample Exam STAC67. Duration - 3 hours

University of Belgrade. Faculty of Mathematics. Master thesis Regression and Correlation

Chapter 2 Simple Linear Regression

Transforming Numerical Methods Education for the STEM Undergraduate Torque (N-m)

Homework Solution (#5)

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Previous lecture. Lecture 8. Learning outcomes of this lecture. Today. Statistical test and Scales of measurement. Correlation

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

Special Instructions / Useful Data

C. Statistics. X = n geometric the n th root of the product of numerical data ln X GM = or ln GM = X 2. X n X 1

Simple Linear Regression Analysis

Chapter 3 Multiple Linear Regression Model

Summary of the lecture in Biostatistics

Lecture 2: The Simple Regression Model

: At least two means differ SST

Handout #8. X\Y f(x) 0 1/16 1/ / /16 3/ / /16 3/16 0 3/ /16 1/16 1/8 g(y) 1/16 1/4 3/8 1/4 1/16 1

Reaction Time VS. Drug Percentage Subject Amount of Drug Times % Reaction Time in Seconds 1 Mary John Carl Sara William 5 4

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Chapter Statistics Background of Regression Analysis

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Introduction to F-testing in linear regression models

Correlation and Regression Analysis

Linear Regression. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Lecture Notes 2. The ability to manipulate matrices is critical in economics.

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

Econ 388 R. Butler 2016 rev Lecture 5 Multivariate 2 I. Partitioned Regression and Partial Regression Table 1: Projections everywhere

LINEAR REGRESSION ANALYSIS

Econometric Methods. Review of Estimation

Topic 4: Simple Correlation and Regression Analysis

A New Family of Transformations for Lifetime Data

Topic 9. Regression and Correlation

Sum Mean n

BIOSTATISTICS TOPIC 8: ANALYSIS OF CORRELATIONS I. SIMPLE LINEAR REGRESSION

Line Fitting and Regression

Analysis of Variance with Weibull Data

"It is the mark of a truly intelligent person to be moved by statistics." George Bernard Shaw

Simulation Output Analysis

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Linear Regression Siana Halim

Chapter 4 Multiple Random Variables

Regression. Linear Regression. A Simple Data Display. A Batch of Data. The Mean is 220. A Value of 474. STAT Handout Module 15 1 st of June 2009

Chapter 2 Supplemental Text Material

F. Tibaldi Limburgs Universitair Centrum, Hasselt University, Belgium

Continuous Distributions

The expected value of a sum of random variables,, is the sum of the expected values:

Linear Regression. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

REVIEW OF SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION

Unit 9 Regression and Correlation

Lecture 1 Review of Fundamental Statistical Concepts

COURSE: Applied Regression Analysis. Lecture 1: Review Simple linear regression.

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

1. The weight of six Golden Retrievers is 66, 61, 70, 67, 92 and 66 pounds. The weight of six Labrador Retrievers is 54, 60, 72, 78, 84 and 67.

Transcription:

933 Probablty ad Statstcs for Software ad Kowledge Egeers Lecture 3: Smple Lear Regresso ad Correlato Mocha Soptkamo, Ph.D. Outle The Smple Lear Regresso Model (.) Fttg the Regresso Le (.) The Aalyss of Varace Table (.6) Resdual Aalyss (.7) Correlato Aalyss (.9)

The Smple Lear Regresso Model I (.) Purpose of regresso aalyss: predct the value of a depedet or respose varable from the values of at least oe explaatory or depedet varable (also called predctors or factors). Purpose of correlato aalyss: measure the stregth of the correlato betwee two varables. Itercept parameter The Smple Lear Regresso Model II (.) y β 0 + β x Y + N(β 0 β x, σ ) Slope parameter Smple lear regresso model

The Smple Lear Regresso Model III (.) Iterpretato of the error varace σ The Smple Lear Regresso Model IV (.) β > 0 postve relatoshp β 0 No relatoshp SLR model s ot approprate for olear relatoshp 35 30 5 β < 0 egatve relatoshp 0 5 0 5 0 0 4 6 8 0 4 6 3

The Smple Lear Regresso Model V (.) Ex.67 pg.536: Car Plat Electrcty Usage Electrcty usage 3.8 3.6 3.4 3. 3.8.6.4. 3 3.5 4 4.5 5 5.5 6 6.5 Productom Excel sheet Outle The Smple Lear Regresso Model (.) Fttg the Regresso Le (.) The Aalyss of Varace Table (.6) Resdual Aalyss (.7) Correlato Aalyss (.9) 4

Fttg the Regresso Le I (.) : Selectg the best le (errors) error estmated y The least squares ft Fttg the Regresso Le II (.) yˆ x 0 : : β ad β SSE ˆ y β 0 + βx predcted value of y for observato. value of observato. e Subject to: e 0 are chose to mmze: ( y ˆ y ) [ ] y ( β + β x ) 0 5

Fttg the Regresso Le III (.) Method of Least Squares x y β x β0 β xy y x Varace of errors: ( x) ˆ σ SSE - sce two regresso parameters eed to be computed frst Fttg the Regresso Le IV (.) Ex.67 pg.545: Car Plat Electrcty Usage β x β y b x 0 x y xy ( x) x 4.885 y.846 x x y 9.3 69.53 69.53 4.885.846 β 9.3 4.885 0.4988 β.846 0.4998 4.885 0 0.409 y 0.409 + 0. 499x Excel sheet 6

Fttg the Regresso Le V (.) Ex.67 pg.545: Car Plat Electrcty Usage 3.8 3.6 3.4 y 0.498x + 0.409 R² 0.80 Electrcty usage 3. 3.8.6.4. 3 3.5 4 4.5 5 5.5 6 6.5 Productom Outle The Smple Lear Regresso Model (.) Fttg the Regresso Le (.) The Aalyss of Varace Table (.6) Resdual Aalyss (.7) Correlato Aalyss (.9) 7

Outle The Smple Lear Regresso Model (.) Fttg the Regresso Le (.) The Aalyss of Varace Table (.6) Resdual Aalyss (.7) Correlato Aalyss (.9) Outle The Smple Lear Regresso Model (.) Fttg the Regresso Le (.) The Aalyss of Varace Table (.6) Resdual Aalyss (.7) Correlato Aalyss (.9) 8

The Aalyss of Varace Table: Sum of Squares Decomposto I (.6.) Apply the smlar ANOVA approach as the oe-factor layout as Chapter Cosder the varablty the depedet varable y Hypothess test: H 0 : β 0 The Aalyss of Varace Table: Sum of Squares Decomposto II (.6.) SST ( y y) SSR ( yˆ y) SST SSE SSE ( ˆ ) y y 9

The Aalyss of Varace Table: Sum of Squares Decomposto III (.6.) The sum of squares for a smple lear regresso The Aalyss of Varace Table: Sum of Squares Decomposto IV (.6.) The aalyss of varace table for a smple lear regresso aalyss Hypothess test: H 0 : β 0 The two-sded p-value s p-value P(X > F) where X s RV that has a F,- dstrbuto 0

The Aalyss of Varace Table: Sum of Squares Decomposto V (.6.) Coeffcet of determato (R ): fracto of varato explaed by the regresso R SSR SST SSE SST SST (0 R ) SSE SST The closer R s to oe, the better s the regresso model. The Aalyss of Varace Table: Sum of Squares Decomposto VI (.6.) The coeffcet of determato R s larger scearo II tha scearo I

The Aalyss of Varace Table: Sum of Squares Decomposto VII (.6.) Ex.67 pg.57: Car Plat Electrcty Usage MSR.4 F 40.53 MSE 0.099 SSR.4 R 0.80 SST.55 The hgher the value of R the better the regresso. Excel sheet Outle The Smple Lear Regresso Model (.) Fttg the Regresso Le (.) The Aalyss of Varace Table (.6) Resdual Aalyss (.7) Correlato Aalyss (.9)

Resdual Aalyss Methods I (.7.) Resduals: dffereces betwee the observed values of the depedet varable ad the correspodg predcted (ftted) values ˆ e y Resdual aalyss ca be used to Idetfy outlers Check f the ftted model s good Check f the varace of error s costat Check f the error terms are ormally dstrbuted y Excel sheet Resdual Aalyss Methods II (.7.) Plot the resduals e agast the values of the explaatory varable x Radom scatter plot dcates o problem wth the obtaed regresso model If e /σˆ (stadardzed resdual) s > 3, data pot s a outler If there are outlers, they should be removed ad the regresso le should be ftted aga Excel sheet 3

Resdual Aalyss Methods III (.7.) Resdual plot dcatg pots that may be outlers Resdual Aalyss Methods IV (.7.) If resdual plots show postve ad egatve resduals grouped together, a lear model s ot sutable A groupg of postve ad egatve resduals dcates that the lear model s approprate 4

Resdual Aalyss Methods V (.7.) If the resdual plot shows a fuel shape, the varace of error (σ ) s ot costat, coflctg w/ the assumpto A fuel shape the resdual plot dcates a o-costat error varace Resdual Aalyss Methods VI (.7.) Normal probablty plot (ormal scores plot) of resduals ca be used to check f the error terms ε are ormally dstrbuted Normal sc cores A ormal scores plot of a smulated sample from a ormal dstrbuto, whch shows the pots lyg approxmately o a straght le 5

Resdual Aalyss Methods VII (.7.) res Normal scor Exhbts o-ormal dstrbuto of resduals Lear modelg approach may ot be used Normal scores Normal scores plots of smulated samples from o-ormal dstrbutos, whch show olear patters Outle The Smple Lear Regresso Model (.) Fttg the Regresso Le (.) The Aalyss of Varace Table (.6) Resdual Aalyss (.7) Correlato Aalyss (.9) 6

The Sample Correlato Coeffcet I (.9.) From the correlato eq. Secto.5.4, Cov( X, Y ) ρ Corr( X, Y ) Var( X )Var( Y ) whch measures the stregth of lear assocato betwee two jotly dstrbuted RVs X ad Y The sample correlato coeffcet r for a set of pared data observatos (x, y ) s r ( x x )( y y ) ( x x) x x x y xy ( y y) y y (- r ) The Sample Correlato Coeffcet II (.9.) r 0 o lear assocato r < 0 egatve lear assocato r > 0 postve lear assocato 7

The Sample Correlato Coeffcet III (.9.) r R (sample correlato coeft.) (coeft. of fdetermato) t r s uchaged f x ad y are swapped, whch s cotrast to regresso aalyss, whch requres that oe varable be depedet ad the other explaatory r s also ot affected by ay lear combato of the varables, e.g., x ax + b ad ad y cy d + The Sample Correlato Coeffcet IV (.9.) Hypothess test: H 0 : ρ 0(o 0 correlato betwee RVs) ca be performed by computg t-statstc r t r wth a t-dstrbuto w/ degrees of freedom 8

The Sample Correlato Coeffcet V (.9.) Ex.69 pg.588: Craal Crcumfereces r S S XY XX S YY 3.0745 0.55.489 99.457 (r R 0.55 0.065) Null Hypothess H 0 : ρ 0 (o correlato) computer t-statstc r 0.55 8 t r 0.55 t-statstc. p-value P(X > t) P(X >.) 0.77 Sce p-value > α, we accept H 0 fger legth ad craal crcumferece are ot correlated. Excel sheet 9