Regression, Inference, and Model Building

Similar documents
Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

Linear Regression Models

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS

Correlation Regression

1 Inferential Methods for Correlation and Regression Analysis

(all terms are scalars).the minimization is clearer in sum notation:

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Simple Linear Regression

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

Grant MacEwan University STAT 252 Dr. Karen Buro Formula Sheet

Properties and Hypothesis Testing

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Correlation and Regression

Important Formulas. Expectation: E (X) = Σ [X P(X)] = n p q σ = n p q. P(X) = n! X1! X 2! X 3! X k! p X. Chapter 6 The Normal Distribution.

Algebra of Least Squares

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Describing the Relation between Two Variables

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

Question 1: Exercise 8.2

Final Examination Solutions 17/6/2010

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

Mathematical Notation Math Introduction to Applied Statistics

ECON 3150/4150, Spring term Lecture 3

University of California, Los Angeles Department of Statistics. Simple regression analysis

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

Regression. Correlation vs. regression. The parameters of linear regression. Regression assumes... Random sample. Y = α + β X.

Sample Size Determination (Two or More Samples)

Statistics 20: Final Exam Solutions Summer Session 2007

CE3502 Environmental Monitoring, Measurements, and Data Analysis (EMMA) Spring 2008 Final Review

Chapter 13, Part A Analysis of Variance and Experimental Design

Correlation and Covariance

Comparing your lab results with the others by one-way ANOVA

n but for a small sample of the population, the mean is defined as: n 2. For a lognormal distribution, the median equals the mean.

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Stat 139 Homework 7 Solutions, Fall 2015

STP 226 ELEMENTARY STATISTICS

Regression, Part I. A) Correlation describes the relationship between two variables, where neither is independent or a predictor.

Statistical Properties of OLS estimators

Common Large/Small Sample Tests 1/55

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

11 Correlation and Regression

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Simple Linear Regression

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

INSTRUCTIONS (A) 1.22 (B) 0.74 (C) 4.93 (D) 1.18 (E) 2.43

Lecture 1, Jan 19. i=1 p i = 1.

Statistics. Chapter 10 Two-Sample Tests. Copyright 2013 Pearson Education, Inc. publishing as Prentice Hall. Chap 10-1

REVIEW OF SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION

Formulas and Tables for Gerstman

Paired Data and Linear Correlation

Chapter 1 (Definitions)

Chapter 12 Correlation

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

Simple Linear Regression. Copyright 2012 Pearson Education, Inc. Publishing as Prentice Hall. Chapter Chapter. β0 β1. β β = 1. a. b.

Lesson 11: Simple Linear Regression

STP 226 EXAMPLE EXAM #1

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

REGRESSION MODELS ANOVA

Lecture 11 Simple Linear Regression

Chapter 4 - Summarizing Numerical Data

Ismor Fischer, 1/11/

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

Topic 9: Sampling Distributions of Estimators

Agenda: Recap. Lecture. Chapter 12. Homework. Chapt 12 #1, 2, 3 SAS Problems 3 & 4 by hand. Marquette University MATH 4740/MSCS 5740

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

TIME SERIES AND REGRESSION APPLIED TO HOUSING PRICE

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

Linear Regression Analysis. Analysis of paired data and using a given value of one variable to predict the value of the other

DAWSON COLLEGE DEPARTMENT OF MATHEMATICS 201-BZS-05 PROBABILITY AND STATISTICS FALL 2015 FINAL EXAM

Simple Linear Regression Matrix Form

Stat 200 -Testing Summary Page 1

CTL.SC0x Supply Chain Analytics

Statistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

1036: Probability & Statistics

ECON 3150/4150, Spring term Lecture 1

UNIT 11 MULTIPLE LINEAR REGRESSION

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

GG313 GEOLOGICAL DATA ANALYSIS

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Simple Regression Model

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

Transcription:

Regressio, Iferece, ad Model Buildig Scatter Plots ad Correlatio Correlatio coefficiet, r -1 r 1 If r is positive, the the scatter plot has a positive slope ad variables are said to have a positive relatioship If r is egative, the the scatter plot has a egative slope ad variables have egative relatioships If r = 0, the o liear relatioship exists betwee the two variables If r = 1, the the data falls i a perfect lie with a positive slope If r = -1, the data falls i a perfect lie with a egative slope Stregth of the correlatio is r The larger r is, the stroger the correlatio r = ( xy) ( x)( y) ( ( x ) ( x) ) ( ( y ) ( y) ) Sigificat Liear Relatioship (two-tailed test): Ho: p = 0 (implies there is o sigificat liear relatioship) Ha: p 0 (implies there is a sigificat liear relatioship) Negative Liear Relatioship (left-tailed test): Ho: p 0 Ha: p < 0 Sprig 017

Positive Liear Relatioship (right-tailed test): Ho: p 0 Ha: p > 0 Test statistic: t = r 1 r Degrees of freedom = - Sigificat Liear Relatioship (two-tailed tests): Reject Ho if t (tα) Negative Liear Relatioship (left-tailed test): Reject Ho if t t α Positive Liear Relatioship (right-tailed test): Reject Ho if t t α Usig critical values to determie statistical sigificace: The correlatio coefficiet, r, is statistically sigificat if the absolute value of the correlatio is greater tha the critical value i the table. r > r α The coefficiet of determiatio, r, is the measure of the amout of variatio i y explaied by the variatio i x. Sprig 017

Fittig a Liear Model A liear relatioship is graphically described as a lie. y = b 0 + b 1 x Error = observed Y predicted Y Error = y - y y = b 0 + b 1 x + error y = b 0 + b 1 x b 0 = y-itercept b 1 = slope of the lie y = predicted y = umber of data values Sum of Squared Errors (SSE): SSE = (error i ) = (y i y ) i = (y i (b 0 + b 1 x 1 )) Defiig the least squares lie: Slope = b 1 = ( x)( y) xy x ( x) y itercept = b 0 = 1 ( y b 1 x) r = ( ( xy) ( x)( y) ) ( ( x ) ( x) ) ( ( y ) ( y) ) The parameter r is defied as the fractio of the total variatio explaied by the least squares lie. I other words, r measures how well the least squares lie fits the sample data. If the total variatio is explaied completely, the we have r = 1 ad we say that there is a perfect liear correlatio. O the other had, if the total variatio is all uexplaied, the r = 0. Regressio Aalysis Variace of Errors (S e ) = SSE Variace of Slope: (S b1 ) = x (S e ) ( x) Sprig 017

Bouds of cofidece iterval: b 1 ± (tα) (S b1 ) Multiple Regressio Multiple regressio model y = b 0 + b 1 x 1 + b x + + b k x k b 1 =slope S b1 =stadard deviatio of slope (tα)=look up i table α ad df = - Where x 1, x,, x k are the k idepedet variables i the model ad b 1, b,, b k are the correspodig coefficiets of the idepedet variables. These values, b 1, b,, b k, are the sample estimates of the correspodig populatio parameters, β 1, β,, β k. The y-itercept of the multiple regressio equatio is b 0, which is the sample estimate of the populatio parameter, β 0. The multiple coefficiet of determiatio, R Test the claim that at least oe of the idepedet variables coefficiets is ot equal to 0 Ho: β 1 = β = = β k = 0 Ha: At least oe coefficiet does ot equal 0 p-value = Sigificace F (from ruig the ANOVA test i excel) p-value < alpha reject the ull p-value alpha fail to reject the ull Test if specific variables are sigificat Ho: β 1 = 0 Ha: β 1 0 *If reject ull the variable is sigificat ANOVA Regressio Aalysis of Variace provides iformatio about how well a estimated regressio model fits the data. ANOVA table divides the total variatio i Y ito variatio that ca be explaied by the model ad variatio that caot be explaied by the model. The divisio of variatio takes place i the colum labeled SS. Source SS DF MS F Regressio SSR k MSR = SSR k Error SSE -(k+1) SSE MSE = (k + 1) Total TotalSS -1 F = MSR MSE Sprig 017

Total variatio i Y = variatio i Y explaied by the model + variatio i Y ot explaied by the model TotalSS = SSR + SSE SSR = Sum of Squared Regressio = amout of variatio i Y explaied by the model SSR = TotalSS - SSE SSE = Sum of Squared Error = variatio i Y that the model could ot explai SSE = (y i y ) i=1 TotalSS = Total Sum of Squared Deviatios about the mea of the depedet variable Y TotalSS = (y i y ) i=1 *Note: that if this expressio were divided by (-1), it would be the sample variace of k = umber of idepedet variables i the model, called DFR Degrees of Freedom amog Regressio -(k+1) = degrees of freedom associated with uexplaied variatio i the model, called DFE Degrees of Freedom amog Error -1 = total degrees of freedom, where is the umber of observatios, called DFT Total Degrees of Freedom MSR = SSR, average amout of variatio explaied per idepedet variable k MSE = SSE, variace of the error terms (k+1) F statistic = MSR, large F values are desirable sice that would idicate that explaied MSE variatio is relatively larger tha the uexplaied. Sprig 017