Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

Similar documents
1 Inferential Methods for Correlation and Regression Analysis

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700

STP 226 ELEMENTARY STATISTICS

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Linear Regression Models

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Regression, Inference, and Model Building

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

11 Correlation and Regression

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Lecture 11 Simple Linear Regression

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

Simple Linear Regression

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Describing the Relation between Two Variables

SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

(all terms are scalars).the minimization is clearer in sum notation:

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

Agenda: Recap. Lecture. Chapter 12. Homework. Chapt 12 #1, 2, 3 SAS Problems 3 & 4 by hand. Marquette University MATH 4740/MSCS 5740

Properties and Hypothesis Testing

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

Chapter 20. Comparing Two Proportions. BPS - 5th Ed. Chapter 20 1

Regression. Correlation vs. regression. The parameters of linear regression. Regression assumes... Random sample. Y = α + β X.

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

Tables and Formulas for Sullivan, Fundamentals of Statistics, 2e Pearson Education, Inc.

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

Correlation Regression

Common Large/Small Sample Tests 1/55

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Formulas and Tables for Gerstman

Final Examination Solutions 17/6/2010

Grant MacEwan University STAT 252 Dr. Karen Buro Formula Sheet

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

Important Formulas. Expectation: E (X) = Σ [X P(X)] = n p q σ = n p q. P(X) = n! X1! X 2! X 3! X k! p X. Chapter 6 The Normal Distribution.

x z Increasing the size of the sample increases the power (reduces the probability of a Type II error) when the significance level remains fixed.

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Mathematical Notation Math Introduction to Applied Statistics

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes

Chapter 1 (Definitions)

INSTRUCTIONS (A) 1.22 (B) 0.74 (C) 4.93 (D) 1.18 (E) 2.43

Paired Data and Linear Correlation

University of California, Los Angeles Department of Statistics. Simple regression analysis

1 Models for Matched Pairs

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

REVIEW OF SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

z is the upper tail critical value from the normal distribution

n but for a small sample of the population, the mean is defined as: n 2. For a lognormal distribution, the median equals the mean.

MidtermII Review. Sta Fall Office Hours Wednesday 12:30-2:30pm Watch linear regression videos before lab on Thursday

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Computing Confidence Intervals for Sample Data

STATISTICAL INFERENCE

TAMS24: Notations and Formulas

Correlation and Regression

BIOSTATS 640 Intermediate Biostatistics Frequently Asked Questions Topic 1 FAQ 1 Review of BIOSTATS 540 Introductory Biostatistics

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Stat 200 -Testing Summary Page 1

This chapter focuses on two experimental designs that are crucial to comparative studies: (1) independent samples and (2) matched pair samples.

(7 One- and Two-Sample Estimation Problem )

Simple Regression Model

Sample Size Determination (Two or More Samples)

Rule of probability. Let A and B be two events (sets of elementary events). 11. If P (AB) = P (A)P (B), then A and B are independent.

Topic 10: Introduction to Estimation

Linear Regression Analysis. Analysis of paired data and using a given value of one variable to predict the value of the other

UNIT 8: INTRODUCTION TO INTERVAL ESTIMATION

Logit regression Logit regression

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Unit 9 Regression and Correlation

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Correlation and Covariance

Sampling Distributions, Z-Tests, Power

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 4

MATH/STAT 352: Lecture 15

Lecture 5. Materials Covered: Chapter 6 Suggested Exercises: 6.7, 6.9, 6.17, 6.20, 6.21, 6.41, 6.49, 6.52, 6.53, 6.62, 6.63.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

STP 226 EXAMPLE EXAM #1

Confidence Interval for one population mean or one population proportion, continued. 1. Sample size estimation based on the large sample C.I.

STAT 155 Introductory Statistics Chapter 6: Introduction to Inference. Lecture 18: Estimation with Confidence

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

Confidence Interval for Standard Deviation of Normal Distribution with Known Coefficients of Variation

Stat 139 Homework 7 Solutions, Fall 2015

Algebra of Least Squares

Simple Linear Regression

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

STA6938-Logistic Regression Model

Lesson 11: Simple Linear Regression

Pearson Edexcel Level 3 Advanced Subsidiary and Advanced GCE in Statistics

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

Least-Squares Regression

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Transcription:

Chapters 5 ad 13: REGREION AND CORRELATION (ectios 5.5 ad 13.5 are omitted) Uivariate data: x, Bivariate data (x,y). Example: x: umber of years studets studied paish y: score o a proficiecy test For each studet oe (x,y) observatio: x (years) 3 4 4 5 3 4 5 3 y (score) 57 78 7 58 89 63 73 84 75 48 We are iterested i ivestigatig the relatioship betwee the x ad y variables. x: idepedet variable predictor explaatory variable y: depedet variable respose variable catterplot

100 90 80 70 60 score 50 40 30 0 10 0 0 1 3 4 5 6 The best fittig lie: years y a + bx (meaig of a,b) Miimize squared deviatios: [ y ( a + ( x, y) Calculatio: b bx)] ( x lope: ( x x)( y x) y)

a y-itercept y bx Predicted values: Notatio: xx ( x y ˆ a + bx ( x) x) x yy ( y ( y) y) y xy ( x x)( y y) xy ( x)( y) o we have b xy xx 3

Calculatios: x y x y xy 3 57 4 78 4 7 58 5 89 3 63 4 73 5 84 3 75 48 35 697 xx yy xy b a Best fittig lie: 4

Pearso s ample Correlatio r: z x z y r 1 xx xy yy Properties of r: 1. does ot deped o the uit of measuremet. symmetric i x,y variables 3. 1 r 1 4. describes the stregth of relatioship 0 r.5 weak.5 < r.8 moderate.8 < r 1 strog 5. r +1: all poits o a lie with positive slope r -1 : all poits o a lie with egative slope r 0 : the relatioship is ot liear or o relatioship Coefficiet of determiatio: r, gives the proportio of variatio i y that ca be attributed to liear relatioship betwee x ad y. (pearma s rak correlatios coefficiet: omitted) Calculatio: Example cotiued: 5

Assessig the Fit of the Lie: Predicted values: Residuals (Errors): y y ˆ a + yˆ, y yˆ 1,..., bx y yˆ 1 Regressio tatistics Multiple R 0.9111135 R quare 0.83017809 Adjusted R 0.808893786 quare tadard Error 5.651379941 Observatios 10 ANOVA df M F igificace F Regressio 1 148.59538 148.59538 39.094991 0.00045 Residual 8 55.5047619 31.9380954 Total 9 1504.1 Coefficiets tadard Error t tat Itercept 31.53333333 6.36041875 4.9577455 X Variable 1 10.9047619 1.744053714 6.5537877 REIDUAL OUTPUT Observatio Predicted Y Residuals 1 64.4761905-7.47619048 75.1538095.847619048 3 75.1538095-3.1538095 4 53.3485714 4.65714857 5 86.0571486.94857143 6 64.4761905-1.47619048 6

7 75.1538095 -.1538095 8 86.0571486 -.05714857 9 64.4761905 10.7538095 10 53.3485714-5.34857143 Residual Plot Residuals 15 10 5 0-5 -10 0 4 6 X variable Defiitios: Total um of quares: To Residual um of quares: Resid To ( y 1 y) + ( y y) +... + ( y y) ( y i y) y i ( y ) i ( y yˆ i yi ay b Resid ( y yˆ) + ( y yˆ) +... + ( y yˆ) xy 1 ) A alterative way of calculatig r : r 1 Re sid To 7

The Model of imple Liear Regressio: y α + βx + e α: y-itercept β: slope e: radom error (deviatio) We assume that 1. the mea of e is zero,. its stadard deviatio is σ, which does ot deped o x, 3. e is a ormal radom variable 4. radom errors at differet x values are idepedet. The stadard deviatio σ is estimated by s e : 8

s e Re sid Example: Poit estimators: Ukow Estimator α a β b σ s e α+βx* a+bx* α+βx*+e a+bx* The four assumptios above make iferece possible: Iferece about parameter β: 1. The mea of the distributio of b is β. The stadard deviatio of the distributio of b is σ σ b xx 3. Estimator b has ormal distributio We estimate σ b by s b ad s e xx 9

t b β s b has t-distributio with - d.f. (1-α)100% Cofidece Iterval for β: Example: b ± t α/ s b Hypothesis Test: 1.Null Hypothesis: H 0 : ββ 0. Alterative Hypothesis: Ha: β >, or <, or β 0 3. Test statistic: T b β 0 s b (d.f. -) 4. P-value: P(T > observed t), or P(T < observed t), or P(T > observed t ), respectively for the three alteratives. Example: 10

Ifereces Based o the Estimated Regressio Lie Recall: (x* is a selected value of the predictor) Ukow Estimator α a β b σ s e α+βx* a+bx* α+βx*+e a+bx* Now we shall review iferece for the last two estimatio. α + βx* is the expected value of y at x x* Its estimator a + bx* 1. is ubiased,. its samplig distributio is ormal, 3. the stadard deviatio of a+bx* is σ a+ bx* σ 1 ( x * x) + xx Whe σ is ot kow it is estimated by 11

s a+ bx* s e 1 ( x * x) + xx ad iferece is based o T a + bx * ( α + βx*) s a + bx* which has - d.f. o a (1-α)100% Cofidece Iterval for α+βx* is: a+bx* ± t α/ s a+bx* α + βx*+ e is the value of y at x x* (1-α)100% Cofidece Iterval for α+βx*+e is: a + bx* ± tα / s e + s a+ bx* Example: 1

Example: Te father-so pairs of mature me were selected at radom ad their heights recorded. Let x refer to the father s height ad y to the so s height (both i iches) Data: 10 Pair 1 3 4 5 6 7 8 9 10 x 68 69 69 67 70 71 70 66 68 65 y 69 70 7 68 7 7 69 67 66 64 x683, y689, x 46,681, y 47,539, xy47,098 13