Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

Similar documents
Regression, Inference, and Model Building

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

1 Inferential Methods for Correlation and Regression Analysis

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS

Final Examination Solutions 17/6/2010

Mathematical Notation Math Introduction to Applied Statistics

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

Ismor Fischer, 1/11/

Common Large/Small Sample Tests 1/55

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)

Topic 9: Sampling Distributions of Estimators

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

Linear Regression Analysis. Analysis of paired data and using a given value of one variable to predict the value of the other

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Paired Data and Linear Correlation

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

11 Correlation and Regression

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Chapter 13, Part A Analysis of Variance and Experimental Design

M1 for method for S xy. M1 for method for at least one of S xx or S yy. A1 for at least one of S xy, S xx, S yy correct. M1 for structure of r

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Properties and Hypothesis Testing

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

Regression. Correlation vs. regression. The parameters of linear regression. Regression assumes... Random sample. Y = α + β X.

Topic 9: Sampling Distributions of Estimators

Stat 139 Homework 7 Solutions, Fall 2015

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

Linear Regression Models

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Algebra of Least Squares

Chapter 1 (Definitions)

Important Formulas. Expectation: E (X) = Σ [X P(X)] = n p q σ = n p q. P(X) = n! X1! X 2! X 3! X k! p X. Chapter 6 The Normal Distribution.

University of California, Los Angeles Department of Statistics. Simple regression analysis

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

(all terms are scalars).the minimization is clearer in sum notation:

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 3

Simple Linear Regression

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Topic 9: Sampling Distributions of Estimators

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

Describing the Relation between Two Variables

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

(7 One- and Two-Sample Estimation Problem )

Formulas and Tables for Gerstman

Sample Size Determination (Two or More Samples)

Successful HE applicants. Information sheet A Number of applicants. Gender Applicants Accepts Applicants Accepts. Age. Domicile

This is an introductory course in Analysis of Variance and Design of Experiments.

Correlation and Regression

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

TAMS24: Notations and Formulas

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700

University of California, Los Angeles Department of Statistics. Hypothesis testing

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Math 140 Introductory Statistics

Data Analysis and Statistical Methods Statistics 651

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

1036: Probability & Statistics

Correlation Regression

A statistical method to determine sample size to estimate characteristic value of soil parameters

Chapter 6 Sampling Distributions

Expectation and Variance of a random variable

Grant MacEwan University STAT 252 Dr. Karen Buro Formula Sheet

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Chapter 5: Hypothesis testing

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

STAC51: Categorical data Analysis

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

NCSS Statistical Software. Tolerance Intervals

INSTRUCTIONS (A) 1.22 (B) 0.74 (C) 4.93 (D) 1.18 (E) 2.43

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

Chapter 8: Estimating with Confidence

Linear Regression Demystified

STP 226 EXAMPLE EXAM #1

This chapter focuses on two experimental designs that are crucial to comparative studies: (1) independent samples and (2) matched pair samples.

Frequentist Inference

Homework 3 Solutions

Statistical inference: example 1. Inferential Statistics

Statistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date:

MA 575, Linear Models : Homework 3

LESSON 20: HYPOTHESIS TESTING

Section 11.8: Power Series

n but for a small sample of the population, the mean is defined as: n 2. For a lognormal distribution, the median equals the mean.

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Pearson Edexcel Level 3 Advanced Subsidiary and Advanced GCE in Statistics

Chapter 2 The Monte Carlo Method

Transcription:

Worksheet 3 ( 11.5-11.8) Itroductio to Simple Liear Regressio (cotiued) This worksheet is a cotiuatio of Discussio Sheet 3; please complete that discussio sheet first if you have ot already doe so. This worksheet cotais material from Sectios 11.5 through 11.8 of the textbook. Please read those sectios. Cosider two variables: x, the temperature 10's of degrees Fahreheit for a maufacturig process, ad y, a measure of the yield of the process. Suppose that the followig sample of data for x ad s provided: x y -5 1-4 5-3 4-7 -1 10 0 8 1 9 13 3 14 4 13 5 18 A scatter plot of these data with the SLR regressio lie looks like the followig: Y 0 18 16 14 1 10 8 6 4-4.0 -.0 0.0.0 4.0 X Cofidece Iterval for β 1 A 100( 1 α ) % cofidece iterval for β 1 is of the form s b 1 ± t ν, α SS xx where ν, s is the residual mea square error from the ANOVA table, ad SS xs what has already bee computed i fidig b 1. 1. Fid a 95 percet cofidece iterval for the β 1 of the data with which we have bee workig. 1

The F-distributio Hypothesis Test (F-Test) for the Sigificace of a SLR Model The followig is a test at the 100( 1 α ) % sigificace level for whether or ot a SLR model is statistically sigificat, that is, that is does or does ot happe by chace because of the particular sample that is chose: H 0 : The model is ot sigificat. H 1 : The model is sigificat. Test statistic : F MS reg s Rejectio regio : F > F ν,ν d,α where ν df reg umber of predictor variables 1 ad ν d df residual.. Perform the F-test at the 95% sigificace level to see if the SLR model for the data that we have bee usig is statistically sigificat or ot. 3. What does this hypothesis test tell us? Two-sided Hypothesis Test for β 1 A 100( 1 α ) % hypothesis test of whether a parameter β 1 equals a certai umerical value, call it β 10, has the form: H 0 : β 1 β 10 H 1 : β 1 β 10 Test statistic : t b 1 β 10 s SS xx Rejectio Regio : t > t ν, α (where ν ) Coclusio : Not reject H 0 or reject H 0 (accept H 1 as ew workig hyp.) Probability that the coclusio is correct : 1- a 4. Perform a hypothesis test at the 95% cofidece level to see whether or ot β 1 equals 0.

5. What would it mea i terms of the SLR model if β 1 equals 0? What would this tell you about whether or ot x helps to reduce the ucertait predictig y? 6. How ca this be used as a test of whether the SLR model is statistically sigificat or ot? Relatio of F-test for the SLR Model ad the t-test for β 1 0 The hypotheses for the two tests for the SLR model are equivalet. Further F t ad the F or t values for the rejectio regios are similarly related. 7. Verify that the relatio betwee F ad t give i the box above is true usig the F from 7 ad the t from 9. 3

Correlatio of Two Variables Recall that the defiitio of correlatio for a populatio 1s: Cov(x, y) ρ XY σ x σ y For a sample the two variables would have a correlatio defied as: cov(x, y) r xy s x s y x ) y x ) 1 x ) y x ) ( ) 1 ( y ) ( ) ( y ) 1 SS xy SS xx SS yy 8. Calculate the sample correlatio, r xy, for the data we have bee usig. 9. What does this correlatio coefficiet, r XY sigify? 4

Relatio of the Correlatio Coefficiet ad the Coefficiet of Determiatio for SLR Models For SLR models i which r xs the sample correlatio coefficiet of the two variables ivolved ad R is the coefficiet of determiatio for the SLR model the we have ( r xy ) R Note: This is true oly for SLR models. With more tha oe predictor variable (more tha oe X) as i multiple regressio, the coefficiet of determiatio, R, is calculated as before but there is o sigle r XY to be related to it. 10. You have calculated r x #9 above ad you have calculated R i a previous worksheet for the data that we have bee usig. Verify that r xy ( ) R withi roud-off error for these data. Relatio of Slope Coefficiet b 1 ad Correlatio Coefficiet r XY There is a systematic relatioship betwee the slope coefficiet b 1 for a SLR model ad for the sample correlatio coefficiet r xy for the two variables ivolved. It ca be show that: b 1 s y s x r xy where s s the sample stadard deviatio of y ad s s the sample stadard deviatio of x. 11. You have foud b 1 for the data with which we have bee workig ad you foud r xy. Verify that the relatio i the box above holds for these data. Suggested Homework: 11.18abcd, 11.19, 11.5, 11.6, 11.8, 11.35abcd, 11.37, 11.38 Solutios to be Posted: 11.18ac, 11.5, 11.8, 11.35ac, 11.37 5