Psychology 282 Lecture #4 Outline Inferences in SLR

Similar documents
Two-Sample Inferential Statistics

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Lecture 3: Inference in SLR

Inference for Regression Simple Linear Regression

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

STAT Chapter 11: Regression

Review of Statistics 101

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

Suppose we obtain a MLR equation as follows:

Correlation Analysis

Regression Models - Introduction

An inferential procedure to use sample data to understand a population Procedures

Chapter 9 Inferences from Two Samples

Psychology 282 Lecture #3 Outline

Inference for the Regression Coefficient

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

Inference for Regression Inference about the Regression Model and Using the Regression Line

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Lecture 10 Multiple Linear Regression

23. Inference for regression

Single Sample Means. SOCY601 Alan Neustadtl

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

MATH Chapter 21 Notes Two Sample Problems

Ch 2: Simple Linear Regression

y response variable x 1, x 2,, x k -- a set of explanatory variables

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

10/4/2013. Hypothesis Testing & z-test. Hypothesis Testing. Hypothesis Testing

HYPOTHESIS TESTING. Hypothesis Testing

Inferences for Correlation

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Business Statistics. Lecture 10: Course Review

Chapter 13. Multiple Regression and Model Building

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Lectures 5 & 6: Hypothesis Testing

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Harvard University. Rigorous Research in Engineering Education

9. Linear Regression and Correlation

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

LECTURE 5 HYPOTHESIS TESTING

Chapter 8. Inferences Based on a Two Samples Confidence Intervals and Tests of Hypothesis

Can you tell the relationship between students SAT scores and their college grades?

Simple Linear Regression: One Qualitative IV

Visual interpretation with normal approximation

Confidence Intervals, Testing and ANOVA Summary

AMS 7 Correlation and Regression Lecture 8

Correlation and Linear Regression

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Given a sample of n observations measured on k IVs and one DV, we obtain the equation

Statistics Introductory Correlation

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

9 Correlation and Regression

Simple Linear Regression

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Stats Review Chapter 14. Mary Stangler Center for Academic Success Revised 8/16

Correlation 1. December 4, HMS, 2017, v1.1

Introduction to the Analysis of Variance (ANOVA)

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

Review of Statistics

Simple Linear Regression: One Quantitative IV

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

TA: Sheng Zhgang (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan (W 1:20) / 346 (Th 12:05) FINAL EXAM

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

Inferences for Regression

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Quality Control Using Inferential Statistics In Weibull Based Reliability Analyses S. F. Duffy 1 and A. Parikh 2

Statistics for Managers using Microsoft Excel 6 th Edition

STAT Chapter 8: Hypothesis Tests

Applied Regression Analysis

MATH 240. Chapter 8 Outlines of Hypothesis Tests

Business Statistics. Lecture 10: Correlation and Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and Simple Linear Regression

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Inferences About the Difference Between Two Means

Ch. 1: Data and Distributions

Lecture 7: Hypothesis Testing and ANOVA

Prerequisite Material

Chapter 8 Heteroskedasticity

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

Regression Analysis II

ME3620. Theory of Engineering Experimentation. Spring Chapter IV. Decision Making for a Single Sample. Chapter IV

Simple Linear Regression

Business Statistics. Lecture 9: Simple Regression

Sociology 6Z03 Review II

Transcription:

Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations and SLR in descriptive manner with no assumptions. Now consider using correlations and SLR in inferential manner: Making inferences about population from which sample was drawn. Hypothesis tests. Confidence intervals. Such analyses and inferences require use of sampling distributions and knowledge of their properties. To define properties of sampling distributions we must make distributional assumptions. Assumptions of Fixed Linear Regression Model: is fixed; levels are selected. Y is randomly sampled at each level of.

2 Regression residuals are normal, and variance of residuals is same at each level of (homogeneity of variance). No assumption made about distribution of or about full distribution of Y. Alternative assumption when is not fixed: Joint distribution of and Y in population is bivariate normal. This means that is normal, Y is normal, and there can be no nonlinear relationship. Later we will study these assumptions further, along with consequences of violating them. For now: Results of inferential statistical methods are fairly robust against violations. Inferences in SLR: Recall correlation and SLR in sample of n observations: Pearson correlation: r Y SLR: Yˆ +

3 Sample statistics of interest: r Y,, Corresponding population parameters: Population correlation coefficient: ρ Y, or just ρ Population regression intercept: Population regression coefficient: Wish to make inferences about population parameters based on sample statistics. Two types of inferences: Confidence intervals, hypothesis tests. Consider general approach: Population parameter: θ Sample statistic (point estimate): θˆ Suppose sampling distribution of θˆ is approximately normal when n is large, with mean θ and standard error. θˆ To construct a confidence interval for θ : Choose α for CI of width 1(1-α)%. (e.g., for 95% CI, α.5)

4 Given degrees of freedom (df), determine value of t that cuts off area of α/2 in each tail of t- distribution. Call this value t α/2. Then CI is defined as: ˆ θ t θ ˆ θ + t α / 2 ˆ θ α / 2 ˆ θ For intervals computed in this fashion, the specified percentage of such intervals will include the value of the parameter, under the stated assumptions. To conduct a hypothesis test about the parameter θ : State the null hypothesis: H : θ θ (often H : θ ) State the alternative hypothesis: Non-directional (implies two-tailed test): H : θ θ Directional (implies one-tailed test): H : θ < θ 1 or H : θ > θ 1

Under the null hypothesis and the stated assumptions, the sampling distribution of θˆ will be approximately normal when n is large, with mean θ, and standard error. θˆ To test null hypothesis, obtain test statistic: ˆ θ θ t ˆ θ Choose α to represent desired Type I error rate and determine critical value of t that cuts off appropriate area in tail(s) of distribution (α for one-tailed test, and α/2 for two-tailed test). Call this value t c. Compare observed value of test statistic, t, to critical value, t c. If observed value is more extreme than critical value, reject H. This means that H is highly unlikely given observed data. If not, then fail to reject H, meaning that H is not highly unlikely given observed data. 5

6 Confidence intervals vs. hypothesis tests: CI implies result of hypothesis test; if CI does not contain θ, then H will be rejected. Width of CI provides information about precision of point estimate, which is not provided by hypothesis test. CIs provide all information that hypothesis tests provide, and more. Confidence intervals and hypothesis tests for specific parameters in correlation and SLR analyses: We apply the general framework just described in order to make inferences about parameters in correlation and regression. Inferences about regression coefficients in SLR: Population parameter: Sample statistic: Standard error: sd sd Y 2 (1 r ) ( n 2) Confidence interval: tα / 2 + tα / 2

7 Hypothesis test: Specify null and alternative hypothesis about Most common and interesting:. H : H : 1 This H implies no effect of Y on, or no linear association. Compute test statistic: t Determine t c, where df (n-2), and make decision about H. Interpret with respect to H : likelihood of whether relationship in population is zero. Inferences about regression intercept in SLR Population parameter: Sample statistic: 2 Y ( n 1) sd Standard error: Y Yˆ + 2 1 n

8 Confidence interval: tα / 2 + tα / 2 Hypothesis test: Specify null and alternative hypothesis about. Most common: H H : 1 : This H implies that in population, when, then predicted value of Y is zero. Compute test statistic: t Determine t c, where df (n-2), and make decision about H. Interpret with respect to H : likelihood of whether intercept in population is zero. Inferences about Pearson correlation coefficient Population parameter: ρ Sample statistic: r The general approach described above must be modified slightly because the sampling distribution

9 of r is not normal, but is skewed, with skewness increasing as ρ increases. To overcome this problem, Fisher developed a transformation of r into Fisher s z which more closely follows a normal distribution. This is known as Fisher s r-to-z transformation. To reduce confusion, this z will be designaged z. z' 1 [ln(1 2 + r) ln(1 r)] Values of z corresponding to any given value of r are routinely provided in tables (see Table in text). The standard error of Fisher s z is given by: z ' 1 ( n 3) Confidence interval for ρ : First establish a CI for Fisher s z according to: z' t α / 2z' ρ z' + t α / 2 z' Then transform confidence limits to correlations by using Table.

1 Hypothesis test for correlation coefficient: Specify null and alternative hypotheses. H : ρ ρ H : ρ ρ 1 Convert observed and hypothesized values of correlation into Fisher s z and conduct test by conventional methods: Convert r into z s, sample value of z. Convert ρ into z, hypothesized value of z. Compute test statistic: t z ' s z z' ' Compare observed t to critical t c and make decision about H. Interpret result in terms of likelihood that population correlation is equal to hypothesized value. Note that the most common null hypothesis about ρ is H : ρ, or that there is no linear relationship between and Y. In this case, the test statistic reduces to ' zs t z'

11 Testing other hypotheses about correlation coefficients It is common to wish to test a variety of other hypotheses about correlations, or to set up other types of confidence intervals. Other common situations include: Making inferences about the difference between two (or more) independent correlations; that is, values of r Y obtained from two (or more) different samples. Making inferences about the difference between two dependent correlations; that is values of r Y and r W obtained from the same sample. Testing the null hypothesis that a set of k measured variables are all uncorrelated with each other in the population. It is possible to perform virtually any test about patterns of correlation coefficients. For a general reference and software, see: Steiger, J.H. (23). Comparing correlations. In A. Maydeu-Olivares (Ed.) Psychometrics. A festschrift to Roderick P. McDonald. Mahwah, NJ: Lawrence Erlbaum Associates, in press. (available at Steiger s website at http://www.statpower.net/)

12 Inferences about predicted scores on Y The SLR approach provides a regression equation of the form Yˆ + that can be used to produce a predicted value of Y given any value of. Any such predicted score is likely to be in error, different from the actual value of Y for the individual. We can assess the likely degree of error by establishing a confidence interval around a predicted Y, and defining our confidence that the given interval will contain the observed Y. Under regression assumptions, for any given, predicted scores are normally distributed. Their standard error can be shown to be Yˆ i Y Yˆ 1 ( i ) + n ( n 1) sd A very important feature of this expression is that the standard error will depend on the deviation of i from the mean of. In other words, predictions of Y become less precise for individuals farther from the mean of. 2 2

13 Given this standard error, a CI for a particular predicted score can then be obtained by Yˆ i t α / 2 Y Yˆ i Yi + ˆ i Y i t α / 2 These confidence intervals can be obtained from most regression software for each individual in the sample. Or they can be computed easily for new individuals for whom predictions are made. ˆ Experience with these CIs shows that they are typically quite wide, indicating that individual predictions are often subject to much error. These CIs tend to become wider as: An individual deviates further from the mean of Sample size becomes smaller The correlation between and Y becomes smaller