t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression

Similar documents
MALLOY PSYCH 3000 Hypothesis Testing PAGE 1. HYPOTHESIS TESTING Psychology 3000 Tom Malloy

t test for independent means

determine whether or not this relationship is.

AMS 7 Correlation and Regression Lecture 8

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

MITOCW ocw f99-lec01_300k

MITOCW ocw f99-lec09_300k

Black White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126

appstats27.notebook April 06, 2017

MITOCW ocw f99-lec30_300k

Mathematical Notation Math Introduction to Applied Statistics

PROFESSOR: WELCOME BACK TO THE LAST LECTURE OF THE SEMESTER. PLANNING TO DO TODAY WAS FINISH THE BOOK. FINISH SECTION 6.5

Note: Please use the actual date you accessed this material in your citation.

Can you tell the relationship between students SAT scores and their college grades?

MITOCW ocw f99-lec17_300k

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

MITOCW ocw f99-lec05_300k

Psych 230. Psychological Measurement and Statistics

Final Exam - Solutions

Simple Linear Regression

Biostatistics: Correlations

Chapter 27 Summary Inferences for Regression

MITOCW ocw f99-lec16_300k

Business Statistics. Lecture 9: Simple Regression

Statistics and Quantitative Analysis U4320

The problem of base rates

Physics 509: Non-Parametric Statistics and Correlation Testing

Do students sleep the recommended 8 hours a night on average?

AP Statistics. Chapter 6 Scatterplots, Association, and Correlation

REVIEW 8/2/2017 陈芳华东师大英语系

Linear Regression and Correlation

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

Business Statistics 41000: Homework # 5

LECTURE 15: SIMPLE LINEAR REGRESSION I

Warm-up Using the given data Create a scatterplot Find the regression line

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Chapter 3. Introduction to Linear Correlation and Regression Part 3

Instructor (Brad Osgood)

COMP6053 lecture: Sampling and the central limit theorem. Jason Noble,

MITOCW R11. Double Pendulum System

Experimental Design and Graphical Analysis of Data

Elementary Statistics Triola, Elementary Statistics 11/e Unit 17 The Basics of Hypotheses Testing

11 Correlation and Regression

Advanced Experimental Design

Midterm 2 - Solutions

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Harvard University. Rigorous Research in Engineering Education

Sampling Distributions: Central Limit Theorem

Review. Midterm Exam. Midterm Review. May 6th, 2015 AMS-UCSC. Spring Session 1 (Midterm Review) AMS-5 May 6th, / 24

Lecture (chapter 13): Association between variables measured at the interval-ratio level

MITOCW MIT18_01SCF10Rec_24_300k

Unit 6 - Introduction to linear regression

Math 10 - Compilation of Sample Exam Questions + Answers

Business Statistics. Lecture 10: Correlation and Linear Regression

Big Data Analysis with Apache Spark UC#BERKELEY

But, there is always a certain amount of mystery that hangs around it. People scratch their heads and can't figure

MITOCW watch?v=pqkyqu11eta

Lesson 6: Algebra. Chapter 2, Video 1: "Variables"

COMP6053 lecture: Sampling and the central limit theorem. Markus Brede,

Correlation and regression

Probability and Statistics

Algebra II Notes Quadratic Functions Unit Applying Quadratic Functions. Math Background

WSMA Algebra - Expressions Lesson 14

Chapter 3: Examining Relationships

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

ONE FACTOR COMPLETELY RANDOMIZED ANOVA

Sampling Distributions

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Math 111, Introduction to the Calculus, Fall 2011 Midterm I Practice Exam 1 Solutions

Chapter 12 : Linear Correlation and Linear Regression

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras. Lecture 11 t- Tests

Multiple Regression Theory 2006 Samuel L. Baker

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

MITOCW watch?v=ed_xr1bzuqs

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

Ch. 16: Correlation and Regression

Chapter 8. Linear Regression /71

1 Correlation and Inference from Regression

ABE Math Review Package

Dialog on Simple Derivatives

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Special Theory of Relativity Prof. Shiva Prasad Department of Physics Indian Institute of Technology, Bombay. Lecture - 15 Momentum Energy Four Vector

Violating the normal distribution assumption. So what do you do if the data are not normal and you still need to perform a test?

Introduction to Statistics for the Social Sciences Review for Exam 4 Homework Assignment 27

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Last few slides from last time

The First Derivative Test

Algebra 2 C Midterm Exam Review Topics Exam Date: A2: Wednesday, January 21 st A4: Friday, January 23 rd

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

Inferences for Regression

Lesson 7: Slopes and Functions: Speed and Velocity

MATH 1150 Chapter 2 Notation and Terminology

( )( b + c) = ab + ac, but it can also be ( )( a) = ba + ca. Let s use the distributive property on a couple of

Chapter 11. Correlation and Regression

MITOCW 2. Harmonic Oscillators with Damping

B. Weaver (24-Mar-2005) Multiple Regression Chapter 5: Multiple Regression Y ) (5.1) Deviation score = (Y i

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS.

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Transcription:

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression Recall, back some time ago, we used a descriptive statistic which allowed us to draw the best fit line through a scatter plot. We called this linear regression. In regression every participant is measured on two variables, like height and weight, or GPA in college and ACT score at the end of high school. After you measure two variables of interest to you, you draw a scatterplot which represents each research participant as a point on the graph. Then you can use some rather complicated formulas, especially if you include calculations for the Pearson r, to calculate the regression line. Recall that the regression line formula is the predicted score Y', was equal to a plus bx, (Y' = a + bx) where a is the intercept and b is the slope of the regression line. So t for b, then, is a t test to determine if the slope of the regression line, b, is significantly greater than zero. Example The example which we will use is an example which you may have already worked with on your practice homework for regression. In that example a teacher thought she could predict final exam scores from midterm exam scores. She also thought there would be a postive relationship between midterm and final scores so that the slope of the regression line should be positive. She had ten students who took a midterm and final. She used the regression formulas to predict the final scores from the midterm scores. In other words, she calculated a regression line between those two variables, X and Y. Let's call the Final Scores Y and the Midterm Scores X. 1 of 9 07/12/2000 4:39 PM

Scatterplot Here is how the data looks. This is actually just a screen capture from StatCenter's Analysis Tool which you've used several times by now. This tool has a graphing function and it can draw the scatter plot of appropriate data and then draw the regression line on the graph so that you don't have to do that. All I did was use the data that is already available in the Analysis Tool for the practice regression homework. Then I used Analysis Tool to graph it. Each dot is an individual participant with final exam scores on the vertical axis and their midterm exam scores on the horizontal. The best fit regression line is the red line shown here and it was calculated by the formulas that we've already considered in the lecture on regression. The Question. In this lecture we are asking: Is b, the slope of that regression line, significantly different from zero or not? 2 of 9 07/12/2000 4:39 PM

b is the Slope of a Line First, recall that b is the slope of the regression line. Second recall that if b equals zero then there is no slope. If b = 0 then you would have a perfectly horizontal line and this indicates no relationship between the two variables X and Y. Think about it. If the line were horizontal, then no matter what value of X you put into the equation, you would get the same value of Y as a prediction. So in the midterm and final scores case, if the red line was horizontal (obviously it's not horizontal) and it ran straight across the graph, that would indicate there was no relationship between midterm and final exam scores. Such an outcome would be quite counterintuitive from our knowledge of having taken a lot of exams. This t test then, is for determining if the slope, b, is significantly different from zero. Note: As a side note, t for b is completely redundant with t for r. If you took the exact same data that you have here for midterms and finals and calculated little r, the correlation coefficient, and then did t for r, you would get th same result as if you did the t for b. The calculated t value for the two tests would be identical, except for rounding error. 3 of 9 07/12/2000 4:39 PM

Directional Scientific Hypothesis The scientific hypothesis is that there is a predictable positive relationship between midterm scores and final exam scores. This would be a pretty common sense scientific hypothesis. This particular hypothesis is directional. We are saying there is a positive relationship and that we expect a positive slope. Another way of saying the same thing is that we expect b to be greater than zero. Remember that, in general, b could be less than zero. In that case the line would slope in the opposite direction; it would have high scores on the midterm predict low scores on the final. That would be a negative relationship. In this example though we are expecting a positive relationship between midterm and final exams. That is, we're expecting people with higher scores on the midterm exam will have higher final exam scores as well. Null and Alternative Hypotheses Because the scientific hypothesis is directional H1 leads to a one-tailed test. The skeptic would think that there's no relationship between mid term and final scores. Therefore the skeptic would expect that the slope of the regression line would be zero. So H0 is that we expect b to be equal to zero. The scientist as we have said is expecting a postive slope. The alternative hypothesis, H1, is that we expect a positive slope, the expected value of b is greater than zero. If the sketpic were presented with the scatterplot and regression line we looked at above, the skeptic would say that the scatterplot is completely random, it happened by chance alone, it is like shotgun pellets hitting the barn door. By luck alone there appears to be a positive relationship between midterms scores and final scores. So we do a t test for the significance of b to evaluate the PCH of Chance. 4 of 9 07/12/2000 4:39 PM

Expected Value of t On the bottom of the graphic you can see part of the formula for the t test. It is small and you don't even have to write it out yet, but the main idea I want you to comprehend here is that this t is like all other t's should be zero if H is true. The top of the t for b formula is going to be b times some standard deviation times some square root. It doesn't matter what the actual values will be, because if b equals 0, then the whole top must be equal to 0 (because anything times zero is zero). And so the whole value of t mus be zero because anything divided into zero must be zero. The null hypothesis predicts a value of zero for t. The t for b Formula 5 of 9 07/12/2000 4:39 PM

Here on the above graphics are the data and the entire formula for the t test. Also shown is the output of Analysis Tool from StatCenter. The Analysis Tool outputs are summarized below the data on the second graphic. The main thing is to understand X is the midterm and Y is the final and that we have already done the summary statistics. As you write the t formula and the summary statistics down in your notes, please substitute the summary statistics into the t formula. Just make sure you can do that now because you will have to on homework and exams. Calculated Value of t for b Here's my substitution which you can use to check your work. Again, my recommendation is to substitute first and then look at my numbers. Once you have substituted the correct values into the formula, then it is just a matter of arithmetic. In this case, I skipped all the steps in the arithmetic. The calculated t is equal to 5.6483. Statistical Conclusion Validity 6 of 9 07/12/2000 4:39 PM

We choose alpha equal to.05. The degrees of freedom are equal to N minus 2. Therefore if we had 10 participan minus 2, the degrees of freedom are equal to 8. Is it one or two-tailed test? One-tailed. The t-test table tells us tha for a one-tailed test with 8 degrees of freedom and alpha.05, t critical is 1.860. Sampling Distribution of t The above graphics show our sampling distribution of t. Notice that t is centered at zero, which is what H0 is expecting. If b is zero, t ought to be zero. However due to chance, or some random error in our data, we wouldn't say t has to be exactly zero, but rather it should be near zero. Look at the sampling distribution shown here. If H0 is true, the highest probabilities are for the values of t are very close to zero. The big bulge in the probability curve is right above zero and near zero. Out in the tails of the distribution the values of t are so far from zero that they have very smal probabilities. Using our familiar logic then, we're going to place our critical value on the t number line far from zero and in the direction the scientist predicted. You can see that the value 1.86 is far out on the upper tail of the distribution and there's only a.05 probability of being at 1.86 or above. If H0 is true, the probability of t being 1.86 or larger, is quite small, about a 1 in 20 chance. Continuing with the same logic, we put our calculated value of t = 5.6483 on the number line. The calculated value of t is out in the reject H0 region, so we reject the null hypothesis. 7 of 9 07/12/2000 4:39 PM

The probability that H0 is True Just to repeat (because it's important to understand the rationale underlying this), if H0 is true and t ought to be around zero, and if we've got just chance data, the probability we would get such chance data that would lead to a t this far from zero, is less than 1 in 20, or.05 or 5 %. Alpha is always the probability that we are wrong when we reject H0. Plausibility of Chance Back to the realm of science. Since H0 has been tested in the realm of statistics and shown to be very improbable, (less than 1 chance in 20) then it seems that in the realm of science chance is pretty improbable, in fact chance is improbable enough to make it an implausible competing hypothesis. 8 of 9 07/12/2000 4:39 PM

The 4-Step Process By now you should be familiar with the process that we follow in hypothesis testing. However, I will still round our our discussion of the t for b as a way of reviewing the big concepts involved. In step 1 we assume that the null hypothesis is true. Recall that the null hypothesis for this test is that b = 0 therefore our calculated t will be equal to 0. We also assume that the data we collected is a normal population. In step 2, we construct a random sample by collecting data on N number of participants. This random sampling helps assure that the data will represent the entire population. Step 3 - We define the statistical formula for t. Here I have shown the general formula for t but it is analogous to the t for b formula. In Step 4, we determine the sampling distribution of t. That means we can determine the probability of obtaining the value of t we calculated. Is it a highly likely or highly unlikely value if H0 is assumed to be true. Base on this probability we determine if we can rule out chance as a plausible competing hypothesis. 9 of 9 07/12/2000 4:39 PM