Statistical Foundations:

Similar documents
Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

Statistics Part IV Confidence Limits and Hypothesis Testing. Joe Nahas University of Notre Dame

Chapter 23. Inference About Means

Two-Sample Inferential Statistics

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Regression With a Categorical Independent Variable: Mean Comparisons

Chapter 7 Comparison of two independent samples

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

HYPOTHESIS TESTING. Hypothesis Testing

The t-statistic. Student s t Test

Ch. 7. One sample hypothesis tests for µ and σ

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

Profile Analysis Multivariate Regression

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Distribution-Free Procedures (Devore Chapter Fifteen)

Statistical Distribution Assumptions of General Linear Models

Regression With a Categorical Independent Variable

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

Chapter 24. Comparing Means

Chapter 8: Confidence Intervals

Business Statistics 41000: Homework # 5

Inferential statistics

Variance Estimates and the F Ratio. ERSH 8310 Lecture 3 September 2, 2009

Harvard University. Rigorous Research in Engineering Education

Inferences for Regression

The Student s t Distribution

2008 Winton. Statistical Testing of RNGs

Regression With a Categorical Independent Variable

Confidence intervals

Can you tell the relationship between students SAT scores and their college grades?

Business Statistics. Lecture 10: Course Review

The t-test Pivots Summary. Pivots and t-tests. Patrick Breheny. October 15. Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/18

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

Session 8: Statistical Analysis of Measurements

PSY 216. Assignment 9 Answers. Under what circumstances is a t statistic used instead of a z-score for a hypothesis test

AMS 7 Correlation and Regression Lecture 8

Psychology 282 Lecture #4 Outline Inferences in SLR

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

Regression With a Categorical Independent Variable

10.1 Simple Linear Regression

Confidence Intervals with σ unknown

8.1-4 Test of Hypotheses Based on a Single Sample

Your schedule of coming weeks. One-way ANOVA, II. Review from last time. Review from last time /22/2004. Create ANOVA table

Student s t-distribution. The t-distribution, t-tests, & Measures of Effect Size

Lecture 14: ANOVA and the F-test

Sampling Distributions: Central Limit Theorem

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

CBA4 is live in practice mode this week exam mode from Saturday!

Chapter 10: Chi-Square and F Distributions

The Chi-Square Distributions

Lecture 10: F -Tests, ANOVA and R 2

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

2 and F Distributions. Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006

Mathematical Notation Math Introduction to Applied Statistics

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Lecture 10 Multiple Linear Regression

Statistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts

INTRODUCTION TO ANALYSIS OF VARIANCE

Confidence Intervals for the Sample Mean

Testing a Claim about the Difference in 2 Population Means Independent Samples. (there is no difference in Population Means µ 1 µ 2 = 0) against

The simple linear regression model discussed in Chapter 13 was written as

Review of Statistics 101

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras. Lecture 11 t- Tests

Introduction to Business Statistics QM 220 Chapter 12

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

determine whether or not this relationship is.

Lecture 3: Inference in SLR

One-factor analysis of variance (ANOVA)

MBA 605, Business Analytics Donald D. Conant, Ph.D. Master of Business Administration

Chapter 5: HYPOTHESIS TESTING

Hypothesis Testing. Mean (SDM)

Non-parametric (Distribution-free) approaches p188 CN

STAT Chapter 11: Regression

ANOVA TESTING 4STEPS. 1. State the hypothesis. : H 0 : µ 1 =

Ch 2: Simple Linear Regression

Inferences about a Mean Vector

Normal (Gaussian) distribution The normal distribution is often relevant because of the Central Limit Theorem (CLT):

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Inference for Distributions Inference for the Mean of a Population. Section 7.1

Hypothesis Testing. We normally talk about two types of hypothesis: the null hypothesis and the research or alternative hypothesis.

LECTURE 5 HYPOTHESIS TESTING

Statistics for IT Managers

Sleep data, two drugs Ch13.xls

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Confidence intervals CE 311S

Correlation and Linear Regression

Relating Graph to Matlab

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

An inferential procedure to use sample data to understand a population Procedures

z-test, t-test Kenneth A. Ribet Math 10A November 28, 2017

Midterm 2. Math 205 Spring 2015 Dr. Lily Yen

Last two weeks: Sample, population and sampling distributions finished with estimation & confidence intervals

Gov Univariate Inference II: Interval Estimation and Testing

Lecture 18: Analysis of variance: ANOVA

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Simple Linear Regression

Ch 13 & 14 - Regression Analysis

Stat 529 (Winter 2011) Experimental Design for the Two-Sample Problem. Motivation: Designing a new silver coins experiment

Population Variance. Concepts from previous lectures. HUMBEHV 3HB3 one-sample t-tests. Week 8

Transcription:

Statistical Foundations: t distributions, t-tests tests Psychology 790 Lecture #12 10/03/2006

Today sclass The t-distribution t ib ti in its full glory. Why we use it for nearly everything. Confidence intervals with t. How to conduct one-sample hypothesis tests with t.

Upcoming Schedule Tuesday 10/3: t and two-sample tests. Thursday 10/5: Midterm review Tuesday 10/10: Midterm (20 item multiple choice). Thursday 10/12: Fall break. Tuesday 10/17: Q and A about the Midterm. Thursday 10/19: No Class Tuesday 10/24: Linear regression with one predictor (Kutner Ch. 1).

The t-distribution

toverview In all of our examples up to this point, we have always known the population variance. In practice, we will never truly know this quantity. In these cases, we will then use the t-distribution as the approximate distribution of many of our sampling distributions. What is convenient is that the t-distribution converges to a normal distribution as N goes to infinity (or gets to practically large sample).

t distribution History The t-distribution was first published in the March 1908 issue of Biometrika. The author was simply Student. Workers in the early 20 th century were owned by their company therefore pseudonymns were used to publish under. This is similar to bloggers in our day. Note: Wikipedia has a nice section about the t-distribution from which some of this information is based.

More History Eventually, the Student was revealed to be William Gossett. In 1908, Gossett worked for the Guinness brewery in Dublin, Ireland. He needed statistical methods to develop cheap quality control methods for the products produced at the brewery. This meant he had small samples of beer to work with.

t Distribution Note the title of Gossett ss paper: The Probable Error of a Mean Imagine a sample of variables X 1, X 2,, X n that are normally distributed (or are iid from N(μ,σ)). The t-distribution is a distribution for the deviation of a sample mean from its population value: T = X S / μ The value T is said to be from a t distribution, with N-1 degrees of freedom. N

t Versus Z The quantity T looks eerily similar il to what we found in previous classes, but notice the key difference: T = X S / μ X μ Z = N σ / N Now we only have an estimate of the population variance. Before, we knew what our population standard deviation was.

What Does Substitution of S for σ Lead To? It may look very simple, but the substitution of S for σ leads to a slightly different distribution for T. This distribution is based on the sampling distribution of S. Recall from previous classes that we said that S 2 has some type of scaled χ 2 distribution. The t-distribution comes from the dividing a normally distributed random variable by a scaled χ 2 variable.

Distributional Parameters Recall that tthe Normal ldistribution ib ti had two parameters that characterized its shape: μ and σ. The t-distribution only has one parameter: The number of degrees of freedom. In our case, that would be N-1.

Shape of T. Because of us making a T score we know that the mean of the t distribution will be zero. The variance of the t distribution is a function of the degrees of ffreedom. As the degrees of freedom approach infinity, the t- distribution converges to a standard normal distribution.

So, What Does This Mean To You? All of these points lead to one big thing to take from this lecture: If you do not know the population variance, you should be using the t-distribution for: Confidence intervals around the mean. One-sample tests for the hypothesis of the mean. Two-sample tests comparing group means. We will talk about each of these in the remainder of this lecture.

An Old Example

Beating a Dead Horse Recall from last week we talked a bit about the Wechsler Adult Intelligence Scale. In the general population, the test has an average of 100 and a standard deviation of 15. Lets go and try a hypothesis test to see if KU students Lets go and try a hypothesis test to see if KU students have a similar mean WAIS score. Before, we will sampled 100 KU students at random and administered the WAIS. Now, lets imagine that because the WAIS costs $25 plus $10 in shipping on ebay, we can only afford to sample 20 KU students.

Buy the WAIS on ebay!!

Confidence Intervals

Confidence Intervals from Samples Now, we have 20 students t to work with Lets construct a sample confidence interval using the estimated sample standard error of the mean and the t-distribution. For fun, lets use a 95% CI. Our sample:

Sample Statistics x s =105.094 =14.697 N = 20

CI Formula The limits of a (1-α)*100% CI for a population mean are given by: x s ± t, N ( N 1) α Here, the notation t (N-1),α refers to the value from a corresponding t distribution with N-1 degrees of freedom, found from the tails where α is the proportion of distribution greater than the tails.

Obtaining t-values Just as with the Z-tests we ran previously, the t-values can be found: From tables in the back of the book (p. 1013). The table for 2Q=0.05 05 and 19 DF gives t 19,0.05 = 2.093 From Excel (using the =tinv function). Typing in =tinv(0.05,19) returns t 19,0.05 = 2.093 Because not all degrees of freedom are listed in the table, I encourage you to use Excel to find the nearly exact t- value. That is, if the table does not have the value for degrees of freedom listed.

Our CI Value is s x ± t( N 1), α N 14.697 105.094 ± 20 105.094 ± 6.878 ( 98.216,111.972) 2.093 This interval means that: P ( 98.216 μ 111.972) = 0. 95

One-Sample Hypothesis Tests

Hypothesis Tests with the T As we did previously, we can run a one-sample hypothesis test to determine if KU students had the same WAIS score as the population. Previously, we knew what the population variance was, so we could use the Z test. Now, lets imagine that we have no idea what the population p variance is, but: Our sample is normally distributed. We substitute our sample variance for the population variance.

Example Setup What is the null hypothesis? H 0 : μ KU = 100 What is the alternative hypothesis? H A : μ KU 100

Distributional Setup The key element in our example is to find out what the assumed distribution of the test statistic under H 0. 0 T = X S / μ N In our case, we will be sampling 20 subjects and creating a t-statistic. For our t-statistic t ti ti we have the assumption that t the sample of scores is from a population that is normally distributed.

Distribution of Test Statistic Under Using R, the plot to the right is a picture of the distribution of the test statistic ( T ) under the null hypothesis. Similar to our Z test, we need to find critical values where we would reject H 0. Null Hypothesis

Step 1: Set the Type I Error Rate Before we collect our sample, we must first set the Type I error rate for our experiment. Recall the Type I error rate (or α) is the maximum probability we will allow for rejecting the null hypothesis when the null hypothesis is true. This sets up the decision rule for our test. From this, we can obtain a critical value to which we can compare our test statistic. What rate do you want to set? Let s to α = 0.05, for tradition s sake.

Decision Rule Using α = 0.05, we can then assign a region of our null distribution where we will reject the null hypothesis. Because we have no idea which direction KU s sample mean will fall, we will split our region into two halfs: An upper tail and a lower tail. We then want to find the following points: α 2 = 0. α 2 = 0. T such that P ( t T ) = 025 T such that P ( t T ) = 025 Find these two points. We use =tinv(0.05,19) and come up with 2.093 and -2.093.

The Test Now, we need to form the test statistic and compare it with the critical value. T 105.094 100 = X μ = = 1.55 S / N 14.697 / 20 Because 1.55 is less than our critical value, we therefore fail to reject H 0. Alternatively, we can find the p-value for the test y, p statistic. Here we can use the Excel function =tdist(1.55,19,2)

Two-Sample Hypothesis Tests

Two Sample Tests We can also use the t-test test to test a hypothesis comparing two populations. Imagine for instance, we wanted to test whether or not KU and K State students have the same level of intelligence (as measured by the WAIS scores). Our hypothesis test would then be: H 0 : μ KU = μ K-State H A : μ KU μ K-State

Our Samples We go take another sample of 20 KU students, and we take a sample of 20 K- State students.

Our Sample Statistics KU K-State x =110.479 x = 65.736 s =14.350 s =13.897 N = 20 N = 20

Our Test Statistic The two-sample t-test t t takes the difference between two means, assuming: Both samples come from normal distributions. Both distributions have the same variance. The test statistic is distributed as t with N 1 + N 2 2 degrees of freedom. T = X 1 2 1 s N + 1 X 2 2 2 s N 2

Our Test X X 2 110.479 65.736 T = 1 = = 10. 02 2 2 2 2 s1 s2 14.350 13.897 + + N N 20 20 1 2 Using =tdist(10.02,38,2), we find that our p-value is 0.00000000000323 so we will reject H 0

Wrapping Up Today we did confidence intervals. Confidence intervals and hypothesis test are mostly interchangeable. Once we know about a sampling distribution, we can form a CI for the true parameter. We will encounter these as we go through linear models.

Next Time A review before the mid-term.