Statistical Foundations: - PDF Free Download

Statistical Foundations: t distributions, t-tests tests Psychology 790 Lecture #12 10/03/2006

Today sclass The t-distribution t ib ti in its full glory. Why we use it for nearly everything. Confidence intervals with t. How to conduct one-sample hypothesis tests with t.

Upcoming Schedule Tuesday 10/3: t and two-sample tests. Thursday 10/5: Midterm review Tuesday 10/10: Midterm (20 item multiple choice). Thursday 10/12: Fall break. Tuesday 10/17: Q and A about the Midterm. Thursday 10/19: No Class Tuesday 10/24: Linear regression with one predictor (Kutner Ch. 1).

The t-distribution

toverview In all of our examples up to this point, we have always known the population variance. In practice, we will never truly know this quantity. In these cases, we will then use the t-distribution as the approximate distribution of many of our sampling distributions. What is convenient is that the t-distribution converges to a normal distribution as N goes to infinity (or gets to practically large sample).

t distribution History The t-distribution was first published in the March 1908 issue of Biometrika. The author was simply Student. Workers in the early 20 th century were owned by their company therefore pseudonymns were used to publish under. This is similar to bloggers in our day. Note: Wikipedia has a nice section about the t-distribution from which some of this information is based.

More History Eventually, the Student was revealed to be William Gossett. In 1908, Gossett worked for the Guinness brewery in Dublin, Ireland. He needed statistical methods to develop cheap quality control methods for the products produced at the brewery. This meant he had small samples of beer to work with.

t Distribution Note the title of Gossett ss paper: The Probable Error of a Mean Imagine a sample of variables X 1, X 2,, X n that are normally distributed (or are iid from N(μ,σ)). The t-distribution is a distribution for the deviation of a sample mean from its population value: T = X S / μ The value T is said to be from a t distribution, with N-1 degrees of freedom. N

t Versus Z The quantity T looks eerily similar il to what we found in previous classes, but notice the key difference: T = X S / μ X μ Z = N σ / N Now we only have an estimate of the population variance. Before, we knew what our population standard deviation was.

What Does Substitution of S for σ Lead To? It may look very simple, but the substitution of S for σ leads to a slightly different distribution for T. This distribution is based on the sampling distribution of S. Recall from previous classes that we said that S 2 has some type of scaled χ 2 distribution. The t-distribution comes from the dividing a normally distributed random variable by a scaled χ 2 variable.

Distributional Parameters Recall that tthe Normal ldistribution ib ti had two parameters that characterized its shape: μ and σ. The t-distribution only has one parameter: The number of degrees of freedom. In our case, that would be N-1.

Shape of T. Because of us making a T score we know that the mean of the t distribution will be zero. The variance of the t distribution is a function of the degrees of ffreedom. As the degrees of freedom approach infinity, the t- distribution converges to a standard normal distribution.

So, What Does This Mean To You? All of these points lead to one big thing to take from this lecture: If you do not know the population variance, you should be using the t-distribution for: Confidence intervals around the mean. One-sample tests for the hypothesis of the mean. Two-sample tests comparing group means. We will talk about each of these in the remainder of this lecture.

An Old Example

Beating a Dead Horse Recall from last week we talked a bit about the Wechsler Adult Intelligence Scale. In the general population, the test has an average of 100 and a standard deviation of 15. Lets go and try a hypothesis test to see if KU students Lets go and try a hypothesis test to see if KU students have a similar mean WAIS score. Before, we will sampled 100 KU students at random and administered the WAIS. Now, lets imagine that because the WAIS costs $25 plus $10 in shipping on ebay, we can only afford to sample 20 KU students.

Buy the WAIS on ebay!!

Confidence Intervals

Confidence Intervals from Samples Now, we have 20 students t to work with Lets construct a sample confidence interval using the estimated sample standard error of the mean and the t-distribution. For fun, lets use a 95% CI. Our sample:

Sample Statistics x s =105.094 =14.697 N = 20

CI Formula The limits of a (1-α)*100% CI for a population mean are given by: x s ± t, N ( N 1) α Here, the notation t (N-1),α refers to the value from a corresponding t distribution with N-1 degrees of freedom, found from the tails where α is the proportion of distribution greater than the tails.

Obtaining t-values Just as with the Z-tests we ran previously, the t-values can be found: From tables in the back of the book (p. 1013). The table for 2Q=0.05 05 and 19 DF gives t 19,0.05 = 2.093 From Excel (using the =tinv function). Typing in =tinv(0.05,19) returns t 19,0.05 = 2.093 Because not all degrees of freedom are listed in the table, I encourage you to use Excel to find the nearly exact t- value. That is, if the table does not have the value for degrees of freedom listed.

Our CI Value is s x ± t( N 1), α N 14.697 105.094 ± 20 105.094 ± 6.878 ( 98.216,111.972) 2.093 This interval means that: P ( 98.216 μ 111.972) = 0. 95

One-Sample Hypothesis Tests

Hypothesis Tests with the T As we did previously, we can run a one-sample hypothesis test to determine if KU students had the same WAIS score as the population. Previously, we knew what the population variance was, so we could use the Z test. Now, lets imagine that we have no idea what the population p variance is, but: Our sample is normally distributed. We substitute our sample variance for the population variance.

Example Setup What is the null hypothesis? H 0 : μ KU = 100 What is the alternative hypothesis? H A : μ KU 100

Distributional Setup The key element in our example is to find out what the assumed distribution of the test statistic under H 0. 0 T = X S / μ N In our case, we will be sampling 20 subjects and creating a t-statistic. For our t-statistic t ti ti we have the assumption that t the sample of scores is from a population that is normally distributed.

Distribution of Test Statistic Under Using R, the plot to the right is a picture of the distribution of the test statistic ( T ) under the null hypothesis. Similar to our Z test, we need to find critical values where we would reject H 0. Null Hypothesis

Step 1: Set the Type I Error Rate Before we collect our sample, we must first set the Type I error rate for our experiment. Recall the Type I error rate (or α) is the maximum probability we will allow for rejecting the null hypothesis when the null hypothesis is true. This sets up the decision rule for our test. From this, we can obtain a critical value to which we can compare our test statistic. What rate do you want to set? Let s to α = 0.05, for tradition s sake.

Decision Rule Using α = 0.05, we can then assign a region of our null distribution where we will reject the null hypothesis. Because we have no idea which direction KU s sample mean will fall, we will split our region into two halfs: An upper tail and a lower tail. We then want to find the following points: α 2 = 0. α 2 = 0. T such that P ( t T ) = 025 T such that P ( t T ) = 025 Find these two points. We use =tinv(0.05,19) and come up with 2.093 and -2.093.

The Test Now, we need to form the test statistic and compare it with the critical value. T 105.094 100 = X μ = = 1.55 S / N 14.697 / 20 Because 1.55 is less than our critical value, we therefore fail to reject H 0. Alternatively, we can find the p-value for the test y, p statistic. Here we can use the Excel function =tdist(1.55,19,2)

Two-Sample Hypothesis Tests

Two Sample Tests We can also use the t-test test to test a hypothesis comparing two populations. Imagine for instance, we wanted to test whether or not KU and K State students have the same level of intelligence (as measured by the WAIS scores). Our hypothesis test would then be: H 0 : μ KU = μ K-State H A : μ KU μ K-State

Our Samples We go take another sample of 20 KU students, and we take a sample of 20 K- State students.

Our Sample Statistics KU K-State x =110.479 x = 65.736 s =14.350 s =13.897 N = 20 N = 20

Our Test Statistic The two-sample t-test t t takes the difference between two means, assuming: Both samples come from normal distributions. Both distributions have the same variance. The test statistic is distributed as t with N 1 + N 2 2 degrees of freedom. T = X 1 2 1 s N + 1 X 2 2 2 s N 2

Our Test X X 2 110.479 65.736 T = 1 = = 10. 02 2 2 2 2 s1 s2 14.350 13.897 + + N N 20 20 1 2 Using =tdist(10.02,38,2), we find that our p-value is 0.00000000000323 so we will reject H 0

Wrapping Up Today we did confidence intervals. Confidence intervals and hypothesis test are mostly interchangeable. Once we know about a sampling distribution, we can form a CI for the true parameter. We will encounter these as we go through linear models.

Next Time A review before the mid-term.