Single Sample Means SOCY601 Alan Neustadtl
The Central Limit Theorem If we have a population measured by a variable with a mean µ and a standard deviation σ, and if all possible random samples of size n are drawn from this population, regardless of the shape of the distribution of the population, then as n becomes large: the distribution of sample means will be approximately normally distributed with a mean equal to the population parameter, and a Standard error equal to the population standard deviation divided by the square-root of N. = µ and s = σ N
Important Points We repeatedly take random samples and calculate means. Then we use these means as a variable and create a frequency distribution. This distribution represents the mean of every sample that possibly could be selected. It is a sampling distribution. The distribution of sample means will be normally distributed (particularly if n is large), regardless of the shape of the population. The mean of the sampling distribution is equal to the mean of the population. As sample size increases, the standard error (read standard deviation) of the sampling distribution will decrease.
Three Distributions Three Different Types of Distributions Population Sample Sampling Distribution Central Tendency µ = = = = µ N n n Dispersion ( µ ) 2 σ ( ) 2 = N s = n 1 σ σ = n
What Does This Mean? Suppose that we have a population with a mean equal to 100 (µ=100) and a standard deviation equal to 15 (σ=15). Assuming that we take a simple random sample of 400 cases (n=400) from this population, we can immediately calculate the standard error of the sampling distribution using the following formula: σ CL CL 95 95 15 = = 0.750 400 = µ ± ( 1.96)( 0.750) = 98.53 < µ < 101.47
The Effect of Sample Size If the sample size was increased to 1,600, the standard error would be smaller and the confidence interval narrower. For example, the standard error would be equal to: σ CL CL 95 95 15 = = 0.375 1, 600 = µ ± ( 1.96)( 0.375) = 99.27 < µ < 100.74
The Effect of Confidence Size If the sample size is held constant at 1,600, but we used a larger confidence interval, 99% for example, we would see an increase in the range of possible sample means: σ CL CL 95 95 15 = = 0.375 1, 600 = µ ± ( 2.58)( 0.375) = 99.03 < µ < 100.97
Confidence Intervals ± ( )( σ ) = ± ( ) z z σ n σ x _ µ 1. 645 645σ σx µ + 1. 90% Samples x _ µ 1.96σ µ + 1. 96 95% Samples x σ x µ 2. 58σ + 58σ x µ 2. 99% Samples x
Intervals & Level of Confidence Sampling Distribution of the Mean Intervals Extend from Zσ to + Zσ α/2 µ σ x _ 1 - α α/2 = µ Confidence Intervals _ (1 - α) % of Intervals Contain µ. α % Do Not.
Important Points All else being equal: As sample size increases, the standard error decreases. As the standard error decreases, the confidence interval decreases. Conversely, small sample sizes are associated with larger standard errors that in turn are associated with larger confidence intervals. Moving from a smaller to larger confidence limit (e.g. 0.95 to 0.99), the confidence interval increases in size it is more inclusive. Conversely, smaller confidence limits (e.g. 0.95 versus 0.99) are associated with smaller confidence intervals they are more exclusive. The smaller the population standard deviation (s), the smaller the standard error and, in turn, the confidence interval. Conversely, the larger the population standard deviation, the larger the standard error and confidence interval.
Sample Point Estimates and Confidence Intervals Symbolically a point estimate of a mean is given as. We can place a confidence interval around this value. For example, using a 95% confidence interval (α=0.05) we define boundaries approximately two standard errors below and above the point estimate: ± ( 1.96) σ
Sample Point Estimates and Confidence Intervals Similarly, we can construct a 68% confidence interval: ± σ Or a 99% confidence interval: ± ( 2.58) σ
Sample Point Estimates and Confidence Intervals In general, confidence intervals can be constructed for any desired level of confidence, 1-α, using this formula: z ± α σ 2
Summary of Assumptions We Assume that: 1. the sample for estimatingμ is drawn randomly. 2. we have chosen a sample where n is equal to or greater than 50. 3. that we know σ.
Confidence Intervals when the Standard Error is Unknown Typically, we will not know the population parameters. We may be in a position to make assumptions about the mean, but rarely about the standard deviation. We can usually make an estimate of the standard error using the following formula: σ ˆ = s n 1
Confidence Intervals when the Standard Error is Unknown When we use this formula, we have to use the t-distribution, not the z-distribution. In general, they are similar. For example, the general formula for confidence intervals becomes: t ± α σˆ = 2 t s ± α 2 n 1
z- and t-distributions Similarities to z: There are many t-distributions; their shape varies with the sample size and the sample standard deviation. The t-distribution is bell shaped and has a mean of zero. With large sample sizes (n 150) the t-and z-distributions converge. Difference from z: The use of the t-distribution to test hypotheses assumes that the sample was drawn from a normally distributed population. The use of t is generally robust against the violation of this assumption. A t-distribution for a given sample size has a larger variance than a similar z-distribution. Therefore, the standard error of a t-distribution is larger than that of a similar z-distribution.
Student s t Distribution Standard Normal Bell-Shaped Symmetric Fatter Tails t (df = 13) t (df = 5) 0 Z t
An Example Using t to Construct Confidence Intervals Research in the 1970s indicated that there was an increase in city size since World War I. But with a reversal in this trend by 1970. Using data measuring the percentage change in city populations in 63 American cities, we find that the mean of the difference is -1.26 with a standard deviation of 6.32. That is, the point estimate indicated that between 1960 and 1970 there was a decrease in average city size of 1.26%.
An Example Using t to Construct Confidence Intervals Using an alpha level of 0.05, there are 62 degrees of freedom (n-l) the tabled value of t is approximately equal to 2.00. It is approximate because 62 df is not in the table. However, we can use 60 instead. The 95% confidence interval, then, is equal to: CL t ( )( σ ) 95 = ± 0.025 ˆ 6.32 = 1.26 ± ( 2.00) 63 1 = 1.26 ± 2.00 0.7962 = 1.26 ± 1.592 ( )( ) or: -2.85 < < 0.33
z- and t-tests Besides placing confidence intervals around point estimates of the mean, we can also calculate standard z-tests and t-tests: z µ µ = t = ˆ σ σ
An Example of Hypothesis Testing Using Point Estimates If the difference is not equal to zero, do we reject the null hypothesis? To answer that question we need to know what chance or random error can do what kind of differences is chance likely to produce? The central limit theorem provides a distribution based on chance. This allows us to see how chance operates on means.
An Example of Hypothesis Testing Using Point Estimates We know that the mean score on an intelligence test in the general population is 100 with a standard deviation of 15. The mean based on a sample of size 100 from a program for accelerated students is 108. Clearly, there is a difference between the population and sample means. What could produce this difference? 1. The program is successful or 2. random error, sampling error, or chance The real question we need to answer is how likely is it that chance produced this difference. Typically, we choose to assume #2 and call it the null hypothesis (H0). In other words, it is not likely that the difference between the sample mean and the population mean is equal exactly to zero; there will generally be some difference. The null hypothesis is the assumption that this difference is due to random error.
Hypothesis Testing What could produce differences between observed and expected values? There actually is a difference, or random error, sampling error, or chance. There are five basic steps in hypothesis testing: Assume the null hypothesis of no difference We have to have an idea about the range of outcomes if the null hypothesis is true. We obtain this from an appropriate sampling distribution. We have to decide or set a criterion for enough evidence to be convinced that the null hypothesis is false. This is a significance level called alpha or α. We have to go to the real world and collect data. That is determine some sample statistic. We compare 4 with 3 and reject or fail to reject the null hypothesis. If the value we calculate falls in the critical region or exceeds the critical value associated with α, we must reject the null hypothesis; otherwise we fail to reject it.
Null and Alternative Hypotheses First we posit the null H hypothesis: 0 : µ = 0 Next, we choose one of three different alternative hypotheses, depending on a priori expectations: { 1 2-tailed H : µ 0 1-tailed H: µ > 0 1 H : µ < 0 1
Hypothesis Testing 1. Assume the null hypothesis of no difference 2. We have to have an idea about the range of outcomes if the null hypothesis is true. We obtain this from an appropriate sampling distribution. 3. We have to decide or set a criterion for enough evidence to be convinced that the null hypothesis is false. This is a significance level called alpha or α. H o : no IQ difference between population and sample H 1 :there is a statistically significant difference in IQ between the population and the sample In this problem, we have a large sample size and we know the population standard deviation. We can safely use the z-distribution to answer this question. It is reasonable to assume that students in an accelerated program should have higher average I.Q. scores. Therefore, we choose to use a onetailed test. Furthermore, since implementing a program like this universally would be expensive we wish to minimize the probability of a Type I error. So, we select α=0.01. In this case, z-critical is equal to 2.327.
Hypothesis Testing 4. We have to go to the real world and collect data. That is determine some sample statistic. 5. We compare 4 with 3 and reject or fail to reject the null hypothesis. If the value we calculate falls in the critical region or exceeds the critical value associated with α, we must reject the null hypothesis; otherwise we fail to reject it. µ 108 100 z = = 5.33 ˆ σ 15 100 The calculated z of 5.31 exceeds the z critical of 2.327. We reject the null hypothesis in favor of the alternative hypothesis knowing that the probability that we have made a Type I error is 1%.
Determining How Big A Sample You Need You know that sample size affects the amount of error in parameter estimates ceterus paribus larger samples have less error. This is bound up in the following formula: error t = α 2 ( σ ) ˆ t s n = α 2
Determining How Big A Sample You Need So, knowing a and either knowing the population standard deviation or making an estimate of it, you can solve this formula for n, sample size. Consider the following: n t α = 2 error ( σ ) ˆ 2
An Example We know that the population mean and standard deviation for the Stanford-Binet intelligence test is 100 and 15 respectively. How large a sample do we need to produce a parameter estimate of the mean within three points of the parameter? Since we know the actual parameters, we can use z.
An Example ( 1.96)( 15) 2 n = 100 3 What if we wanted to reduce the margin of error to one point? How big a sample size do we need to draw? ( 1.96)( 15) 2 n = 1 865
Tests Involving Proportions pˆ pˆ pˆ z α ± ( σ ) 2 pˆ = z ( p)( 1 p) α ± = 2 n z ( pˆ)( 1 pˆ) n α ± 2 where: pˆ = n : When is large, ˆ can approximate the value of in the formula for. Note n p p σ p ˆ
An Example In a sample of 1,000 American citizens, 637 respond that they trust the president. Using a 95% confidence interval show the range of the population that trusts the president. CL 95 pˆ z ( pˆ)( 1 pˆ) n α = ± 2 = 0.637 ± 1.96 ( ) = 0.637 ± 0.30 0.607 < pˆ < 0.667 0.637 0.363 ( )( ) 1,000
Tests Involving Proportions z = p p s u 1 ( )( ) u n p p u
An Example In a sample of 40 students taking an examination, 70% earned a score of 80% or greater. The professor claims success if 80% meet or exceed the goal of mastering 80% of the examination material. Evaluate this examination using a 99% confidence interval. z 0.70 0.80 = 1.58 ( 0.80)( 0.20) 40 z critical is equal to 2.575, so we fail to reject the null hypothesis
An Example Using a confidence interval, we get: CL 99 ˆ z ( pˆ)( 1 pˆ) n α = p± 2 = 0.70 ± 2.575 ( ) = 0.70 ± 0.16 0.54< pˆ < 086. 0.8 0.2 ( )( ) 40