Central Limit Theorem Confidence Intervals Worked example #6. July 24, 2017

Similar documents
Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Chapter 8 - Statistical intervals for a single sample

Statistical Inference

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

The independent-means t-test:

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?

Confidence Intervals. - simply, an interval for which we have a certain confidence.

Two-sample inference: Continuous data

The Chi-Square Distributions

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t

The Chi-Square Distributions

Measurement And Uncertainty

Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

Mathematical Notation Math Introduction to Applied Statistics

5 Basic Steps in Any Hypothesis Test

Business Statistics. Lecture 5: Confidence Intervals

Sampling Distributions: Central Limit Theorem

Review. One-way ANOVA, I. What s coming up. Multiple comparisons

Statistics Introductory Correlation

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

Ch. 7. One sample hypothesis tests for µ and σ

Statistics and Quantitative Analysis U4320. Segment 5: Sampling and inference Prof. Sharyn O Halloran

Student s t-distribution. The t-distribution, t-tests, & Measures of Effect Size

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Sampling distributions:

Data analysis and Geostatistics - lecture VI

Latent Trait Reliability

QUIZ 4 (CHAPTER 7) - SOLUTIONS MATH 119 SPRING 2013 KUNIYUKI 105 POINTS TOTAL, BUT 100 POINTS = 100%

Today - SPSS and standard error - End of Midterm 1 exam material - T-scores

Chapter 23. Inference About Means

Chapter 5 Least Squares Regression

= X+(z.95)SE(X) X-(z.95)SE(X)

Statistical Analysis How do we know if it works? Group workbook: Cartoon from XKCD.com. Subscribe!

Chapter 5 Confidence Intervals

Uncertainty: A Reading Guide and Self-Paced Tutorial

Question. z-scores. What We Will Cover in This Section. On which of the following tests did Pat do best compared to the other students?

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

Math 124: Modules Overall Goal. Point Estimations. Interval Estimation. Math 124: Modules Overall Goal.

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

7.1 Basic Properties of Confidence Intervals

Multiple Regression Analysis

Probability and Samples. Sampling. Point Estimates

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Using SPSS for One Way Analysis of Variance

Typing Equations in MS Word 2010

Design of Experiments. Factorial experiments require a lot of resources

Hypothesis testing: Steps

Confidence Intervals for the Sample Mean

An inferential procedure to use sample data to understand a population Procedures

Business Statistics. Lecture 10: Course Review

MS&E 226: Small Data

Unit 9 ONE Sample Inference SOLUTIONS

MALLOY PSYCH 3000 MEAN & VARIANCE PAGE 1 STATISTICS MEASURES OF CENTRAL TENDENCY. In an experiment, these are applied to the dependent variable (DV)

Chapter 16 One-way Analysis of Variance

Statistics: revision

Last two weeks: Sample, population and sampling distributions finished with estimation & confidence intervals

Two-Sample Inferential Statistics

Harvard University. Rigorous Research in Engineering Education

Hypothesis testing: Steps

Confidence Intervals with σ unknown

Two-sample inference: Continuous data

Bio 183 Statistics in Research. B. Cleaning up your data: getting rid of problems

Lecture 2. Estimating Single Population Parameters 8-1

Statistical Intervals (One sample) (Chs )

Confidence Intervals for Normal Data Spring 2018

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals

Chapter Eight: Assessment of Relationships 1/42

Confidence intervals

The t-statistic. Student s t Test

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Psychology 282 Lecture #4 Outline Inferences in SLR

Introduction to Statistical Data Analysis Lecture 5: Confidence Intervals

Population Variance. Concepts from previous lectures. HUMBEHV 3HB3 one-sample t-tests. Week 8

Estimating a Population Mean

Lab #12: Exam 3 Review Key

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

Z score indicates how far a raw score deviates from the sample mean in SD units. score Mean % Lower Bound

Section B. The Theoretical Sampling Distribution of the Sample Mean and Its Estimate Based on a Single Sample

Institute of Actuaries of India

Identify the scale of measurement most appropriate for each of the following variables. (Use A = nominal, B = ordinal, C = interval, D = ratio.

STAT 285: Fall Semester Final Examination Solutions

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

We need to define some concepts that are used in experiments.

STAT 201 Assignment 6

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

2011 Pearson Education, Inc

Chapter 8 Handout: Interval Estimates and Hypothesis Testing

Descriptive Statistics

+ Specify 1 tail / 2 tail

One-factor analysis of variance (ANOVA)

Sociology 6Z03 Review II

Treatment of Error in Experimental Measurements

Advanced Experimental Design

Announcements. Unit 7: Multiple linear regression Lecture 3: Confidence and prediction intervals + Transformations. Uncertainty of predictions

Topic 2 Measurement and Calculations in Chemistry

Chapter Seven: Multi-Sample Methods 1/52

The Central Limit Theorem

Transcription:

Central Limit Theorem Confidence Intervals Worked example #6 July 24, 2017

10 8 Raw scores 6 4 Mean=71.4% 2 0 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90+ Scaling is to add 3.6% to bring mean to 75% Scaled scores 14 12 10 8 6 4 2 0 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90+ Mean = 75%

1369 1944 4327 5774 4561 4161 4654 4555 5888 2392 7640 8719 3863 1107 9828 3516 9171 957 4147 5134 5892 7931 2408 6911 97 6708 0 7947 9820 3896 6035 2205 3524 5969 9863 9495 A B C D F 0.4 0.5 0.6 0.7 0.8 0.9 1 Mean=71.4% (add 3.6%) Scaled Mean = 75%

Let's return to the Normal distribution f(x)

The area under the curve between two x values represent how much of the data is in that range. A B Area under curve between A and B Integral impossible, but numerical methods were used to create tables of areas under standard normal from A to B.

Values in the table are a values. Z a is the value Z such that there is a area to the left. Z 1-a is the value Z such that there is a area to the right. Z 1-a/2 is the value Z such that there is a/2 area to the right. Ex: Z 1-0.05/2 = Z 0.975 = 1.96 Beware how different texts and tables use a, 1-a, a/2 and 1-a/2

Why do we care about areas under the Normal distribution? (1) Many populations exhibit a normal distribution Calculating areas allows us to predict how much of the population data is in a region if we have a sample mean and variance. - Lab #8

Recall: we transformed any normal distribution into the standard with 2 steps. (1) subtract mean (2) divide by standard deviation Consider individual data points Z-score of x = Z = (x-m)/s The Z-score allows comparison or standardization of data sets with different means and standard deviations.

Example, IQ: How to compare IQ test results from 2008 with 1996? Get test scores Compute Z-scores Compute an IQ score: IQ = 100 + 15Z Mean IQ = 100, Standard deviation = 15 ~ 66% population 85-115 ~ 13% population 115-130 ~ 2.5% population >130

Why do we care about areas under the Normal distribution? (2) Central Limit Theorem For large sample size the distribution of sample means will be normal, no matter what the actual population distribution is. - Lab #8 Distribution of data Distribution of means If we study means we can use normal distribution

Central limit theorem For large sample size the distribution of sample means will be normal, no matter what the actual population distribution is. Sample means ~ N(m, s 2 /n) Notice: s of the sample means is s/n 1/2 Note: the variance of the distribution of sample means depends on the size of the sample. Note: The ranges around the mean are confidence intervals, we are confident the true population mean lies within a region around our sample mean. Bigger region means more confidence but less utility

Central limit theorem For large sample size the distribution of sample means will be normal, no matter what the actual population distribution is. Distribution of the sample means Pop. mean

Central limit theorem The variance of the distribution of sample means depends on the size of the sample and population variance. which means that Standard error of the sample mean Distribution of samples with many values Distribution of samples with few values means

Central limit theorem The range around the sample mean is a confidence interval. We are x% confident the true population mean lies within a region around our sample mean that includes x% of the distribution. Distribution of sample means Distribution of sample means Very confident pop. mean is somewhere in this range Less confident pop. mean is somewhere in this range Tradeoff between certainty and utility Standard deviation: measures spread of sample or population data values Standard error: measures spread of potential of the population mean

Central limit theorem We are x% confident the true population mean lies within a region around our sample mean that includes x% of the distribution. Distribution is a normal distribution and we know how to calculate what % of area lies within a range defined by Z scores. Using the Normal distribution allows calculation of confidence intervals and quantitative statements about population mean from sample data. Since we usually don't know "s", we have to estimate it from "s". This means we can't just use Z scores we have to use t distribution (which includes uncertainty in our estimation of s via s). If we knew s, we could use this Z distribution Since s unknown, we have to use this t distribution

Central limit theorem We typically don't know m or s (if we did we wouldn't need to get a confidence interval because we would know m) so we use X to estimate m and s to estimate s. Sample means ~ N(X, s 2 /n) (if s is known, m unknown) Region Degree of confidence that pop. mean is in this region x ± Z 1-0.32/2 (s 2 /n) 1/2 68% x ± Z 1-0.05/2 (s 2 /n) 1/2 95% x ± Z 1-0.01/2 (s 2 /n) 1/2 99% Z 1-0.32/2 =1 Z 1-0.05/2 =1.96 Z 1-0.01/2 =2.575 Note: We use values from a Z distribution since s is known.

Central limit theorem We typically don't know m or s (if we did we wouldn't need to get a confidence interval because we would know m) so we use X to estimate m and s to estimate s. Sample means ~ t(x, s 2 /n) (if s is unknown, m unknown) Region Degree of confidence that pop. mean is in this region x ± t 1-0.32/2,df (s 2 /n) 1/2 68% x ± t 1-0.05/2,df (s 2 /n) 1/2 95% x ± t 1-0.01/2,df (s 2 /n) 1/2 99% t 1-0.32/2,df =varies t 1-0.05/2,df =varies t 1-0.01/2,df =varies Note: We have to use values from a t distribution since s is unknown. The "t distribution" is a bit wider than the Z to include our uncertainty in estimating s with s

Central limit theorem m known s known No sampling needed m unknown s known use Z distribution m unknown s unknown use t distribution Note: - As the sample size increases the t distribution becomes the Z distribution. Some naughty people forget the t distribution even exists... Caution: - Don't lose sight of the goal, to describe a region within which we are confident the population mean lies.

Central limit theorem The t distribution includes the uncertainty in estimate of s and is wider. Using the "t distribution" also requires us to specify the degrees of freedom (df) of the data, in this case df=n-1 t a,df you may also see t a/2,df & t 1-a/2,df in texts and tables For our t table the a refers to the area to the right. (our Z table shows area to the left) Tables often require interpolation if sample size not listed As df increases t distribution becomes the Z distribution

Central limit theorem Example: (1) s known = 10 Use Z distribution sample size = 20 df=na -1.96SE +1.96SE 95% confident pop. mean is somewhere in this range 95% CI, confidence interval (2) s unknown Use t distribution s = 10 sample size = 20 df=20-1=19-2.093se +2.093SE 95% CI (3) s unknown Use t distribution s = 10 sample size = 101 df=101-1=100-1.984se +1.984SE 95% CI

HANDOUT #6 Consider a medication that may increase the clotting time of patients taking it. We need reliable data on the usual clotting time (CT) of individuals not taking the medication in order to determine whether the medication is effective. We are unable to regularly measure the CT of every individual in the population of people not taking the medication so we will estimate the population parameter, mean clotting time (CT), with a sample from the general population. For the first set of questions we have access to detailed physiological data from the NIH and we know that the population standard deviation of CT values in humans is 3.2 seconds Sample data: 18, 20, 22, 23, 26, 17, 14, 22, 18, 21, 20, 19. What is the region in which there is a 95% chance that the true population mean CT lies? What is the region in which there is a 99% chance that the true population mean CT lies? If we have a larger sample: 18, 20, 22, 23, 26, 17, 14, 22, 18, 21, 20, 19, 18, 20, 22, 23, 26, 17, 14, 22, 18, 21, 20, 19, 18, 20, 22, 23, 26, 17, 14, 22, 18, 21, 20, 19. What is the region in which there is a 95% chance that the true population mean CT lies?

HANDOUT #6 note: we are interested in a/2 area to the right, but our Z table shows 1-a/2 area to the left. Consider a medication that may increase the clotting time of patients taking it. We need reliable data on the usual clotting time (CT) of individuals not taking the medication in order to determine whether the medication is effective. We are unable to regularly measure the CT of every individual in the population of people not taking the medication so we will estimate the population parameter, mean clotting time (CT), with a sample from the general population. For the first set of questions we have access to detailed physiological data from the NIH and we know that the population standard deviation of CT values in humans is 3.2 seconds Sample data: 18, 20, 22, 23, 26, 17, 14, 22, 18, 21, 20, 19. What is the region in which there is a 95% chance that the true population mean CT lies? Z 1-a/2 = Z 1-0.05/2 = Z 0.975 = 1.96 from our Z table What is the region in which there is a 99% chance that the true population mean CT lies? If we have a larger sample: 18, 20, 22, 23, 26, 17, 14, 22, 18, 21, 20, 19, 18, 20, 22, 23, 26, 17, 14, 22, 18, 21, 20, 19, 18, 20, 22, 23, 26, 17, 14, 22, 18, 21, 20, 19. What is the region in which there is a 95% chance that the true population mean CT lies? sample mean=20, pop s = 3.2 X ± 1.96(s/n 1/2 ) 20 ± 1.96(3.2/12 1/2 ) 20 ± 1.81, {18.19,21.81} Z 1-a/2 = Z 1-0.01/2 = Z 0.995 = 2.575 from our Z table X ± 2.575(s/n 1/2 ) 20 ± 2.575(3.2/12 1/2 ) 20 ± 2.38, {17.62,22.38} Z 1-a/2 = Z 1-0.05/2 = Z 0.975 = 1.96 from our Z table X ± 1.96(s/n 1/2 ) 20 ± 1.96(3.2/36 1/2 ) 20 ± 1.05, {18.95,21.05} note: with a larger sample size the confidence interval for the same degree of confidence is narrower because the SE is smaller.

HANDOUT #6 For the second set of questions we consider the more realistic situation in which we don't know the population standard deviation and have to estimate it from the sample data Sample data: 18, 20, 22, 23, 26, 17, 14, 22, 18, 21, 20, 19. What is the region in which there is a 95% chance that the true population mean CT lies? What is the region in which there is a 99% chance that the true population mean CT lies? If we have a larger sample: 18, 20, 22, 23, 26, 17, 14, 22, 18, 21, 20, 19, 18, 20, 22, 23, 26, 17, 14, 22, 18, 21, 20, 19, 18, 20, 22, 23, 26, 17, 14, 22, 18, 21, 20, 19. What is the region in which there is a 95% chance that the true population mean CT lies?

HANDOUT #6 note: instead of our table showing 1-a to the left like the Z table does, the t table shows values for a area to the right. For the second set of questions we consider the more realistic situation in which we don't know the population standard deviation and have to estimate it from the sample data Sample data: 18, 20, 22, 23, 26, 17, 14, 22, 18, 21, 20, 19. sample mean=20, sample s = 3.133 What is the region in which there is a 95% chance that the true population mean CT lies? t a/2,df = t 0.05/2,11 = t 0.025,11 = 2.201 from our t table X ± 2.201(s/n 1/2 ) 20 ± 2.201(3.133/12 1/2 ) 20 ± 1.99, {18.01,21.99} What is the region in which there is a 99% chance that the true population mean CT lies? t a/2 = t 0.01/2,11 = t 0.005,11 = 3.106 from our t table X ± 3.106(s/n 1/2 ) 20 ± 3.106(3.133/12 1/2 ) 20 ± 2.81, {17.19,22.81} If we have a larger sample: 18, 20, 22, 23, 26, 17, 14, 22, 18, 21, 20, 19, 18, 20, 22, 23, 26, 17, 14, 22, 18, 21, 20, 19, 18, 20, 22, 23, 26, 17, 14, 22, 18, 21, 20, 19. What is the region in which there is a 95% chance that the true population mean CT lies? note: our t table does not have an entry for df=35 so we round down and use the df=30 value from our table. t a/2,df = t 0.05/2,35 = t 0.025,30 = 2.042 from our t table X ± 2.042(s/n 1/2 ) 20 ± 2.042(3.04/36 1/2 ) 20 ± 1.04, {18.96,21.04} note: the SD is also smaller for the larger sample because of the relatively larger denominator; as n-1 increases it approaches n.