Stat 400, section 5.4 supplement: The Central Limit Theorem

Similar documents
Expectation and Variance of a random variable

Chapter 6 Sampling Distributions

PRACTICE PROBLEMS FOR THE FINAL

Random Variables, Sampling and Estimation

Understanding Samples

Homework 5 Solutions

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Statisticians use the word population to refer the total number of (potential) observations under consideration

Estimation of a population proportion March 23,

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Final Review for MATH 3510

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Chapter 8: Estimating with Confidence

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

Introduction There are two really interesting things to do in statistics.

Statistics 511 Additional Materials

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Frequentist Inference

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

STAT 515 fa 2016 Lec Sampling distribution of the mean, part 2 (central limit theorem)

Confidence Intervals for the Population Proportion p

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Topic 9: Sampling Distributions of Estimators

Central Limit Theorem the Meaning and the Usage

Data Analysis and Statistical Methods Statistics 651

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

PRACTICE PROBLEMS FOR THE FINAL

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

Read through these prior to coming to the test and follow them when you take your test.

STATS 200: Introduction to Statistical Inference. Lecture 1: Course introduction and polling

Chapter 23: Inferences About Means

(7 One- and Two-Sample Estimation Problem )

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Module 1 Fundamentals in statistics

Math 140 Introductory Statistics

Topic 6 Sampling, hypothesis testing, and the central limit theorem

(6) Fundamental Sampling Distribution and Data Discription

Chapter 11: Asking and Answering Questions About the Difference of Two Proportions

Eco411 Lab: Central Limit Theorem, Normal Distribution, and Journey to Girl State

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Economics Spring 2015

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes

STAT 203 Chapter 18 Sampling Distribution Models

5. A formulae page and two tables are provided at the end of Part A of the examination PART A

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Exam II Covers. STA 291 Lecture 19. Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Location CB 234

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

AP Statistics Review Ch. 8

Binomial Distribution

Comparing your lab results with the others by one-way ANOVA

Computing Confidence Intervals for Sample Data

Exam 2 Instructions not multiple versions

CONFIDENCE INTERVALS STUDY GUIDE

Topic 9: Sampling Distributions of Estimators

Stat 421-SP2012 Interval Estimation Section

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

Parameter, Statistic and Random Samples

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Simulation. Two Rule For Inverting A Distribution Function

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1

Topic 10: Introduction to Estimation

6.3 Testing Series With Positive Terms

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.

Confidence Intervals

Chapter 1 (Definitions)

Introduction to Probability and Statistics Twelfth Edition

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008

Topic 9: Sampling Distributions of Estimators

INSTRUCTIONS (A) 1.22 (B) 0.74 (C) 4.93 (D) 1.18 (E) 2.43

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

The standard deviation of the mean

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Infinite Sequences and Series

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date: Confidence Interval Guesswork with Confidence

MATH/STAT 352: Lecture 15

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Lecture 2: Monte Carlo Simulation

Sample Size Determination (Two or More Samples)

Understanding Dissimilarity Among Samples

Class 27. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Lesson 2. Projects and Hand-ins. Hypothesis testing Chaptre 3. { } x=172.0 = 3.67

1 Inferential Methods for Correlation and Regression Analysis

( ) = p and P( i = b) = q.

Confidence Intervals QMET103

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

Transcription:

Stat, sectio 5. supplemet: The Cetral Limit Theorem otes by Tim Pilachowski Table of Cotets 1. Backgroud 1. Theoretical. Practical. The Cetral Limit Theorem 5. Homework Exercises 7 1. Backgroud Gatherig Data. Iformatio is beig collected ad aalyzed all the time by various groups for a vast variety of purposes. For example, maufacturers test products for reliability ad safety, ews orgaizatios wat to kow which cadidate is ahead i a political campaig, advertisers wat to kow whether their wares appeal to potetial customers, ad the govermet eeds to kow the rate of iflatio ad the percet of uemploymet to predict expeditures. The items icluded i the aalysis ad the people actually aswerig a survey are the sample, while the whole group of thigs or people we wat iformatio about is the populatio. How well does a sample reflect the status or opiios of the populatio? The aswer depeds o may variables, some easier to cotrol tha others. Sources of Bias. Oe source of bias is carelessly prepared questios. For example, if a pollster asks, Do you thik our overworked, uderpaid teachers deserve a raise? the aswers received are likely to reflect the slated way i which the questio was asked! Aother dager is a poorly selected sample. If the perso coductig the survey stads outside a shoppig ceter at pm o a weekday, the sample may ot be represetative of the etire populatio because people at work or people who shop elsewhere have o chace of beig selected. Radom Samplig. To elimiate bias due to poor sample selectio, statisticias isist o radom samplig, i which every member of the populatio is equally likely to be icluded i the sample. For a small populatio, simple radom samplig ca be achieved by puttig everyoe s ame o a slip of paper, placig all the slips i a large bi, mixig well, ad selectig the sample. Techically, each slip should be replaced before the ext slip is draw, so the probability does t chage for subsequet draws. That is, the evets should be idepedet, ad the probability ot coditioal. Of course, the slips should be mixed before each draw, as well. I practice, most populatios beig tested are large eough that the coditioal probabilities do t sigificatly alter the results. Represetative Samples. It is importat to ote that sayig the sample is free of bias does ot guaratee that it will perfectly represet the populatio. A radom sample may, by chace variatio, be very differet from the populatio. However, the mathematics uderlyig statistics allows us to specify how much a radom sample is likely to vary ad how ofte it will be extremely o-represetative. Bias, o the other had, affects the results i ukow ways that caot be quatified or specified i ay such maer. page 1.

Example 1 Suppose that a club has members, ad five are to be radomly selected to represet the club at a regioal coferece. Names of female members are i italics. No-italicized ames belog to male members. Abby Greg Mark Sam Zelda Be Hele Norris Tom Art Charlie Iva Oscar Ursula Bob David Jamal Patricia Vic Curt Eric Kevi Queti Walt Da Frak Larry Rob Yvette Eva Suppose you were to select te differet radom samples of five club members from this populatio of. Eve though exactly 1/5 of the populatio is wome, it is very likely your samples would ot all iclude exactly 1/5 female. Some samples might have o wome, two wome, or maybe eve all five picks would be wome. This does't mea the samplig process was biased it simply illustrates samplig variability. Example Similarly, if you flip a coi te times, you would ot be surprised if it came up heads oly four times or six times rather tha exactly 1/, or five, of the times. If, o the other had, you flipped a coi te times ad got te heads, you probably would be surprised. We kow ituitively that it is ulikely we will get such a extremely urepresetative sample. It is ot impossible, just very rare. We may eve decide to test the coi to determie whether it has bee weighted i some way so that heads are more likely to appear. Samplig variability is also affected by the umber of observatios we iclude. If we flip a coi te times ad get oly heads, or %, we may ot be very surprised. If we flip the same coi 1 times ad oly get heads, however, this is more surprisig. Agai, we may begi to woder whether the coi is fair. Over a larger umber of trials, the percet of heads should be closer to the 5% heads we expect.. Theoretical If you were able to costruct every possible sample of a specified size from the same populatio, you would create what statisticias call a samplig distributio. The Cetral Limit Theorem tells us, i short, that a samplig distributio (based o the theoretical possibility of creatig every possible sample of size ) is close to a ormal distributio. Furthermore, the Cetral Limit Theorem tells us that the peak of a samplig distributio is the populatio mea (symbolized by E() or µ ). The techical statistical term for this true mea is the populatio parameter or just parameter. I cotrast, the mea calculated from a give sample (radom variable, calculated value x ) is a sample statistic or just statistic. The mea of the samplig distributio is the mea of the populatio [that is, E ( ) µ ]. Calculated sample meas (statistics) will vary while they are most likely to occur ear the true populatio mea, they may also be far away from the populatio mea, depedig o the particular characteristics of the particular sample selected. Also, we kow some facts about ormal distributios that will be helpful i determiig how likely it is that a sample statistic occurs ear the populatio parameter. Oe characteristic of ormal probability distributios: 6% of the outcomes are withi 1 stadard deviatio of the mea 95% of the outcomes are withi stadard deviatios of the mea 99.7% of the outcomes are withi stadard deviatios of the mea. page.

What does this mea for radom samplig? It tells us that 6% of the time, a radom sample will give us a result a statistic withi 1 stadard deviatio of the true parameter. We would expect that 95% of the time, a radom sample will give a statistic withi stadard deviatios of the populatio parameter, ad 99.7% of the time, a radom sample will give a statistic withi stadard deviatios of the populatio parameter. I a begiig level statistics course, you would be itroduced to cofidece itervals ad hypothesis tests, each of which makes use of the percets listed above.. Practical Costructig every possible sample of a specified size from the populatio is a theoretical costruct. I practical terms, it s ever goig to happe. Time ad cost would prevet researchers from eve tryig. Likewise, a ifiite umber of trials is a impossibility. However, the Cetral Limit Theorem also helps us i this regard. A sample of size 1 is ot likely to be represetative of the etire populatio. Neither is a sample of size. It makes sese ituitively that a larger sample size would be more likely to be represetative of the populatio. (See Example above). The Cetral Limit Theorem ot oly verifies this ituitio, but also tells us that, as sample size icreases, the shape of the samplig distributio grows closer to the shape of a ormal distributio. The followig series of graphics illustrate this. They were draw by a appelet foud o the website http://www.chem.uoa.gr/applets/appletcetrallimit/appl_cetrallimit.html. I each case, the graph o the left illustrates a samplig distributio for a set of samples of size 1, ad gives a rough idea of the shape of the populatio distributio. For the middle graph the sample size has bee icreased to 1. I the graph o the right, the sample size is. Notice that, o matter what the shape of the populatio distributio, as the sample size icreases, the shape of the samplig distributio becomes very much like the shape of a ormal distributio. Close eough, i fact, that we ca use the ormal distributio table to determie probabilities. page.

. The Cetral Limit Theorem Give a populatio with mea µ ad stadard deviatio : 1) As the sample size icreases, or as the umber of trials approaches ifiite, the shape of a samplig distributio becomes icreasigly like a ormal distributio. ) The mea of samplig distributio is the populatio mea, E ( ) µ. ) The stadard deviatio of samplig distributio, called the stadard error, is. For statistics, a sample size of is usually large eough to use the ormal distributio probability table for hypothesis tests ad cofidece itervals. For the homework exercises below, you ll use the ormal distributio table to fid various probabilities ivolvig sample meas (radom variable ). Example The populatio of fish i a particular pod is kow to have a mea legth µ 15 cm with stadard deviatio 5. a) You catch oe fish from the pod. What is its expected legth? The legths of the fish i the pod exhibit some variability we kow this because the stadard deviatio does ot equal. So, while we caot say with absolute certaity that the fish we catch will be ay specific legth, our best (educated) guess is that we expect the fish to be 15 cm i legth. (This is why the mea is also called expected value.) b) You catch oe fish from the pod. What is the probability that its legth is less tha 1 cm? We caot aswer this questio because we do ot kow the shape of the probability distributio. Specifically, sice it is ot explicitly stated that the distributio is ormal, we caot use the ormal distributio table. c) You catch fifty fish from the pod ad measure their legths. What are the expected value of the sample mea [i.e. E ( )] ad stadard error [i.e. ] for the samplig distributio? While the legth of ay sigle fish from the pod is ot likely to be represetative of the whole populatio, a sample of size 5 is large eough for the Cetral Limit Theorem to apply, ad we ca say how likely it is that this sample will be represetative of the populatio. By the Cetral Limit Theorem, for sample size 5, 5 expected value E ( ) µ 15 ad stadard error.771. 5 d) You catch fifty fish from the pod ad measure their legths. What is the probability that the average legth of those fish (i.e. sample mea ) is less tha 1 cm? µ 1 15 Z 1.1, P( < 1) P(Z < 1.1).79 5 5 e) You catch fifty fish from the pod ad measure their legths. What is the probability that the average legth of those fish (i.e. sample mea ) is betwee 1 cm ad 16.5 cm? µ 16.5 15 Z.1, P( < 16.5) P(Z <.1).9 5 5 P(1 < < 16.5) P( 1.1 < Z <.1) P( < Z <.1) P( < Z < 1.1).9.79.97 page.

Example A populatio exhibits the followig probability distributio: 1 5 P( x)..... a) What are the expected value ad stadard deviatio for a sigle radomly-chose subject from this populatio? E µ 1. +. +. +. + 5. + + 6 + + 1 Var ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( 1 ) (.) + ( ) (.) + ( ) (.) + ( ) (.) + ( 5 ) (.) 1 1.9666 b) What is the probability that a sigle radomly-chose subject from this populatio will exhibit a value of at most? I cotrast to Example, we do kow the shape of this probability distributio, ad ca calculate a aswer directly from the probability distributio. P P 1 + P + P. +. +.. ( ) ( ) ( ) ( ) c) You select a sample of 5 from this populatio. (sample size 5). What are the expected value ad stadard error for the samplig distributio? By the Cetral Limit Theorem, for 5, 1 expected value E ( ) µ ad stadard error.11 5 d) You select a sample of 5 from this populatio. (sample size 5). What is the probability that the sample mea is at most? µ Z.69, P( ) P(Z.69).996 1 5 e) You select a sample of 5 from this populatio. (sample size 5). What is the probability that the sample mea is at least? This is the complemet of the aswer to part d). P( ) P(Z.69) 1.996.6 Note the differece betwee the questios asked i parts a) ad b) as opposed to the questios asked i parts c), d) ad e). Parts a) ad b) asked about oe sigle subject, radomly chose from withi the populatio. The probabilities give i the probability distributio applied. Parts c), d) ad e) asked about 5 subjects, radomly chose from withi the populatio. The sample thus created ca be viewed as oe of the may possible samples of size 5 that could have bee created from the populatio. That is, the sample we eded up with is part of the samplig distributio, ad we were able to make use of the Cetral Limit Theorem. The calculatios i parts a) ad b) ivolved radom variable ad P( x), that is, they focused o oe specific value. The calculatios i parts c), d) ad e) ivolved radom variable ad stadard error, that is, they focused o the mea of a sample of size. page 5.

Example 1 revisited A club has 6 female members ad male members. Five club members are to be radomly selected to represet the club at a regioal coferece. Defie a radom variable umber of wome i the delegatio. The probability distributio (biomial probability) looks like this: 1 5 P( x).77.96..51.6. a) What is the expected umber of wome i the delegatio? E µ (.77) + 1(.96) + (.) + (.51) + (.6) + 5(.) 1 ( ) b) What are the variace ad stadard deviatio for this probability distributio? Var() ( 1) (.77) + (1 1) (.96) + ( 1) (.) + ( 1) (.51) + ( 1) (.6) + (5 1) (.)...9 c) Why would t we apply the Cetral Limit Theorem to Example 1? The sample size is ot large eough. We eed >. The shape of the samplig distributio is ot close eough to ormal to justify usig the ormal distributio table. Rather, we would eed to recogize this as a hypergeometric distributio ad use the appropriate formula. Example revisited You flip a fair coi. Defie a radom variable for tails ad 1 for heads. a) What is the expected value for a sigle toss (sample size 1)? E µ (.5) + 1(.5).5 5% ( ) b) What are the variace ad stadard deviatio of for a sigle toss? V() (.5) (.5) + (1.5) (.5).5.5.5 c) You flip a coi forty times (sample size ). What are the expected value ad stadard error for the samplig distributio? By the Cetral Limit Theorem, for,.5 expected value E ( ) µ. 5 ad stadard error.791 d) What is the probability that the average value for is less tha.? µ..5 Z 1.6, P( <.) P(Z < 1.6).1 Example 5.5 A cotiuous radom variable has probability desity fuctio ( x ) f x o the iterval [, ]. a) What is the probability that a sigle radomly-chose value of will be greater tha 1.? (You ca thik of this as a sample size 1.) P x 1. ( 1.) x dx.71 > 1. 1. page 6.

b) What are the mea ad stadard deviatio for this probability distributio? µ V x x dx x x dx page 7. 1.5 ( ) x x dx. 75.66 5 x 5 9 c) Give a sample of size, what are the expected value ad stadard error for the samplig distributio? By the Cetral Limit Theorem, for, expected value E ( ) µ 1. 5 ad stadard error.169 16 d) Give a sample of size, what is the probability that the sample mea is greater tha 1.? µ 1. 1.5 Z.19 ( > 1.) 16 1 5 P P(Z >.19) 1 P(Z <.19) 1.957.1 e) There is a % probability that the sample mea will be below what value? From the ormal distributio table, the closest we ca get to. is.7995 P(Z <.). µ x 1.5 Z. x. + 1.5 5. Homework Exercises 16 16 9 ( ) 1.615 1. I 1, the mea score for the three sectios of the SAT was µ 159, with a stadard deviatio 9. (source: College Etrace Examiatio Board, College-Boud Seiors: Total Group Profile (Natioal) Report, 1966-67 through 9-1) a) Oe studet takes the SAT. What is her expected score (o the three parts)? b) 1 studets take the SAT. What is the expected value of their average score (i.e. expected value of the sample mea)? What is the stadard error for the samplig distributio? c) 1 studets take the SAT. What is the probability that their average score is less tha 15? d) 1 studets take the SAT. What is the probability that their average score is betwee 15 ad 16? e) 1 hudred studets take the SAT. What is the probability that their average score is above 16?. A experimet has the followig probability distributio: 1 P()..1.. a) What are the expected value ad stadard deviatio for a sigle radomly-chose value of? b) You radomly select a sample of size 5. What is the expected value for the sample mea? What is the stadard error for the samplig distributio? c) You radomly select a sample of size 5. What is the probability that the sample mea is less tha.? d) You radomly select a sample of size 5. What is the probability that the sample mea is betwee. ad.5?

. You flip a coi 1 times. Defie a radom variable for tails ad 1 for heads. a) What are the expected value ad stadard deviatio of for a sigle toss? b) What are the expected value ad stadard error for the samplig distributio? c) What is the probability that the average (mea) value for is betwee.5 ad.6? d) What is the probability that the average (mea) value for is greater tha.6?. You pay $1 to play a game i which you roll oe stadard six-sided die. You lose your dollar if the die is 1,, or. You get your dollar back if the die is a 5, ad if the die is a 6 you get your dollar back plus $ more (total of $). a) Calculate expected value ad stadard deviatio for a sigle toss. (Be sure to iclude the dollar you pay to play the game.) b) If you play the game 1 times, what are the expected value ad stadard error for the samplig distributio? c) If you play the game 1 times, what is the probability that your average outcome will be positive? (That is, you walk away with more moey tha what you had before the game.) d) If you play the game 1 times, your average wiigs have a 9% probability of beig below what value? 1. x a) Fid the expected value ad stadard deviatio for. b) What is the probability that a sigle radomly-chose value of is less tha 1.7? c) Give a sample size of 9, calculate the expected value ad stadard error for the samplig distributio. d) Give a sample size of 9, what is the probability that the sample mea is less tha 1.7? e) Give a sample size of 9, there is a 5% probability that the sample mea is above what value? 5.* A radom variable has probability desity fuctio f ( x), 1 x e 6.* A certai drug is to be rated either effective or ieffective. Suppose lab results idicate that 75% of the time the drug icreases the lifespa of a patiet by 5 years (effective) ad 5% of the time the drug causes a complicatio which decreases the lifespa of a patiet by 1 year (ieffective). As part of a study you admiister the drug to 1 patiets. a) Fid E() ad for a sigle patiet. b) Fid the expected value ad stadard error for the samplig distributio. c) What is the probability that the lifespas of those i the study will be icreased by a average of.55 years or more? 7.* A hoeybee droe has a expected lifespa which is expoetially distributed with mea 5 days. You collect oe thousad hoeybee droes. a) What is the stadard deviatio for the lifespa of a hoeybee droe? b) For a idividual bee what is the probability that its lifespa will be above 7 days? c) What is the probability that the average lifespa for all 1 hoeybee droes will be above 7 days? d) There is a 1% probability that the average lifespa for all 1 hoeybee droes will be less tha what age? *Exercises borrowed (ad i some cases slightly revised) from Justi Wyss-Gallifet. page.