STAT 203 Chapter 18 Sampling Distribution Models

Similar documents
(6) Fundamental Sampling Distribution and Data Discription

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Introduction There are two really interesting things to do in statistics.

Chapter 8: Estimating with Confidence

Chapter 18 Summary Sampling Distribution Models

Chapter 6 Sampling Distributions

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

MATH/STAT 352: Lecture 15

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

Statistics 511 Additional Materials

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

Lecture 5. Materials Covered: Chapter 6 Suggested Exercises: 6.7, 6.9, 6.17, 6.20, 6.21, 6.41, 6.49, 6.52, 6.53, 6.62, 6.63.

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date:

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Eco411 Lab: Central Limit Theorem, Normal Distribution, and Journey to Girl State

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

CH19 Confidence Intervals for Proportions. Confidence intervals Construct confidence intervals for population proportions

AP Statistics Review Ch. 8

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

(7 One- and Two-Sample Estimation Problem )

STAT 515 fa 2016 Lec Sampling distribution of the mean, part 2 (central limit theorem)

Understanding Samples

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

Confidence Intervals รศ.ดร. อน นต ผลเพ ม Assoc.Prof. Anan Phonphoem, Ph.D. Intelligent Wireless Network Group (IWING Lab)

CHAPTER 2. Mean This is the usual arithmetic mean or average and is equal to the sum of the measurements divided by number of measurements.

Confidence Intervals for the Population Proportion p

Topic 9: Sampling Distributions of Estimators

Economics Spring 2015

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

1 Inferential Methods for Correlation and Regression Analysis

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

5. A formulae page and two tables are provided at the end of Part A of the examination PART A

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Common Large/Small Sample Tests 1/55

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

Instructor: Judith Canner Spring 2010 CONFIDENCE INTERVALS How do we make inferences about the population parameters?

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date: Confidence Interval Guesswork with Confidence

Chapter 7 Student Lecture Notes 7-1

Expectation and Variance of a random variable

Exam 2 Instructions not multiple versions

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

a.) If random samples of size n=16 are selected, can we say anything about the x~ distribution of sample means?

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

S160 #12. Review of Large Sample Result for Sample Proportion

Parameter, Statistic and Random Samples

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

Announcements. Unit 5: Inference for Categorical Data Lecture 1: Inference for a single proportion

Chapter 2 Descriptive Statistics

S160 #12. Sampling Distribution of the Proportion, Part 2. JC Wang. February 25, 2016

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Understanding Dissimilarity Among Samples

Chapter 20. Comparing Two Proportions. BPS - 5th Ed. Chapter 20 1

Topic 9: Sampling Distributions of Estimators

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Central Limit Theorem the Meaning and the Usage

Stat 421-SP2012 Interval Estimation Section

Final Examination Solutions 17/6/2010

Joint Probability Distributions and Random Samples. Jointly Distributed Random Variables. Chapter { }

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Topic 6 Sampling, hypothesis testing, and the central limit theorem

Estimation of a population proportion March 23,

STATISTICAL INFERENCE

Computing Confidence Intervals for Sample Data

Random Variables, Sampling and Estimation

Stat 225 Lecture Notes Week 7, Chapter 8 and 11

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Topic 9: Sampling Distributions of Estimators

LESSON 20: HYPOTHESIS TESTING

1 Models for Matched Pairs

Stat 200 -Testing Summary Page 1

Class 27. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Confidence Interval for one population mean or one population proportion, continued. 1. Sample size estimation based on the large sample C.I.

Module 1 Fundamentals in statistics

Rule of probability. Let A and B be two events (sets of elementary events). 11. If P (AB) = P (A)P (B), then A and B are independent.

Background Information

Frequentist Inference

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Simulation. Two Rule For Inverting A Distribution Function

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Data Analysis and Statistical Methods Statistics 651

Mathematical Notation Math Introduction to Applied Statistics

Chapter 22: What is a Test of Significance?

Estimating the Population Mean - when a sample average is calculated we can create an interval centered on this average

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Binomial Distribution

Chapter 23: Inferences About Means

Transcription:

STAT 203 Chapter 18 Samplig Distributio Models Populatio vs. sample, parameter vs. statistic Recall that a populatio cotais the etire collectio of idividuals that oe wats to study, ad a sample is a subset of idividuals selected from a populatio. A parameter refers to a umerical summary of a populatio. The couterpart of a sample is a statistic. The value of a parameter is fixed yet ukow i practice. We use statistics to estimate populatio parameters. Due to samplig variability (variatio from sample to sample), a statistic takes o differet values for differet samples. Samplig Distributio of Proportios (Percetages) A local burger store is iterested i fidig out the proportio of vegetaria customers (who most likely purchase veggie burgers). It radomly samples 500 customers over a moth ad asks each whether he/she is vegetaria. Here, the populatio of iterest are all customers visitig the burger store, ad the 500 radomly chose customers make up the sample. The parameter (a umerical summary of a populatio) is the proportio of all customers who are vegetaria, ad the statistic (a umerical summary of a sample) is the sample proportio of customers who are vegetaria. Sample data (of the burger store s sample): Customer Vegetaria? 1 No 2 No 3 No 4 No 5 Yes 499 Yes 500 No Suppose there are 32 vegetaria customers i the sample. The sample proportio of vegetaria customers is 32/500 = 0.064. How reliable is this sample proportio as a 1

estimate of the true proportio of vegetaria customers? The burger store has draw oe radom sample of size 500. Imagie the samplig procedure is repeated may more radom samples of size 500 are draw. For each of these samples, we have a sample proportio whose value will be differet for differet samples. Repeated samples data (of the may more samples): Sample Sample proportio 1 0.064 2 0.048 3 0.050 4 0.070 5 0.068 Thigs to thik about... 1. Where do the sample proportio values ceter at? 2. How spread out are the sample proportio values? 3. What is the shape of the distributio of the sample proportio values? The true proportio of idividuals sharig a certai characteristic i a populatio is the populatio proportio p (which is a parameter), 0 < p < 1. For a sample of idividuals radomly selected from the populatio, the sample proportio (which is a statistic) is give by ˆp = # idividuals sampled who have the characteristic sample size The value of the populatio proportio p is fixed but is usually ukow. The sample proportio ˆp is used to estimate the true populatio proportio. Due to samplig variatio, ˆp varies across samples, ad is ulikely to be exactly equal to p. How close will ˆp be to p? It would be useful if we ca make probability statemets about the proximity of ˆp to p. 2

Samplig distributio of ˆp: The samplig distributio of proportios is the distributio of the sample proportios of all possible radom samples of size that ca be obtaied from a populatio. The mea µ(ˆp) of the samplig distributio of ˆp is equal to p i value. I other words, the sample proportios from repeated radom samples of size has a mea equal to the populatio proportio p i value. The stadard deviatio σ(ˆp) of the samplig distributio of ˆp is equal to or pq where q = 1 p. p(1 p), Whe p is ukow, σ(ˆp) is estimated by substitutig p by ˆp. We call this estimated σ(ˆp) the stadard error of ˆp: SE(ˆp) = ˆp(1 ˆp), or ˆpˆq where ˆq = 1 ˆp. For sufficietly large samples, the samplig distributio of ˆp is approximately ormal. The larger the sample size, the better the ormal approximatio. Assumptios ad coditios for the validity of ormal approximatio are: 1. the sample is radomly draw from the populatio. 2. the idividual values i the sample are idepedet. (Idividuals are draw without replacemet from the populatio, so idepedece ca ever be achieved. But this assumptio is well validated as log as the sample size is o greater tha 10% of the populatio size.) 3. the sample size has to be large. (It is sufficiet to check the coditios: p > 10 ad (1 p) > 10.) Samplig Distributio of Meas A Erolmet Services staff at a istitutio is iterested i fidig the mea GPA of studets of the istitutio. A sample of 100 studets is radomly draw from all studets, ad their academic records are retrieved. All studets of the istitutio comprise the populatio, ad the 100 studets selected comprise the sample. The parameter is the mea GPA of all studets of the istitutio, ad the statistic is the sample mea GPA. 3

Sample data (obtaied by the staff): Studet GPA 1 3.0 2 3.3 3 2.7 4 4.0 5 2.0 99 1.7 100 3.7 Suppose the staff s sample gives a mea of 2.4. How reliable is this sample mea as a estimate of the true mea GPA of all studets of the istitutio? The staff has draw oe radom sample of size 100. Imagie the samplig procedure is repeated may more radom samples of size 100 are draw. For each of these samples, we have a sample mea whose value will be differet for differet samples. Repeated samples data (of the may more samples): Sample Sample mea 1 2.4 2 2.6 3 2.0 4 2.3 5 3.6 Thigs to thik about... 1. Where do the sample mea values ceter at? 2. How spread out are the sample mea values? 3. What is the shape of the distributio of the sample mea values? The populatio mea µ is a parameter, which is fixed but usually ukow. The sample mea y is a statistic ad is used to estimate the true populatio mea µ. Due to samplig variatio, y varies across samples, ad is ulikely to exactly equal to µ. How close will y be to µ? 4

Samplig Distributio of Meas: The samplig distributio of meas is the distributio of the meas of all the possible radom samples of size that could be selected from a populatio. Suppose a radom sample of subjects is to be draw from a populatio, ad the observatio o a subject (y) i the populatio follows a distributio with mea µ ad stadard deviatio σ. The mea of the samplig distributio of meas is represeted by µ(y), ad is equal to µ i value. Equivaletly, let y 1, y 2,, y be a radom sample from some populatio with mea µ. The sample meas from repeated radom samples of size draw from this populatio has a mea equal to the populatio mea µ i value. The stadard deviatio of the samplig distributio of meas is represeted by σ(y). It is give by σ(y) = σ. Equivaletly, let y 1, y 2,, y be a radom sample from a populatio with mea µ ad stadard deviatio σ. The sample meas from repeated radom samples of size draw from this populatio has stadard deviatio equal to σ i value. Whe σ is ukow, σ(y) is estimated by substitutig σ by the sample SD s. We call this estimated σ(y) the stadard error of y: SE(y) = s The larger the sample size, the smaller the stadard deviatio for the sample meas, ad the better the approximatio of the ormal model to the samplig distributio of y. The Cetral Limit Theorem (CLT): Let y 1, y 2,, y be idepedet values of a radom sample from some populatio with mea µ ad stadard deviatio σ. For sufficietly large samples, the sample mea y follows approximately the ormal model with mea µ ad stadard deviatio σ, eve if the uderlyig distributio of the idividual observatios (y s) i the populatio is ot ormal. Assumptios ad coditios for the validity of CLT are: 1. the sample is radomly draw from the populatio. 2. the idividual values i the sample are idepedet. (The sample size should be o greater tha 10% of the populatio size.) 3. the sample size has to be sufficietly large. If the uderlyig distributio is ormal, the sample mea y follows the ormal model with mea µ ad stadard deviatio σ regardless of the sample size. The ormality of the sample mea i this case is ot a result of the CLT. 5

Examples 1. It is geerally believed that earsightedess affects about 12% of childre. A school district gives visio tests to 144 icomig kidergarte childre. (a) Describe the samplig distributio model for the sample proportio by amig the model ad tellig its mea ad stadard deviatio. Justify your aswer. (b) Sketch ad clearly label the model. (c) What is the probability that i this group over 15% of the childre will be foud to be earsighted? 2. A recet study ivolvig attritio rates at a major uiversity has show that 43% of all icomig freshme do ot graduate withi 4 years of etrace. (a) Describe the samplig distributio of the sample proportio of 200 radomly selected freshme who will graduate withi the ext 4 years. State ay assumptio(s) made to reach your aswer. (b) What is the approximate probability that the percetage of sampled freshme graduatig withi 4 years will be betwee 50% ad 64%? 3. Your mail-order compay advertises that it ships 90% of its orders withi three workig days. You select a radom sample of 120 orders for a audit. The audit reveals that 98 out of the 120 were shipped o time. (a) Fid the sample proportio of orders that were shipped o time. (b) If the compay really ships 90% of its orders o time, what is the probability that the proportio i a radom sample of size 120 orders is smaller tha or equal to the proportio i your sample for audit? Do you thik the compay s claim is trustworthy? 4. A radom sample of = 100 observatios is selected from a populatio with µ = 30 ad σ = 16. (a) Describe the samplig distributio of the mea y. (b) Approximate the followig probabilities: i. y is greater tha 28 ii. y is betwee 22.1 ad 26.8 6

5. The ages of U.S. commercial aircraft have a mea of 13.0 years ad a stadard deviatio of 7.9 years (based o data from Aviatio Data Services). The Federal Aviatio Admiistratio radomly selects 36 commercial aircrafts for special stress tests. (a) Describe the samplig distributio of the mea age of a sample of 36 aircrafts. (b) Fid the probability that the mea age of this sample group is greater tha 15.0 years. (c) Is the probability calculated i part (b) a exact or a a approximate probability? Justify your aswer. 6. A bottlig compay uses a fillig machie to fill plastic bottles with cola. A bottle should cotai 300 ml. I fact, the cotets vary accordig to the ormal model with mea 298 ml ad stadard deviatio 3 ml. (a) What is the probability that a idividual bottle cotais less tha 295 ml? (b) What is the probability that the mea cotets of bottles i a six-pack is less tha 295 ml? (c) Withi what rage of values does the mea cotets of bottles i a 12-pack have a 95% chace of fallig? 7