Chapter 18 Summary Sampling Distribution Models

Similar documents
Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

STA Why Sampling? Module 6 The Sampling Distributions. Module Objectives

Statistics 511 Additional Materials

Chapter 23: Inferences About Means

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

STAT 203 Chapter 18 Sampling Distribution Models

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Chapter 8: Estimating with Confidence

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

STAT 515 fa 2016 Lec Sampling distribution of the mean, part 2 (central limit theorem)

Chapter 6 Sampling Distributions

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

Homework 5 Solutions

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Confidence Intervals for the Population Proportion p

CH19 Confidence Intervals for Proportions. Confidence intervals Construct confidence intervals for population proportions

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

MATH/STAT 352: Lecture 15

STA Module 8 The Sampling Distribution of the Sample Mean. Rev.F08 1

Parameter, Statistic and Random Samples

Chapter 18: Sampling Distribution Models

Frequentist Inference

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Eco411 Lab: Central Limit Theorem, Normal Distribution, and Journey to Girl State

Variance of Discrete Random Variables Class 5, Jeremy Orloff and Jonathan Bloom

CONFIDENCE INTERVALS STUDY GUIDE

Announcements. Unit 5: Inference for Categorical Data Lecture 1: Inference for a single proportion

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Sampling Distributions, Z-Tests, Power

Module 1 Fundamentals in statistics

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

Understanding Samples

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

Read through these prior to coming to the test and follow them when you take your test.

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Introduction There are two really interesting things to do in statistics.

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Binomial Distribution

Sampling Distribution Models. Chapter 17

STATS 200: Introduction to Statistical Inference. Lecture 1: Course introduction and polling

Economics Spring 2015

Lecture 8: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

(6) Fundamental Sampling Distribution and Data Discription

This is an introductory course in Analysis of Variance and Design of Experiments.

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov

Chapter 7 Student Lecture Notes 7-1

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

Topic 9: Sampling Distributions of Estimators

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Chapter 20. Comparing Two Proportions. BPS - 5th Ed. Chapter 20 1

Topic 10: Introduction to Estimation

Instructor: Judith Canner Spring 2010 CONFIDENCE INTERVALS How do we make inferences about the population parameters?

Lecture 1 Probability and Statistics

Lecture 2: Monte Carlo Simulation

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

CS 330 Discussion - Probability

Lecture 1. Statistics: A science of information. Population: The population is the collection of all subjects we re interested in studying.

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8)

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.

2 Definition of Variance and the obvious guess

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

Computing Confidence Intervals for Sample Data

Rule of probability. Let A and B be two events (sets of elementary events). 11. If P (AB) = P (A)P (B), then A and B are independent.

MidtermII Review. Sta Fall Office Hours Wednesday 12:30-2:30pm Watch linear regression videos before lab on Thursday

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

BIOSTATS 640 Intermediate Biostatistics Frequently Asked Questions Topic 1 FAQ 1 Review of BIOSTATS 540 Introductory Biostatistics

Random Variables, Sampling and Estimation

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Lecture 5. Materials Covered: Chapter 6 Suggested Exercises: 6.7, 6.9, 6.17, 6.20, 6.21, 6.41, 6.49, 6.52, 6.53, 6.62, 6.63.

MA238 Assignment 4 Solutions (part a)

Solutions. Discussion D1. a. The middle 95% of a sampling distribution for a binomial proportion ˆp is cut off by the two points

4. Partial Sums and the Central Limit Theorem

1 Estimating a population statistic from a sample statistic

Lecture 1 Probability and Statistics

Anna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

4.3 Growth Rates of Solutions to Recurrences

Data Analysis and Statistical Methods Statistics 651

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

f X (12) = Pr(X = 12) = Pr({(6, 6)}) = 1/36

Chapter 2 Descriptive Statistics

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

µ and π p i.e. Point Estimation x And, more generally, the population proportion is approximately equal to a sample proportion

Topic 8: Expected Values

Transcription:

Uit 5 Itroductio to Iferece Chapter 18 Summary Samplig Distributio Models What have we leared? Sample proportios ad meas will vary from sample to sample that s samplig error (samplig variability). Samplig variability may be uavoidable, but it is also predictable! We ve leared to describe the behavior of sample proportios whe our sample is radom ad large eough to expect at least 10 successes ad failures. We ve also leared to describe the behavior of sample meas (thaks to the CLT!) whe our sample is radom (ad larger if our data come from a populatio that s ot roughly uimodal ad symmetric). Modelig the Distributio of Sample Proportios Rather tha showig real repeated samples, imagie what would happe if we were to actually draw may samples. Now imagie what would happe if we looked at the sample proportios for these samples. What would the histogram of all the sample proportios look like? We would expect the histogram of the sample proportios to ceter at the true proportio, p, i the populatio. As far as the shape of the histogram goes, we ca simulate a buch of radom samples that we did t really draw. It turs out that the histogram is uimodal, symmetric, ad cetered at p. More specifically, it s a amazig ad fortuate fact that a Normal model is just the right oe for the histogram of sample proportios. To use a Normal model, we eed to specify its mea ad stadard deviatio. The mea of this particular Normal is at p. Whe workig with proportios, kowig the mea automatically gives us the stadard deviatio as well the stadard deviatio we will use is. So, the distributio of the sample proportios is modeled with a probability model that is N p, A picture of what we just discussed is as follows: How Good Is the Normal Model? The Normal model gets better as a good model for the distributio of sample proportios as the sample size gets bigger. Just how big of a sample do we eed? This will soo be revealed AP Statistics Page 1 2007

Uit 5 Itroductio to Iferece Assumptios ad Coditios Most models are useful oly whe specific assumptios are true. There are two assumptios i the case of the model for the distributio of sample proportios: 1. The sampled values must be idepedet of each other. 2. The sample size,, must be large eough. Assumptios are hard ofte impossible to check. That s why we assume them. Still, we eed to check whether the assumptios are reasoable by checkig coditios that provide iformatio about the assumptios. The correspodig coditios to check before usig the Normal to model the distributio of sample proportios are the 10% Coditio ad the Success/Failure Coditio. 1. 10% coditio: If samplig has ot bee made with replacemet, the the sample size,, must be o larger tha 10% of the populatio. 2. Success/failure coditio: The sample size has to be big eough so that both pˆ ad qˆ are greater tha 10. So, we eed a large eough sample that is ot too large. A Samplig Distributio Model for a Proportio A proportio is o loger just a computatio from a set of data. o It is ow a radom quatity that has a distributio. o This distributio is called the samplig distributio model for proportios. Eve though we deped o samplig distributio models, we ever actually get to see them. o We ever actually take repeated samples from the same populatio ad make a histogram. We oly imagie or simulate them. Still, samplig distributio models are importat because o they act as a bridge from the real world of data to the imagiary world of the statistic ad o eable us to say somethig about the populatio whe all we have is data from the real world. Provided that the sampled values are idepedet ad the sample size is large eough, the samplig distributio of is modeled by a Normal model with o Mea: ( ˆp) p o Stadard deviatio: SD( pˆ ) What About Quatitative Data? Proportios summarize categorical variables. The Normal samplig distributio model looks like it will be very useful. Ca we do somethig similar with quatitative data? We ca ideed. Eve more remarkable, ot oly ca we use all of the same cocepts, but almost the same model. Simulatig the Samplig Distributio of a Mea Like ay statistic computed from a radom sample, a sample mea also has a samplig distributio. We ca use simulatio to get a sese as to what the samplig distributio of the sample mea might look like AP Statistics Page 2 2007

Uit 5 Itroductio to Iferece Meas The Average of Oe Die Let s start with a simulatio of 10,000 tosses of a die. A histogram of the results is: ` Lookig at the average of two dice after a simulatio of 10,000 tosses (see above) The average of three dice after a simulatio of 10,000 tosses looks like (see above) The average of 5 dice after a simulatio of 10,000 tosses looks like (see below) The average of 20 dice after a simulatio of 10,000 tosses looks like (see below) Meas What the Simulatios Show As the sample size (umber of dice) gets larger, each sample average is more likely to be closer to the populatio mea. o So, we see the shape cotiuig to tighte aroud 3.5 Ad, it probably does ot shock you that the samplig distributio of a mea becomes Normal. The Fudametal Theorem of Statistics The samplig distributio of ay mea becomes Normal as the sample size grows. o All we eed is for the observatios to be idepedet ad collected with radomizatio. o We do t eve care about the shape of the populatio distributio! The Fudametal Theorem of Statistics is called the Cetral Limit Theorem (CLT). The CLT is surprisig ad a bit weird: o Not oly does the histogram of the sample meas get closer ad closer to the Normal model as the sample size grows, but this is true regardless of the shape of the populatio distributio. The CLT works better (ad faster) the closer the populatio model is to a Normal itself. It also works better for larger samples. The Fudametal Theorem of Statistics (cot.) The Cetral Limit Theorem (CLT) - The mea of a radom sample has a samplig distributio whose shape ca be approximated by a Normal model. The larger the sample, the better the approximatio will be. AP Statistics Page 3 2007

Uit 5 Itroductio to Iferece But Which Normal? The CLT says that the samplig distributio of ay mea or proportio is approximately Normal. But which Normal model? o For proportios, the samplig distributio is cetered at the populatio proportio. o For meas, it s cetered at the populatio mea. But what about the stadard deviatios? But Which Normal? (cot.) The Normal model for the samplig distributio of the mea has a stadard deviatio equal to SD y where σ is the populatio stadard deviatio. The Normal model for the samplig distributio of the proportio has a stadard deviatio equal to SD pˆ Assumptios ad Coditios The CLT requires remarkably few assumptios, so there are few coditios to check: 1. Radom Samplig Coditio: The data values must be sampled radomly or the cocept of a samplig distributio makes o sese. 2. Idepedece Assumptio: The sample values must be mutually idepedet. (Whe the sample is draw without replacemet, check the 10% coditio ) 3. Large Eough Sample Coditio: There is o oe-size-fits-all rule. Dimiishig Returs The stadard deviatio of the samplig distributio declies oly with the square root of the sample size. While we d always like a larger sample, the square root limits how much we ca make a sample tell about the populatio. (This is a example of the Law of Dimiishig Returs.) Stadard Error Both of the samplig distributios we ve looked at are Normal. For proportios SD pˆ For meas SD y Whe we do t kow p or σ, we re stuck, right? o Nope. We will use sample statistics to estimate these populatio parameters. o Wheever we estimate the stadard deviatio of a samplig distributio, we call it a stadard error. ˆˆ For a sample proportio, the stadard error is SE pˆ s For the sample mea, the stadard error is SE y Samplig Distributio Models Always remember that the statistic itself is a radom quatity. o We ca t kow what our statistic will be because it comes from a radom sample. Fortuately, for the mea ad proportio, the CLT tells us that we ca model their samplig distributio directly with a Normal model. AP Statistics Page 4 2007

Uit 5 Itroductio to Iferece Samplig Distributio Models (cot.) There are two basic truths about samplig distributios: 1. Samplig distributios arise because samples vary. Each radom sample will have differet cases ad, so, a differet value of the statistic. 2. Although we ca always simulate a samplig distributio, the Cetral Limit Theorem saves us the trouble for meas ad proportios. The Process Goig Ito the Samplig Distributio Model What Ca Go Wrog? Do t cofuse the samplig distributio with the distributio of the sample. o Whe you take a sample, you look at the distributio of the values, usually with a histogram, ad you may calculate summary statistics. o The samplig distributio is a imagiary collectio of the values that a statistic might have take for all radom samples the oe you got ad the oes you did t get. What Ca Go Wrog? (cot.) Beware of observatios that are ot idepedet. o The CLT depeds crucially o the assumptio of idepedece. o You ca t check this with your data you have to thik about how the data were gathered. Watch out for small samples from skewed populatios. o The more skewed the distributio, the larger the sample size we eed for the CLT to work. AP Statistics Page 5 2007