Lab 5 for Math 17: Sampling Distributions and Applications

Similar documents
Math 361. Day 3 Traffic Fatalities Inv. A Random Babies Inv. B

Last few slides from last time

AP Statistics Ch 6 Probability: The Study of Randomness

Boyle s Law and Charles Law Activity

WISE Regression/Correlation Interactive Lab. Introduction to the WISE Correlation/Regression Applet

σ. We further know that if the sample is from a normal distribution then the sampling STAT 2507 Assignment # 3 (Chapters 7 & 8)

Experiment 1: The Same or Not The Same?

Probability Distributions

Math 261 Sampling Distributions Lab Spring 2009

THE SAMPLING DISTRIBUTION OF THE MEAN

ACMS Statistics for Life Sciences. Chapter 13: Sampling Distributions

Section 7.1 How Likely are the Possible Values of a Statistic? The Sampling Distribution of the Proportion

Sampling Distribution Models. Central Limit Theorem

Lab 3 Acceleration. What You Need To Know: Physics 211 Lab

Using Dice to Introduce Sampling Distributions Written by: Mary Richardson Grand Valley State University

(a) (i) Use StatCrunch to simulate 1000 random samples of size n = 10 from this population.

Statistic: a that can be from a sample without making use of any unknown. In practice we will use to establish unknown parameters.

Statistics, continued

Unit 22: Sampling Distributions

STT 315 This lecture is based on Chapter 2 of the textbook.

4.12 Sampling Distributions 183

Section 2.5 from Precalculus was developed by OpenStax College, licensed by Rice University, and is available on the Connexions website.

TOPIC: Descriptive Statistics Single Variable

Modeling Data with Functions

*Karle Laska s Sections: There is no class tomorrow and Friday! Have a good weekend! Scores will be posted in Compass early Friday morning

Describing distributions with numbers

Sampling Distributions of the Sample Mean Pocket Pennies

EXPERIMENT: REACTION TIME

6.1.1 How can I make predictions?

Lesson 19: Understanding Variability When Estimating a Population Proportion

PHY 123 Lab 1 - Error and Uncertainty and the Simple Pendulum

1 Binomial Probability [15 points]

Mathematics Level D: Lesson 2 Representations of a Line

AP Statistics Review Ch. 7

Math Lab 10: Differential Equations and Direction Fields Complete before class Wed. Feb. 28; Due noon Thu. Mar. 1 in class

Name: Block: Date: Electrical Potentials and Electrical Fields: Potentially Dangerous Situations

Statistical Analysis of Data

Math Lab 8: Electric Fields Integrating Continuous Charge Distributions II Due noon Thu. Feb. 1 in class

Experiment 2 Random Error and Basic Statistics

Chapter 8: Confidence Intervals

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Experiment 0 ~ Introduction to Statistics and Excel Tutorial. Introduction to Statistics, Error and Measurement

LAB 3 - VELOCITY AND ACCELERATION

Where Is Newton Taking Us? And How Fast?

LAB 2 - ONE DIMENSIONAL MOTION

M1-Lesson 8: Bell Curves and Standard Deviation

ENV Laboratory 1: Quadrant Sampling

Probability and Discrete Distributions

Exam #2 Results (as percentages)

ST 371 (IX): Theories of Sampling Distributions

Overview. Confidence Intervals Sampling and Opinion Polls Error Correcting Codes Number of Pet Unicorns in Ireland

Name: Date: Partners: LAB 2: ACCELERATED MOTION

Lesson 12: Position of an Accelerating Object as a Function of Time

Performance of fourth-grade students on an agility test

Experiment 1 Introduction to 191 Lab

Solving Equations by Adding and Subtracting

CENTRAL LIMIT THEOREM (CLT)

Occupy movement - Duke edition. Lecture 14: Large sample inference for proportions. Exploratory analysis. Another poll on the movement

Finite Differences TEACHER NOTES MATH NSPIRED

Mean/Average Median Mode Range

Survey on Population Mean

1. Rolling a six sided die and observing the number on the uppermost face is an experiment with six possible outcomes; 1, 2, 3, 4, 5 and 6.

Elisha Mae Kostka 243 Assignment Mock Test 1 due 02/11/2015 at 09:01am PST

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Measures of Central Tendency. Mean, Median, and Mode

Describing distributions with numbers

STA 291 Lecture 16. Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately) normal

The Scientific Method

download instant at

Chapter 6. Estimates and Sample Sizes

MATH 1150 Chapter 2 Notation and Terminology

Each trial has only two possible outcomes success and failure. The possible outcomes are exactly the same for each trial.

Exploring Graphs of Polynomial Functions

Conceptual Explanations: Modeling Data with Functions

Data Analysis and Statistical Methods Statistics 651

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

4/19/2009. Probability Distributions. Inference. Example 1. Example 2. Parameter versus statistic. Normal Probability Distribution N

Chapter 5: Exploring Data: Distributions Lesson Plan

Math 140 Introductory Statistics

Math 140 Introductory Statistics

Interactive Chalkboard

Lab 8 Impulse and Momentum

This module focuses on the logic of ANOVA with special attention given to variance components and the relationship between ANOVA and regression.

The point value of each problem is in the left-hand margin. You must show your work to receive any credit, except in problem 1. Work neatly.

{X i } realize. n i=1 X i. Note that again X is a random variable. If we are to

PHY 111L Activity 2 Introduction to Kinematics

Are data normally normally distributed?

LABORATORY II DESCRIPTION OF MOTION IN TWO DIMENSIONS

CHAPTER 1: Preliminary Description of Errors Experiment Methodology and Errors To introduce the concept of error analysis, let s take a real world

Statistics and Data Analysis in Geology

LECTURE 15: SIMPLE LINEAR REGRESSION I

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing

Instructor: Doug Ensley Course: MAT Applied Statistics - Ensley

Introduction to Algebra: The First Week

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

EXPERIMENT 2 Reaction Time Objectives Theory

= - = = 1 = -2 = 3. Jeremy can plant 10 trees in 4 hours. How many trees can he plant in 10 hours? A. 16

Make purchases and calculate change for purchases of up to ten dollars.

Ch. 3 Equations and Inequalities

are the objects described by a set of data. They may be people, animals or things.

Transcription:

Lab 5 for Math 17: Sampling Distributions and Applications Recall: The distribution formed by considering the value of a statistic for every possible sample of a given size n from the population is called the sampling distribution of the statistic. It is usually too difficult to enumerate all possible samples and compute all possible values of the statistic by hand, but we can approximate the distributions by taking a large number of samples (via simulation) to help visualize the distribution. Statistical theory helps us determine the distributions of some common sampling distributions. 1 Coin Activity Suppose we want to understand how the sample mean year on pennies behaves. The population of pennies we have available for investigation is a collection of 1002 pennies which were obtained from the UMass Five College Credit Union on August 25, 2010 ($10 in pennies was asked for). What do you think the distribution of year looks like for the population of pennies? Explain. Obtain a sample of 30 pennies, and compute the sample mean year. What value do you get? We note that due to time constraints, we are not sampling with replacement. Compare your mean value with the class (class graph). Are the values very different? What does the distribution of sample mean year look like based on the graph? Do you think looking at roughly 30 samples of size 30 is good enough to tell us about the distribution of sample mean year when n is 30? 2 Sampling Distribution of the Sample Proportion For the purposes of this example, the bin filled with balls represents the population of all possible birds that could be captured as part of an upcoming study looking for a genetic trait which is known to be harmful to carriers and sometimes fatal to those which exhibit the trait (think sickle cell anemia idea but for birds). Let white balls denote birds that do not have the trait and are also not carriers. Let red balls denote birds that are carriers but do not themselves exhibit the trait, and let green balls denote birds that do exhibit the trait (also then carriers). 1

Looking at the bin, what are your initial guesses as to the composition of this population? % white, % red, % green with total balls With the understanding that you could choose a combination of colors (i.e. red + green = all carriers) and estimate the population proportion for that combination, what combination (or single color) do you want to investigate? (You cannot choose single green vs. white+red). What color/combination did the class decide on? Working in groups of 2, taking turns as appropriate, every group come get a sample of size 25, 50, and 100 from the bin and get your count of the number of balls meeting the criteria above (class color/combination selected). Both members need to count the number of balls meeting the criteria chosen and agree on the count before you can record your counts for the class. Be sure you take a sample then return it to the population without losing members! (Also means don t take all three samples at once; do one, then return the balls, then take the second, etc.). Small (n=25) Medium (n=50) Large (n=100) Explain why this is NOT equivalent to capture-recapture sampling. What does it look like the counts are close to for each sample size? What proportion is that (roughly)? (Class values will be entered into R/Rcmdr for analysis). What values does the class get as the average of the sample proportions for each sample size? What values does the class get as the standard deviation of the sample proportions for each sample size? The population proportion corresponding to the class color/combination is proportion appear to be an unbiased statistic? %. Does the sample What does the effect of sample size on standard deviation for the sampling distribution of p appear to be? 2

What shapes do the histograms for each sample size have (will be hard to tell with our small number of repetitions)? The Sampling Distribution for p can be described as: approximately normal for large sample sizes where p is not too near 0 or 1, with a mean denoted µˆp = p, the population proportion, and a standard deviation σˆp = a sample). p(1 p) n (assuming that not more than 10% of the population is used as For n = 25, 50, 100, compute the standard deviations for p based on the now known population proportion. Do the observed standard deviations for the sample proportions match up? 3 Sampling Distribution of the Sample Mean For sample means, we will learn about the sampling distribution via an applet (link online). Steer your (Java-enabled) browsers to http://onlinestatbook.com/stat sim/sampling dist/index.html In this applet, when you first hit Begin, a histogram of a normal distribution is displayed at the top of the screen. This is the parent population from which samples are taken (think of it as the bin of balls) except it s showing the distribution. The mean of that distribution is indicated by a small blue line and the median is indicated by a small purple line. Since the mean and median are the same for a normal distribution, the two lines overlap. The red line extends from the mean one standard deviation in each direction. The second histogram displays the sample data. This histogram is initially blank. The third and fourth histograms show the distribution of statistics computed from the sample data. The option N in those histograms is the sample size you are drawing from the population. We will be exploring the distribution of the sample mean by drawing many samples from the parent distribution and examining the distribution of the sample means we get. Step 1. Describe the parent population. What distribution is it and what is its mean and standard deviation? Step 2. You can see the third histogram is already set to Mean, with a sample size of N = 5. Click Animated sample once. The animation shows five observations being drawn from the parent distribution. Their mean is computed and dropped down onto the third histogram. For your sample, what was the sample mean? Step 3. Click Animated sample again. A new set of five observations are drawn, their mean is computed and dropped as the second sample mean onto the third histogram. What did the mean of the sample means (yes, we are interested in the mean of sample means as part of the sampling distribution) change to? Step 4. Click Animated sample one more time. What did the mean of the sample means update to now? 3

Step 5. Click 10,000. This takes 10,000 samples at once (no more animation) and will place those 10,000 sample means on the third histogram and update the mean and standard deviation of the sample means. Record the mean and standard deviation of the sample means. What shape does this third histogram have? How do these findings compare to the parent distribution? Step 6. Hit Clear Lower 3 in the upper right corner. Change N = 5 to N = 25 for the third histogram. Do animated sample at least once (convince yourself it is actually samples of 25 now). Then take 10,000 at once. Record the mean and standard deviation of the sample means. What shape does the third histogram have? How do these findings compare to the parent distribution? Step 7. Compare the different standard deviations from Steps 5 and 6. What effect does sample size appear to have on standard deviation of the sample means? Step 8. Hit Clear Lower 3. Change the parent distribution to Skewed. What are the new mean and standard deviation of the parent distribution? Which direction is this distribution skewed? Step 9. Set N = 5 back for the third histogram. Set Mean and N = 25 for the fourth histogram. Hit 10,000 at once. (This will take 10,000 samples of size 5, compute the sample means and put those means in the third histogram, as well as take 10,000 samples of size 25, compute the sample means and put those means in the fourth histogram). What do the distributions look like for the third and fourth histograms? Are they skewed like the parent population? What are the means and standard deviations for each histogram? Step 10. Hit Clear Lower 3. Change the parent distribution to Custom. Draw in a custom distribution (left click and drag the mouse over the top histogram). Sketch your custom distribution below. What are its mean and standard deviation? Step 11. Hit 10,000 at once (leave the settings on the third and fourth histograms alone). (You could take animated once to convince yourself it was really drawing from your new distribution). What do the third and fourth histograms look like? Anything like the parent distribution? What are their means and standard deviations? The Sampling Distribution for the sample mean, X can be described as having a mean µ X = µ, 4

the population mean, and a standard deviation σ X = σ n. The distribution is exactly normal if the parent population is normal. Finally, the Central Limit Theorem tells us the distribution will be approximately normal with the mean and standard deviation stated above if n is sufficiently large even if the population distribution is not normal. 4 Application Example A rental car company is interested in the number of miles put on their rental cars by their clients as part of a project where they may trade in some cars in the Cash for Clunkers program. From past experience, they believe the population distribution of mileage has a mean of 60 miles and a standard deviation of 60 miles. They obtain a random sample of 50 mileages from their rental car fleet and obtain a sample mean of 73.31 miles. The company executives are worried: has the average number of miles put on the cars gone up? Your job is to help them figure out if the data suggest an increase in average number of miles put on the cars. a. What is the sampling distribution of the sample mean mileage put on rental cars? (Give distribution type, mean, and standard deviation.) What result allows you to provide this distribution? b. What is the probability you would see a sample mean of 73.31 or greater if the population mean and standard deviation were both really 60? c. Would you tell the executives that the average number of miles put on the rental cars has increased? (How unusual is 73.31 if the mean is really 60, assuming the standard deviation is correct?) d. In practice, do you think the standard deviation of the parent distribution would be known? How would you get around it being unknown? What value could you substitute for σ in our calculations relating to the CLT? This swap and its consequences will be a focus of our discussions next week as we start developing confidence intervals. 5

5 More Applications 1. Suppose 40 percent of the voters in a large city prefer candidate Q for mayor. A random sample of 2400 city voters is taken. a. What is the sampling distribution of the sample proportion of city voters who prefer candidate Q for mayor? Check that this distribution is valid. b. What is the probability that the sample taken results in a sample proportion of.426 or higher? 2. A researcher is investigating deaths among a new invasive species of beetles treated with various insecticides. Age of death is recorded for fully matured adult beetles at various doses of insecticides. Since only fully matured adult beetles are included, and because ages at death are usually not bell-shaped, the researcher records age at death for 50 beetles at each insecticide/dosage level to help study average age at death. a. What is the significance of studying 50 beetles at each treatment level if you know you want to examine the sample mean? b. Suppose the population mean age at death for the beetle population at a specific treatment level is 20 days with a population standard deviation of 3 days. What is the sampling distribution of the sample mean for that treatment level for the sample of size 50 taken? c. What is the probability a sample of size 50 results in a sample mean between 19 and 21? 6 To Turn In In a recent election, 62 percent of voters voted in favor of a new law. A related law is coming up to vote in a neighboring state. A random sample of 80 voters in the neighoring state reveals that 43 of the 80 are in favor of the related law. If the percent in favor is really the same in both states, how unusual is the result of the sample poll or something more extreme (for direction of extreme use smaller values)? 6