ST 371 (IX): Theories of Sampling Distributions

Similar documents
Chapter Goals. To introduce you to data collection

Lecture 8 Sampling Theory

AP Statistics Review Ch. 7

STAT Chapter 5 Continuous Distributions

1 Exercises for lecture 1

Chapter 18: Sampling Distributions

Probability and Statistics Notes

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

STAT 516 Midterm Exam 2 Friday, March 7, 2008

Statistics 427: Sample Final Exam

Lecture 8 Continuous Random Variables

Lecture 7: Confidence interval and Normal approximation

Discrete Distributions

Introduction to Statistical Data Analysis Lecture 4: Sampling

Carolyn Anderson & YoungShil Paek (Slide contributors: Shuai Wang, Yi Zheng, Michael Culbertson, & Haiyan Li)

December 2010 Mathematics 302 Name Page 2 of 11 pages

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

HT Introduction. P(X i = x i ) = e λ λ x i

Introduction to Probability

Chapter 4: Continuous Random Variables and Probability Distributions

Math 494: Mathematical Statistics

Applied Statistics I

STAT 430/510 Probability Lecture 12: Central Limit Theorem and Exponential Distribution

Limiting Distributions

σ. We further know that if the sample is from a normal distribution then the sampling STAT 2507 Assignment # 3 (Chapters 7 & 8)

MTH135/STA104: Probability

The Central Limit Theorem

The variable θ is called the parameter of the model, and the set Ω is called the parameter space.

Chris Piech CS109 CS109 Final Exam. Fall Quarter Dec 14 th, 2017

Mathematics 375 Probability and Statistics I Final Examination Solutions December 14, 2009

1 MA421 Introduction. Ashis Gangopadhyay. Department of Mathematics and Statistics. Boston University. c Ashis Gangopadhyay

Queueing Theory and Simulation. Introduction

Probability Distributions Columns (a) through (d)

Lecture 7: Chapter 7. Sums of Random Variables and Long-Term Averages

Solutions - Final Exam

Introduction to Statistical Data Analysis Lecture 1: Working with Data Sets

Chapter 8: Confidence Interval Estimation: Further Topics

7.1: What is a Sampling Distribution?!?!

Introduction and Overview STAT 421, SP Course Instructor

Statistics 135 Fall 2007 Midterm Exam

Chapter 7 Sampling Distributions

Probability Distributions for Continuous Variables. Probability Distributions for Continuous Variables

Week 2: Review of probability and statistics

Statistics and Sampling distributions

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

If we want to analyze experimental or simulated data we might encounter the following tasks:

Chapter 2 Queueing Theory and Simulation

Chapter 1: Revie of Calculus and Probability

Statistics 1B. Statistics 1B 1 (1 1)

Definition: A random variable X is a real valued function that maps a sample space S into the space of real numbers R. X : S R

Chapter 6 Continuous Probability Distributions

Exponential, Gamma and Normal Distribuions

Continuous Random Variables. What continuous random variables are and how to use them. I can give a definition of a continuous random variable.

Common Discrete Distributions

STA 584 Supplementary Examples (not to be graded) Fall, 2003

Chapter 5 continued. Chapter 5 sections

What Is a Sampling Distribution? DISTINGUISH between a parameter and a statistic

Chapter 8: Confidence Intervals

Econ 250 Winter 2009 Assignment 2 - Solutions

Confidence Intervals for the Mean of Non-normal Data Class 23, Jeremy Orloff and Jonathan Bloom

Continuous Probability Distributions

Let X be a continuous random variable, < X < f(x) is the so called probability density function (pdf) if

Limiting Distributions

SS257a Midterm Exam Monday Oct 27 th 2008, 6:30-9:30 PM Talbot College 342 and 343. You may use simple, non-programmable scientific calculators.

Learning Objectives for Stat 225

Probability and Probability Distributions. Dr. Mohammed Alahmed

December 2010 Mathematics 302 Name Page 2 of 11 pages

COMPSCI 240: Reasoning Under Uncertainty

Discrete probability distributions

Lecture 13. Poisson Distribution. Text: A Course in Probability by Weiss 5.5. STAT 225 Introduction to Probability Models February 16, 2014

Business Statistics:

STAT100 Elementary Statistics and Probability

Sampling Distribution Models. Chapter 17

Asymptotic Statistics-III. Changliang Zou

Review. A Bernoulli Trial is a very simple experiment:

Chapter 6: Functions of Random Variables

STAT 418: Probability and Stochastic Processes

Using Dice to Introduce Sampling Distributions Written by: Mary Richardson Grand Valley State University

Week 1 Basic Statistical Concepts, Part I

1 Basic continuous random variable problems

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

5.2 Continuous random variables

Continuous random variables

6 The normal distribution, the central limit theorem and random samples

Course information: Instructor: Tim Hanson, Leconte 219C, phone Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment.

AP Statistics - Chapter 7 notes

CPSC 531: System Modeling and Simulation. Carey Williamson Department of Computer Science University of Calgary Fall 2017

STAT515, Review Worksheet for Midterm 2 Spring 2019

Sampling Distributions. Introduction to Inference

Confidence Intervals for the Sample Mean

Northwestern University Department of Electrical Engineering and Computer Science

Chapter Learning Objectives. Probability Distributions and Probability Density Functions. Continuous Random Variables

Part 3: Parametric Models

Business Statistics:

Practice Problems Section Problems

Chapter 6 Continuous Probability Distributions

Estimation and Confidence Intervals

Exam: practice test 1 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

15 Discrete Distributions

ax, 0 < x < 1 0, otherwise. f(x)dx = 1).

Transcription:

ST 371 (IX): Theories of Sampling Distributions 1 Sample, Population, Parameter and Statistic The major use of inferential statistics is to use information from a sample to infer characteristics about a population. A population is the complete collection of subjects to be studied; it contains all subjects of interest. A sample is a part of the population of interest, a sub-collection selected from a population. A parameter describes a characteristic of a population, while a statistic describes a characteristic of a sample. In general, we will use a statistic to infer the value of a parameter. Unbiased Sample: A sample is unbiased if every individual or the element in the population has an equal chance of being selected. Next we discuss several examples occurred in survey sampling. 1. Survey in presidential election. (a) Option I: Call all registered voters on the phone and ask them who they will vote for. Although this would provide a very accurate result, it would be a very tedious and time consuming project. (b) Option II: Call 4 registered voters,1 in each time zone, and ask them who they will vote for. Although this is a very easy task, the results would not be very reliable. (c) Option III: Randomly select 20,000 registered voters and poll them. The population of interest here is all registered voters, and the parameter is the percentage of them that will vote for a candidate. The sample is the 20,000 registered voters that were polled, and the statistic is the percentage of them that will vote for a candidate. 2. Kathy wants to know how many students in her city use the internet for learning purposes. She used an email poll. Based on the replies 1

to her poll, she found that 83% of those surveyed used the internet. Kathys sample is biased as she surveyed only the students those who use the internet. She should have randomly selected a few schools and colleges in the city to conduct the survey. 3. Another classic example of a biased sample and the misleading results it produced occurred in 1936. In the early days of opinion polling, the American Literary Digest magazine collected over two million postal surveys and predicted that the Republican candidate in the U.S. presidential election, Alf Landon, would beat the incumbent president, Franklin Roosevelt by a large margin. The result was the exact opposite. The Literary Digest survey represented a sample collected from readers of the magazine, supplemented by records of registered automobile owners and telephone users. This sample included an overrepresentation of individuals who were rich, who, as a group, were more likely to vote for the Republican candidate. In contrast, a poll of only 50 thousand citizens selected by George Gallup s organization successfully predicted the result, leading to the popularity of the Gallup poll. Conclusion: To use a sample to make inferences about a population, the sample should be representative of the population (unbiased). 2 Statistics and their Distributions A statistic is a random variable, denoted by an upper case letter whose value can be computed from sample data. We often use a statistic to infer the value of a parameter. Examples include Measures of location: Suppose we observe n realizations of random variable X: x 1,, x n, the sample mean is x = 1 n n i=1 x i. In contrast, the population mean is E(X) = µ. 2

the sample median: let x (1),, x (n) denote the ordered values. If n is odd, then x = x ( n+1 2 ). If n is even, x = 1/2[x ( n 2 ) + x ( n 2 +1) ]. In contrast, the population median is µ = FX 1 (0.5). Measure of variability: the sample variance S 2 = 1 n 1 Note that the population variance is n (x i x) 2. i=1 σ 2 = V (X) = E(X µ) 2. Measure of contrasts: Consider random samples from two populations {x 1,, x n } and {y 1,, y m }, for example, in a randomized clinical trial, the difference of the quality of life (QOL) between the patients (or survival time, or cure rate) on two treatment arms T = x ȳ. The contrast between two populations is µ X µ Y = E(X) E(Y ). Each statistic is a random variable and has a probability distribution. The probability distribution of a statistic is referred to as its sampling distribution. The sampling distribution depends not only on the population distribution but also on the method of sampling. The most widely used sampling method is random sampling with replacement. The random variables X 1,, X n are said to form a random sample of size n, or be independently identically distributed (i.i.d.), if 1. The X i s are independent rv s. 2. Every X i has the same probability distribution. Denote by µ and σ 2 the mean and variance of the random variable X. The next theorem follows from the results on the distribution of a linear combination that we shall discuss in Section 4. 3

Theorem on the distribution of the sample mean X. 1. E( X) = µ X = µ. 2. V( X) = σ 2 X = σ 2 /n. 3. σ X = σ/ n. Example 1 Let X 1,, X 5 be a random sample from a normal distribution with µ = 1.5 and σ = 0.35. Find the probability that P ( X 2.0). Find the variance of 5 i=1 X i. 4

Example 2 Service time for a certain bank transaction is a random variable having an exponential distribution with parameter λ. Suppose X 1 and X 2 are service times for two independent customers. Consider the average service time X = (X 1 + X 2 )/2. Find the cdf of X. Find the pdf of X. Find the mean and variance of X. 5

3 Limit Theorems 3.1 Weak law of large numbers Consider a sample of independent and identically distributed random variables X 1,, X n. The relationship between the sample mean X n = X 1 + + X n n and true mean of the X i s, E(X i ) = µ, is a problem of pivotal importance in statistics. Typically, µ is unknown and we would like to estimate µ based on X n. The weak law of large numbers says that the sample mean converges in probability to µ. This means that for a large enough sample size n, Xn will be close to µ with high probability. The weak law of large numbers. Let X 1, X 2, be a sequence of independent and identically distributed random variables, each having finite mean E(X i ) = µ. Then, for any ɛ > 0, (3.1) P { Xn µ ε } 0 as n. Example 3 A numerical study of the law of large numbers. We first simulate normal random variables from N(5, 1) with different sample sizes, then calculate the difference between the sample mean and the population mean. n 5 20 500 10000 50000 Bias: Xn µ 0.8323-0.1339 0.0368 0.0069-0.0092 We can see that X n based on a large n tends to be closer to µ than does X n based on a small n. Example 4 (optional) Application of Weak Law of Large Numbers: Monte Carlo Integration. Suppose that we wish to calculate I(f) = 1 0 6 f(x)dx,

where the integration cannot be done by elementary means or evaluated using tables of integrals. The most common approach is to use a numerical method in which the integral is approximated by a sum; various schemes and computer packages exist for doing this. Another method, called the Monte Carlo method, works in the following way. Generate independent uniform random variables on (0,1), that is, X 1,, X n, and compute I( ˆf) = 1 n n f(x i ). By the law of large numbers, for large n, this should be close to E[f(X)], which is simply E[f(X)] = 1 0 i=1 f(x)dx = I(f). This simple scheme can easily be modified in order to change the range of integration and in other ways. Compared to the standard numerical methods, it is not especially efficient in one dimension, but becomes increasingly efficient as the dimensionality of the integral grows. 3.2 Strong law of large numbers (optional) The strong law of large numbers states that for a sequence of independent and identically distributed random variables X 1, X 2,, the sample mean converges almost surely to the mean of the random variables E(X i ) = µ. Let be a sequence of independent and identically distributed random variables, each having a finite mean µ = E(X i ). Then, with probability 1, X µ as n. The weak law of large numbers states that for any specified large value n, Xn is likely to be near µ. However, it does not say that X n is bound to stay near µ for all values of n larger than n. Thus, it leaves open the possibly that large values of X n µ can occur infinitely often (thought at infrequent intervals). The strong law shows that this cannot occur. In particular, it implies that with probability 1, for any positive value ɛ, X n µ will be greater than ɛ only a finite number of times. 7

The strong law of large numbers is of enormous importance, because it provides a direct link between the axioms of probability and the frequency interpretation of probability. If we accept the interpretation that with probability 1 means with certainty, then we can say that P (E) is the limit of the long-run relative frequency of times E would occur in repeated, independent trials of the experiment. 3.3 Central limit theorem The weak law of large numbers says that for X 1,, X n, iid, the sample mean X n is close to E(X i ) = µ when n is large. The Central Limit Theorem provides a more precise approximation by showing that a magnification of the distribution of Xn around µ has approximately a standard normal distribution: The Central Limit Theorem (CLT): Let X 1,, X n be a sequence of independent and identically distributed random variables, each having finite mean E(X i ) = µ and finite variance Var(X i ) = σ 2. Then the distribution X of n µ tends to the standard normal distribution as n. That is, for σ/ n < a <, ( X1 + + X n nµ P σ n ) a 1 a e x2 /2 dx 2π as n. The theorem can be thought of as roughly saying that the sum of a large number of iid random variables has a distribution that is approximately normal. By writing X 1 + + X n nµ σ n = n ( Xn µ ) (σ n) = X n µ σ/ n, we see that the CLT says that the sample mean X n has a approximately a normal distribution with mean µ and variance σ/ n. The CLT is a remarkable result - only assuming that a sequence of iid random variables have a finite mean and variance, the central limit theorem shows that the mean of the sequence, suitably standardized, always converges to having a 8

standard normal distribution. The normal approximation to the binomial distribution is a special case of the central limit theorem. Consider a skewed distribution (lognormal). Consider the histogram of the sample mean X n for n = 1, 5, 20, 50. n=1 n=5 Frequency 0 500 1000 1500 Frequency 0 200 400 600 800 0 1 2 3 4 x1.bar 0.0 0.5 1.0 1.5 x2.bar n=10 n=30 Frequency 0 200 600 1000 Frequency 0 200 400 600 800 0.2 0.4 0.6 0.8 1.0 1.2 1.4 x3.bar 0.2 0.4 0.6 0.8 1.0 x4.bar We can see from the histograms that the sampling distributions become progressively less skewed as the sample size n increases, therefore the distribution can be better approximated by a normal distribution. This interesting result shows that the central limit theorem can be successfully applied when n is large. In general, the rule of thumb is n > 30. 9

Example 5 An Airline overbooks a flight because it expects that there will be no-shows. Assume that (i) There are 200 seats available on the flight. (ii) Seats are occupied only by individuals who made reservations (no standbys). (iii) The probability that a person who made a reservation shows up for the flight is 0.95. (iv) Reservations show up for the flight independently of each other. 1. If the airline accepts 220 reservations, write an expression for the exact probability that the plane will be full (i.e., at least 200 reservations show up). Use the central limit theorem to approximate this probability. 2. Suppose the airline wants to choose a number n of reservations so that the probability that at least 200 of the n reservations show up is 0.75. Find the (approximate) minimum value of n. 10

Example 6 The number of parking tickets issued in Raleigh on any given weekday has a Poisson distribution with parameter λ = 50. What is the approximate probability that (a) Between 35 and 70 tickets are given out on a particular day? (b) The total number of tickets given out during a 5-day week is between 225 and 275? 11

4 Distribution of a Linear Combination Given a collection of n random variables X 1,, X n and n numerical constants a 1,, a n, the rv Y = a 1 X 1 + + a n X n = n a i X i i=1 is called a linear combination of the X i s. Let X 1,, X n have means µ 1,, µ n, respectively, and variances σ 2 1,, σ 2 n, respectively. Then 1. E(a 1 X 1 + a 2 X 2 + + a n X n ) = a 1 E(X 1 ) + a 2 E(X 2 ) + + a n E(X n ) = a 1 µ 1 + + a n µ n. 2. If X 1, X 2,, X n are independent, then Var(a 1 X 1 +, a n X n ) = a 2 1Var(X 1 )+ +a 2 nvar(x n ) = a 2 1σ 2 1+ +a 2 nσ 2 n. 3. For any (possibly dependent) random variables X 1,, X n, Var(a 1 X 1 + + a n X n ) = n i=1 n a i a j Cov(X i, X j ). j=1 The case of normal random variables: If X 1,, X n are independent, normally distributed rv s, then any particular linear combination of the X i s are also normally distributed. Special cases: 1. E( X) = µ X = µ. 2. If all X i are independent, V( X) = σ 2 X = σ 2 /n. 3. E(X 1 X 2 ) = E(X 1 ) E(X 2 ). 4. If X i are independent, then V (X 1 X 2 ) = V (X 1 ) + V (X 2 ). Otherwise V (X 1 X 2 ) = V (X 1 ) + V (X 2 ) 2Cov(X 1, X 2 ). 12

Example 7 The total revenue from the sale of the three grades of gasoline on a particular day was Y = 21.2X 1 +21.35X 2 +21.5X 3. Assume that X 1, X 2 and X 3 are independent with µ 1 = 1000, µ 2 = 500, µ 3 = 300, σ 1 = 100, σ 2 = 80 and σ 3 = 50. What is the probability that the revenue exceeds 45000? 13

Example 8 A student has a class that is supposed to end at 9am and another class that is supposed to begin at 9:10am. Suppose that the actual ending time (after 9 in minutes) X 1 N(2, 1.5 2 ) and the starting time of the next class X 2 N(10, 1 2 ). Suppose also that the time to get from one location to next location X 3 N(6, 1 2 ). What is the probability that a student makes it to the second class before the lecture starts. 14

Example 9 Three different roads feed into a particular freeway entrance. Suppose that during a fixed time period, the number of cars coming from each road onto the freeway X i is normally distributed, with X 1 N(750, 16 2 ), X 2 N(1000, 24 2 ) and X 3 N(550, 18 2 ). (a). What is the expected total number of cars entering the freeway at this point during the period? (b). Suppose X 1, X 2 and X 3 are independent. Find the probability P (X 1 + X 2 + X 3 > 2500). (c). Now suppose that the three streams of traffic are not independent, and Cov(X 1, X 2 ) = 80, Cov(X 1, X 3 ) = 90 and Cov(X 2, X 3 ) = 100. Compute the expected value and variance of the total number of entering cars. 15