Lecture 8 Sampling Theory

Similar documents
The Central Limit Theorem

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

CSE 103 Homework 8: Solutions November 30, var(x) = np(1 p) = P r( X ) 0.95 P r( X ) 0.

Last few slides from last time

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

ST 371 (IX): Theories of Sampling Distributions

The Central Limit Theorem

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20

ACMS Statistics for Life Sciences. Chapter 13: Sampling Distributions

Chapter 18 Sampling Distribution Models

18.440: Lecture 19 Normal random variables

Continuous Expectation and Variance, the Law of Large Numbers, and the Central Limit Theorem Spring 2014

STA Why Sampling? Module 6 The Sampling Distributions. Module Objectives

Are data normally normally distributed?

Math/Stat 352 Lecture 10. Section 4.11 The Central Limit Theorem

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc.

MAS113 Introduction to Probability and Statistics

Sections 5.1 and 5.2

Example. If 4 tickets are drawn with replacement from ,

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Lecture 1: August 28

X = X X n, + X 2

Proving the central limit theorem

Math 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential.

CS 361: Probability & Statistics

1 MA421 Introduction. Ashis Gangopadhyay. Department of Mathematics and Statistics. Boston University. c Ashis Gangopadhyay

Supporting Australian Mathematics Project. A guide for teachers Years 11 and 12. Probability and statistics: Module 25. Inference for means

Ch. 5 Joint Probability Distributions and Random Samples

Homework for 1/13 Due 1/22

6 The normal distribution, the central limit theorem and random samples

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

Epidemiology Principle of Biostatistics Chapter 11 - Inference about probability in a single population. John Koval

3 Multiple Discrete Random Variables

Topic 7: Convergence of Random Variables

Statistics and Sampling distributions

7 Random samples and sampling distributions

Sampling Distribution Models. Chapter 17

Stat 101: Lecture 12. Summer 2006

Conditional distributions (discrete case)

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Sampling Distribution Models. Central Limit Theorem

Business Statistics: A Decision-Making Approach 6 th Edition. Chapter Goals

Statistics, continued

System Identification

Math489/889 Stochastic Processes and Advanced Mathematical Finance Solutions for Homework 7

The central limit theorem

Lecture 22/Chapter 19 Part 4. Statistical Inference Ch. 19 Diversity of Sample Proportions

COMPSCI 240: Reasoning Under Uncertainty

success and failure independent from one trial to the next?

CHAPTER 7. Parameters are numerical descriptive measures for populations.

Topic 3 Random variables, expectation, and variance, II

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

Lecture 1: Probability Fundamentals

CS 361: Probability & Statistics

Chapter 4. Chapter 4 sections

Random variables, Expectation, Mean and Variance. Slides are adapted from STAT414 course at PennState

Chapter 8. Some Approximations to Probability Distributions: Limit Theorems

18.175: Lecture 8 Weak laws and moment-generating/characteristic functions

Econ 325: Introduction to Empirical Economics

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

Notes 12 Autumn 2005

Probability and Statistics

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

Statistics and Data Analysis in Geology

Lecture 10. Variance and standard deviation

Probability. We will now begin to explore issues of uncertainty and randomness and how they affect our view of nature.

Sampling: A Brief Review. Workshop on Respondent-driven Sampling Analyst Software

CMPT 882 Machine Learning

Chapter 5. Means and Variances

STA 291 Lecture 16. Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately) normal

Analysis of Engineering and Scientific Data. Semester

Lecture 7: Confidence interval and Normal approximation

Advanced Herd Management Probabilities and distributions

Chapter 18. Sampling Distribution Models /51

Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

(It's not always good, but we can always make it.) (4) Convert the normal distribution N to the standard normal distribution Z. Specically.

MA 1125 Lecture 33 - The Sign Test. Monday, December 4, Objectives: Introduce an example of a non-parametric test.

Introduction to Statistical Data Analysis Lecture 4: Sampling

Counting principles, including permutations and combinations.

SCHOOL OF MATHEMATICS AND STATISTICS

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Lecture 16. Lectures 1-15 Review

LECTURE 1. Introduction to Econometrics

Lecture 4: Random Variables and Distributions

Overview. Confidence Intervals Sampling and Opinion Polls Error Correcting Codes Number of Pet Unicorns in Ireland

4/19/2009. Probability Distributions. Inference. Example 1. Example 2. Parameter versus statistic. Normal Probability Distribution N

8 Laws of large numbers

CENTRAL LIMIT THEOREM (CLT)

Sampling Distributions

One-sample categorical data: approximate inference

ELEG 3143 Probability & Stochastic Process Ch. 2 Discrete Random Variables

MATH 3510: PROBABILITY AND STATS June 15, 2011 MIDTERM EXAM

More on Distribution Function

STA 4321/5325 Solution to Extra Homework 1 February 8, 2017

Stat 139 Homework 2 Solutions, Spring 2015

Introduction and Overview STAT 421, SP Course Instructor

Transcription:

Lecture 8 Sampling Theory Thais Paiva STA 111 - Summer 2013 Term II July 11, 2013 1 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Lecture Plan 1 Sampling Distributions 2 Law of Large Numbers 3 Central Limit Theorem 2 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Statistical Inference We want to study some quantities of interest (parameter) in a large population. Example: Obama s approval rating. But we cannot observe the whole population. What do we do? Design a study to sample individuals from the population. Example: eligible voters Study the quantity of interest on your sample Infer (conclude) about the unknown parameter. Example: 1 Determine a range that will include the parameter of interest: 0.45 < approval rating < 0.55 2 Test a hypothesis: is the approval rating > 0.5? 3 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Statistical Inference A statistic refers to a characteristic of the sample (e.g., sample mean, sample deviation, sample maximum) A parameter refers to a characteristic of the population (e.g., population mean, population standard deviation, population proportion that votes for republicans) Our goal is to use statistics to infer the parameter in the population (e.g., what is the relation between the sample mean and the population mean?) The sampling distribution is the bridge! 4 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Example Population: STA 111 Heights Distribution of students height Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 50 55 60 65 70 75 80 Height (in) Let s assume this is the true population with parameter µ = 68.4 σ 2 = 18.6 We wish to take a sample to estimate µ and σ 2. 5 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Samples size = 4 Let s say we take a sample of size 4 and repeat it 5 times. For each sample, we calculate the sample mean x and the variance s 2. Sample # x 1 x 2 x 3 x 4 x s 2 1 63 72 73 71 69.80 20.90 2 72 70 73 73 72.00 2.00 3 70 71 63 60 66.00 28.70 4 62 76 74 72 71.00 38.70 5 73 74 71 75 73.20 2.90 We see that the x s are pretty close around µ = 68.4. There is quite some variability in s 2 across samples. 6 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Sampling Distribution (n = 4) What if I carry on and repeat it 1000 times? Frequency 0 50 100 150 Some x s are quite extreme! But most of them seem to hover around the population mean (red vertical line). 60 65 70 75 Sample Mean (n=4) 7 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Sampling Distribution What if we change the sample size? Frequency 0 50 150 60 65 70 75 Sample Mean (n=4) Frequency 0 100 200 300 60 65 70 75 Sample Mean (n=15) Frequency 0 20 40 60 80 60 65 70 75 Sample Mean (n=50) Frequency 0 5 10 20 60 65 70 75 Sample Mean (n=100) 8 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Sampling Distribution The previous histograms are examples of Sampling Distributions Distributions of a statistic calculated from a random sample Each individual in the population is equally likely to be chosen every time we draw an observation A statistic is random because each sample is different: if the data have not been recorded yet, the statistic is simply a function of same random elements Viewing a statistic as a random variable, we can define its mean and variance. For example, E( X ) = µ X V ( X ) = σ 2 X (Tricky notation: Population mean µ of a statistic X!!) 9 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Estimator We saw that The sampling distribution of x is centered around µ The variability of x becomes smaller with larger sample size If we use x to infer about µ, we call x an estimator of µ. There are many other potential estimators for µ. For example, if the underlying population is Normal, we can use the sample median. In the next lecture, we will discuss ways to evaluate and compare estimators. 10 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Combination of Random Variables There are two important properties of random variables that are useful in studying estimators. If we let X and Y be two independent random variables, then E(X + Y ) = E(X ) + E(Y ) Var(X + Y ) = Var(X ) + Var(Y ) We will discuss these properties later in the class. 11 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Mean and Variance of the Sample Mean Let X 1,..., X n be independent and identically distributed random variables. The above assumption says X 1,..., X n are randomly sampled from the same distribution (= random sample). Then ( ) X1 +... + X n E( X ) = E n ( ) Var( X X1 +... + X n ) = Var n = E(X 1) +... + E(X n ) n = Var(X 1) +... + Var(X n ) n 2 = nµ n = µ = nσ2 n 2 = σ2 n 12 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Mean and Variance of the Sample Mean E( X ) = µ says If I repeatedly collect my sample, the overall average of X is µ, the true population mean In reality, we usually only collect the sample once This holds for any sample size! V ( X ) = σ2 n says The variability in X decreases as the sample size increases. Specifically, it goes down by a rate of 1/n The variability also depends on the underlying population s variability! 13 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Mean and Variance of the Sample Mean However, E( X ) = µ by itself does not guarantee that X = µ! Luckily, V ( X ) = σ2 n says that the variance of X decreases toward zero as the sample size increases. So, when the sample is large, the uncertainty goes to zero, and therefore lim Var( X ) = 0 n lim X = µ n 14 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Mean and Variance of the Sample Mean Recall our first example: Frequency 0 50 150 60 65 70 75 Sample Mean (n=4) Frequency 0 100 200 300 60 65 70 75 Sample Mean (n=15) Frequency 0 20 40 60 80 60 65 70 75 Sample Mean (n=50) Frequency 0 5 10 20 60 65 70 75 Sample Mean (n=100) 15 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Law of Large Numbers: Interpretation Suppose you want to estimate µ on a specific population What you can do is to extract a sample from the population and estimate the sample mean x If the sample is big enough, x will be close to µ If you increase the sample size, x should get closer to µ The more you increase the sample size, the closer x to µ 16 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Central Limit Theorem The Law of Large Numbers tells me how X behaves in terms of central tendency and variability. That is useful information, but it does not tell me its actual distribution! The Central Limit Theorem says: when n is large, X is approximately normally distributed ( σ X 2 ) N µ, n Important: the CLT holds regardless of the underlying distribution of X! No matter what the shape of the original distribution is, the sampling distribution of the mean approaches a normal distribution. 17 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Central Limit Theorem Density 0.00 0.05 0.10 0.15 0 5 10 15 X Here is a weird distribution with parameter By CLT, µ = 6.5 σ = 2.9 if n = 10: ( X N if n = 50: ( X N 6.5, 2.92 10 6.5, 2.92 50 ) ) 18 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Central Limit Theorem Sample Mean (n = 10) Sample Mean (n = 50) Density 0.0 0.1 0.2 0.3 0.4 4 6 8 10 Density 0.0 0.2 0.4 0.6 0.8 4 6 8 10 Amazing. 19 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Using CLT: Height Example Assume the distribution of height in our class has a mean of 70 (inches) and a variance of 100 (inches 2 ). In my study, I will obtain measurements of 20 individuals. What is the probability that X will be between 65 and 75? By CLT, X N (µ, ), σ2 where µ = 70 and σ2 n n = 100 20 = 5. ( ) P(65 < X 65 70 75 70 < 75) = P < Z < 5 5 ( = P 5 < Z < ) 5 = 0.974 20 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Using CLT: Height example Note: in the previous example we calculated P(65 < X < 75) = 0.974 NOT ( ) 65 70 75 70 P(65 < X < 75) = P < Z < 100 100 = P ( 0.5 < Z < 0.5) = 0.38 The first one is the sample average; the second one is just one actual height!!! 21 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Using CLT: Sample Size Assume the distribution of height in our class has a mean of 70 (inches) and a variance of 100 (inches 2 ). In designing my study, what sample size should I use so that the probability that my sample average X is between 69 and 71 is equal to 90%? P(69 < X < 71) = P ( ) 69 70 71 70 < Z < 100/n 100/n ( n = P ( P Z > ) n 10 < Z < = 0.90 10 ) ( n = P Z < 10 Because Z is symmetric: n = 1.64 n = 270.5 10 ) n = 0.05 10 22 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Sample Percentage The sample percentage is defined to be the ratio between the number of successes over the number of trials n i=1 P = X i n For example, the batting averages (P) are estimates of the unknown proportion of successful batting in the whole career (π) If we could observe the data for the whole career, then we would know the true value 23 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Sample Percentage n i=1 P = X i n E(P) = E[X 1] +... + E[X n ] n = π +... + π n = π Var(P) = Var[X1]+...+Var[Xn] n 2 Law of Large Numbers: Central Limit Theory: = π(1 π)+...+π(1 π) n 2 lim P = π n [ P N π, ] π(1 π) n = π(1 π) n 24 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013

Sample Percentage Suppose tossing a fair coin 1,000 times. What is the probability of observing heads less than half of the times? Fair coin means that π = 0.5 ( P P < 500 ) ( 500 1000 = P Z < 0.5 ) 1000 0.5(1 0.5)/1000 = P(Z < 0) = 0.5 We could also try to work with Binomial distribution probabilities but n is very large here 25 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013