Large n normal approximations (Central Limit Theorem). xbar ~ N[mu, sigma 2 / n] (sketch a normal with mean mu and sd = sigma / root(n)).

Similar documents
SAMPLING II BIOS 662. Michael G. Hudgens, Ph.D. mhudgens :37. BIOS Sampling II

Sections 7.1 and 7.2. This chapter presents the beginning of inferential statistics. The two major applications of inferential statistics

FINAL EXAM STAT 5201 Spring 2013

Prepared by: DR. ROZIAH MOHD RASDI Faculty of Educational Studies Universiti Putra Malaysia

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Inference for the Regression Coefficient

What Is a Sampling Distribution? DISTINGUISH between a parameter and a statistic

Sampling Methods and the Central Limit Theorem GOALS. Why Sample the Population? 9/25/17. Dr. Richard Jerz

T-Test QUESTION T-TEST GROUPS = sex(1 2) /MISSING = ANALYSIS /VARIABLES = quiz1 quiz2 quiz3 quiz4 quiz5 final total /CRITERIA = CI(.95).

10.1 Estimating with Confidence

EC969: Introduction to Survey Methodology

Statistics for Managers Using Microsoft Excel 7 th Edition

Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing

Chapter 9. Inferences from Two Samples. Objective. Notation. Section 9.2. Definition. Notation. q = 1 p. Inferences About Two Proportions

Part I. Sampling design. Overview. INFOWO Lecture M6: Sampling design and Experiments. Outline. Sampling design Experiments.

Chapter 6 Estimation and Sample Sizes

Formalizing the Concepts: Simple Random Sampling. Juan Muñoz Kristen Himelein March 2012

Topic 16 Interval Estimation

Day 8: Sampling. Daniel J. Mallinson. School of Public Affairs Penn State Harrisburg PADM-HADM 503

Statistics and Quantitative Analysis U4320. Segment 5: Sampling and inference Prof. Sharyn O Halloran

Ch. 7: Estimates and Sample Sizes

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

Formalizing the Concepts: Simple Random Sampling. Juan Muñoz Kristen Himelein March 2013

STA Module 10 Comparing Two Proportions

ECON Introductory Econometrics. Lecture 2: Review of Statistics

Statistics for Business and Economics

ECON 214 Elements of Statistics for Economists

Comparing Means from Two-Sample

Lecturer: Dr. Adote Anum, Dept. of Psychology Contact Information:

Survey on Population Mean

New Developments in Econometrics Lecture 9: Stratified Sampling

Evaluation. Andrea Passerini Machine Learning. Evaluation

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

Lecture 11. Data Description Estimation

3. When a researcher wants to identify particular types of cases for in-depth investigation; purpose less to generalize to larger population than to g

Module 16. Sampling and Sampling Distributions: Random Sampling, Non Random Sampling

Understanding Inference: Confidence Intervals I. Questions about the Assignment. The Big Picture. Statistic vs. Parameter. Statistic vs.

POPULATION AND SAMPLE

Chapter 3. Comparing two populations

Single Sample Means. SOCY601 Alan Neustadtl

Steps in Regression Analysis

Soc 3811 Basic Social Statistics Second Midterm Exam Spring Your Name [50 points]: ID #: ANSWERS

What if we want to estimate the mean of w from an SS sample? Let non-overlapping, exhaustive groups, W g : g 1,...G. Random

ECON1310 Quantitative Economic and Business Analysis A

Taking into account sampling design in DAD. Population SAMPLING DESIGN AND DAD

Descriptive statistics: Sample mean and variance

Stochastic calculus for summable processes 1

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data

Business Statistics:

STATISTICAL INFERENCE PART II CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

Confidence Intervals. - simply, an interval for which we have a certain confidence.

Statistical Thinking and Data Analysis Computer Exercises 2 Due November 10, 2011

Stat 100a: Introduction to Probability. Outline for the day:

BIOSTATISTICS. Lecture 4 Sampling and Sampling Distribution. dr. Petr Nazarov

The variable θ is called the parameter of the model, and the set Ω is called the parameter space.

10.1 The Sampling Distribution of the Difference Between Two FallSample Term 2009 Means for1 Indepe / 6

Business Statistics:

Recall that the standard deviation σ of a numerical data set is given by

Introduction to Survey Data Analysis

Sampling Concepts. IUFRO-SPDC Snowbird, UT September 29 Oct 3, 2014 Drs. Rolfe Leary and John A. Kershaw, Jr.

SAMPLE SIZE ESTIMATION FOR MONITORING AND EVALUATION: LECTURE NOTES

Statistics 135: Fall 2004 Final Exam

Hypothesis Testing Problem. TMS-062: Lecture 5 Hypotheses Testing. Alternative Hypotheses. Test Statistic

EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY

Math 1040 Final Exam Form A Introduction to Statistics Fall Semester 2010

Econ 325: Introduction to Empirical Economics

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Interpret Standard Deviation. Outlier Rule. Describe the Distribution OR Compare the Distributions. Linear Transformations SOCS. Interpret a z score

Q.1 Define Population Ans In statistical investigation the interest usually lies in the assessment of the general magnitude and the study of

AP Statistics Cumulative AP Exam Study Guide

Performance Evaluation and Comparison

Chapter 9: Inferences from Two Samples. Section Title Pages

Mark Scheme (Results) Summer Pearson Edexcel GCE Further Mathematics. Statistics S3 (6691)

Point Estimation and Confidence Interval

Examine characteristics of a sample and make inferences about the population

Chapter 23: Inferences About Means

3/9/2015. Overview. Introduction - sampling. Introduction - sampling. Before Sampling

Descriptive Statistics: cal. Is it reasonable to use a t test to test hypotheses about the mean? Hypotheses: Test Statistic: P value:

Chapter 3: Element sampling design: Part 1

Problem Set 4 - Solutions

H1 Maths Preliminary Exam Solutions. Section A: Pure Mathematics [35 marks]

SydU STAT3014 (2015) Second semester Dr. J. Chan 18

Ch. 7 Statistical Intervals Based on a Single Sample

UNIVERSITY OF TORONTO MISSISSAUGA. SOC222 Measuring Society In-Class Test. November 11, 2011 Duration 11:15a.m. 13 :00p.m.

Sampling. Sampling. Sampling. Sampling. Population. Sample. Sampling unit

Midterm 1 and 2 results

2011 Pearson Education, Inc

Review of the Normal Distribution

Inference for Single Proportions and Means T.Scofield

CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH. Awanis Ku Ishak, PhD SBM

CBA4 is live in practice mode this week exam mode from Saturday!

Review of probability and statistics 1 / 31

Topic 6 - Confidence intervals based on a single sample

Lesson 6 Population & Sampling

Math 2000 Practice Final Exam: Homework problems to review. Problem numbers

Online supplement. Absolute Value of Lung Function (FEV 1 or FVC) Explains the Sex Difference in. Breathlessness in the General Population

On Assumptions. On Assumptions

Pointwise Exact Bootstrap Distributions of Cost Curves

Inference in Regression Model

IB Math Standard Level 2-Variable Statistics Practice SL 2-Variable Statistics Practice from Math Studies

Transcription:

1. 5-1, 5-2, 5-3 Large n normal approximations (Central Limit Theorem). xbar ~ N[mu, sigma 2 / n] (sketch a normal with mean mu and sd = sigma / root(n)). phat ~ N[p, pq / n] (sketch a normal with mean p and sd = root(pq) / root(n)). Use FPC = root((n-n) / (N-1)) as appropriate. ME xbar +/- 1.96 s / root(n) phat +/- 1.96 root(phat qhat) / root(n) 2. 6-3, 6-4, 6-6 Large n confidence intervals xbar +/- z s / root(n) phat +/- z root(phat qhat) / root(n) Use FPC = root((n-n) / (N-1)) as appropriate. Claim: P(mu in z-ci) ~ P(Z in [-z, z]) n large P(p in z-ci) ~ P(Z in [-z, z]) n large 3. Any n > 1 confidence intervals if population is normal. xbar +/- t s / root(n)

Example problem. A population distribution is known to be in control, that is approximately normal. A sample of 12 finds xbar = 16.3 with s = 7.1. Give a 95% confidence interval for population mean mu. ans. We need the t-score (replaces z-score) for degrees of freedom n 1 = 12 1 = 11). Consult the t-table, inside front cover to the right, for degrees of freedom = n 1). Here is how it looks, Critical Values of the t Distribution Degrees of Freedom 11 2.201 infinity 1.96 C.I. 95% So the desired t-score for 11 degrees of freedom and confidence level 0.95 is t = 2.201. The 95% CI is therefore xbar +/- t s / root(n) 16.3 +/- 2.201 7.1 / root(12) Claim: For sample of any size n > 1 from a normal population : P(mu in xbar +/- t s / root(n)) = 0.95 (not just ~ 0.95). 4. Supplement to the readings: Stratified sampling produces a better estimate for mu (general x scores). Example question. A population of accounts is divided into two types, those having good credit and those not. We are interested in estimating the population average balance mu of all accounts. However, we will need to audit each account we include in our

sample and this is costly so we need to do things as efficiently as possible on a per-sample basis. Suppose it is known that 22% of accounts do not have good credit We decide to invest in 100 samples. A statistically trained employee suggests that we draw a random sample of 22 accounts from those that do not have good credit and 78 from those that do. This is done, from which we find xbar from 22 = 147.21 xbar from 78 = 194.49 s from 22 = 75.38 s from 78 = 110.33 What is the stratified estimator of the overall population mean mu based upon this stratified sample and what is a 95% confidence interval for mu based on this stratified estimator? ans. The stratified estimate of mu is xbar STRAT = 0.22 147.21 + 0.78 194.49 and the margin of error is +/- 1.96 root(0.22 2 75.38 2 / 22 + 0.78 2 110.33 2 / 78) The sample of 22 would be regarded as rather too small for application of this method based on large n z-approximations. Defining the stratified estimator. a. Divide the population into two or more disjoint subpopulations (strata). b. From each of stratum i select a with-replacement sample of size n i. The samples are to be independent between strata also. c. Form the stratified estimate of the population mean mu xbar strat = sum, over all strata i, of (W i xbar i ) where W i = N i / N is the relative size of stratum i in the population xbar i is the mean of stratum i sample scores. Using E and Var rules to evaluate the expectation and variance of the stratified estimator.

E xbar STRAT = sum of W i E xbar i (linearity of E) = sum of W i mu i (each xbar i is unbiased) = mu (mu is the wtd avg of subpopulation means) That is, the stratified estimator xbar strat is unbiased as an estimator of the population mean mu. Var xbar STRAT = sum of Var (W i xbar i ) (xbar i are indep) = sum of (W i 2 Var xbar i ) = sum of (W i 2 sigma i 2 / n i ) ME This leads to the following definition of ME for the stratified estimator of mu ME for xbar STRAT = +/- 1.96 root(sum of (W i 2 s i 2 / n i )) where s i denotes the sample standard deviation obtained from the sample of stratum i. Important fact: In the special case n i =W i n (called proportionally stratified sampling and used in the example problem above) the theoretical sd of xbar STRAT (obtained just above) is never larger than the sd of xbar. In practice it is usually the case that the estimator xbar (without stratification) and its competitor xbar STRAT (with stratification) are each approximately normally distributed. This makes it easy to compare their performances. Since they have the same expectation mu (each is unbiased for the population mean mu), and the proportionally stratified estimator xbar STRAT has the smaller sd, it is the better estimator. To see it just think of comparing two normal curves, both having the same mean mu, but one having a smaller sd.

5. Supplement to the readings: Outline of a ME applicable to stratified sampling for p (0-1 scores). Example question. A population of accounts is divided into two types, those having good credit and those not. We are interested in estimating the population fraction p of all accounts that will respond to an offer we plan to make them. However, it is costly to make this offer properly so we need to do things as efficiently as possible on a per-sample basis. Suppose it is known that 30% of accounts are held by women We decide to invest in 50 samples. A statistically trained employee suggests that we draw a random sample of 50 0.3 = 15 accounts from those held by women and 35 from those held by men. This is done, from which we find phat from 15 = 6/15 phat from 35 = 20/35 What is the stratified estimator of the overall population proportion p based upon this stratified sample and what is a 95% confidence interval for p based on this stratified estimator? ans. The stratified estimate of p is phat STRAT = 0.3 6/15 + 0.7 20/35 and the margin of error is +/- 1.96 root(0.3 2 (6/15 9/15) / 15 + 0.7 2 (20/35 15/35) / 35) The sample size of 15 would be regarded as rather too small and insufficient to support the use of this large n method. Defining the stratified estimator. a. Divide the population into two or more disjoint subpopulations (strata). b. From each of stratum i select a with-replacement sample of size n i. The samples are to be independent between strata also.

c. Form the stratified estimate of the population proportion p having some particular characteristic of interest phat strat = sum, over all strata i, of (W i phat i ) where W i = N i / N is the relative size of stratum i in the population phat i is the stratum i proportion for sample scores. Then E phat strat = sum of W i E phat i (linearity of E) = sum of W i p i (each phat i is unbiased for p i ) = p (p is the wtd avg of subpopulation proportions) That is, the stratified estimator phat strat is also unbiased as an estimator of the population proportion p. Var phat strat = sum of Var (W i phat i ) (phat i are indep) = sum of (W i 2 Var phat i ) = sum of (W i 2 p i q i / n i ) ME This leads to the following definition of ME for the stratified estimator of mu ME for phat STRAT = +/- 1.96 root(sum of (W i 2 phat i qhat i / n i )) where phat i qhat i is the estimated standard deviation of i-th stratum (0-1 scores), obtained from the sample of stratum i. Important fact: In the special case n i =W i n (called proportionally stratified sampling and used in the example problem above) the theoretical sd of phat STRAT (obtained just above) is never larger than the sd of phat. In practice it is usually the case that the estimator phat (without stratification) and its competitor phat STRAT (with stratification)

are each approximately normally distributed. This makes it easy to compare their performances. Since they have the same expectation p, and the proportionally stratified estimator phat strat has the smaller sd, we prefer to use the stratified estimator. To see it just think of comparing two normal curves, both having the same mean mu, but one having a smaller sd.