Section 8.1: Interval Estimation

Similar documents
2.3 Estimating PDFs and PDF Parameters

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

STATISTICS SYLLABUS UNIT I

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 5: May 9, Abstract

STAT 830 Non-parametric Inference Basics

Section 7.3: Continuous RV Applications

2x + 5 = x = x = 4

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Part 4: Multi-parameter and normal models

2 2 + x =

Estimators as Random Variables

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the

Practice Problems Section Problems

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

BIOE 198MI Biomedical Data Analysis. Spring Semester Lab 5: Introduction to Statistics

Simulation. Where real stuff starts

BEST TESTS. Abstract. We will discuss the Neymann-Pearson theorem and certain best test where the power function is optimized.

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Z score indicates how far a raw score deviates from the sample mean in SD units. score Mean % Lower Bound

Statistics: Learning models from data

Bias-Variance Tradeoff

Bootstrap, Jackknife and other resampling methods

GMA Review Packet Answer Key. Unit Conversions 1) 2 NY15(2) 2) 2.56 TX14(34) Linear Equations and Inequalities 1) 1 NY15(7) 2) 3 NY15(13)

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

CPSC 531: System Modeling and Simulation. Carey Williamson Department of Computer Science University of Calgary Fall 2017

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

A Review of Basic Monte Carlo Methods

Statistical Intervals (One sample) (Chs )

COMPSCI 240: Reasoning Under Uncertainty

Recall the Basics of Hypothesis Testing

Hypothesis Testing - Frequentist

4.1 Hypothesis Testing

The Nonparametric Bootstrap

Sampling Distributions of Statistics Corresponds to Chapter 5 of Tamhane and Dunlop

Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1

Confidence intervals for kernel density estimation

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I

Modern Methods of Data Analysis - WS 07/08

Sampling Concepts. IUFRO-SPDC Snowbird, UT September 29 Oct 3, 2014 Drs. Rolfe Leary and John A. Kershaw, Jr.

Economics 583: Econometric Theory I A Primer on Asymptotics

ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS

Parametric Techniques Lecture 3

Statistics 3858 : Maximum Likelihood Estimators

Lecture 11. Data Description Estimation

ST 371 (IX): Theories of Sampling Distributions

Overall Plan of Simulation and Modeling I. Chapters

Fundamental Probability and Statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Why is the field of statistics still an active one?

MIT Spring 2015

Week 1 Factor & Remainder Past Paper Questions - SOLUTIONS

Parametric Techniques

MS&E 226: Small Data

The stochastic modeling

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

STATISTICS 4, S4 (4769) A2

Confidence Intervals. Confidence interval for sample mean. Confidence interval for sample mean. Confidence interval for sample mean

Inferring from data. Theory of estimators

Simulation. Where real stuff starts

PDF estimation by use of characteristic functions and the FFT

MthSc 103 Test 3 Spring 2009 Version A UC , 3.1, 3.2. Student s Printed Name:

6-3 Solving Systems by Elimination

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

Statistical Analysis of Chemical Data Chapter 4

Objective. The student will be able to: solve systems of equations using elimination with multiplication. SOL: A.9

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

(b) M1 for a line of best fit drawn between (9,130) and (9, 140) and between (13,100) and (13,110) inclusive

Spring 2012 Math 541B Exam 1

A Simulation Comparison Study for Estimating the Process Capability Index C pm with Asymmetric Tolerances

A. Incorrect! Replacing is not a method for solving systems of equations.

Methods of evaluating estimators and best unbiased estimators Hamid R. Rabiee

Chapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

The Central Limit Theorem

SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA

Monte Carlo Methods for Stochastic Programming

Chapter 9. Non-Parametric Density Function Estimation

Statistical inference

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Systems of Linear Equations

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin

Statistical Data Analysis Stat 3: p-values, parameter estimation

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Investigation of Possible Biases in Tau Neutrino Mass Limits

Class 15. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

CLEP Precalculus - Problem Drill 15: Systems of Equations and Inequalities

Chapter 11 Estimation of Absolute Performance

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

The exact bootstrap method shown on the example of the mean and variance estimation

Two Different Shrinkage Estimator Classes for the Shape Parameter of Classical Pareto Distribution

Problem Selected Scores

Transcription:

Section 8.1: Interval Estimation Discrete-Event Simulation: A First Course c 2006 Pearson Ed., Inc. 0-13-142917-5 Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 1/ 35

Section 8.1: Interval Estimation Theorem (Central Limit Theorem) If X 1, X 2,..., X n is an iid sequence of RVs with common mean µ common standard deviation σ and if X is the (sample) mean of these RVs X = 1 n n i=1 X i then X approaches a Normal(µ,σ/ n) RV as n Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 2/ 35

Sample Mean Distribution Choose one of the random variate generators in rvgs to generate a sequence of random variable samples with fixed sample size n > 1 With the n-point samples indexed j = 1,2,..., the corresponding sample mean X j and sample standard deviation s j can be calculated using Algorithm 4.1.1 x 1,x 2,...,x }{{ n,x } n+1,x n+2,...,x 2n,x }{{} 2n+1,x 2n+2,...,x 3n,x }{{} 3n+1,... x 1,s 1 x 2,s 2 x 3,s 3 A continuous-data histogram can be created using program cdh x 1, x 2, x 3,..... cdh...... histogram mean histogram standard deviation histogram density Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 3/ 35

Properties of Sample Mean Histogram Independent of n, the histogram mean is approximately µ the histogram standard deviation is approximately σ/ n If n is sufficiently large, the histogram density approximates the Normal(µ, σ/ n) pdf Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 4/ 35

Example 8.1.2: 10000 n-point Exponential(µ) samples n = 9 x = 1 n pdf n i=1 x i 0.0 0.0 µ.. x 4σ/ n pdf n = 36 x = 1 n n x i i=1 0.0 0.0 µ.. x 4σ/ n Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 5/ 35

Example 8.1.2 The histogram mean and standard deviation are approximately µ and σ/ n The histogram density corresponding to the 36-point sample means is closely matched by the pdf of a Normal(µ,σ/ n) RV For Exponential(µ) samples, n = 36 is large enough for the sample mean to be approximately Normal(µ, σ/ n) The histogram density corresponding to the 9-point sample means matches relatively well, but with a skew to the left n = 9 is not large enough Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 6/ 35

More on Example 8.1.2 Essentially all of the sample means are within an interval of width of 4σ/ n centered about µ Because σ/ n 0 as n, if n is large, all the sample means will be close to µ In general: The accuracy of the Normal(µ, σ/ n) pdf approximation is dependent on the shape of a fixed population pdf If the samples are drawn from a population with a highly asymmetric pdf (like the Exponential(µ) pdf): n may need to be as large as 30 or more for good fit a pdf symmetric about the mean (like the Uniform(a,b) pdf): n as small as 10 or less may produce a good fit Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 7/ 35

Standardized Sample Mean Distribution We can standardize the sample means x 1, x 2, x 3,... by subtracting µ and dividing the result by σ/ n to form the standardized sample means z 1,z 2,z 3,... defined by z j = x j µ σ/ n j = 1,2,3,... Generate a continuous-data histogram for the standardized sample means by program cdh z 1, z 2, z 3,..... cdh...... histogram mean histogram standard deviation histogram density Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 8/ 35

Properties of Standardized Sample Mean Histogram Independent of n, the histogram mean is approximately 0 the histogram standard deviation is approximately 1 If n is sufficiently large, the histogram density approximates the Normal(0, 1) pdf Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 9/ 35

Example 8.1.4 The sample means from Example 8.1.2 were standardized 0.4 pdf 0.3 0, 2 n = 9 z = x µ σ/ n 0.1 0.0 2.0 0.0 2.0 4.0.. z 0.4 pdf 0.3 0.2 n = 36 z = x µ σ/ n 0.1 0.0 2.0 0.0 2.0 4.0.. z Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 10/ 35

Properties of the Histogram in Example 8.1.4 The histogram mean and standard deviation are approximately 0.0 and 1.0 respectively The histogram density corresponding to the 36-point sample means matches the pdf of a Normal(0,1) random variable almost exactly The histogram density corresponding to the 9-point sample means matches the pdf of a Normal(0,1) random variable, but not as well Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 11/ 35

t-statistic Distribution Want to replace population standard deviation σ with sample standard deviation s j in standardization equation z j = x j µ σ/ n j = 1,2,3,... Definition 8.1.1 Each sample mean x j is a point estimate of µ Each sample variance s 2 j is a point estimate of σ 2 Each sample standard deviation s j is a point estimate of σ Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 12/ 35

Removing Bias The sample mean is an unbiased point estimate of µ The mean of x 1, x 2, x 3... will converge to µ The sample variance is a biased point estimate of σ 2 The mean of s 2 1, s2 2, s3 3,... will converge to (n 1)σ2 /n, not σ 2 To remove this (n 1)/n bias, it is conventional to multiply the sample variance by a bias correction n/(n 1) The point estimate of σ/ n is ( ) n n 1 s j = n s j n 1 Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 13/ 35

Example 8.1.5 Calculate the t-statistic t j = x j µ s j / n 1 j = 1,2,3,... Generate a continuous-data histogram using cdh t 1, t 2, t 3,.... cdh...... histogram mean histogram standard deviation histogram density Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 14/ 35

Properties of t-statistic Histogram If n > 2, the histogram mean is approximately 0 If n > 3, the histogram standard deviation is approximately (n 1)/(n 3) If n is sufficiently large, the histogram density approximates the pdf of a Student(n 1) random variable Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 15/ 35

Example 8.1.6 Generate t-statistics from Example 8.1.2 pdf 0.4 0.3 0.2 0.1 n = 9 t = x µ s/ n 1 0.0 3.0 2.0 1.0 0.0 1.0 2.0 3.0 4.0.. t pdf 0.4 0.3 0.2 0.1 n = 36 t = x µ s/ n 1 0.0 3.0 2.0 1.0 0.0 1.0 2.0 3.0 4.0.. t Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 16/ 35

Properties of the Histogram in Example 8.1.6 The histogram mean and standard deviation are approximately 0.0 and (n 1)/(n 3) 1.0 respectively The histogram density corresponding to the 36-point sample means matches the pdf of a Student(35) RV relatively well The histogram density corresponding to the 9-point sample means matches the pdf of a Student(8) RV, but not as well Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 17/ 35

Interval Estimation Theorem (8.1.2) If x 1,x 2,...,x n is an (independent) random sample from a source of data with unknown mean µ, if x and s are the mean and standard deviation of this sample, and if n is large, it is approximately true that t = x µ s/ n 1 is a Student(n 1) random variate Theorem 8.1.2 provides the justification for estimating an interval that is likely to contain the mean µ As n, the Student(n 1) distribution becomes indistinguishable from Normal(0, 1) Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 18/ 35

Interval Estimation (2) Suppose T is a Student(n 1) random variable α is a confidence parameter with 0.0 < α < 1.0 Then there exists a corresponding positive real number t Pr( t T t ) = 1 α area = α/2. area = 1 α. Student(n 1) pdf area = α/2.. t t 0 t Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 19/ 35

Interval Estimation (3) Suppose µ is unknown. Since t Student(n 1), t x µ s/ n 1 t will be approximately true with probability 1 α x µ Right inequality: s/ n 1 t x t s µ n 1 Left inequality: t x µ s/ µ x + t s n 1 n 1 So, with probability 1 α (approximately), x t s µ x + t s n 1 n 1 Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 20/ 35

Theorem 8.1.3 Theorem (8.1.3) If x 1,x 2,...,x n is an independent random sample from a source of data with unknown mean µ x and s are the sample mean and sample standard deviation n is large Then, given a confidence parameter α with 0.0 < α < 1.0, there exists an associated positive real number t such that ( ) Pr x t s µ x + t s = 1 α n 1 n 1 Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 21/ 35

Example 8.1.7 If α = 0.05, we are 95% confident that µ lies somewhere between x t s n 1 and x + t s n 1 For a fixed sample size n and level of confidence 1 α, use rvms to determine t = idfstudent(n 1,1 α/2) For example, if n = 30 and α = 0.05, then t = idfstudent(29,0.975) 2.045 Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 22/ 35

Definition 8.1.2 The interval defined by the two endpoints x ± t s n 1 is a (1 α) 100% confidence interval estimate for µ (1 α) is the level of confidence associated with this interval estimate and t is the critical value of t Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 23/ 35

Algorithm 8.1.1 Algorithm 8.1.1 To calculate an interval estimate for the unknown mean µ of the population from which a random sample x 1, x 2, x 3,...,x n was drawn: Pick a level of confidence 1 α (typically α = 0.05) Calculate the sample mean x and standard deviation s (use Algorithm 4.1.1) Calculate the critical value t = idfstudent(n 1, 1 α/2) Calculate the interval endpoints x ± t s n 1 If n is sufficiently large, then you are (1 α) 100% confident that the mean µ lies within the interval. The midpoint of the interval is x. Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 24/ 35

Example 8.1.8 The random sample of size n = 10: 1.051 6.438 2.646 0.805 1.505 0.546 2.281 2.822 0.414 1.307 is drawn from a population with unknown mean µ x = 1.982 and s = 1.690 To calculate a 90% confidence interval estimate: Determine t = idfstudent(9, 0.95) 1.833 Interval: 1.982 ± (1.833)(1.690/ 9) = 1.982 ± 1.032 We are approximately 90% confident that µ is between 0.950 and 3.014 Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 25/ 35

Example 8.1.8, ctd. To calculate a 95% confidence interval estimate: Determine t = idfstudent(9, 0.975) 2.262 Interval: 1.982 ± (2.262)(1.690/ 9) = 1.982 ± 1.274 We are approximately 95% confident that µ is between 0.708 and 3.256 To calculate a 99% confidence interval estimate: Determine t = idfstudent(9, 0.995) 3.250 Interval: 1.982 ± (3.250)(1.690/ 9) = 1.982 ± 1.832 We are approximately 99% confident that µ is between 0.150 and 3.814 Note: n = 10 is not large Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 26/ 35

Tradeoff - Confidence Versus Sample Size For a fixed sample size More confidence can be achieved only at the expense of a larger interval A smaller interval can be achieved only at the expense of less confidence The only way to make the interval smaller without lessening the level of confidence is to increase the sample size Good news: with simulation, we can collect more data Bad news: interval size decreases with n, not n Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 27/ 35

How Much More Data Is Enough? How large should n be to achieve an interval estimate x ± w where w is user-specified? Answer: Use Algorithm 4.1.1 with Algorithm 8.1.1 to iteratively collect data until a specified interval width is achieved Note: if n is large then t is essentially independent of n 5.0 4.0 t = idfstudent(n 1, 1 α/2) t 3.0 2.0 2.576 (α = 0.01) 1.960 (α = 0.05) 1.645 (α = 0.10) 1.0 3 5 10 15 20 25 30 35 40 n Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 28/ 35

Asymptotic Value of t The asymptotic (large n) value of t is t = lim idfstudent(n 1, 1 α/2) = idfnormal(0.0, 1.0, 1 α/2) n Unless α is very close to 0.0, if n > 40, the asymptotic value t can be used If n > 40 and wish to construct a 95% confidence interval estimate, t = 1.960 can be used in Algorithm 8.1.1 Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 29/ 35

Example 8.1.9 Given a reasonable guess for s and a user-specified half-width parameter w, if t is used in place of t, n can be determined by solving w = t s for n: n 1 provided n > 40 (t ) n = s 2 + 1 w For example, if s = 3.0 and want to estimate µ with 95% confidence to within ±0.5, a value of n = 139 should be used Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 30/ 35

Example 8.1.10 If a reasonable guess for s is not available, w can be specified as a proportion of s thereby eliminating s from the previous equation For example, if w is 10% of s and 95% confidence is desired, n = 385 should be used to estimate µ to within ±w Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 31/ 35

Program Estimate Program estimate automates the interval estimation process A typical application: estimate the value of an unknown population mean µ by using n replications to generate an independent random variate sample x 1,x 2,...,x n Function Generate() represents a discrete-event or Monte Carlo simulation program that returns a random variate output x Using the Generate Method for (i = 1; i <= n; i++) x i = Generate(); return x 1, x 2,...,x n ; Given a level of confidence 1 α, program estimate can be used with x 1,x 2,...,x n to compute an interval estimate for µ Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 32/ 35

Algorithm 8.1.2 Algorithm 8.1.2 Given an interval half-width w and level of confidence 1 α, the algorithm computes the interval estimate x ± w t = idfnormal(0.0, 1.0, 1-α/2); /* t */ x = Generate(); n = 1; v = 0.0; x = x; while ((n<40) or (t*sqrt(v/n) > w * sqrt(n-1)){ x = Generate(); n++; d = x - x; v = v + d * d * (n - 1) / n; x = x + d / n; } return n, x; It is important to appreciate the need for sample independence in Algorithms 8.1.1 and 8.1.2 Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 33/ 35

The meaning of confidence Incorrect: For this 95% confidence interval, the probability that µ is within this interval is 0.95 Why incorrect? µ is not a random variable; it is constant (but unknown) The interval endpoints are random Correct: If I create many 95% confidence intervals, approximately 95% of them should contain µ Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 34/ 35

Example 8.1.11 100 samples of size n = 9 drawn from Normal(6, 3) population For each sample, construct a 95% confidence interval 95 intervals contain µ = 6 Three intervals too low, two intervals too high 12 10 8 6 4 2 0 12 10 8 6 4 2 0 Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 35/ 35