Comparing Systems Using Sample Data

Similar documents
Summarizing Measured Data

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.

Two-Factor Full Factorial Design with Replications

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Chapter 5: HYPOTHESIS TESTING

1 Statistical inference for a population mean

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

Sociology 6Z03 Review II

Chapter 9: Hypothesis Testing Sections

One Factor Experiments

The t-test Pivots Summary. Pivots and t-tests. Patrick Breheny. October 15. Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/18

CPSC 531: System Modeling and Simulation. Carey Williamson Department of Computer Science University of Calgary Fall 2017

Pivotal Quantities. Mathematics 47: Lecture 16. Dan Sloughter. Furman University. March 30, 2006

Lecture #16 Thursday, October 13, 2016 Textbook: Sections 9.3, 9.4, 10.1, 10.2

Chapter 4 - Lecture 3 The Normal Distribution

Null Hypothesis Significance Testing p-values, significance level, power, t-tests

Summarizing Measured Data

Relating Graph to Matlab

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

This does not cover everything on the final. Look at the posted practice problems for other topics.

Introduction to Probability

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Business Statistics. Lecture 10: Course Review

Lecture on Null Hypothesis Testing & Temporal Correlation

16.400/453J Human Factors Engineering. Design of Experiments II

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Introduction to statistics

Mathematical Statistics

Lecture 3. Biostatistics in Veterinary Science. Feb 2, Jung-Jin Lee Drexel University. Biostatistics in Veterinary Science Lecture 3

Business Statistics. Lecture 5: Confidence Intervals

Two sample Test. Paired Data : Δ = 0. Lecture 3: Comparison of Means. d s d where is the sample average of the differences and is the

Frequency table: Var2 (Spreadsheet1) Count Cumulative Percent Cumulative From To. Percent <x<=

Statistical Inference

Frequentist Statistics and Hypothesis Testing Spring

Exam details. Final Review Session. Things to Review

Statistics for IT Managers

MVE055/MSG Lecture 8

Limiting Distributions

Simple and Multiple Linear Regression

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

Sleep data, two drugs Ch13.xls

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Confidence Intervals for Normal Data Spring 2014

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Chapter 3. Chapter 3 sections

BMIR Lecture Series on Probability and Statistics Fall, 2015 Uniform Distribution

MTMS Mathematical Statistics

Summary: the confidence interval for the mean (σ 2 known) with gaussian assumption

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Chapter 4: Continuous Random Variables and Probability Distributions

Inference for Distributions Inference for the Mean of a Population

Stochastic Convergence, Delta Method & Moment Estimators

Confidence intervals CE 311S

Introduction to bivariate analysis

IV. The Normal Distribution

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Lecture 12: Small Sample Intervals Based on a Normal Population Distribution

STAT 430/510: Lecture 15

Week 1 Quantitative Analysis of Financial Markets Distributions A

Introduction to bivariate analysis

Categorical Data Analysis 1

Inference for Binomial Parameters

Introduction to Statistical Inference

Statistics. Statistics

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

STAT Section 2.1: Basic Inference. Basic Definitions

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 19

Problem 1 (20) Log-normal. f(x) Cauchy

Statistics for Managers Using Microsoft Excel 7 th Edition

0.27 Sample Preparation

Monte Carlo Integration II & Sampling from PDFs

The Binomial distribution. Probability theory 2. Example. The Binomial distribution

II. The Normal Distribution

Two Factor Full Factorial Design with Replications

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Outline. Simulation of a Single-Server Queueing System. EEC 686/785 Modeling & Performance Evaluation of Computer Systems.

Lecture 8 Hypothesis Testing

Summarizing Measured Data

How to Describe Accuracy

Recall the Basics of Hypothesis Testing

Confidence Intervals for Normal Data Spring 2018

Math 494: Mathematical Statistics

CS5314 Randomized Algorithms. Lecture 18: Probabilistic Method (De-randomization, Sample-and-Modify)

Practice Questions: Statistics W1111, Fall Solutions

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

PubH 5450 Biostatistics I Prof. Carlin. Lecture 13

Lecture 4: September Reminder: convergence of sequences

STA1000F Summary. Mitch Myburgh MYBMIT001 May 28, Work Unit 1: Introducing Probability

BIO5312 Biostatistics Lecture 6: Statistical hypothesis testings

BNG 495 Capstone Design. Descriptive Statistics

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Chapter 8 of Devore , H 1 :

Sampling Distributions of Statistics Corresponds to Chapter 5 of Tamhane and Dunlop

Chapter 6: Large Random Samples Sections

Ch. 7: Estimates and Sample Sizes

Confidence Intervals with σ unknown

Data Science for Engineers Department of Computer Science and Engineering Indian Institute of Technology, Madras

Transcription:

Comparing Systems Using Sample Data Dr. John Mellor-Crummey Department of Computer Science Rice University johnmc@cs.rice.edu COMP 528 Lecture 8 10 February 2005

Goals for Today Understand Population and samples Confidence intervals How to compute confidence intervals for sample mean proportions How to test a hypothesis using a test for zero mean How to compare two alternatives How to determine minimum sample size for estimating sample mean with a given accuracy proportion with a given accuracy 2

Sample vs. Population Suppose we generate a set S containing several million random numbers. We will call this set the population. denote population mean with µ denote population std deviation with σ Draw a sample of n numbers {x 1, x 2,, x n } from S denote sample mean with denote the std deviation of the sample with s No guarantee that x = µ s = σ ( x,s) of sample are estimates of the population parameters (µ, σ) Conventions population characteristics: parameters (µ, σ) (Greek alphabet) sample estimates: statistics (,s) (Roman alphabet) x x 3

Confidence Interval for the Mean k samples of a population may yield k different sample means No sample or finite set of samples will necessarily give a perfect estimate for the population mean µ Instead, we use probability bounds for an estimate of µ, the population mean P(c 1 " µ " c 2 ) =1#$ Confidence interval (c 1,c 2 ) Significance level Confidence coefficient Confidence level (a percentage): 100(1- α) 4

Understanding Confidence Intervals Why use them? provide a way to decide if measurements are meaningful characterize potential error in sample mean enable comparisons in the presence of experimental error Understand their limitations! at 95% confidence, confidence intervals for 5% of sample means do not include the population mean µ 5

Computing (c 1,c 2 ) for Population Mean µ The hard way To compute a 90% confidence interval for a population mean µ Take k samples of the population (each sample is a set) Compute the set of sample means (one for each sample) Sort the set of sample means Select the [1 +.05(k-1)] th element as c 1 Select the [1 +.95(k-1)] th element as c 2 90% confidence interval for µ is (c 1, c 2 ) 90% = 100(1- α.05 = α/2.95 = 1- α/2 6

The Central Limit Theorem If observations {x 1, x 2,, x n } are independent from the same population the population has mean µ the population has std deviation σ Then sample mean normally distributed x for large samples is approximately x ~ N(µ," / n) Std error = std deviation of sample mean If population std deviation is σ, std error is " / n 7

Confidence Interval of a Normal Distribution Example: 90% confidence interval. α =.10 PDF of N(0,1) 90% z 0.05 z 0.95 CDF of N(0,1) 8

Meaning of a Confidence Interval Example: 90% confidence interval. α =.10 PDF of N(0,1) 90% sample 1 z 0.05 z 0.95 100 total yes = 90 = 100(1-.10) total no = 10 = 100 (.10) 9

Computing (c 1,c 2 ) for a Population Mean µ The easy way (for a large sample, n > 30) By the central limit theorem, a 100(1- α)% confidence interval for µ (x " z 1"# / 2 s/ n, x + z 1"# / 2 s/ n ) Where x is the sample mean scaled by std error s is the sample std deviation n is the sample size α is the significance level, 100(1- α)% is the confidence level z 1- α/2 is the (1- α/2) quantile of the unit normal variate 10

Confidence Interval Example Given a (large) sample with the following characteristics 32 elements (n = 32) x sample mean = 3.90 sample std deviation s =.71 A 90% confidence interval for the mean can be computed as (x " z 1"# / 2 s/ n, x + z 1"# / 2 s/ n ) (x " (z 0.95 )s/ n,x + (z 0.95 )s/ n) Recall z 1-α/2 is approximately 4.91[(1-α/2) 0.14 -(α/2) 0.14 ] (3.90 " (1.645)(0.71) / 32,3.90 + (1.645)(0.71) / 32) = (3.69,4.11) 11

Computing (c 1,c 2 ) for Population Mean µ The easy way (for a small sample, n 30) For smaller samples, confidence intervals can only be constructed if samples come from a normally distributed population Ratio of (x " µ) /(s/ n) follows a t(n-1) distribution For small samples, 100(1- α)% confidence interval for µ is (x " t [1"# / 2;n"1] s/ n,x + t [1"# / 2;n"1] s/ n) Where x is the sample mean s is the sample std deviation n is the sample size α is the significance level, 100(1- α)% is the confidence level t [1- α/2; n-1] : (1- α/2) quantile of t distribution with n-1 degrees of freedom 12

Confidence Interval of a t(n-1) Distribution Example: 90% confidence interval PDF of t(n-1) P(x < -t [1-.05; n-1] ) = α/2 P(x > t [1-.05; n-1] ) = α/2 90% t 0.05 t 0.95 CDF of t(n-1) 13

Confidence Interval Example Given a (small) sample with the following characteristics modeling error shown to be normally distributed from quantile/quantile plot {-.04,-.19,.14,-.09,-.14,.19,.04,.09} 8 elements (n = 8) sample mean = 0 sample std deviation s =.138 x A 90% confidence interval for the mean can be computed as (x " t [1"# / 2;n"1] s/ n,x + t [1"# / 2;n"1] s/ n) (0 " (t [0.95;7] )(.138) / 8,0 + (t [0.95;7] )(.138) / 8) Look up t [1-.05; n-1] in Jain Table A.4 = 1.895 (0 " (1.895)(.138) / 8,0 + (1.895)(.138) / 8) = (-.0926,+.0926) 14

Small vs. Large Samples Why the difference when computing confidence for small vs. large samples? As n increases, t-distribution approaches normal distribution 15

Testing for a Zero Mean Is a measured value significantly different from zero? common use of confidence intervals When comparing random measurement with zero, must do so probabilistically If value different from zero with probability 100(1-α)%, then value is significantly different from zero mean 0 CI includes 0 CI does not include 0 16

Example: Testing for a Zero Mean Difference in running time of two sorting algorithms A and B was measured on several different input sequences Differences are {1.5, 2.6, -1.8, 1.3, -.5, 1.7, 2.4} Can we say with 99% confidence that 1 algorithm is superior? Example properties x n = 7, = 1.03, std deviation = 1.6 α =.01, α/2 =.005 confidence interval (1.03 " t [1".005;6] *1.60 / 7,1.03+ t [1".005;6] *1.60/ 7) Look up t [1-.005; 6] = t [.995; 6] in Jain Table A.4 = 3.707 (1.03" (3.707) *1.60 / 7,1.03 + (3.707) *1.60 / 7) = ("1.21,3.27) Confidence interval includes 0; thus, cannot say with 99% confidence that the mean difference between A & B is significantly different from 0 17

Comparing Two Alternatives Most performance analysis projects require determining the best of two or more systems Here we compare 2 systems with very similar workloads more than 2 systems or substantially different workloads use techniques for experimental design (covered later in course) 18

Paired Observations Conduct n experiments on each of 2 systems system a: {a 1, a 2,, a n } system b: {b 1, b 2,, b n } If one-one correspondence between tests on both systems observations are said to be paired Treat the samples for 2 systems as one sample of n pairs For each pair, compute difference in performance {a 1 -b 1, a 2 -b 2,, a n -b n } Construct a confidence interval for the mean difference Is the confidence interval includes 0, systems not significantly different 19

Unpaired Observations (t-test) Two samples, one size n a, the other size n b Compute mean of each sample: x a, x b Compute std deviation of each sample: s a, s b Compute mean difference Compute std deviation of mean difference Effective number of degrees of freedom x a " x b s a 2 (s 2 " = a /n a + s 2 b /n b ) 2 2 2 1 # s & a 1 # % ( + % n a +1$ ' n b +1$ n a s b 2 n b 2 n a + s b & ) 2 2 ( ' n b Confidence interval for mean difference (x a " x b ) ± t [1"# / 2;$ ] s 20

Notes on Unpaired Observations Preceding slide made following assumptions two samples of unequal size standard deviations not assumed equal small sample sizes normal populations 21

Approximate Visual Test Simple visual test to compare unpaired samples CI no overlap A > B CI overlap; means in CI of other; alternatives not different mean 0 A B mean 0 A B CI overlap; mean A not in CI B; need t-test mean A 0 B 22

What Confidence Level to Use? Typically use confidence of 90% or 95% Need not always be that high Choice of confidence level is based on cost of loss if wrong! If loss is high compared to gain, use high confidence If loss is negligible compared to gain, low confidence OK 23

One-sided Confidence Intervals Sometimes only a one-sided confidence interval is needed Example: want to test if mean > µ 0 In this case, one-sided lower confidence interval for µ needed (x " t [1"#;n"1] s/ n,x ) For large samples, use z-values (unit normal distribution) rather than t-distribution 24

Confidence Intervals for Proportions For categorical variables, data often associated with probabilities of various categories Example: want confidence interval for n 1 of n of type 1 Proportion = p = n 1 /n Confidence interval for proportion p ± z 1"# / 2 p(1" p) n Confidence interval based on approximating the binomial distribution valid only if np 10 If np 10, can t use t-values; must use binomial tables 25

Example: Confidence for Proportions If 10 out of 1000 pages from a laser printer are illegible Proportion of illegible pages is 10/1000 =.01 Since np 10, we can use the aforementioned confidence interval p ± z 1"# / 2 p(1" p) n.01(.99).01± z 1"# / 2 1000 Recall z 1-α/2 is approximately 4.91[(1-α/2) 0.14 -(α/2) 0.14 ].01± z 1"# / 2.0000099 =.01± z 1"# / 2 (.0031) 90% confidence = 95% confidence =.01± (1.645)(.0031) = (.0049,.0151).01± (1.96)(.0031) = (.0039,.0161) 26

Determining Sample Size Confidence level from a sample depends on sample size the larger the sample, the higher the confidence Goal: determine smallest sample yielding desired accuracy 27

Sample Size for Determining Mean For a sample size n, the 100(1-α)% confidence interval of µ is s x ± z 1"# / 2 n For a desired accuracy of r%, the confidence interval must be r x ± x 100 s Thus, z 1"# / 2 and n = x r * n = 100z s 2 $ ', 1"# / 2 + & ) - 100 + % rx ( - In a preliminary test, sample mean of response time is 20 seconds and std dev. = 5 seconds. How many repetitions are needed to estimate the mean response time within 2s at 95% confidence? Required accuracy r = 2 in 20 = 10% * n = 100z 2 $ 1"# / & 2s', * + ) - = 100(1.96)(5) 2 $ ', + & ) - + % rx ( - + % (10)(20) ( - = * 24.01, = 25 28

Sample Size for Determining Proportions Recall confidence interval for proportion To get an accuracy of r, Thus, r = z 1"# / 2 p(1" p) n p ± r = p ± z 1"# / 2 p(1" p) n and $ n = % z 1"# / 2 ( ) p ± z 1"# / 2 p(1" p) n 2 p(1" p) & ' r 2 A preliminary measurement of a laser printer showed an illegible print rate of 1 in 10000. How many pages must be observed to get an accuracy of 1 per million at 95% confidence? p =1/10,000 =10 "4 ;r =10 "6 ;z.975 =1.96 # % n = $ ( 1.96) 2 10"4 (1"10 "4 )& $ ( 10 "6 ) 2 & = 384,160,000 29