Computing Confidence Intervals for Sample Data

Similar documents
7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Topic 9: Sampling Distributions of Estimators

Chapter 6 Sampling Distributions

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

Linear Regression Models

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Random Variables, Sampling and Estimation

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Module 1 Fundamentals in statistics

Lecture 5. Materials Covered: Chapter 6 Suggested Exercises: 6.7, 6.9, 6.17, 6.20, 6.21, 6.41, 6.49, 6.52, 6.53, 6.62, 6.63.

1 Inferential Methods for Correlation and Regression Analysis

Statistical inference: example 1. Inferential Statistics

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Sampling Distributions, Z-Tests, Power

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

(7 One- and Two-Sample Estimation Problem )

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Topic 10: Introduction to Estimation

MATH/STAT 352: Lecture 15

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Confidence Level We want to estimate the true mean of a random variable X economically and with confidence.

Statistics 511 Additional Materials

Stat 421-SP2012 Interval Estimation Section

Analysis of Experimental Data

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Chapter 6. Sampling and Estimation

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

Chapter 8: Estimating with Confidence

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

Statisticians use the word population to refer the total number of (potential) observations under consideration

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

AP Statistics Review Ch. 8

UNIT 8: INTRODUCTION TO INTERVAL ESTIMATION

Expectation and Variance of a random variable

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Simulation. Two Rule For Inverting A Distribution Function

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Properties and Hypothesis Testing

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Sample Size Determination (Two or More Samples)

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov

Bayesian Methods: Introduction to Multi-parameter Models

NCSS Statistical Software. Tolerance Intervals

Statistical Intervals for a Single Sample

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

Lecture 2: Monte Carlo Simulation

Activity 3: Length Measurements with the Four-Sided Meter Stick

Error & Uncertainty. Error. More on errors. Uncertainty. Page # The error is the difference between a TRUE value, x, and a MEASURED value, x i :

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Chapter 7 Student Lecture Notes 7-1

Class 27. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Confidence Intervals รศ.ดร. อน นต ผลเพ ม Assoc.Prof. Anan Phonphoem, Ph.D. Intelligent Wireless Network Group (IWING Lab)

Instructor: Judith Canner Spring 2010 CONFIDENCE INTERVALS How do we make inferences about the population parameters?

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

STATS 200: Introduction to Statistical Inference. Lecture 1: Course introduction and polling

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Confidence Intervals QMET103

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Exam 2 Instructions not multiple versions

Lecture 7: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

Binomial Distribution

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

Lecture 5. Random variable and distribution of probability

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Common Large/Small Sample Tests 1/55

Read through these prior to coming to the test and follow them when you take your test.

Lecture 1. Statistics: A science of information. Population: The population is the collection of all subjects we re interested in studying.

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Formulas and Tables for Gerstman

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

Chapter 1 (Definitions)

Basis for simulation techniques

Power and Type II Error

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Exam II Covers. STA 291 Lecture 19. Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Location CB 234

Biostatistics for Med Students. Lecture 2

µ and π p i.e. Point Estimation x And, more generally, the population proportion is approximately equal to a sample proportion

Lecture 4. Random variable and distribution of probability

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Economics Spring 2015

1.010 Uncertainty in Engineering Fall 2008

Probability and statistics: basic terms

Transcription:

Computig Cofidece Itervals for Sample Data Topics Use of Statistics Sources of errors Accuracy, precisio, resolutio A mathematical model of errors Cofidece itervals For meas For variaces For proportios How may measuremets are eeded for desired error? 1

What are statistics? A brach of mathematics dealig with the collectio, aalysis, iterpretatio, ad presetatio of masses of umerical data. Merriam-Webster We are most iterested i aalysis ad iterpretatio here. Lies, dam lies, ad statistics! 3 What is a statistic? A quatity that is computed from a sample [of data]. Merriam-Webster A estimate of a populatio parameter 4

Statistical Iferece populatio statistics iferece parameters sample 5 Why do we eed statistics? A set of experimetal measuremets costitute a sample of the uderlyig process/system beig measured Use statistical techiques to ifer the true value of the metric Use statistical techiques to quatify the amout of imprecisio due to radom experimetal errors 6 3

Experimetal errors Errors oise i measured values Systematic errors Result of a experimetal mistake Typically produce costat or slowly varyig bias Cotrolled through skill of experimeter Examples Temperature chage causes clock drift Forget to clear cache before timig ru 7 Experimetal errors Radom errors Upredictable, o-determiistic Ubiased equal probability of icreasig or decreasig measured value Result of Limitatios of measurig tool Observer readig output of tool Radom processes withi system Typically caot be cotrolled Use statistical tools to characterize ad quatify 8 4

Example: Quatizatio Radom error 9 Quatizatio error Timer resolutio quatizatio error Repeated measuremets X ± Δ Completely upredictable 10 5

A Model of Errors Error Measured value Probability -E x E ½ +E x + E ½ 11 A Model of Errors Error 1 Error Measured value Probability -E -E x E ¼ -E +E x ¼ +E -E x ¼ +E +E x + E ¼ 1 6

A Model of Errors 13 Probability of Obtaiig a Specific Measured Value 14 7

A Model of Errors Pr(X=x i ) = Pr(measure x i ) = umber of paths from real value to x i Pr(X=x i ) ~ biomial distributio As umber of error sources becomes large, Biomial Gaussia (Normal) Thus, the bell curve 15 Frequecy of Measurig Specific Values Accuracy Precisio Mea of measured values Resolutio True value 16 8

Accuracy, Precisio, Resolutio Systematic errors accuracy How close mea of measured values is to true value Radom errors precisio Repeatability of measuremets Characteristics of tools resolutio Smallest icremet betwee measured values 17 Quatifyig Accuracy, Precisio, Resolutio Accuracy Hard to determie true accuracy Relative to a predefied stadard o E.g. defiitio of a secod Resolutio Depedet o tools Precisio Quatify amout of imprecisio usig statistical tools 18 9

Cofidece Iterval for the Mea 1-α α/ c1 c α/ 19 Statistical Iferece populatio statistics iferece parameters sample 0 10

Why do we eed statistics? A set of experimetal measuremets costitute a sample of the uderlyig process/system beig measured Use statistical techiques to ifer the true value of the metric Use statistical techiques to quatify the amout of imprecisio due to radom experimetal errors Assumptio: radom errors ormally distributed 1 Iterval Estimate a b The iterval estimate of the populatio parameter will have a specified cofidece or probability of correctly estimatig the populatio parameter. 11

Properties of Poit Estimators I statistics, poit estimatio ivolves the use of sample data to calculate a sigle value which is to serve as a "best guess" for a ukow (fixed or radom) populatio parameter. Example of poit estimator: sample mea. Properties: Ubiasedess: the expected value of all possible sample statistics (of give size ) is equal to the populatio parameter. E[ X ] = µ E [ s ] =! Efficiecy: precisio as estimator of the populatio parameter. Cosistecy: as the sample size icreases the sample statistic becomes a better estimator of the populatio parameter. 3 Ubiasedess of the Mea ' i= 1 X i X = & E$ $ E[ X ] = % ' i= # Xi!!" = ' i= 1 ' i= 1 µ 1 µ = = µ E[ X ] i = 4 1

Sample size= 15 1.7% of populatio Sample 1 Sample Sample 3 0.0739 0.00 0.918 0.1407 0.1089 0.4696 0.157 0.04 0.8644 0.043 0.453 0.1494 0.1784 0.1584 0.44 0.4106 0.8948 0.0051 0.1514 0.035 1.1706 0.454 0.175 0.0084 0.0485 0.387 0.0600 0.1705 0.1697 0.780 0.3335 0.090 0.4985 0.177 0.1488 0.0988 0.04 0.486 0.4896 0.183 0.467 0.189 0.074 0.4079 0.114 E[sample] Populatio Error Sample Average 0.1718 0.467 0.3744 0.643 0.083 6.9% Sample Variace 0.0180 0.0534 0.104 0.0639 0.0440 45.3% Efficiecy (average) 18% 18% 80% Efficiecy (variace) 59% 1% 173% 5 Sample size = 87 10% of populatio Sample 1 Sample Sample 3 0.575 0.3864 0.467 0.0701 0.0488 0.317 0.165 0.0611 0.1138 0.6581 0.0881 0.0047 0.0440 0.5866 0.438 0.1777 0.3419 0.0819 0.380 0.193 0.6581 0.010 0.9460 0.0714 0.435 0.0445 0.959 Populatio % Rel. Error Sample Average 0.39 0.03 0.178 0.06 0.083 5.9% Sample Variace 0.045688 0.0484057 0.0440444 0.0459 0.0440 4.3% Efficiecy (average) 7.5% 5.7% 4.5% Efficiecy (variace).9% 10.0% 0.1% 6 13

Cofidece Iterval Estimatio of the Mea Kow populatio stadard deviatio. Ukow populatio stadard deviatio: Large samples: sample stadard deviatio is a good estimate for populatio stadard deviatio. OK to use ormal distributio. Small samples ad origial variable is ormally distributed: use t distributio with -1 degrees of freedom. 7 Cetral Limit Theorem If the observatios i a sample are idepedet ad come from the same populatio that has mea µ ad stadard deviatio σ the the sample mea for large samples has a ormal distributio with mea µ ad stadard deviatio σ/ x ~ N( µ,! / ) The stadard deviatio of the sample mea is called the stadard error. 8 14

c =Q(1-α/) c 1 =Q(α/) µ N ( µ,! / ) 0 1-α α/ α/ c 1 c 1-α α/ α/ x 1 0 x c N(0,1) = c1 = µ + µ + " " z1#! / z! / = x =z 1-α/ x 1 =z α/ = - z 1-α/ µ # " z1#! / 9 Cofidece Iterval - large (>30) samples 100 (1-α)% cofidece iterval for the populatio mea: s ( x " z1 "! /, x + z1 "! / s ) x : sample mea s: sample stadard deviatio : sample size z : (1-α/)-quatile of a uit ormal variate ( N(0,1)). 1 "! / 30 15

0.435 0.0445 0.959 Populatio Sample Average 0.39 0.03 0.178 0.06 0.083 Sample Variace 0.045688 0.0484057 0.0440444 0.0459 0.0440 Efficiecy (average) 7.5% 5.7% 4.5% Efficiecy (variace).9% 10.0% 0.1% 95% iterval lower 0.179 0.1740 0.1737 95% iterval upper 0.686 0.665 0.619 0.0894 Mea i iterval YES YES YES 99% iterval lower 0.1651 0.1595 0.1598 99% iterval upper 0.86 0.810 0.757 0.1175 Mea i iterval YES YES YES 90% iterval lower 0.1864 0.1815 0.1807 90% iterval upper 0.614 0.591 0.548 0.0750 Mea i iterval YES YES YES I Excel: ½ iterval = CONFIDENCE(1-0.95,s,) iterval size α Note that the higher the cofidece level the larger the iterval 31 µ Sample Iterval iclude µ? 1 YES YES 3 NO 100 YES 100 (1 α) of the 100 samples iclude the populatio mea µ. 3 16

Cofidece Iterval Estimatio of the Mea Kow populatio stadard deviatio. Ukow populatio stadard deviatio: Large samples: sample stadard deviatio is a good estimate for populatio stadard deviatio. OK to use ormal distributio. Small samples ad origial variable is ormally distributed: use t distributio with -1 degrees of freedom. 33 Studet s t distributio t( v) ~ N(0,1)! ( v) / v v: umber of degrees of freedom. v! ( ) : chi-square distributio with v degrees of freedom. Equal to the sum of squares of v uit ormal variates. the pdf of a t-variate is similar to that of a N(0,1). for v > 30 a t distributio ca be approximated by N(0,1). 34 17

Cofidece Iterval (small samples) For samples from a ormal distributio N(µ,σ ), (X " µ) /(# / ) has a N(0,1) distributio ad ( "1)s /# has a chi-square distributio with -1 degrees of freedom Thus, (X " µ) / s / has a t distributio with -1 degrees of freedom 35 Cofidece Iterval (small samples, ormally distributed populatio) 100 (1-α)% cofidece iterval for the populatio mea: x s ( x! t[1!" / ;! 1], x + t[1!" / ;! 1] s ) : sample mea s: sample stadard deviatio : sample size t [ 1! " / ;! 1] : critical value of the t distributio with -1 degrees of freedom for a area of α/ for the upper tail. 36 18

Usig the t Distributio. Sample size= 15. 0.074 0.4079 0.114 E[sample] Populatio Error Sample Average 0.1718 0.467 0.3744 0.643 0.083 6.9% Sample Variace 0.0180 0.0534 0.104 0.0639 0.0440 45.3% Efficiecy (average) 18% 18% 80% Efficiecy (variace) 59% 1% 173% 95% iterval lower 0.0975 0.1187 0.183 95%,-1 critical value.145 95% iterval upper 0.46 0.3747 0.5665 Mea i iteval YES YES YES I Excel: TINV(1-0.95,15-1) α 37 How may measuremets do we eed for a desired iterval width? Width of iterval iversely proportioal to Wat to miimize umber of measuremets Fid cofidece iterval for mea, such that: Pr(actual mea i iterval) = (1 α) [(1! e) x,(1 e x] ( 1 + c, c ) = ) 38 19

How may measuremets? z ( c 1' ( / 1, c) = (1 m e) = x m z s = xe & = $ % z 1' ( / 1' ( / xe x s #! " s 39 How may measuremets? But depeds o kowig mea ad stadard deviatio! Estimate s with small umber of measuremets Use this s to fid eeded for desired iterval width 40 0

How may measuremets? Mea = 7.94 s Stadard deviatio =.14 s Wat 90% cofidece mea is withi 7% of actual mea. 41 How may measuremets? Mea = 7.94 s Stadard deviatio =.14 s Wat 90% cofidece mea is withi 7% of actual mea. α = 0.90 (1-α/) = 0.95 Error = ± 3.5% e = 0.035 4 1

How may measuremets? & z1 '( / s # = = $ % xe! " & 1.895(.14) # = $! % 0.035(7.94) " 1.9 13 measuremets 90% chace true mea is withi ± 3.5% iterval 43 Cofidece Iterval Estimates for Proportios

Cofidece Iterval for Proportios For categorical data: E.g. file types {html, html, gif, jpg, html, pdf, ps, html, pdf } If 1 of observatios are of type html, the the sample proportio of html files is p = 1 /. The populatio proportio is π. Goal: provide cofidece iterval for the populatio proportio π. 45 Cofidece Iterval for Proportios The samplig distributio of the proportio formed by computig p from all possible samples of size from a populatio of size N with replacemet teds to a ormal with mea π ad stadard error! ( 1#! ). " p = The ormal distributio is beig used to approximate the biomial. So, "!10 46 3

Cofidece Iterval for Proportios The (1-α)% cofidece iterval for π is p(1! p), p + z ( p! z1! " / 1! " / p(1! p) ) p: sample proportio. : sample size z : (1-α/)-quatile of a uit ormal variate ( N(0,1)). 1 "! / 47 Example 1 Oe thousad etries are selected from a Web log. Six hudred ad fifty correspod to gif files. Fid 90% ad 95% cofidece itervals for the proportio of files that are gif files. Sample size () 1000 No. gif files i sample 650 Sample proportio (p) 0.65 *p 650 > 10 OK 90% cofidece iterval alpha 0.1 1-alpha/ 0.95 z0.95 1.645 Lower boud 0.65 Upper boud 0.675 95% cofidece iterval alpha 0.05 1-alpha/ 0.975 z0.975 1.960 Lower boud 0.60 Upper boud 0.680 I Excel: NORMSINV(1-0.1/) NORMSINV(1-0.05/) 48 4

Example How much time does processor sped i OS? Iterrupt every 10 ms Icremet couters = umber of iterrupts m = umber of iterrupts whe PC withi OS 49 Proportios How much time does processor sped i OS? Iterrupt every 10 ms Icremet couters = umber of iterrupts m = umber of iterrupts whe PC withi OS Ru for 1 miute = 6000 m = 658 50 5

Proportios p(1! p) ( c 1, c) = p m z1!" / 0.1097(1! 0.1097) = 0.1097 m1.96 6000 = (0.1018,0.1176) 95% cofidece iterval for proportio So 95% certai processor speds 10.-11.8% of its time i OS 51 Number of measuremets for proportios (1! e) p = ep = = p! z z 1! " / z 1! " / 1! " / p(1! p(1! p) p(1! ( ep) p) p) 5 6

Number of measuremets for proportios How log to ru OS experimet? Wat 95% cofidece ± 0.5% 53 Number of measuremets for proportios How log to ru OS experimet? Wat 95% cofidece ± 0.5% e = 0.005 p = 0.1097 54 7

Number of measuremets for proportios = z 1"# / p (1" p ) (ep) = (1.960) (0.1097)(1" 0.1097) 0.005(0.1097) [ ] =1,47,10 10 ms iterrupts 3.46 hours 55 Cofidece Iterval Estimatio for Variaces 8

Cofidece Iterval for the Variace If the origial variable is ormally distributed the the chi-square distributio ca be used to develop a cofidece iterval estimate of the populatio variace. The (1-α)% cofidece iterval for! is ( # 1) s! U $ " $ ( # 1) s! L! L! U : lower critical value of : upper critical value of!! 57 Chi-square distributio Not symmetric! α/ Q(α/) 1-α α/ Q(1-α/) 58 9

95% cofidece iterval for the populatio variace for a sample of size 100 for a N(3,) populatio. 1-α/.91903 average 4.71435 variace.1716 std deviatio 73.36110 lower critical value of chi-square for 95% 18.4193 upper critical value of chi-square for 95% lower boud for cofidece iterval for the variace 3.63477 upper boud for cofidece iterval for the variace 6.361966 I Excel: CHIINV (0.975, 99) CHIINV (0.05, 99) α/ The populatio variace (4 i this case) is i the iterval (3.6343, 6.36) with 95% cofidece. 59 Cofidece Iterval for the Variace If the populatio is ot ormally distributed, the cofidece iterval, especially for small samples, is ot very accurate. 60 30

Key Assumptio Measuremet errors are Normally distributed. Is this true for most measuremets o real computer systems? 1-α α/ c1 c α/ 61 Key Assumptio Saved by the Cetral Limit Theorem Sum of a large umber of values from ay distributio will be Normally (Gaussia) distributed. What is a large umber? Typically assumed to be > 6 or 7. 6 31

Normalizig data for cofidece itervals If the uderlyig distributio of the data beig measured is ot ormal, the the data must be ormalized Fid the arithmetic mea of four or more radomly selected measuremets Fid cofidece itervals for the meas of these average values o We ca o loger obtai a cofidece iterval for the idividual values o Variace for the aggregated evets teds to be smaller tha the variace of the idividual evets 63 Summary Use statistics to Deal with oisy measuremets Estimate the true value from sample data Errors i measuremets are due to: Accuracy, precisio, resolutio of tools Other sources of oise Systematic, radom errors 64 3

Summary (cot d): Model errors with bell curve Accuracy Precisio Mea of measured values Resolutio True value 65 Summary (cot d) Use cofidece itervals to quatify precisio Cofidece itervals for Mea of samples Proportios Variace Cofidece level Pr(populatio parameter withi computed iterval) Compute umber of measuremets eeded for desired iterval width 66 33