Statistics 511 Additional Materials

Similar documents
Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

Exam II Covers. STA 291 Lecture 19. Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Location CB 234

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Estimation of a population proportion March 23,

Frequentist Inference

MATH/STAT 352: Lecture 15

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Chapter 6. Sampling and Estimation

Chapter 8: Estimating with Confidence

Infinite Sequences and Series

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

STAT 155 Introductory Statistics Chapter 6: Introduction to Inference. Lecture 18: Estimation with Confidence

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008

This is an introductory course in Analysis of Variance and Design of Experiments.

Stat 421-SP2012 Interval Estimation Section

Chapter 6 Sampling Distributions

Random Variables, Sampling and Estimation

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Final Examination Solutions 17/6/2010

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Parameter, Statistic and Random Samples

Statistical Intervals for a Single Sample

1 Inferential Methods for Correlation and Regression Analysis

µ and π p i.e. Point Estimation x And, more generally, the population proportion is approximately equal to a sample proportion

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

Binomial Distribution

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Probability, Expectation Value and Uncertainty

Statisticians use the word population to refer the total number of (potential) observations under consideration

MEASURES OF DISPERSION (VARIABILITY)

Instructor: Judith Canner Spring 2010 CONFIDENCE INTERVALS How do we make inferences about the population parameters?

(7 One- and Two-Sample Estimation Problem )

Statistical Inference Procedures

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

A statistical method to determine sample size to estimate characteristic value of soil parameters

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

MA238 Assignment 4 Solutions (part a)

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

Chapter 23: Inferences About Means

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Estimation for Complete Data

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

Understanding Samples

Module 1 Fundamentals in statistics

Chapter 6 Principles of Data Reduction

Sequences. Notation. Convergence of a Sequence

Confidence Intervals for the Population Proportion p

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

6.3 Testing Series With Positive Terms

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

ANALYSIS OF EXPERIMENTAL ERRORS

Median and IQR The median is the value which divides the ordered data values in half.

UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences

Power and Type II Error

Computing Confidence Intervals for Sample Data

Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

Chapter 18 Summary Sampling Distribution Models

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Properties and Hypothesis Testing

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

The standard deviation of the mean

Topic 10: Introduction to Estimation

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

CONFIDENCE INTERVALS STUDY GUIDE

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

Data Analysis and Statistical Methods Statistics 651

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date: Confidence Interval Guesswork with Confidence

Confidence Intervals QMET103

Analysis of Experimental Measurements

S160 #12. Review of Large Sample Result for Sample Proportion

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date:

S160 #12. Sampling Distribution of the Proportion, Part 2. JC Wang. February 25, 2016

Confidence Intervals รศ.ดร. อน นต ผลเพ ม Assoc.Prof. Anan Phonphoem, Ph.D. Intelligent Wireless Network Group (IWING Lab)

Confidence Interval for one population mean or one population proportion, continued. 1. Sample size estimation based on the large sample C.I.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Sample Size Determination (Two or More Samples)

Parameter, Statistic and Random Samples

24.1. Confidence Intervals and Margins of Error. Engage Confidence Intervals and Margins of Error. Learning Objective

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Read through these prior to coming to the test and follow them when you take your test.

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Homework 5 Solutions

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Transcription:

Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability from statistics is to thik of probability as the process of makig a iferece about a subset (a sample) of the populatio whe we kow the attributes of the etire populatio. This was the case i the previous sectios. We (preteded that we) kew the etire populatio or the etire distributio. We the were able to discuss probabilities for how a sigle observatio might behave, as well as how the average of several observatios might behave. Thus i probability we had kowledge of the whole populatio (via the values of its parameters) ad we wat to be able to make statemets about a observatio or a group of observatios. Now i statistics this process reverses. I statistics we will have a sample, a collectio of observatios, from the populatio. From this subset, the sample, we wat to be able to make statemets about the populatio. Backgroud The ideas preseted i this topic deped heavily o cocepts from the previous sectios. Specifically the idea of variability from sample to sample is crucial. As we saw previously, each time we take a sample we got differet values for the sample mea ad these values differed from the populatio mea. Oe cosequece of this is that usig the sample mea, aloe, is ot the best estimate of the populatio mea. The reaso beig that the sample mea differs. Each sample gives us a differet value for the sample mea. The idea of a cofidece iterval is that istead of simply usig a sigle umber, we use a iterval, a rage of umbers, to estimate the populatio mea. Defiitios ad prelimiaries Defiitio: A parameter is a umerical quatity that summarizes a characteristic of the etire populatio. Defiitio: A statistic is a umerical quatity that summarizes a characteristic of a subset of the populatio (this is a sample). We differetiate here betwee µ ad σ which are parameters ad x ad s x which are statistics. Recall that µ is the populatio mea ad σ is the populatio stadard deviatio, while x is the sample mea ad s x is the sample stadard deviatio. µ ad σ are umerical summaries for the etire populatio. x ad s x are calculated from a subset (sample) of the populatio. Defiitio: A poit estimate of a populatio parameter is a sigle umber used to estimate the ukow value of a populatio parameter. Page 1 of 9

Statistics 511 Additioal Materials Defiitio: A iterval estimate of a populatio parameter is a rage of umbers used to estimate the ukow value of a populatio parameter. Suppose that we are iterested i estimatig the mea weight of all black bears i West Virgiia. We are able to weigh 38 black bears. From the radom sample of 38 observatios, we wat to make statemets about the etire populatio of black bears. That is we wat to estimate the average weight of the etire populatio of black bears i West Virgiia. A poit estimate for this parameter, the mea weight of all black bears i WV, is 457 pouds. A iterval estimate would be that the mea weight of all black bears is betwee 428 ad 497 pouds. Now it is importat to ote that the actual mea weight of the populatio of all black bears i West Virgiia does ot chage. Rather it is our kowledge that is imperfect. We have oly 38 bears from the etire populatio ad as a cosequece we do ot kow the weight of all bears. That idea is worth reiteratig. The eed for iterval estimates comes from the fact that we do ot have all of the iformatio that we wat about the parameter, i the previous example, the populatio mea. That is, we have a subset of the populatio or a sample but ot the etire populatio. We kow that the sample mea is a radom variable ad that its value is likely ot the same as the populatio mea. Thus what we will do is specify a rage of values that are plausible based o our sample. It is worth otig that uless we specify a rage of values that goes from egative ifiity to positive ifiity, we ca ever guaratee that the populatio mea will be i the iterval estimate for that mea. No method will cotai or have the populatio mea iside the iterval for every sample; however, statistical methods ca specify the percetage of time that our itervals will miss the parameter of iterest. Whe we use the word cotai i this topic it has a special meaig. It is importat to remember that the value of the populatio mea or ay parameter is costat. Cosequetly, whe we are cosiderig a iterval estimate, if the parameter is iside the iterval we will cosider the parameter s value to be cotaied i the iterval. For the parameter to be cotaied i the iterval, it must fall betwee the edpoits of the iterval; a iterval will have a upper edpoit ad a lower edpoit. Defiitio: A (1-α)*100% cofidece iterval for a parameter is a iterval estimate that through repeatig the process of takig a sample ad makig a cofidece iterval from that sample, will cotai the parameter (1-α)*100% of the time. Defiitio: The cofidece level for a cofidece iterval is the percetage of times that a collectio of cofidece itervals will cotai the parameter of iterest i repeated samplig. The cofidece level for a 95% cofidece iterval is 0.95. The cofidece level for a 84% cofidece iterval is 0.84. The cofidece level is ofte deoted by (1-α)*100%. The reaso for this is to allow each researcher to specify his/her level of cofidece: Page 2 of 9

Statistics 511 Additioal Materials α=0.05 yields a cofidece level of 0.95; while α = 0.10 yields of cofidece iterval of 0.90. The defiitio of a cofidece iterval eeds some explaatio (maybe plety of explaatio). Each cofidece iterval is calculated from a sample. This sample is a subset of the populatio. Previously, we saw that each sample of observatios was differet. The idea of repeatig the process of takig a sample described i the defiitio above is just that, each sample will be differet. As we will see shortly (whe we talk about calculatios), sice each sample is differet, each cofidece iterval will be differet. Because of the variability from sample to sample, some of the cofidece itervals that we costruct will ot cotai the parameter it is tryig to estimate. The difficulty with cofidece itervals is this: We DON T get to kow the value of the parameter we are tryig to estimate; so we do t kow which itervals capture the parameter ad which do t. Remember that the parameter is a quatity calculated from the populatio. I Statistics, we oly see a subset of the populatio, so we caot kow the value of the parameter. Thus, we will costruct a cofidece iterval ad we will ot kow whether the parameter is iside the cofidece iterval. Istead we must be cotet to kow that the procedure works a certai percetage of the time, specifically (1-α)*100%. The procedure begis with selectig objects for the radom sample, gettig data from those uits ad the makig our calculatios. However, we do t kow if the cofidece iterval that we have created is oe of those (1-α)*100% of times that cotai the ukow value of the parameter or oe of the α*100% of the times that the cofidece iterval does ot cotai the ukow value of the parameter. Cosider this more cocrete example. A 95% cofidece iterval is made for the mea height of Yellow Poplars i West Virgiia. This iterval goes from 85.68 feet to 94.39 feet. This is based upo a radom sample of 56 trees take from aroud the state. We say that we are 95% cofidet that the mea height of the populatio of all Yellow Poplars i WV is betwee 85.68 ad 94.39 feet. However, sice we do t kow the actual value of the populatio mea we caot say with absolute certaity that it is betwee these umbers. So why, if I ca t say aythig with certaity usig a cofidece iterval would I use it at all. The aswer is simply that by usig statistics we ca accurately tell the percetage of times the process will fail. No other methodology allows you to specify that percetage. Statistics allows for this, but forces a layer of ucertaity ito the discourse. Cofidece Itervals o mu (Small Sample Size) Whe the umber of observatios i the sample is large (at least 30 observatios), we ca use the Cetral Limit Theorem to help us costruct a cofidece iterval o the ukow value of the populatio mea mu. For large samples the CLT tells us that the distributio of the possible values of the sample mea is ormal. We ca use the stadard ormal distributio to fid the critical value (z) that is part of the (margi of) error term. Page 3 of 9

Statistics 511 Additioal Materials Let s start with a 95% cofidece iterval for the populatio mea. 95% is a commo choice for the cofidece level. The critical value correspodig to a 95% cofidece level is 1.96. Note that 0.95 = P(-1.96<Z<1.96) = P( 1.96 < X µ <1.96) σ x Some algebra later: σ x σ = P( µ 1.96 < X < µ + 1.96 x ) The above expressio is a statemet of probability about X, if we kow the values for µ ad σ x. This follows what we did i the previous chapters whe we preteded that we kew the value of µ or σ x or m or p. Some more algebra later, we ca tur this ito a iterval for µ, σ = P( X 1.96 = 0.95. < µ < X + 1.96 x σ x ) This expressio provides the edpoits of the CI: X 1.96 σ x < µ < X +1.96 σ x which we ca write more succictly (ad i a more geeral fashio) as ) * σ x But we do t kow the value of σ x. Whe the value of σ x is ukow (always the case i the real world) we ca substitute our estimate s x for σ x. Our CI is computed as X ± t ( 1, α 2 After we have computed x ad s x (ad ot µ ad σ x ) the we ca costruct a cofidece iterval o the ukow value of mu; however, there are two direct results of this. 1.The critical value (z) i the error term o loger have a ormal distributio, it comes from somethig called a t-distributio. Page 4 of 9

Statistics 511 Additioal Materials 2.We o loger have a statemet of probability; we have a statemet of cofidece. The t-distributio The t-distributio is similar to the z distributio or stadard ormal distributio. It is based upo takig a sample of observatios from a Normal distributio with mea µ ad stadard deviatio σ. The radom variable T will possess a t-distributio with -1 degrees of freedom, where T X µ is computed as T =. sx A particular member of the family of t-distributios is defied by its umber of degrees of freedom much like the Poisso was idexed by µ. Degrees of freedom is a parameter for the family of t-distributios just as µ was a parameter for the Poisso family. The mea of a t radom variable is 0. This distributio is symmetric ad uimodal, but it has slightly more variability tha the Normal distributio. We will use the percetiles of the t-distributio quite frequetly throughout the rest of the course. Cosequetly, we have specific otatio for it. The k th percetile for a t-distributio with df degrees of freedom will be deoted by t (df;k). t (25, 0.05) would be the 95 th percetile of a t R.V. with 25 degrees of freedom. t (38,0.10) would be the 90 th percetile of a t R.V. with 38 degrees of freedom. For calculatig these percetiles we use Table A.2. This table has the degrees of freedom i the first colum ad percetiles i the other colums. This book uses P to represet the area to the right of the percetile, thus if we wat the 90 th percetile from the table, we eed to look i the colum with P = 0.10. Likewise the 99 th percetile ca be foud i the colum that is desigated by P = 0.01. t (14, 0.05) = 1.7613 t (25, 0.01) = 2.4851 t (30, 0.10) = 1.3104 Table A.2 does ot cotai all possible values for degrees of freedom. For example, if the degrees of freedom is 30 or more, the you would use the Stadard Normal table (Table A.1) to estimate the correspodig t-value. Page 5 of 9

Statistics 511 Additioal Materials Give a sample of 14 observatios from a distributio that is kow to be Normally distributed, costruct a 99% cofidece iterval o the ukow value of populatio parameter mu. Form the data we have calculated X = 4.127 ad s x = 0.358. We ca use the formula X ± t ( 1, α 2 sice there are more tha 2 observatios ad the data comes from a Normal distributio. First, df=-1=14-1=13 ad α =1 C.L. =1 0.99 = 0.01. So α 2 = 0.01 = 0.005 (here P=0.005) ad t = 3.0123 2 The X ± t ( 1, α 2 = 4.127 ± t (13,0.005) * 0.358 14 = 4.127 ± 3.0123* 0.358 14 = 4.127 ± 3.012*0.096 = 4.127 ± 0.28915 The edpoits of our cofidece iterval o mu are (3.838, 4.416). (3.838, 4.416) is mathematical otatio for a iterval that goes from 3.838 to 4.416. Page 6 of 9

Statistics 511 Additioal Materials So a 99% cofidece iterval for the populatio mea goes from 3.838 to 4.416. We iterpret this by sayig that we are 99% cofidet that the populatio mea s value falls withi the iterval 3.838 to 4.416. Cofidece Itervals o mu (Large Sample Size) Whe the sample size is large (at least 30 observatios) a t-distributio with -1 d.f. is virtually idetical to the Stadard Normal distributio. So we ca obtai our critical value from the Stadard Normal distributio table istead of the t-table. I the large sample situatio, the formula for a (1-α)*100% cofidece iterval o mu becomes The t-value i the error term is replaced by a z-value from a Stadard Normal distributio. All else remais the same. The formula above ca be used to costruct a (1-α)*100% cofidece iterval (CI) for the populatio mea whe 1. (the umber of observatios) is more tha 2 ad the origial data (the values of the variable X) are approximately Normal or 2. is at least 30 ( 30) (ad we do t kow what distributio the data came from) Suppose that wat to estimate the mea of a populatio. We have a sample of 48 observatios from a populatio. The mea of these observatios is 290.34 ad the stadard deviatio of these observatios is 41.22. The a 95% cofidece iterval for the populatio mea would be as follows We ca use the formula below sice 30. = 290.34 ±1.96* 41.22 48 = 290.34 ±11.66 Page 7 of 9

Statistics 511 Additioal Materials = (278.368, 302.00) (278.68, 302.00) is mathematical otatio for a iterval that goes from 278.68 to 302.00. Thus a 95% cofidece iterval for the populatio mea goes from 278.68 to 302.00. We iterpret this by cocludig that we are 95% cofidet the ukow value of the populatio mea is betwee 278.68 ad 302.00. Cofidece istead of probability Whe we are dealig with parameters such as µ or σ, we are dealig with fixed quatities. As a cosequece, if we make a statemet such as the value of the populatio mea is betwee 8.5 ad 19.4, that statemet is either true or false. The populatio mea is either i the cofidece iterval or the populatio mea is outside of the cofidece iterval. This has implicatios for our iterpretatio of a cofidece iterval. After we create a 95% cofidece iterval for µ from say 175.46 to 176.32, the P(175.46< µ< 176.32) 0.95. This probability, P(175.46< µ< 176.32) is either 0 or 1. The value of the populatio mea is either iside the iterval or it is ot iside the iterval. The cofidece that we assert comes from repetitios of the process of takig may samples ad calculatig the cofidece iterval for each sample. However, for ay idividual iterval we do ot kow whether the mea is iside the iterval or outside the iterval. What we do kow is that if we repeated the process of collectig samples ad makig 95% cofidece itervals for the populatio mea from each sample, the approximately 95% of those cofidece itervals would cotai the populatio mea. Factors Affectig the Width of a (1-α)*100% Cofidece Iterval There are three factors that ifluece the size or width of a cofidece iterval. The sample size. As icreases, the width of the CI decreases. Cofidece level (1-α). As cofidece level icreases, the width of the CI icreases. The sample stadard deviatio s. The bigger s is, the wider the CI is. Summary: The basic form of a cofidece iterval for a populatio parameter is as follows: error. Poit estimate ± critical value from a samplig distributio * stadard Page 8 of 9

Statistics 511 Additioal Materials The poit estimate is best sigle umber estimate for the parameter. The stadard error is a estimate of the variability from sample to sample for the poit estimate. The critical value that is used is based upo the cofidece level that we wat to use ad the samplig distributio is determied by the type of parameter that we are estimatig. (1-α)*100% Large Sample CI o mu: (1-α)*100% Small Sample CI o mu: X ± t ( 1, α 2 (1-α)*100% CI o mu whe our data comes from a ormal distributio (regardless of the sample size): Page 9 of 9