Chapter 6 Sampling Distributions

Similar documents
Module 1 Fundamentals in statistics

Parameter, Statistic and Random Samples

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Expectation and Variance of a random variable

Topic 9: Sampling Distributions of Estimators

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

Statisticians use the word population to refer the total number of (potential) observations under consideration

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Computing Confidence Intervals for Sample Data

Chapter 13, Part A Analysis of Variance and Experimental Design

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Statistics 511 Additional Materials

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Common Large/Small Sample Tests 1/55

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

(6) Fundamental Sampling Distribution and Data Discription

MATH/STAT 352: Lecture 15

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Binomial Distribution

Power and Type II Error

Parameter, Statistic and Random Samples

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

1 Inferential Methods for Correlation and Regression Analysis

This is an introductory course in Analysis of Variance and Design of Experiments.

Bayesian Methods: Introduction to Multi-parameter Models

Estimation for Complete Data

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

(7 One- and Two-Sample Estimation Problem )

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

1036: Probability & Statistics

Random Variables, Sampling and Estimation

Economics Spring 2015

Probability and statistics: basic terms

Background Information

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

Lecture 19: Convergence

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Sample Size Determination (Two or More Samples)

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Lecture 2: Monte Carlo Simulation

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Data Analysis and Statistical Methods Statistics 651

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Lecture 5. Materials Covered: Chapter 6 Suggested Exercises: 6.7, 6.9, 6.17, 6.20, 6.21, 6.41, 6.49, 6.52, 6.53, 6.62, 6.63.

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Sample questions. 8. Let X denote a continuous random variable with probability density function f(x) = 4x 3 /15 for

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

MA238 Assignment 4 Solutions (part a)

The standard deviation of the mean

GG313 GEOLOGICAL DATA ANALYSIS

Read through these prior to coming to the test and follow them when you take your test.

STATISTICAL INFERENCE

Stat 421-SP2012 Interval Estimation Section

Statistical inference: example 1. Inferential Statistics

6.3 Testing Series With Positive Terms

Simulation. Two Rule For Inverting A Distribution Function

Instructor: Judith Canner Spring 2010 CONFIDENCE INTERVALS How do we make inferences about the population parameters?

PRACTICE PROBLEMS FOR THE FINAL

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

Properties and Hypothesis Testing

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

A statistical method to determine sample size to estimate characteristic value of soil parameters

Chapter 6 Principles of Data Reduction

4. Partial Sums and the Central Limit Theorem

6 Sample Size Calculations

Distribution of Random Samples & Limit theorems

7.1 Convergence of sequences of random variables

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

Quick Review of Probability

Chapter 6. Sampling and Estimation

Sampling Distributions, Z-Tests, Power

Chapter 20. Comparing Two Proportions. BPS - 5th Ed. Chapter 20 1

Transcription:

Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to examie probabilities associated with evets that specify coditios o two or more radom variables. Def: A set of radom variables costitutes a radom sample of size from a fiite populatio of size N if each member of the sample,, is chose i such a way that every sample of size has the same probability of beig chose. Def: A set of (cotiuous or discrete) radom variables X 1, X 2,..., X is called a radom sample of size if the r.v. s have the same distributio ad are idepedet. We say that X 1, X 2,..., X are idepedet ad idetically distributed (i.i.d.). Note: We will also use the term radom sample to the set of observed values of the radom variables. Prior to selectig the sample ad makig the measuremets, we have with each beig a (ukow) radom quatity havig associated probability distributio f(x). After selectig the sample ad makig the measuremets, we have Note: I practice, it is ofte difficult to do radom samplig. However, radom samplig is basic to the use of the statistical iferetial procedures that we will discuss later. These procedures are used for aalyzig experimetal data, for testig hypotheses, for estimatig parameters (umerical characteristics of populatios), ad for performig quality cotrol i maufacturig. I each

situatio, we must somehow obtai covicig evidece that the data collected do approximate the coditios of radomess. 2 Example : I a maufacturig situatio, we have maufactured items comig off a assembly lie. Assume that the populatio of items that have bee completed is relatively large. We wat to check the quality of these items by selectig a radom sample of them ad makig measuremets o each item i the sample. If the sample is radom, the it has a good chace of beig represetative of the populatio, ad we ca obtai useful iformatio about the quality of the etire populatio. For example, we are iterested i kowig whether the average value of a certai measuremet is close to the specified target value. It is very ulikely that the sample average will be exactly equal to the populatio average, but it is likely to be close. The Samplig Distributio of the Sample Mea Def: A statistic is a radom variable which is a fuctio of a radom sample. The probability distributio associated with a statistic is called its samplig distributio. Example: Let X 1, X 2,..., X be a radom sample from a populatio 1 (probability distributio). The statistic X X i is called the i 1 sample mea. Sice The X i s are radom variables, the X is also a radom variable, with a samplig distributio. Some other examples of statistics are: 2 1 S 1 1) The sample variace, X i X i 1 2,

3 2) The sample media, X ~, Theorem 6.1: Let X 1, X 2,..., X be a radom sample from a distributio havig mea ad stadard deviatio. The the mea of the samplig distributio of is: 1 1 E X E X i X i 1 i 1 The variace of the samplig distributio depeds o the size of the populatio from which the sample is draw. If the populatio is of ifiite size, the 2 1 2 X 2 i 1 2. Note: The quatity (stadard deviatio of the samplig distributio of the sample mea) is also called the stadard error of the mea. It provides us with a measure of reliability of the sample mea as a estimate of the populatio mea. This term will be importat whe we discuss statistical iferece. Note: If the radom sample was selected from a ormal distributio (we write X 1, X 2,..., X ~ Normal(, ) ), the it ca be show that X ~ Normal,. Example: O page 134, Exercise 5.27. If I radomly select a sigle assembled piece of machiery from the populatio of assembled pieces, the time for assembly will be a radom variable X havig a Normal(µ = 12.9 mi., σ = 2.0 mi.) O the other had, if I select a radom sample of size 64 from the populatio, the distributio of,

the average assembly time for the sample of pieces, will have a distributio that is ( ) Note that the variability i the distributio of is oly oe-eighth the variability i the distributio of X. This is a importat cocept. 4 The followig theorem is EXTREMELY importat (as well as astoishig). This theorem provides the basis for our procedures for doig statistical iferece. Theorem 6.3: (Cetral Limit Theorem) If X 1, X 2,..., X are a radom sample from ay distributio with mea ad stadard deviatio X < +, the the limitig distributio of stadard ormal. as + is Note: Nothig was said about the distributio from which the sample was selected except that it has fiite stadard deviatio. The sample could be selected from a ormal distributio, or from a expoetial distributio, or from a Weibull distributio, or from a Beroulli distributio, or from a Poisso distributio, or from ay other distributio with fiite stadard deviatio. See, e.g., the example o pages 179-180. See also the illustratio o page 184. Note: For what will the ormal approximatio be good? For most purposes, if 30, we will say that the approximatio give by the Cetral Limit Theorem (CLT) works well.

Example: p. 187, Exercise 6.15. 5 Example: The fracture stregth of tempered glass averages 14 (measured i thousads of p.s.i.) ad has a stadard deviatio of 2. What is the probability that the average fracture stregth of 100 radomly selected pieces of tempered glass will exceed 14,500 p.s.i.? Example: Shear stregth measuremets for spot welds have bee foud to have a stadard deviatio of 10 p.s.i. If 100 test welds are to be measured, what is the approximate probability that the sample mea will be withi 1 p.s.i. of the true populatio mea? The T Distributio Use of the above discussio (Cetral Limit Theorem, etc.) to draw coclusios about the value of the populatio mea, µ, from a measured value of the sample mea,, has a flaw. If we have to deped o sample data for iformatio about the populatio mea, the we would ted ot to kow the value of the populatio stadard deviatio, either. We would also have to estimate σ. We eed to modify our theory somewhat to take this complicatio ito accout. We itroduce aother probability distributio that allows us to use sample data aloe to make ifereces about the populatio mea. Theorem 6.4: If is the mea of a radom sample of size take from a ormal distributio havig mea µ ad stadard deviatio σ, ad if ( ) is the sample variace, the the radom variable

6 ( ) has a t-distributio with degrees of freedom ν = 1. The t-distributio (which is actually a family of distributios, characterized by the degrees of freedom) has characteristics similar to those of the stadard ormal distributio, as we ca see from the figure o page 187. Note that for large d.f., the t(-1) distributio is very close to the stadard ormal distributio. I fact, the stadard ormal distributio provides a good approximatio to the t(-1) distributio for of size 30 or more. Note: Cut-off values ad various tail probabilities for the t- distributio, with various values for ν, may be foud i Table 4 o page 516. Note that i order to use this table, we must kow the degrees of freedom i the particular exercise. However, we will fid these values usig Excel. The Excel fuctios to be used would be ( ) ( ) ad ( ) Example: page 188. The Samplig Distributio of the Variace The above discussio provides us with the tools to do iferece about the value of a populatio mea. If we wat to do iferece about the value of a populatio variace,, the we eed to discuss the samplig distributio for the sample statistic,, that we use to estimate the populatio variace. For this, we eed to itroduce aother family of probability distributios, the chi-square family.

7 Theorem 6.5: If is the variace of a radom sample of size take from a ormal distributio with variace the the radom variable ( ) ( ) has a chi-square distributio with degrees of freedom ν = 1. Note: Cut-off values ad various tail-probabilities for the chi square distributio, with various values for ν, may be foud i Table 5 o page 517. Note that i order to use this table, we must kow the degrees of freedom i the particular exercise. However, we will fid these values usig Excel. The Excel fuctios to be used are ( ) ( ) ad ( ) Example: p. 190. The F-Distributio Whe we do aalysis of experimetal data, our coclusios about whether the experimetal treatmets had a effect will be based o a statistic which may be imagied as a sigal-to-oise ratio, with the sigal beig the treatmet effect (differeces amog the treatmet groups) ad the oise beig the variability of the data withi treatmet groups. The samplig distributio of this statistic is give i the followig theorem. This statistic may also be used to do iferece about the differeces betwee two populatio variaces.

8 Theorem 6.6: If ad are the variaces of idepedet radom samples of size ad, respectively, take from two ormal distributios havig the same variace, the the radom variable has a F distributio with parameters degrees of freedom) ad freedom). (the umerator (the deomiator degrees of Note: Cut-off values ad various tail-probabilities for the F distributio, with various values for ad, may be foud i Table 6 o pages 518-519 (ote that this table is a abbreviated versio of a F-table that would be used i practical situatios). Note that i order to use this table, we must kow the values of the two degrees-of-freedom parameters i the particular exercise. We may also fid probabilities ad quatiles usig Excel. We will come back to the F distributio later i the course.