KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

Similar documents
MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistics 511 Additional Materials

Stat 421-SP2012 Interval Estimation Section

Topic 10: Introduction to Estimation

Topic 9: Sampling Distributions of Estimators

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

MATH/STAT 352: Lecture 15

Random Variables, Sampling and Estimation

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Chapter 6 Sampling Distributions

Simulation. Two Rule For Inverting A Distribution Function

Final Examination Solutions 17/6/2010

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Expectation and Variance of a random variable

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

Estimation of a population proportion March 23,

Topic 9: Sampling Distributions of Estimators

Confidence Intervals

Topic 9: Sampling Distributions of Estimators

Estimation for Complete Data

STATISTICAL INFERENCE

Module 1 Fundamentals in statistics

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Parameter, Statistic and Random Samples

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Statistical inference: example 1. Inferential Statistics

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 3

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

AMS570 Lecture Notes #2

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Properties and Hypothesis Testing

7.1 Convergence of sequences of random variables

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

f(x i ; ) L(x; p) = i=1 To estimate the value of that maximizes L or equivalently ln L we will set =0, for i =1, 2,...,m p x i (1 p) 1 x i i=1

Sampling Distributions, Z-Tests, Power

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 5. Materials Covered: Chapter 6 Suggested Exercises: 6.7, 6.9, 6.17, 6.20, 6.21, 6.41, 6.49, 6.52, 6.53, 6.62, 6.63.

Approximations and more PMFs and PDFs

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

Statisticians use the word population to refer the total number of (potential) observations under consideration

Chapter 6 Principles of Data Reduction

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

BIOSTATS 640 Intermediate Biostatistics Frequently Asked Questions Topic 1 FAQ 1 Review of BIOSTATS 540 Introductory Biostatistics

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

STAT 155 Introductory Statistics Chapter 6: Introduction to Inference. Lecture 18: Estimation with Confidence

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

4. Partial Sums and the Central Limit Theorem

Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.

Class 27. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

7.1 Convergence of sequences of random variables

Statistics 300: Elementary Statistics

(7 One- and Two-Sample Estimation Problem )

Frequentist Inference

Rule of probability. Let A and B be two events (sets of elementary events). 11. If P (AB) = P (A)P (B), then A and B are independent.

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Chapter 18 Summary Sampling Distribution Models

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

Binomial Distribution

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

Lecture 2: Poisson Sta*s*cs Probability Density Func*ons Expecta*on and Variance Es*mators

STAT 203 Chapter 18 Sampling Distribution Models

Chapter 8: Estimating with Confidence

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

MEASURES OF DISPERSION (VARIABILITY)

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Chapter 23: Inferences About Means

Economics Spring 2015

µ and π p i.e. Point Estimation x And, more generally, the population proportion is approximately equal to a sample proportion

Lecture 7: Properties of Random Samples

Lecture 33: Bootstrap

Mathematical Statistics - MS

Topic 8: Expected Values

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Introducing Sample Proportions

Lecture 6: Coupon Collector s problem

Parameter, Statistic and Random Samples

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Computing Confidence Intervals for Sample Data

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

(6) Fundamental Sampling Distribution and Data Discription

STAT431 Review. X = n. n )

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

Summary. Recap ... Last Lecture. Summary. Theorem

1 Inferential Methods for Correlation and Regression Analysis

6. Sufficient, Complete, and Ancillary Statistics

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Understanding Dissimilarity Among Samples

AAEC/ECON 5126 FINAL EXAM: SOLUTIONS

Distribution of Random Samples & Limit theorems

Transcription:

We have previously leared: KLMED8004 Medical statistics Part I, autum 00 How kow probability distributios (e.g. biomial distributio, ormal distributio) with kow populatio parameters (mea, variace) ca give aswer to questios as for example Discrete distributio: Give =0 childbirths ad probability of low birth weight p=0., what is the probability to observe at least 3 low birth childre? Model: X ~ bi(, p ). P( X 3 = 0, p= 0. ) Estimatio Harald Johse, Sept 00 Cotiuous distributio: Give that the birth weight X is ormally distributed with mea μ = 3750 g ad stadard deviatio σ = 500 g. What is the probability that a ewbor weighs is at least 54 grams? Model: N ~ N( μ, σ ) = N( 3750, 500 ). 54 3750 P( X 54 μ = 3750, σ = 500 ) = P Z 500 Populatio ad sample New questios How ca a sample (NO: utvalg) be used to estimate ukow parameters i a probability distributio? How to estimate the cetral tedecy i a distributio? Which oe is the best measure of the cetral tedecy? How to estimate variability (variace)? How to place a iterval aroud a poit estimate to idicate how sure the estimate is? A populatio (statistical populatio, target populatio) is the complete set of the possible measuremets, or the record of some qualitative trait correspodig to the etire collectio of uits for which ifereces ca be made. A sample is a limited subset of a populatio that is actually collected i the course of a ivestigatio. The objective of the process of data collectio (samplig) is to draw coclusios about the populatio. Essetial questios: Which populatio? How is the samplig doe? Give a sample, to which populatio are the coclusios valid? 3 4

Some commo samplig procedures A radom sample is a selectio of some members of a populatio such that members are idepedetly chose ad each member has a kow ozero probability of beig selected A simple radom sample is a radom sample i which each member has the same probability of beig selected Stratified samplig: the populatio is divided ito homogeous subsets (strata) based upo specified traits of the members(sex, age,.) ad subsequetly radom samplig withi each stratum. ad there are more Poit estimatio Give a populatio ad a represetative sample. Based o the sample, the challege is to estimate ukow quatities (parameters) i the populatio distributio. Recall that a parameter is a fixed, usually ukow umeric quatity ad accordigly o-radom. Examples: I the biomial distributio X ~ bi(, p )the parameter is p I a ormal distributio X ~N( μ, σ ) the parameters are μ og σ I the Poisso distributio X~Po( λ ) the parameter is λ I the geeral case the Greek letter θ (theta) is used as symbol for a parameter. 5 6 Poit estimatio cot Suppose simple radom samplig of size from a effectively ifiite populatio with populatio mea μ ad variace σ. This gives idetically distributed radom variables X, X,..., X (ot ecessarily ormally distributed). A estimator ˆθ (theta hat)is a mathematical fuctio of the radom variables ad is used to estimate the ukow value of the parameter θ. The estimator ˆθ is a radom variable with a probability distributio. Whe a radom sample becomes available from the populatio ad ˆθ is computed from the data set, the umeric value obtaied is called a estimate of θ from the particular sample. Ulike the estimator, a estimate is oradom! The sample arithmetic mea is a ituitive or atural estimator of the populatio mea μ : μˆ X + X +... + X = = X The properties of the estimator ˆμ = X Expected value (mea, NO: forvetig): X + X +... + X ( ˆ ) = ( ) = = = E μ EX E μ μ Hece, the estimator is ubiased (NO: forvetigsrett).. a good property Variace (provided idepedet observatios): X + X +... + X σ ˆ = = = = Var( μ) Var( X ) Var σ It follows that Var( μˆ ) 0whe a good property too 7 8

Estimator cot. For oe ad the same parameter there may exist several ubiased estimators. I symmetric distributios with oly oe mode (NO: modalverdi) the sample mea the sample media the sample mode are all ubiased estimators for the populatio mea, but their variaces may be differet Usually, the estimator havig the least variace is chose. For ormally distributed data that estimator will be the sample mea. (Sometimes a ubiased estimator is chose for the cost of a estimator with less variace.) The distributio of the mea We have show that if X,X,...,X are idepedet ad ormally distributed with meaμ ad variace σ, the σ X is ormally distributed with mea μ ad variace It ca be show that eve if X,X,...,X are ot ormally distributed, X will be approximately ormally distributed whe is sufficietly large. If the distributio of X is reasoably symmetric without too may modes, ad ot too peculiar, this setece will practically hold as early as from >0. 9 0 The variace of the arithmetic mea Mea of o-ormally distributed data Fig. 3: The lower curve shows a ormal distributio with mea (expectatio)3 ad variace.44, (SD=.). The arithmetic mea of a radom sample of size 6 will be ormally distributed with mea 3 ad variace.44/6=0.09, (SD=0.3). This is the peaked, arrow distributio. Fig. 3: Travellig times from home to campus. 300 radom samples of size, 4, 9 ad 6. (Aale 998)

Iterval estimatio A poit estimate (of a ukow parameter) is a umeric value obtaied by puttig observed sample values ito the mathematical formula for the estimator. Questios of iterest: How precise it the estimate? Is it possible to calculate a iterval coverig the estimated parameter with a specified probability? The aswer is NO! o But there is a recipe tellig how such a probability iterval, cofidece iterval, ca be costructed. But as soo as observed values are used ad a umeric iterval is calculated, that iterval ca ot loger be iterpreted withi a probability framework. Costructio of cofidece itervals Suppose X,X,...,X are idepedet ad ormally distributed with mea μ ad variace σ. This gives ˆ μ μ Z = ~N(,0) σ ˆ μ μ < = P zα/< z α/ = α σ ad P( z Z z ) α/ α/ Because zα / = z α /, σ σ P ˆ μ z α/ < μ ˆ μ + z α/ = α, which is a probability statemet ad the recipe to costruct a ( α )-cofidece iterval for the parameter μ 3 4 σ σ The iterval is ˆ μ z α/,z α/ + Suppose repeated samples of size. Each time we estimate a ew ˆμ = x, ad a ew cofidece iterval. Choosig α = 0.05 ad exchagig ˆμ with X, we have σ σ P X.96 < μ X +.96 = 0.05 = 0.95 We arrive at the radom iterval σ σ X.96, X +.96 with σ fixed legth.96 Factors affectig the legth? Siceμ is ad remais ukow, we will ever kow which itervals i fact do cotai the parameter!! 5 6

Cofidece iterval, ormal distributio with ukow variace The same argumet as above, but the populatio variace is estimated by the sample variace σ = s = (Xi X ) i= ad we arrive at the t-distributio with - degrees of freedom givig s s P X t, α/ < μ X + t, α/ = α ad get the radom iterval s s X t, α/ < μ X + t, α/ with s radom legth t, α / (What is radom.?) Approximatio to ormal distributio Biomial series of trials (discrete distributio) i) Each trial yields oe of two outcomes techically called success (A) ad failure (A*) ii) For each trial, the probability of success P(A) is the same ad is deoted p=p(a). The probability of failure is the P(A*) = - p ad is deoted p, so that p + q =. iii) Trials are idepedet. The probability of success i a trial does ot chage give ay amout of iformatio about the outcomes i other trials. iv) The umber if success, X, is observed i trials. k P( X = k ) = p ( p) k k 7 8 Cofidece iterval for p X p( p) ˆp =, E( p) ˆ = p, Var( p ˆ ) = (cosistet estimator) Stadardisig ˆp by subtractio of mea ad divisio to stadard deviatio: We defie The pˆ p pˆ p Z = = SD( p ˆ ) p( p ) atoutcome A, P( A) = p I = 0atoutcome A*, P( A*) = p= q i i= X = I = I + I +... + I is the umber of outcomes A i trials X I + I +... + I Note that ˆp = = is s a sum of several idepedet evets, each oe without domiace to X. The cetral limit theorem ow implies that as icreases, pˆ p pˆ p Z = = will coverge to the stadard ormal SD( p ˆ ) p( p ) distributio. The approximatio works especially well if p( ˆ p ˆ ) > 5 Whe the coditio above is met, the ( α ) cofidece iterval for p is approximately ˆp± z p( p) / α / x A umerical result is achieved by replacig p with a estimate ˆp = (ote small x, observed value of X). 9 0

Example 6.44, prevalece of breast cacer amog wome 50 54: Radom sample =0000, Observed umber of cacer x = 400 Poit estimate of prevalece: the target populatio) 400 ˆp = = 0.040 (estimated prevalece i 0000 p( ˆ p ˆ ) = 0000 0.04 0.96 = 38.4 > 5, approximatio to ormal distributio applies. 0.95-cofidece iterval estimate: p ˆ z p( ˆ p) ˆ /, p ˆ + z p( ˆ p) ˆ / ( 0.975 0.975 ) =( 0.040.96 0.04 0.96 / 0000, 0.040 +.96 0.04 0.96 / 0000 ) = ( 0.040 0.004,0.040 + 0.004) = ( 0.036,0.044) Suppose we kow that the prevalece i the populatio is 0.0. How to iterpret the fidigs above? Example, exercise 4.40 Sample size =00 Primary evet, A: bacteriuria, P(A) = p = 0.05 X: umber of wome havig bacteriuria Questio: What is P( X 3) Model: X bi(,p) P( X 3) = P( X < ) = ( P( X = 0) + P( X = ) + P( X = ) ) 00 0 00 P( X = 0) = 0.05 0.95 = 0.006 0 00 99 P( X = ) = 0.05 0.95 = 0.03 00 98 P( X = ) = 0.05 0.95 = 0.08 P( X < ) = 0.006 + 0.03 + 0.08 = 0.8 P( X 3 ) = 0.8 = 0.88 Approximatio to ormal distributio p( p ) = 00 0.05( 0.05 ) = 4.75, (borderlie for.a.) E X = p = 00 0.05 = 5 [ ] [ ] Var X = p( p ) = 4.75 3 5 0.5 P( X 3) P( Z =.5) = P( Z.5) = 0.875 4.75 reasoably fair compared to 0.88. Fial commet, small sample cases: If is ot sufficietly large for the cetral limit theorem to apply, or approximatio to the ormal distributio does t work, exact methods have to be used. 3 4