Biostatistics for Med Students. Lecture 2

Similar documents
MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Common Large/Small Sample Tests 1/55

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

MA238 Assignment 4 Solutions (part a)

1 Inferential Methods for Correlation and Regression Analysis

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Chapter 13, Part A Analysis of Variance and Experimental Design

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Class 27. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

University of California, Los Angeles Department of Statistics. Hypothesis testing

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Sample Size Determination (Two or More Samples)

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 3

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

Data Analysis and Statistical Methods Statistics 651

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Chapter 1 (Definitions)

Chapter 6 Sampling Distributions

Expectation and Variance of a random variable

1036: Probability & Statistics

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

This is an introductory course in Analysis of Variance and Design of Experiments.

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Statistics. Chapter 10 Two-Sample Tests. Copyright 2013 Pearson Education, Inc. publishing as Prentice Hall. Chap 10-1

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

MidtermII Review. Sta Fall Office Hours Wednesday 12:30-2:30pm Watch linear regression videos before lab on Thursday

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

Sampling Distributions, Z-Tests, Power

Hypothesis Testing (2) Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Chapter 11: Asking and Answering Questions About the Difference of Two Proportions

Chapter two: Hypothesis testing

Chapter 22: What is a Test of Significance?

Formulas and Tables for Gerstman

Lecture 9: Independent Groups & Repeated Measures t-test

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Power and Type II Error

z is the upper tail critical value from the normal distribution

This chapter focuses on two experimental designs that are crucial to comparative studies: (1) independent samples and (2) matched pair samples.

Two sample test (def 8.1) vs one sample test : Hypotesis testing: Two samples (Chapter 8) Example 8.2. Matched pairs (Example 8.6)

Properties and Hypothesis Testing

Final Examination Solutions 17/6/2010

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Lesson 2. Projects and Hand-ins. Hypothesis testing Chaptre 3. { } x=172.0 = 3.67

STA6938-Logistic Regression Model

Math 140 Introductory Statistics

Computing Confidence Intervals for Sample Data

Last Lecture. Wald Test

Chapter 5: Hypothesis testing

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

Module 1 Fundamentals in statistics

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

STAT431 Review. X = n. n )

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Describing the Relation between Two Variables

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

6 Sample Size Calculations

Topic 18: Composite Hypotheses

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

Exam II Review. CEE 3710 November 15, /16/2017. EXAM II Friday, November 17, in class. Open book and open notes.

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Agenda: Recap. Lecture. Chapter 12. Homework. Chapt 12 #1, 2, 3 SAS Problems 3 & 4 by hand. Marquette University MATH 4740/MSCS 5740

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes

Stat 200 -Testing Summary Page 1

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Statistical inference: example 1. Inferential Statistics

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

LESSON 20: HYPOTHESIS TESTING

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Chapter 20. Comparing Two Proportions. BPS - 5th Ed. Chapter 20 1

Important Formulas. Expectation: E (X) = Σ [X P(X)] = n p q σ = n p q. P(X) = n! X1! X 2! X 3! X k! p X. Chapter 6 The Normal Distribution.

Statistics 20: Final Exam Solutions Summer Session 2007

Statistical Inference

Confidence Intervals for Association Parameters Testing Independence in Two-Way Contingency Tables Following-Up Chi-Squared Tests

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

GG313 GEOLOGICAL DATA ANALYSIS

Frequentist Inference

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

1 Models for Matched Pairs

CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 1 of 13 Chapter 1: Introduction to Statistics

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

Issues in Study Design

One-Sample Test for Proportion

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

y ij = µ + α i + ɛ ij,

Statisticians use the word population to refer the total number of (potential) observations under consideration

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

1 Constructing and Interpreting a Confidence Interval

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Transcription:

Biostatistics for Med Studets Lecture 2 Joh J. Che, Ph.D. Professor & Director of Biostatistics Core UH JABSOM JABSOM MD7 February 22, 2017 Lecture Objectives To uderstad basic research desig priciples ad data presetatio approaches To build a foudatio which will facilitate the active participatio i cliical research To fully grasp descriptive statistics To itroduce key cocepts of iferetial statistics To survey some commoly used statistical approaches To be prepared for the USMLE Step 1 biostat/epi questios 1

Outlie Lecture 1 (02/15/2017) The goal of statistics Itroductio to descriptive biostatistics Basic research desig priciples ad data presetatio approaches Lecture 2 (02/22/2017) Itroductio to iferetial statistics Commoly used statistical approaches The Goal of Statistics Samplig Descriptive Statistics Populatio [Parameters] (, ) Probability Iferece (Iferetial Statistics) Sample [Statistics] ( X, S) Descriptive Statistics 2

Effects ad Variability Birth Weight Birth Weight Y 3 Y 3 Y Y 1 Y Y 1 Y 2 Y 2 Group A Group B Group C Group A Group B Group C Note: Biological/cliical sigificace vs. statistical sigificace The Normal Distributio 50% 50% X ~ N (µ, 2 ) µ - 2 µ - µ µ + µ + 2 X 3

The Normal Distributio Stadard ormal distributio: Z ~ N( µ = 0, 2 = 1) 50% 50% -2-1 0 + 1 + 2 Z Give X ~ N(µ, 2 ), we have Z=(X - µ)/. Stadard Normal Table (From Zero to Z Uder The Normal Curve) z 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0 0.004 0.008 0.012 0.016 0.019 0.0239 0.0279 0.0319 0.0359 0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753 0.2 0.0793 0.0832 0.0871 0.091 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141 0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.148 0.1517 0.4 0.1554 0.1591 0.1628 0.1664 0.17 0.1736 0.1772 0.1808 0.1844 0.1879 0.5 0.1915 0.195 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.219 0.2224 0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549 0.7 0.258 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852 0.8 0.2881 0.291 0.2939 0.2969 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133 0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.334 0.3365 0.3389 1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3513 0.3554 0.3577 0.3529 0.3621 1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.377 0.379 0.381 0.383 1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.398 0.3997 0.4015 1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177 1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319 1.5 0.4332 0.4345 0.4357 0.437 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441 1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545 1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633 1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706 1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.475 0.4756 0.4761 0.4767 2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817 2.1 0.4821 0.4826 0.483 0.4834 0.4838 0.4842 0.4846 0.485 0.4854 0.4857 2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.489 2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916 2.4 0.4918 0.492 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936 2.5 0.4938 0.494 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952 µ=0 Z 2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.496 0.4961 0.4962 0.4963 0.4964 2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.497 0.4971 0.4972 0.4973 0.4974 2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.498 0.4981 8 2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986 4

AUC For Normal Distributio The Rule of Thumb: Withi oe s.d.: 68.27% (2/3) Withi two s.d.: 95.45% (95%) Withi three s.d.: 99.74% (99%) Samplig Distributio The distributio of idividual observatios versus the distributio of sample meas:.... Sample size:................................................. X populatio........... X 1 X 2 X 3 X X 5

Cetral Limit Theorem The distributio of sample meas (samplig distributio) from a populatio is approximately ormal as log as the sample size is large, i.e., 2 X X ~ N ( X, X ) Z ~ N(0,1) / 1. The populatio distributio ca be o-ormal. 2. Give the populatio has mea, the the mea of the samplig distributio,. X 3. If the populatio has variace 2, the stadard deviatio of the samplig distributio, or the stadard error (a measure of the amout of samplig error) is s. e.( X ). X Hypothesis Testig A Example: Normal serum creatiie level depeds o the populatio studied. From the literature a 4 th year JABSOM med studet foud that oe well-established study showed a average scr of 0.56 (with a stadard deviatio of 0.15 mg/dl) for 2 d trimester Caucasia pregat wome livig o the east coast. But based o her kowledge ad experiece, she believed that the µ of scr amog Japaese pregat wome i Hawaii seemed differet. She decided to test this by measurig scr of 49 local Japaese 2 d trimester pregat wome. 6

Hypothesis Testig (cot.) Rejectio regio: (1- ) No-rejectio regio µ 0 Critical value X Hypothesis Testig Basic steps of hypothesis testig: 1. State ull (H 0 :) ad alterative (H 1 :) hypotheses 2. Choose a sigificace level, α (usually 0.05 or 0.01) 3. Determie the critical (or rejectio) regio ad the o-rejectio regio, based o the samplig distributio ad uder the ull hypothesis 4. Based o the sample, calculate the test statistic ad compare it with the critical values 5. Make a decisio, ad state the coclusio 7

Oe-tailed vs. Two-tailed Test Right Sided Test Left Sided Test Two Sided Test Statistical Decisio: Errors & Power Truth H0 True H0 False Decisio Reject H0 1- Not reject H0 1- Type I Error ( ) - Type II Error ( ) - False positives, errors due to chace; Reject H 0 whe H 0 is true False egatives; Do t reject H0 whe H1 is true Power: ( 1- ) = 1 - P (Type II Error) 8

Desig factors: - effect differece - power - alpha level - std. dev. - sample size Statistical Decisio µ 0 µ 1 X Now, what is a p-value? Iterpretatio: p-values The p-value is the probability of obtaiig a result as extreme or more extreme tha the oe observed based o the curret sample, give the ull hypothesis is true. Note: Statistically sigificat does ot ecessarily mea biologically (or cliically) sigificat!!! 9

Hypothesis Testig (cot.) Example (cot.): Say, the average scr of the sample of 49 locals is 0.60 mg/dl ad the populatio stadard deviatio is 0.15 mg/dl (based o the literature). Step 1. State H 0 : ad H 1 : H 0 : μ scr = 0.56 vs. H 1 : μ scr 0.56 Step 2. Choose a sigificat level, say, α =0.05. Step 3. Calculate the test statistic: Z X / scr 0.60 0.56 1.87. 0.15/ 49 Hypothesis Testig (cot.) Step 4. Determie the critical regio ad the o-rejectio regio: The critical value: ± 1.96. The rejectio regio: Z 1.96. The o-rejectio regio: Z < 1.96. Step 5. Make a decisio, based o the sample, ad state the coclusio: As the test statistic Z = 1.87 < 1.96, it is withi the o-rejectio regio. Therefore, we do ot reject the ull hypothesis. We coclude that there is o evidece that the average scr amog local Japaese 2 d trimester wome is differet from 0.56 mg/dl. 10

Example (cot.): Cofidece Itervals A med studet wated to determie the average serum creatiie level amog secod trimester pregat Japaese wome i Hoolulu. From the literature she foud that for similar populatios is about 0.15 mg/dl, but she could ot fid ay iformatio o µ of scr amog local pregat wome. She measured 49 Japaese 2 d trimester wome ad the sample mea scr was 0.60 mg/dl. What should be the 95% CI for µ? Cofidece Itervals CIs for µ: 90% CI : X 1.645 95% CI : X 1.960 99% CI : X 2.575 11

The t - Distributio A small sample from ormal distributio Ukow populatio stadard deviatio, X -µ t= with -1 degrees of freedom (d.f.). s/ The (Studet s) t-distributio is very similar to ormal distributio, with heavier tails. t Table (tail probabilities of the t-distributios) Degrees of Freedom 2Q (Q) 0.10 (0.05) 0.05 (0.025) 0.01 (0.005) 0.005 (0.0025) 0.001 (0.0005) 1 6.3138 12.706 63.657 127.32 636.62 2 2.9200 4.3026 9.9251 14.0911 31.6075 3 2.3534 3.1825 5.8408 7.4533 12.9258 4 2.1318 2.7764 4.6040 5.5980 8.6087 5 2.0151 2.5706 4.0323 4.7734 6.8701 6 1.9432 2.4469 3.7075 4.3169 5.9590 7 1.8946 2.3646 3.4995 4.0293 5.4088 8 1.8595 2.3060 3.3555 3.8326 5.0421 9 1.8331 2.2621 3.2498 3.6895 4.7805 10 1.8125 2.2281 3.1693 3.5814 4.5871 11 1.7959 2.2010 3.1057 3.4967 4.4374 12 1.7823 2.1788 3.0545 3.4285 4.3184 13 1.7709 2.1604 3.0122 3.3726 4.2215 14 1.7613 2.1448 2.9768 3.3258 4.1412 15 1.7530 2.1314 2.9467 3.2862 4.0735 16 1.7459 2.1199 2.9207 3.2521 4.0157 17 1.7396 2.1098 2.8982 3.2226 3.9659 18 1.7341 2.1009 2.8784 3.1967 3.9224 19 1.7291 2.0930 2.8609 3.1738 3.8841 20 1.7247 2.0860 2.8453 3.1535 3.8502 21 1.7207 2.0796 2.8313 3.1353 3.8200 22 1.7171 2.0739 2.8187 3.1189 3.7928 23 1.7139 2.0687 2.8073 3.1041 3.7683 24 1.7109 2.0639 2.7969 3.0906 3.7461 25 1.7081 2.0595 2.7874 3.0783 3.7258 26 1.7056 2.0555 2.7787 3.0670 3.7073 27 1.7033 2.0518 2.7707 3.0566 3.6903 28 1.7011 2.0484 2.7632 3.0470 3.6746 29 1.6991 2.0452 2.7564 3.0382 3.6601 30 1 6973 2 0423 2 7500 3 0299 36466 12

Oe Sample t Test Problem: Neoates gai (o average) 100 grams/wk i the first 4 weeks. A sample of 25 ifats give a ew utritio formula gaied 112 grams/wk (o average) with stadard deviatio = 30 grams. Is this statistically sigificat? Test: H0: µ = 100 H1: µ = 100 Solutio: Oe Sample t - Test No-rejectio Regio 88 100 112-2.06 0 2.02.06 t= 112 100 30 / 25 = 2.0 < t 24, 0.025 = 2.06 Therefore, do ot reject (p-value=0.057). 13

Two Idepedet Sample t - Test Populatio 1 (µ 1, ) Sample 1 (X 1, s 1 ) Populatio 2 (µ 2, ) Sample 2 (X 2, s 2 ) Test: H0: µ 1 = µ 2 versus H1: µ 1 µ 2, assumig 2 1 = 22 = 2. Two Idepedet Sample t - Test Test statistic: t= (x1 x2) (µ 1 -µ 2 ) with ( 1 + 2 2)d.f. Sx 1 x 2 2 2 2 s 1 ( 1 1) + s 2 ( 2 1) 1 + 2 2 Sp = Sx 1 x 2 = Sp 1 + 1 1 2 ( s X s s 1 ) 14

Two Idepedet Sample t - Test Problem: Two headache remedies Brad A: X1 = 20.1, s 1 = 8.7, 1 = 12 Brad B: X2 = 18.9, s 2 = 7.5, 2 = 12 Test: H0: µ1 = µ2 µ1 - µ2 = 0 H1: µ1 = µ2 Two Idepedet Sample t - Test Solutio: t= x1 x2 = Sx1 x2 20.1 18.9 Sx1 x2 = 1.2 Sx1 x2 2 (12 1) 8.7 + ( 12 1) 7.5 Sp = 2 2 12 + 12-2 = 832.59 + 618.75 22 = 65.97 Sp = 65.97 = 8.12 Sx1 x2 = 8.12* 1 + 1 12 12 = 3.3 t = 1.2 3.3 = 0.36 15

Two Idepedet Sample t - Test Solutio (cot.): Therefore, do ot reject the ull, i.e., o statistically sigificat differece was foud betwee the two remedies. -2.07 0 0.36 2.07 1.2 t = = 0.36, 3.3 t 0.05, 22 = 2.07 Two Paired Sample t - Test Populatio 1: µ 1 Sample 1 Populatio 2: µ 2 Sample 2 Test: H0: µ 2 -µ 1 = 0 versus H1: µ 2 - µ 1 0 16

Two Paired Sample t - Test Same subject for both treatmets: -- placebo (X 1 ) versus active (X 2 ) -- before (X 1 ) versus after (X 2 ) Itra idividual compariso, e.g., left (X 1 ) versus right (X 2 ) Approach: reduce data to oe sample t-test problem. First, calculate the differece, d = X 2 -X 1, for each subject; the, perform oe sample t-test o the d scores, with d.f.=-1. Two Paired Sample t - Test Test: H 0 : average differece is zero Test statistic: t = d - 0 Sd d = i di Sd = Sd / Sd = ( d d) 2-1 17

Two Paired Sample t - Test Problem: Does the medicatio sigificatly lower blood pressure? Subject Reactio to Placebo Reactio to Med. 1 150 130 2 180 148 3 148 126 4 172 150 5 160 136 Two Paired Sample t - Test Subject Reactio to Placebo Reactio to Medicatio 1 150 130 20 2 180 148 32 3 148 126 22 4 172 150 22 5 160 136 24 Total 120 d 18

Two Paired Sample t - Test Sd = i di Solutio: d = ( d d) 2-1 = 120 = 24 5 = 22 Sd = Sd / = 22 / 5 = 2.1 t = d - 0 Sd = 24 2.1 = 11.4 Two Paired Sample t - Test 11.4-2.78 0 2.78 At 5% level, t - table = 2.78. For a two-sided test: reject! There is a statistically sigificat effect! 19

Cotigecy Tables The most commo two-way tables: 2-by-2 tables For example, smokig ad lug cacer status Explaatory variable Smokig Yes No Lug cacer Yes No 11 12 21 22.1.2 Measures of associatio RD: risk differece (prospective studies) RR: relative risk (prospective studies) OR: odds ratio (prospective or retrospective) 1. 2... Respose variable RR RD 11 1 12 11 1 21 2 11 OR 22 21 21 2 Iterpretatios Iterpretatio of estimates of RR/OR: If RR/OR is above 1, we say there is a positive relatioship/associatio betwee risk factor ad outcome. If RR/OR is below 1, we say there is a egative relatioship/associatio betwee risk factor ad outcome. Iterpretatio of CI of populatio RR/OR: If the etire iterval is above 1, we coclude that the probability that havig the outcome of iterest is higher i those with risk factor. If the etire iterval is below 1, we coclude that the probability is lower i those with risk factor. We coclude that there is o evidece that the probabilities of the outcome of iterest are differet betwee those with vs. without risk factor, if the iterval cotais 1. 20

2 Tests Expected ad observed frequecies are compared Goodess of Fit of a sigle variable Test of Idepedece of two variables Percetiles of the 2 - Distributios Table Degrees of freedom 0.100 0.050 0.010 0.005 0.001 1 2.7055 3.8415 6.6349 7.8944 10.828 2 4.6052 5.9915 9.2102 10.5963 13.8173 3 6.2514 7.8147 11.3447 12.8383 16.2672 4 7.7795 9.4877 13.2768 14.8605 18.4667 5 9.2363 11.0705 15.0864 16.7495 20.5165 6 10.6447 12.5916 16.8118 18.5479 22.4599 7 12.0171 14.0671 18.4751 20.2776 24.3219 8 13.3616 15.5073 20.0900 21.9549 26.1237 9 14.6836 16.9190 21.6658 23.5891 27.8768 10 15.9872 18.3071 23.2095 25.1886 29.5871 11 17.2750 19.6751 24.7250 26.7569 31.2628 12 18.5493 21.0261 26.2170 28.2999 32.9099 13 19.8120 22.3621 27.6882 29.8195 34.5283 14 21.0641 23.6848 29.1409 31.3198 36.1258 15 22.3071 24.9958 30.5778 32.8014 37.6973 16 23.5418 26.2962 31.9998 34.2675 39.2520 17 24.7690 27.5871 33.4086 35.7186 40.7908 18 25.9894 28.8693 34.8052 37.1562 42.3131 19 27.2036 30.1435 36.1912 38.5823 43.8206 20 28.4120 31.4104 37.5660 39.9970 45.3141 21 29.6151 32.6706 38.9321 41.4017 46.7982 22 30.8133 33.9244 40.2893 42.7955 48.2678 23 32.0069 35.1725 41.6383 44.1808 49.7262 24 33.1962 36.4151 42.9797 45.2291 51.1831 25 34.3816 37.6525 44.3144 46.9280 52.6165 21

2 Observed: Tests of Idepedece C1 C2 R1 A C A+C R2 B D B+D A+B C+D A+B+C+D Expected: E = (row total) * (colum total) (grad total) d.f. = (r-1)*(c-1) 2 Tests of Idepedece Problem: Is the NQO1 gee associated with edemic ephropathy (EN)? Observed: NQO1 Mutat NQO1 WT EN cases EN cotrols 50 20 70 10 20 30 60 40 100 22

2 Tests of Idepedece Solutio: O E 50? 10? 20? 20? 50 20 70 10 20 30 60 40 100 60 * 70 E = 100 = 42 2 Solutio: Tests of Idepedece 2 2 O E (O-E) (O-E) (O-E) / E 50 42 8 64 1.52 10 18-8 64 3.56 20 28-8 64 2.29 20 12 8 64 5.33 with (R-1) * (C-1) d.f. 2 = (2-1) * (2-1) d.f. = 1 d.f. 2 = 12.70 23

2 Tests of Idepedece Solutio: 2 = 12.70 2 1, 0.001 = 10.83 NQO1 is ot idepedet of EN (a associatio exists), p < 0.001. Fisher s Exact Tests Fisher s Exact Test for small expected frequecies If ay expected frequecies are < 2 or if half of the expected frequecies are < 5, you should use Fisher s Exact Test istead of 2. 24

Fisher s Exact Tests Fisher s Tea Tastig Experimet Poured First (Guessed) Milk Tea Poured First (Actual) Milk Tea 3 1 1 3 4 4 4 4 8 Fisher s Exact Tests Fisher s Tea Tastig Experimet (cot.): Based o hypergeometric distributio, the p-value is the sum of all probabilities for tables that give eve more evidece i favor of the lady s claim. Poured First (Actual) Poured First (Guess) Milk Tea Milk 0 4 Tea 4 0 Poured First (Guess) Poured First (Guess) Milk Tea Milk Tea Milk 3 1 Milk 4 0 Tea 1 3 Tea 0 4 p-value = P (1,1) (3) + P (1,1) (4) = 0.229 + 0.014 = 0.243 Therefore, the experimet did ot establish a sigificat associatio betwee the actual order of pourig ad the woma s guess. 25

A Roadmap of Commo Statistical Tools Descriptive Statistics Populatio Parameters (, ) Samplig Probability Iferece (Iferetial Statistics) Outcome Descriptive Statistics Sample Statistics (X, S) Normal, t, ad Chi-squared Samplig distributio Cetral Limit Theorem CI, hypothesis testig p-value, Type I & II error Cotiuous variable Cotiuous idepedet variables Liear regressio Time to evet data Survival aalysis Oe sample t-test Two samples t-test Geeral liear models Categorical idepedet variables Aalysis of variace (ANOVA) If a mixture of cot. ad categ. idep. variables, If o distributio assumptios, No-parametrics If paired Categorical variable Oe categorical variable Chi-sq GOF test Two categorical variables Chi-sq idepedece test Fisher s exact test( exp.) Mixture of idepedet variables Logistic regressio (biary outcome) Others: logitudial aalysis, factor aalysis, cluster aalysis, pricipal compoet aalysis, etc. Collaboratio with A Biostatisticia 1. Early ad ofte 2. Start the discussio whe you have the iitial idea 3. It is a iterative process 4. A collaborative effort: equal ad fair 5. Ask questios so you ca discuss about the geeral statistical approach without the statisticia 6. Educatio ad traiig i research desig ad biostatistics http://biostat.jabsom.hawaii.edu NIH U54MD007584 NIH G12MD007601 NIH P20GM103466 26

Sample USMLE Step 1 Questios: Questio 4. The stadard error of the mea: a. is less tha the stadard deviatio of the populatio. b. decreases as the sample size icreases. c. measures the variability of the mea from sample to sample. d. all of the above (a, b, ad c) e. oe of the above (a, b, or c) Sample USMLE Step 1 Questios: Questio 5. I a hypothesis test, the probability of obtaiig a value of the test statistic equal to or more extreme tha the value observed, give that the ull hypothesis is true, is referred to as: a. Type I error. b. The p-value. c. Statistical power. d. Type II error. e. Critical value. 27

Sample USMLE Step 1 Questios: Questio 6. The power of a statistical test is the probability of rejectig the ull hypothesis whe it is. Whe you icrease alpha, the power of the test will. a. true / decrease b. false / icrease c. true / icrease d. false / decrease e. true / stay costat 28