Stat 642, Lecture notes for 01/27/ d i = 1 t. n i t nj. n j

Similar documents
ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Lecture 4 Hypothesis Testing

Basically, if you have a dummy dependent variable you will be estimating a probability.

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Chapter 13: Multiple Regression

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Lecture 6 More on Complete Randomized Block Design (RBD)

A be a probability space. A random vector

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Topic- 11 The Analysis of Variance

x = , so that calculated

NUMERICAL DIFFERENTIATION

Randomness and Computation

Chapter 3 Describing Data Using Numerical Measures

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Estimation: Part 2. Chapter GREG estimation

Economics 130. Lecture 4 Simple Linear Regression Continued

Homework Assignment 3 Due in class, Thursday October 15

F statistic = s2 1 s 2 ( F for Fisher )

First Year Examination Department of Statistics, University of Florida

NEW ASTERISKS IN VERSION 2.0 OF ACTIVEPI

18.1 Introduction and Recap

Statistical tables are provided Two Hours UNIVERSITY OF MANCHESTER. Date: Wednesday 4 th June 2008 Time: 1400 to 1600

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

STAT 511 FINAL EXAM NAME Spring 2001

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

Goodness of fit and Wilks theorem

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

STK4080/9080 Survival and event history analysis

Global Sensitivity. Tuesday 20 th February, 2018

Statistics II Final Exam 26/6/18

Joint Statistical Meetings - Biopharmaceutical Section

SDMML HT MSc Problem Sheet 4

Chapter 11: Simple Linear Regression and Correlation

Appendix B: Resampling Algorithms

Feature Selection: Part 1

GROUP SEQUENTIAL TEST OF NON-PARAMETRIC STATISTICS FOR SURVIVAL DATA

Convergence of random processes

Probability Theory (revisited)

Comparison of Regression Lines

Introduction to Regression

Lecture 3: Probability Distributions

Negative Binomial Regression

Statistics for Economics & Business

experimenteel en correlationeel onderzoek

Sampling Theory MODULE VII LECTURE - 23 VARYING PROBABILITY SAMPLING

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor

Sampling Theory MODULE V LECTURE - 17 RATIO AND PRODUCT METHODS OF ESTIMATION

Non-Mixture Cure Model for Interval Censored Data: Simulation Study ABSTRACT

/ n ) are compared. The logic is: if the two

CS-433: Simulation and Modeling Modeling and Probability Review

Homework 9 STAT 530/J530 November 22 nd, 2005

STAT 3008 Applied Regression Analysis

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Bayesian predictive Configural Frequency Analysis

Multiple Choice. Choose the one that best completes the statement or answers the question.

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

CHAPTER 6 GOODNESS OF FIT AND CONTINGENCY TABLE PREPARED BY: DR SITI ZANARIAH SATARI & FARAHANIM MISNI

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Chapter 14: Logit and Probit Models for Categorical Response Variables

ANOVA. The Observations y ij

4.3 Poisson Regression

Statistics for Business and Economics

Methods in Epidemiology. Medical statistics 02/11/2014

Chapter 8 Indicator Variables

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Jon Deeks and Julian Higgins. on Behalf of the Statistical Methods Group of The Cochrane Collaboration. April 2005

Linear Regression Analysis: Terminology and Notation

SELECTED PROOFS. DeMorgan s formulas: The first one is clear from Venn diagram, or the following truth table:

Chapter 20 Duration Analysis

Lecture 3 Stat102, Spring 2007

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

Chapter 14 Simple Linear Regression

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Applied Stochastic Processes

Limited Dependent Variables

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Population element: 1 2 N. 1.1 Sampling with Replacement: Hansen-Hurwitz Estimator(HH)

Temperature. Chapter Heat Engine

arxiv: v2 [stat.me] 26 Jun 2012

Transcription:

Stat 642, Lecture notes for 01/27/05 18 Rate Standardzaton Contnued: Note that f T n t where T s the cumulatve follow-up tme and n s the number of subjects at rsk at the mdpont or nterval, and d s the observed number of cases durng nterval, so that f ˆλ d T, then lettng w n / n j, ˆλw d n n t nj 1 t n j d whch s the number of cases per person year of exposure for the entre cohort. If w represents the proporton of the standard populaton n each age category, ˆλw would represent the number of cases per person year of exposure f the cohort had the same dstrbuton as the standard populaton. The prmary dffculty wth the CMF s that f, for some, var(ˆλ ) s large (e.g. T s small), then var(cmf) can be large. var(cmf) w 2 var(ˆλ ) ( w λ )2 w 2 d /T 2 ( w λ )2 Standardzed Mortalty Rato: The CMF mposes the standard populaton dstrbuton on the observed hazard rates. Conversely, one could mpose the standard rates on the observed populaton. Standardzed Mortalty Rato (SMR) d T λ Observed Events Expected Events under Standardzed rates The Varance of SMR s stable. If rate ratos λ /λ mnmum varance estmate of the common rato. are ndependent of, then the SMR s the var(smr) d ( T λ )2 and s usually smaller than var(cmf). In the above example, SMR 5.4/3.9 1.38 (assumng that w represent observed proportons n group B). Whle the SMR has smaller varance than the CMF, the SMR s potentally based.

Stat 642, Lecture notes for 01/27/05 19 Example: (Breslow and Day II, Table 2.13) (An example of Smpson s Paradox) Suppose that we have two subject cohorts. Age Groups Cohort I 20-44 45-64 total d (observed) 100 1600 1700 expected 200 800 1000 SMR (%) 50% 200% 170% Cohort II d (observed) 80 180 260 expected 120 60 180 SMR (%) 67% 300% 144% SMR I SMR II.75.67 1.18 Even though, n each age group, the SMR for cohort I s less than the SMR for cohort II, the the relatonshp for the aggregate SMR s reversed. Snce the SMRs are qute dfferent n older and younger subjects, the overall SMR, whch s an average of the two, s not a very meanngful summary measure. Note that n Cohort I, 200/1000 20% of expected events are n the low SMR group. In Cohort II, 120/180 67% of expected events are n the low SMR group. We wll dscuss SMR and CMF n more detal n the second half of the course. Measures of Assocaton. Gven two groups, exposed and not exposed, and two hazard rates λ 1 and λ 0 for the two groups, respectvely, we want to summarze the dfference n rsk. There are two mmedate ways to do ths. Excess Rsk: b λ 1 λ 0 addtve model Rsk Rato: r λ 1 /λ 0 multplcatve model Wth only two groups, t makes very lttle dfference whch of these measures s used for modelng purposes (one s just a re-parameterzaton of the other). As a descrptve measure, excess rsk s only useful f the baselne (un-exposed) rate s also gven. Rsk rato s typcally a more easly nterpreted measure. In most cases, rsk rato make more bologcal sense, although f, for example, the mechansm by whch exposure causes dsease s ndependent of that for non-exposed cases, then the addtve model mght make more sense.

Stat 642, Lecture notes for 01/27/05 20 Attrbutable Rsk: It s sometmes useful to consder the proporton of rsk whch can be attrbuted to exposure. AR attrbutable rsk for an exposed person excess rsk total rsk gven exposure λ 1 λ 0 λ 1 rλ 0 λ 0 rλ 0 r 1 r It the proporton of subjects n the populaton wth exposure s p, then the populaton average rsk pλ 1 + (1 p)λ 0 λ 0 + p(λ 1 λ 0 ) }{{} populaton excess rsk PAR populaton attrbutable rsk p(λ 1 λ 0 ) λ 0 + p(λ 1 λ 0 ) p(r 1) 1 + p(r 1) λ 1 Area p(λ 1 - λ 0 ) Excess Rsk In the accompanyng fgure, the area of the L shaped regon s the populaton average hazard rate. The area of the rectangle on the rght hand sde whch les between λ 0 and λ 1 represents the populaton average excess rsk. The rato of ths area and the total area s the populaton attrbutable rsk. λ 0 Area λ 0 Rsk Wth No Exposure 0 1-p 1

Stat 642, Lecture notes for 01/27/05 21 Analyss of crude data: 2 2 Tables Suppose that we have the followng probabltes for exposure and dsease n a populaton: E+ p 11 p 10 E- p 01 p 00 where p j 1, and let ψ p 11p 00 p 10 p 01. If we take a random sample from the populaton of sze N, we have expected values E+ Np 11 Np 10 E- Np 01 Np 00 and the OR s Np 11 Np 00 Np 10 Np 01 ψ. On the other hand, f we sample n 1 and n 0 from the rows, we have expected values E+ E- n 1 p 11 n 1 p 10 p 11 + p 10 p 11 + p 10 n 0 p 01 n 0 p 00 p 01 + p 00 p 01 + p 00 and the OR s n 1 p 11 n 0 p 00 (p 11 + p 10 )(p 01 + p 00 ) n 1 p 10 n 0 p 01 (p 11 + p 10 )(p 01 + p 00 ) ψ. Clearly, f we sample from the columns, we obtan the same odds-rato. The only samplng model we have not consdered s one n whch we fx both row and column totals, but t s dffcult to construct an observatonal settng n whch and sample from the nteror of the table. (If we choose a prescrbed number of dseased and non-dseased subjects, and randomly assgn a prescrbed number of labels of exposed and not exposed to these subjects, we can generate a random 2 2 table wth prescrbed margns and ψ 1. Also see Fsher s tea tastng experment.) Now suppose that we have the followng populaton dstrbuton:

Stat 642, Lecture notes for 01/27/05 22 E+ pp 1 pq 1 p E- qp 0 qq 0 q where p Probablty of Exposure, q 1 p P 1 Probablty of Dsease Gven Exposure, Q 1 1 P 1 P 0 Probablty of Dsease Gven No Exposure, Q 0 1 P 0 and suppose that we observe We have two goals: E+ a b n 1 E- c d n 0 m 1 m 0 N. estmate ψ, and a confdence nterval or standard error. test hypotheses, say, H 0 : ψ ψ 0 (usually ψ 0 1) For ths problem, P 1, P 0 and p are nusance parameters. There are three prmary approaches to ths problem. 1. Exact condtonal: test (or estmate) ψ condtonal on the suffcent statstcs for the nusance parameters usng exact dstrbuton. 2. Exact uncondtonal: consder worst case values of nusance parameters usng exact dstrbuton. 3. Asymptotc: estmate nusance parameters, and appeal to asymptotc theory for samplng dstrbuton. We wll also consder the samplng dstrbuton of X (a, b, c, d) under three samplng models: (I) random sample from populaton - N fxed (II) cohort study, n 1, n 0 fxed (III) case-control study, m 1, m 0 fxed 1. Exact condtonal I - random sample.

Stat 642, Lecture notes for 01/27/05 23 Let X (a, b, c, d), then X s multnomal wth probabltes (pp 1, pq 1, qp 0, qq 0 ). The dstrbuton functon for X s ( ) N Pr{X; P 0, P 1, p} (pp a b c d 1 ) a (pq 1 ) b (qp 0 ) c (qq 0 ) d N n1 ( ) m1 ( ) (qq n 1 a c 0 ) N pq1 P0 P1 Q a 0 qq 0 Q 0 Q 1 P 0 h I (n 0, m 1, P 0, P 1, p) ψ a Therefore, condtonal on all margns, the dstrbuton for a becomes ψ a Pr{a ψ, n 1, m 1 } u ψ u m 1 u u Ths s the (noncentral) hypergeometrc dstrbuton. II - n 0, n 1 fxed. a and c are ndependent bnomal RV s wth probabltes P 1 and P 0, and sample szes n 1 and n 0. The dstrbuton for X (a, c), s Pr{X; P 0, P 1 } (P a c 1 ) a (Q 1 ) b (P 0 ) c (Q 0 ) d ( ) m1 ( ) Q n 0 a c 0 Qn 1 P0 P1 Q a 0 1 Q 0 Q 1 P 0 h II (n 0, m 1, P 0, P 1, p) ψ a The factor whch depends on a and ψ s precsely the same as case I. III - m 0, m 1 fxed. By nterchangng the roles of the rows and columns and usng the fact that m1 m0 n1!n 0! a c a b m 1!m 0! We have that Pr{X; P 0, P 1 } h III (n 0, m 1, P 0, P 1, p) ψ a Agan, the factor whch depends on a and ψ s precsely the same as case I. In all cases, the condtonal dstrbuton of a, s the same non-central hypergeometrc dstrbuton.