Case-control studies C&H 16

Similar documents
11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies.

Case-control studies

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S

Cohen s s Kappa and Log-linear Models

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Epidemiology Wonders of Biostatistics Chapter 13 - Effect Measures. John Koval

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Tests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only

Sections 4.1, 4.2, 4.3

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Modelling Rates. Mark Lunt. Arthritis Research UK Epidemiology Unit University of Manchester

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Chapter 4: Generalized Linear Models-I

Lecture 7 Time-dependent Covariates in Cox Regression

COMPLEMENTARY LOG-LOG MODEL

Lecture 5: Poisson and logistic regression

Chapter 5 Formulas Distribution Formula Characteristics n. π is the probability Function. x trial and n is the. where x = 0, 1, 2, number of trials

STAT 7030: Categorical Data Analysis

STA6938-Logistic Regression Model

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 2: Poisson and logistic regression

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Chapter 5: Logistic Regression-I

PASS Sample Size Software. Poisson Regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Statistical Modelling with Stata: Binary Outcomes

STAT 705: Analysis of Contingency Tables

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data

Tests for Two Correlated Proportions in a Matched Case- Control Design

Section Poisson Regression

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Simple logistic regression

PubHlth Intermediate Biostatistics Spring 2015 Exam 2 (Units 3, 4 & 5) Study Guide

Confidence Intervals for the Odds Ratio in Logistic Regression with One Binary X

Chapter 20: Logistic regression for binary response variables

Calculating Odds Ratios from Probabillities

Survival Analysis I (CHL5209H)

Homework Solutions Applied Logistic Regression

Epidemiology Principle of Biostatistics Chapter 14 - Dependent Samples and effect measures. John Koval

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Appendix: Computer Programs for Logistic Regression

IP WEIGHTING AND MARGINAL STRUCTURAL MODELS (CHAPTER 12) BIOS IPW and MSM

Confounding and effect modification: Mantel-Haenszel estimation, testing effect homogeneity. Dankmar Böhning

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

Tests for the Odds Ratio in Logistic Regression with One Binary X (Wald Test)

A comparison of 5 software implementations of mediation analysis

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013

Unit 3. Discrete Distributions

Beyond GLM and likelihood

(c) Interpret the estimated effect of temperature on the odds of thermal distress.

Multinomial Logistic Regression Models

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

Lecture 3: Measures of effect: Risk Difference Attributable Fraction Risk Ratio and Odds Ratio

Testing Independence

Statistics in medicine

Models for Binary Outcomes

Chapter 6 Part 4. Confidence Intervals

Statistics in medicine

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

Chapter 1. Modeling Basics

Chapter Six: Two Independent Samples Methods 1/51

BIOSTATS Intermediate Biostatistics Spring 2017 Exam 2 (Units 3, 4 & 5) Practice Problems SOLUTIONS

Sociology 362 Data Exercise 6 Logistic Regression 2

8 Nominal and Ordinal Logistic Regression

Stat 642, Lecture notes for 04/12/05 96

Regression Models for Risks(Proportions) and Rates. Proportions. E.g. [Changes in] Sex Ratio: Canadian Births in last 60 years

n y π y (1 π) n y +ylogπ +(n y)log(1 π).

Generalized Linear Models for Count, Skewed, and If and How Much Outcomes

Answer to exercise: Blood pressure lowering drugs

Unit 5 Logistic Regression Practice Problems

Inference for Binomial Parameters

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

PubH 7405: REGRESSION ANALYSIS INTRODUCTION TO LOGISTIC REGRESSION

Chapter 4 Part 3. Sections Poisson Distribution October 2, 2008

Single-level Models for Binary Responses

Correlation and regression

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users

Package HGLMMM for Hierarchical Generalized Linear Models

Stat 587: Key points and formulae Week 15

Standardization methods have been used in epidemiology. Marginal Structural Models as a Tool for Standardization ORIGINAL ARTICLE

General Linear Model (Chapter 4)

Confidence Intervals for the Odds Ratio in Logistic Regression with Two Binary X s

Lecture 3.1 Basic Logistic LDA

MS&E 226: Small Data

Generalized linear models

Meta-Analysis in Stata, 2nd edition p.158 Exercise Silgay et al. (2004)

DISPLAYING THE POISSON REGRESSION ANALYSIS

Categorical and Zero Inflated Growth Models

Multinomial Regression Models

Measures of Association and Variance Estimation

Introduction to logistic regression

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

Transcription:

Case-control studies C&H 6 Bendix Carstensen Steno Diabetes Center & Department of Biostatistics, University of Copenhagen bxc@steno.dk http://bendixcarstensen.com PhD-course in Epidemiology, Department of Biostatistics, Tuesday 3 January 207 Relationship between follow up studies and case control studies In a cohort study, the relationship between exposure and disease incidence is investigated by following the entire cohort and measuring the rate of occurrence of new cases in the different exposure groups. The follow up allows the investigator to register those subjects who develop the disease during the study period and to identify those who remain free of the disease. Case-control studies (C&H 6) 2/ 59 Case-control study In a case-control study the subjects who develop the disease (the cases) are registered by some other mechanism than follow-up, and a group of healthy subjects (the controls) is used to represent the subjects who do not develop the disease. Case-control studies (C&H 6) 3/ 59

Rationale behind case-control studies In a follow-up study, rates among exposed and non-exposed are estimated by: D Y D 0 Y 0 and hence the rate ratio by: D Y / D0 Y 0 = D D 0 / Y Y 0 Case-control studies (C&H 6) 4/ 59 In a case-control study we use the same cases, but select controls to represent the distribution of risk time between exposed and unexposed: H H 0 Y Y 0 Therefore the rate ratio is estimated by: D D 0 / H H 0 Controls represent risk time, not disease-free persons. Case-control studies (C&H 6) 5/ 59 Choice of controls (I) Failures Healthy study period The period over which failures are registered as cases is called the study period. A group of subjects who remain healthy over the study period is chosen to represent the healthy part of the source population. but this is an oversimplification... Case-control studies (C&H 6) 6/ 59

What about censoring and late entry? Failures Healthy Censored Late entry study period Choosing controls which remains healthy throughout takes no account of censoring or late entry. Instead, choose controls who are in the study and healthy, at the times the cases are registered. Case-control studies (C&H 6) 7/ 59 Choice of controls (II) Failures Healthy Censored Late entry study period This is called incidence density sampling. Subjects can be chosen as controls more than once, and a subject who is chosen as a control can later become a case. Equivalent to sampling observation time from vertical bands drawn to enclose each case. Case-control studies (C&H 6) 8/ 59 Most common way of choosing controls. Case-control probability tree Exposure p p Failure π E π π 0 E 0 π 0 Selection 0.97 F 0.03 0.0 S 0.99 0.97 F 0.03 0.0 S 0.99 Case (D ) Control (H ) Case (D 0 ) Control (H 0 ) Probability pπ 0.97 p( π ) 0.0 ( p)π 0 0.97 ( p)( π 0 ) 0.0 Case-control studies (C&H 6) 9/ 59

Retrospective analysis of case-control studies Retrospective: Compare the distribution of exposure between cases and controls. The proportion of cases who smoke compared to controls The mean age of cases compared to controls Looks at the study backwards. Only works properly for binary explanatory variables. Case-control studies (C&H 6) 0/ 59 The retrospective argument Selection Failure Exposure Probability E p π 0.97 F (Cases) E 0 ( p) π 0 0.97 E p ( π ) 0.0 S (Controls) E 0 ( p) ( π 0 ) 0.0 Not in study Note: Parameters in the previous tree not on these branches. Case-control studies (C&H 6) / 59 Odds of exposure for cases and controls: Ω cas = Ω ctr = p π 0.97 ( p) π 0 0.97 = p p π π 0 p ( π ) 0.0 ( p) ( π 0 ) 0.0 = p p π π 0 Odds-ratio for exposure between cases and controls: Ω cas = π / π = OR(disease) population Ω ctr π 0 π 0 Case-control studies (C&H 6) 2/ 59

Prospective analysis of case-control studies Compare the case/control ratio between exposed and non-exposed subjects or more general: How does case-control ratio vary with exposure? The point is that in the study it varies in the same way as in the population. Case-control studies (C&H 6) 3/ 59 The prospective argument Selection Exposure Failure Probability π p E π π 0 p E 0 π 0 Not in study F S F S p π 0.97 p ( π ) 0.0 ( p) π 0 0.97 ( p) ( π 0 ) 0.0 Case-control studies (C&H 6) 4/ 59 Odds of disease = P {Case given inclusion} P {Control given inclusion} ω = ω 0 = p π 0.97 p ( π ) 0.0 = 0.97 0.0 π π ( p) π 0 0.97 ( p) ( π 0 ) 0.0 = 0.97 0.0 π 0 π 0 OR = ω = π / π0 = OR(disease) population ω 0 π π 0 Case-control studies (C&H 6) 5/ 59

What is the case-control ratio? D = 0.97 H 0.0 π ( s,cas = π ) π s,ctr π D 0 = 0.97 H 0 0.0 π ( 0 s0,cas = π ) 0 π 0 s 0,ctr π 0 D /H D 0 /H 0 = π /( π ) π 0 /( π 0 ) = OR population but only if the sampling fractions are identical: s,cas = s 0,cas and s,ctr = s 0,ctr. Case-control studies (C&H 6) 6/ 59 Log-likelihood for case-control studies Log-Likelihood (conditional on being included) is a binomial likelihood with odds-parameters ω 0 and ω D 0 log(ω 0 ) N 0 log(+ω 0 )+D log(ω ) N log(+ω ) where N 0 = D 0 + H 0 and N = D + H. Exposed: D cases, H controls Unexposed: D 0 cases, H 0 controls Case-control studies (C&H 6) 7/ 59 Odds-ratio (θ) is the ratio of the odds ω to ω 0, so: ( ) ω log(θ) = log = log(ω ) log(ω 0 ) ω 0 Estimates of log(ω ) and log(ω 0 ) are just the empirical odds: ( ) ( ) D D0 log and log H H 0 Case-control studies (C&H 6) 8/ 59

The standard errors of the odds are estimated by: D + H and D 0 + H 0 Exposed and unexposed form two independent bodies of data (they are sampled independently), so the estimate of log(θ) [= log(or)] is: ( ) ( ) D D0 log log, H with s.e. ( log(or) ) = H 0 D + H + D 0 + H 0 Case-control studies (C&H 6) 9/ 59 Confidence interval for OR First a confidence interval for log(or): log(or) ±.96 + + + D H D 0 H 0 Take the exponential: ( OR exp.96 + + + ) D H D 0 H }{{ 0 } error factor Case-control studies (C&H 6) 20/ 59 BCG vaccination and leprosy Does BCG vaccination in early childhood protect against leprosy? New cases of leprosy were examined for presence or absence of the BCG scar. During the same period, a 00% survey of the population of this area, which included examination for BCG scar, had been carried out. The tabulated data refer only to subjects under 35, because vaccination was not widely available when older persons were children. Case-control studies (C&H 6) 2/ 59

Exercise I BCG scar Leprosy cases Population survey Present 0 46 028 Absent 59 34 594 Estimate the odds of BCG vaccination for leprosy cases and for the controls. Estimate the odds ratio and hence the extent of protection against leprosy afforded by vaccination. Give a 95% c.i. for the OR. Use SAS for this: Exercise from the notes. Case-control studies (C&H 6) 22/ 59 Solution to I OR = D /H D 0 /H 0 = 0/46028 59/34594 = 0.00294 0.004596 = 0.48 s.e.(log[or]) = = D + H + D 0 + H 0 0 + 46028 + 59 + 34594 = 0.27 The 95% limits for the odds-ratio are: OR exp(.96 0.27) = 0.48.28 = (0.37, 0.6) Case-control studies (C&H 6) 23/ 59 Exercise II BCG scar Leprosy cases Population controls Present 0 554 Absent 59 446 The table shows the results of a computer-simulated study which picked 000 controls at random. What is the odds ratio estimate in this study? Give a 95% c.i. for the OR. Use SAS for this: Exercise from the notes. Case-control studies (C&H 6) 24/ 59

Solution to II OR = D /H D 0 /H 0 = 0/554 59/446 = 0.823 0.3565 = 0.5 s.e.(log[or]) = = + + + D H D 0 H 0 0 + 554 + 59 + 446 = 0.42 The 95% limits for the odds-ratio are: OR exp(.96 0.42) = 0.5.32 = (0.39, 0.68) Case-control studies (C&H 6) 25/ 59 More levels of exposure (William Guy) Physical exertion at work of 659 outpatients: 34 pulmonary consumption, 38 other diseases. Level of Pulmonary Other Case/ OR exertion in consumption diseases control relative occupation (Cases) (Controls) ratio to (3) Little (0) 25 385 0.325.643 Varied () 4 36 0.30.526 More (2) 42 630 0.225.4 Great (3) 33 67 0.98.000 The relationship of case-control ratios is what matters. Case-control studies (C&H 6) 26/ 59 The retro/prospective argument Retrospective: Four possible outcomes (little/varied/more/great), Prospective: Two possible outcomes (case/control), but a large number of comparisons (between any two exposure levels). But the probability model is still a binary model, and the argument for the analysis is still the same as before. Prospective argument applicable in deriving a logistic regression model for case-control studies. Case-control studies (C&H 6) 27/ 59

Odds-ratio and rate ratio If the disease probability, π, in the study period is small: π = cumulative risik cumulative rate = λt For small π, π, so: OR = π /( π ) π 0 /( π 0 ) π π 0 λ λ 0 = RR π small OR estimate of RR. Case-control studies (C&H 6) 28/ 59 Important assumption behind rate ratio interpretation The entire study base must have been available throughout: no censorings. no delayed entries. This will clearly not always be the case, but it may be achieved in carefully designed studies. Case-control studies (C&H 6) 29/ 59 Avoiding censoring and delayed entry Can be achieved simultaneously with small π by incidence density sampling: Subdivide calendar time in small time bands. New case-control study in each time band. Only one case in each time band. No delayed entry or censoring. If the fraction of exposed does not vary much over time, all the small studies can be analysed together as one. This is effectively matching on calendar time. Case-control studies (C&H 6) 30/ 59

The rare disease assumption Necessary to make the approximation: π /( π ) π 0 /( π 0 ) π π 0 This is more appropriately termed: The short study duration assumption each of the small studies we imagine as components of the entire study should be sufficiently short in relation to disease occurrence, so that the π (disease probability) if small. Case-control studies (C&H 6) 3/ 59 Nested case-control studies Study base = large cohort Expensive to get covariate information for all persons. (expensive analyses, tracing of histories,... ) Covariate information only for cases and time matched controls: To each case, choose one or more (usually 5) controls from the risk set. Case-control studies (C&H 6) 32/ 59 How many controls per case? The standard deviation of log(or): Equal number of cases and controls: D + H + D 0 + H 0 = = + + + D D D 0 D 0 ( + ) ( + ) D D 0 Case-control studies (C&H 6) 33/ 59

Twice as many controls as cases: D + H + D 0 + H 0 = = + + + D 2D D 0 2D 0 ( + ) ( + /2) D D 0 m times as many cases as controls: ( + + + = + ) ( + /m) D H D 0 H 0 D D 0 Case-control studies (C&H 6) 34/ 59 How many controls per case? The standard deviation of the log[or] is + m times larger in a case-control study, compared to the corresponding cohort-study. Therefore, 5 controls per case is normally sufficient. (Only relevant if controls are cheap compared to cases). But if cases and controls cost the same and are available the most efficient is to have the same number of cases and controls. Case-control studies (C&H 6) 35/ 59 SAS-intro Bendix Carstensen Steno Diabetes Center & Department of Biostatistics, University of Copenhagen bxc@steno.dk http://bendixcarstensen.com PhD-course in Epidemiology, Department of Biostatistics, Tuesday 3 January, 207

SAS Display manager (programming): program, log, output windows reproducible easy to document SAS ANALYST menu-oriented interface writes and runs programs for you no learning by heart, no syntax errors not every thing is included it is heavy to use in the long run SAS-intro () 37/ 59 Data set example: Blood pressure and obesity OBESE: weight/ideal weight BP: systolic blood pressure OBS SEX OBESE BP male.3 30 2 male.3 48 3 male.9 46 4 male. 22............ 0 female.64 36 02 female.73 208 SAS-intro () 38/ 59 Data Data are in the text file BP.TXT located at www.biostat.ku.dk/~pka/epidata and contains the following variables: SEX: Character variable ($) OBESE: weight/ideal weight BP: systolic blood pressure 3 variables and 02 observations SAS-intro () 39/ 59

Printing in SAS We read the file bp.txt directly from www and skip the first line containing variable names (firstobs=2). data bp; filename bpfile url http://www.biostat.ku.dk/~pka/epidata/bp.txt ; infile bpfile firstobs=2; input sex $ obese bp; run; proc print data=bp; var sex obese bp; run; A temporary data set bp which only exists within the current program. (Permanent data sets may be saved but we will not use this feature in this course.) SAS-intro () 40/ 59 SAS programming data-step: data bp; ( reading ) ; ( data manipulations ) ; run; proc-step: proc xx data=bp ; ( procedure statments ) ; run; NB: No data manipulations after run; only if we make a new data-step. better to revise the first data-step. SAS-intro () 4/ 59 Example data bp; filename bpfile url "http://www.biostat.ku.dk/~pka/epidata/bp.txt"; infile bpfile firstobs=2; input sex obese bp; run; data bp; set bp; if bp<25 then highbp=0; if bp>=25 then highbp=; /* an alternative way of creating the new variable highbp is: highbp = (bp>=25); */ run; proc freq data=bp; tables sex * highbp ; run; SAS-intro () 42/ 59

Example, simplfied data bp; filename bpfile url http://www.biostat.ku.dk/~pka/epidata/bp.txt ; infile bpfile firstobs=2; input sex obese bp; if bp < 25 then highbp=0; if bp >= 25 then highbp=; /* an alternative way of creating the new variable highbp is: highbp = (bp>=25); */ run; proc freq data=bp; tables sex * highbp ; run; SAS-intro () 43/ 59 Typing of programs is done in the Program Editor window: Works like all other text editors: arrow keys, backspace, delete etc. When the program is submitted (click on Submit or press F3), the results are in the Log-window: Here you can see how things went: how many observations you have, how many variables you have if there were any errors which pages were written by which procedures SAS-intro () 44/ 59 Output-window (perhaps): In this window you will find the results (if there are any) Graph-window (which we won t use on this course) Here plots are stored in order SAS-intro () 45/ 59

Making life simpler You can move between the windows by clicking Windows in the command bar, or use that: F5 is editor window, F6 is log window, F7 is output window. SAS-intro () 46/ 59 Modifications in the program When the program has been executed and you want to make changes: Go back to the Program-window The Log- Output- and Graph-windows cumulate, that is output is stored consecutively Clear by choosing Clear under Edit (or press Ctrl-E - for erase ) Don t print! Remember to save the the program from time to time before SAS crashes! SAS-intro () 47/ 59 Simple statistical models Proportions and rates Bendix Carstensen Steno Diabetes Center & Department of Biostatistics, University of Copenhagen bxc@steno.dk http://bendixcarstensen.com PhD-course in Epidemiology, Department of Biostatistics, Tuesday 3 January, 207

A single proportion The log-likelihood for π, the proportion dead, if we observe 4 deaths out of 0: l(π) = 4log(π) + 6log( π) The log-likelihood for ω, the odds of dying, if we observe 4 deaths and 6 non-deaths: l(π) = 4log(ω) 0log( + ω) Simple statistical models (Proportions and rates) 49/ 59 Programs General purpose programs for estimating in the binomial and Poisson distribution: SAS: proc genmod R: glm Stata: glm Here we primarily look at SAS. Simple statistical models (Proportions and rates) 50/ 59 Estimating odds: genmod data p ; input x n ; datalines ; 4 0 ; proc genmod data= p ; model x/n = / dist=bin link=logit ; estimate "4 versus 6" intercept / exp ; Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > Chi Intercept -0.4055 0.6455 -.6706 0.8597 0.39 0.52 Scale 0.0000 0.0000.0000.0000 Contrast Estimate Results L Beta Standard L Beta Chi- Label Estimate Error Confidence Limits Square Pr > ChiSq 4 versus 6-0.4055 0.6455 -.6706 0.8597 0.39 0.5299 Exp(4 versus 6) 0.6667 0.4303 0.88 2.3624 Simple statistical models (Proportions and rates) 5/ 59

Estimating a proportion: genmod The only difference from estimation of odds is the link= argument, which is changed to log (instead of logit): proc genmod data= p ; model x/n = / dist=bin link=log ; estimate "4 out of 0" intercept / exp ; Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > Chi Intercept -0.963 0.3873 -.6754-0.572 5.60 0.0 Scale 0.0000 0.0000.0000.0000 Contrast Estimate Results L Beta Standard L Beta Chi- Label Estimate Error Confidence Limits Square Pr > ChiSq 4 out of 0-0.963 0.3873 -.6754-0.572 5.60 0.080 Exp(4 out of 0) 0.4000 0.549 0.872 0.8545 Simple statistical models (Proportions and rates) 52/ 59 A single proportion: individual records data bissau; filename bisfile url "http://www.biostat.ku.dk/~pka/epidata/bissau.txt"; infile bisfile firstobs=2; input id fuptime dead bcg dtp age agem; run; title "Estimate odds - Bissau" ; proc genmod data=bissau descending ; model dead = / dist=bin link=logit ; estimate "odds of dying" intercept / exp ; Contrast Estimate Results L Beta Standard L Beta Chi- Label Estimate Error Confidence Limits Square Pr > ChiSq odds of dying -3.249 0.0686-3.2593-2.9905 2076.5 <.000 Exp(odds of dying) 0.0439 0.0030 0.0384 0.0503 Simple statistical models (Proportions and rates) 53/ 59 A single proportion: individual records title "Estimate proportion - Bissau" ; proc genmod data=bissau descending ; model dead = / dist=bin link=log ; estimate "prob of dying" intercept / exp ; Contrast Estimate Results L Beta Standard L Beta Chi- Label Estimate Error Confidence Limits Square Pr > ChiSq prob of dying -3.679 0.0657-3.2966-3.039 2325.8 <.000 Exp(prob of dying) 0.042 0.0028 0.0370 0.0479 Simple statistical models (Proportions and rates) 54/ 59

Likelihood for a single rate Recall the log-likelihood for a single rate, λ based on D events during Y person years: Dlog(λ) λy This is also the log-likelihood for a Poisson variate D with mean µ = λy. Therefor we can use a program for the Posson distribution to estimate rates, except we must remove the Y from the mean. Poisson distribution usually use the log-mean: log(µ) = log(λ) + log(y ) log(y ) extracted via the offset argument. Simple statistical models (Proportions and rates) 55/ 59 A single rate data r ; input d y ; ly = log(y) ; my = log(y/000) ; datalines ; 30 26.9 ; title "Estimate a rate per year" ; proc genmod data= r ; model d = / dist=poisson link=log offset=ly ; estimate "30 during 26.9 - per year" intercept / exp ; Contrast Estimate Results L Beta Standard L Beta Chi- Label Estimate Error Confidence Limits Square 30 during 26.9 - per year -2.668 0.826-2.5246 -.8089 40.85 Exp(30 during 26.9 - per year) 0.45 0.0209 0.080 0.638 Simple statistical models (Proportions and rates) 56/ 59 A single rate: Scaling Remember the data step statement: my = log(y/000) ; title "Estimate a rate per 000 year" ; proc genmod data= r ; model d = / dist=poisson link=log offset=my ; estimate "30 during 26.9 - per 000 years" intercept / exp ; Contrast Estimate Results L Beta Standard L Beta Label Estimate Error Alpha Confidence Limits 30 during 26.9 - per 000 years 4.740 0.826 0.05 4.3832 5.0988 Exp(30 during 26.9 - per 000 years) 4.5475 20.934 0.05 80.0900 63.8299 Simple statistical models (Proportions and rates) 57/ 59

A single rate: individual records data bissau ; set bissau ; ld = log(fuptime) ; ly = log(fuptime/36525) ; title "Estimate a rate per day" ; proc genmod data=bissau ; model dead = / dist=poisson link=log offset=ld ; estimate "mortality rate - per day" intercept / exp ; Contrast Estimate Results L Beta Standard L Beta Chi- Label Estimate Error Alpha Confidence Limits Square mortality rate - per day -8.2852 0.067 0.05-8.468-8.537 5239 Exp(mortality rate - per day) 0.0003 0.0000 0.05 0.0002 0.0003 Simple statistical models (Proportions and rates) 58/ 59 Single rate individual records, scaling Remember the data step statement: ly = log(fuptime/36525) ; title "Estimate a rate per year" ; proc genmod data=bissau ; model dead = / dist=poisson link=log offset=ly ; estimate "mortality rate - per 00 years" intercept / exp ; Contrast Estimate Results L Beta Standard L Beta Label Estimate Error Alpha Confidence Limits mortality rate - per 00 years 2.2205 0.067 0.05 2.0890 2.352 Exp(mortality rate - per 00 years) 9.223 0.683 0.05 8.0768 0.5074 Simple statistical models (Proportions and rates) 59/ 59