Case-control studies C&H 16 - PDF Free Download

Case-control studies C&H 6 Bendix Carstensen Steno Diabetes Center & Department of Biostatistics, University of Copenhagen bxc@steno.dk http://bendixcarstensen.com PhD-course in Epidemiology, Department of Biostatistics, Tuesday 3 January 207 Relationship between follow up studies and case control studies In a cohort study, the relationship between exposure and disease incidence is investigated by following the entire cohort and measuring the rate of occurrence of new cases in the different exposure groups. The follow up allows the investigator to register those subjects who develop the disease during the study period and to identify those who remain free of the disease. Case-control studies (C&H 6) 2/ 59 Case-control study In a case-control study the subjects who develop the disease (the cases) are registered by some other mechanism than follow-up, and a group of healthy subjects (the controls) is used to represent the subjects who do not develop the disease. Case-control studies (C&H 6) 3/ 59

Rationale behind case-control studies In a follow-up study, rates among exposed and non-exposed are estimated by: D Y D 0 Y 0 and hence the rate ratio by: D Y / D0 Y 0 = D D 0 / Y Y 0 Case-control studies (C&H 6) 4/ 59 In a case-control study we use the same cases, but select controls to represent the distribution of risk time between exposed and unexposed: H H 0 Y Y 0 Therefore the rate ratio is estimated by: D D 0 / H H 0 Controls represent risk time, not disease-free persons. Case-control studies (C&H 6) 5/ 59 Choice of controls (I) Failures Healthy study period The period over which failures are registered as cases is called the study period. A group of subjects who remain healthy over the study period is chosen to represent the healthy part of the source population. but this is an oversimplification... Case-control studies (C&H 6) 6/ 59

What about censoring and late entry? Failures Healthy Censored Late entry study period Choosing controls which remains healthy throughout takes no account of censoring or late entry. Instead, choose controls who are in the study and healthy, at the times the cases are registered. Case-control studies (C&H 6) 7/ 59 Choice of controls (II) Failures Healthy Censored Late entry study period This is called incidence density sampling. Subjects can be chosen as controls more than once, and a subject who is chosen as a control can later become a case. Equivalent to sampling observation time from vertical bands drawn to enclose each case. Case-control studies (C&H 6) 8/ 59 Most common way of choosing controls. Case-control probability tree Exposure p p Failure π E π π 0 E 0 π 0 Selection 0.97 F 0.03 0.0 S 0.99 0.97 F 0.03 0.0 S 0.99 Case (D ) Control (H ) Case (D 0 ) Control (H 0 ) Probability pπ 0.97 p( π ) 0.0 ( p)π 0 0.97 ( p)( π 0 ) 0.0 Case-control studies (C&H 6) 9/ 59

Retrospective analysis of case-control studies Retrospective: Compare the distribution of exposure between cases and controls. The proportion of cases who smoke compared to controls The mean age of cases compared to controls Looks at the study backwards. Only works properly for binary explanatory variables. Case-control studies (C&H 6) 0/ 59 The retrospective argument Selection Failure Exposure Probability E p π 0.97 F (Cases) E 0 ( p) π 0 0.97 E p ( π ) 0.0 S (Controls) E 0 ( p) ( π 0 ) 0.0 Not in study Note: Parameters in the previous tree not on these branches. Case-control studies (C&H 6) / 59 Odds of exposure for cases and controls: Ω cas = Ω ctr = p π 0.97 ( p) π 0 0.97 = p p π π 0 p ( π ) 0.0 ( p) ( π 0 ) 0.0 = p p π π 0 Odds-ratio for exposure between cases and controls: Ω cas = π / π = OR(disease) population Ω ctr π 0 π 0 Case-control studies (C&H 6) 2/ 59

Prospective analysis of case-control studies Compare the case/control ratio between exposed and non-exposed subjects or more general: How does case-control ratio vary with exposure? The point is that in the study it varies in the same way as in the population. Case-control studies (C&H 6) 3/ 59 The prospective argument Selection Exposure Failure Probability π p E π π 0 p E 0 π 0 Not in study F S F S p π 0.97 p ( π ) 0.0 ( p) π 0 0.97 ( p) ( π 0 ) 0.0 Case-control studies (C&H 6) 4/ 59 Odds of disease = P {Case given inclusion} P {Control given inclusion} ω = ω 0 = p π 0.97 p ( π ) 0.0 = 0.97 0.0 π π ( p) π 0 0.97 ( p) ( π 0 ) 0.0 = 0.97 0.0 π 0 π 0 OR = ω = π / π0 = OR(disease) population ω 0 π π 0 Case-control studies (C&H 6) 5/ 59

What is the case-control ratio? D = 0.97 H 0.0 π ( s,cas = π ) π s,ctr π D 0 = 0.97 H 0 0.0 π ( 0 s0,cas = π ) 0 π 0 s 0,ctr π 0 D /H D 0 /H 0 = π /( π ) π 0 /( π 0 ) = OR population but only if the sampling fractions are identical: s,cas = s 0,cas and s,ctr = s 0,ctr. Case-control studies (C&H 6) 6/ 59 Log-likelihood for case-control studies Log-Likelihood (conditional on being included) is a binomial likelihood with odds-parameters ω 0 and ω D 0 log(ω 0 ) N 0 log(+ω 0 )+D log(ω ) N log(+ω ) where N 0 = D 0 + H 0 and N = D + H. Exposed: D cases, H controls Unexposed: D 0 cases, H 0 controls Case-control studies (C&H 6) 7/ 59 Odds-ratio (θ) is the ratio of the odds ω to ω 0, so: ( ) ω log(θ) = log = log(ω ) log(ω 0 ) ω 0 Estimates of log(ω ) and log(ω 0 ) are just the empirical odds: ( ) ( ) D D0 log and log H H 0 Case-control studies (C&H 6) 8/ 59

The standard errors of the odds are estimated by: D + H and D 0 + H 0 Exposed and unexposed form two independent bodies of data (they are sampled independently), so the estimate of log(θ) [= log(or)] is: ( ) ( ) D D0 log log, H with s.e. ( log(or) ) = H 0 D + H + D 0 + H 0 Case-control studies (C&H 6) 9/ 59 Confidence interval for OR First a confidence interval for log(or): log(or) ±.96 + + + D H D 0 H 0 Take the exponential: ( OR exp.96 + + + ) D H D 0 H }{{ 0 } error factor Case-control studies (C&H 6) 20/ 59 BCG vaccination and leprosy Does BCG vaccination in early childhood protect against leprosy? New cases of leprosy were examined for presence or absence of the BCG scar. During the same period, a 00% survey of the population of this area, which included examination for BCG scar, had been carried out. The tabulated data refer only to subjects under 35, because vaccination was not widely available when older persons were children. Case-control studies (C&H 6) 2/ 59

Exercise I BCG scar Leprosy cases Population survey Present 0 46 028 Absent 59 34 594 Estimate the odds of BCG vaccination for leprosy cases and for the controls. Estimate the odds ratio and hence the extent of protection against leprosy afforded by vaccination. Give a 95% c.i. for the OR. Use SAS for this: Exercise from the notes. Case-control studies (C&H 6) 22/ 59 Solution to I OR = D /H D 0 /H 0 = 0/46028 59/34594 = 0.00294 0.004596 = 0.48 s.e.(log[or]) = = D + H + D 0 + H 0 0 + 46028 + 59 + 34594 = 0.27 The 95% limits for the odds-ratio are: OR exp(.96 0.27) = 0.48.28 = (0.37, 0.6) Case-control studies (C&H 6) 23/ 59 Exercise II BCG scar Leprosy cases Population controls Present 0 554 Absent 59 446 The table shows the results of a computer-simulated study which picked 000 controls at random. What is the odds ratio estimate in this study? Give a 95% c.i. for the OR. Use SAS for this: Exercise from the notes. Case-control studies (C&H 6) 24/ 59

Solution to II OR = D /H D 0 /H 0 = 0/554 59/446 = 0.823 0.3565 = 0.5 s.e.(log[or]) = = + + + D H D 0 H 0 0 + 554 + 59 + 446 = 0.42 The 95% limits for the odds-ratio are: OR exp(.96 0.42) = 0.5.32 = (0.39, 0.68) Case-control studies (C&H 6) 25/ 59 More levels of exposure (William Guy) Physical exertion at work of 659 outpatients: 34 pulmonary consumption, 38 other diseases. Level of Pulmonary Other Case/ OR exertion in consumption diseases control relative occupation (Cases) (Controls) ratio to (3) Little (0) 25 385 0.325.643 Varied () 4 36 0.30.526 More (2) 42 630 0.225.4 Great (3) 33 67 0.98.000 The relationship of case-control ratios is what matters. Case-control studies (C&H 6) 26/ 59 The retro/prospective argument Retrospective: Four possible outcomes (little/varied/more/great), Prospective: Two possible outcomes (case/control), but a large number of comparisons (between any two exposure levels). But the probability model is still a binary model, and the argument for the analysis is still the same as before. Prospective argument applicable in deriving a logistic regression model for case-control studies. Case-control studies (C&H 6) 27/ 59

Odds-ratio and rate ratio If the disease probability, π, in the study period is small: π = cumulative risik cumulative rate = λt For small π, π, so: OR = π /( π ) π 0 /( π 0 ) π π 0 λ λ 0 = RR π small OR estimate of RR. Case-control studies (C&H 6) 28/ 59 Important assumption behind rate ratio interpretation The entire study base must have been available throughout: no censorings. no delayed entries. This will clearly not always be the case, but it may be achieved in carefully designed studies. Case-control studies (C&H 6) 29/ 59 Avoiding censoring and delayed entry Can be achieved simultaneously with small π by incidence density sampling: Subdivide calendar time in small time bands. New case-control study in each time band. Only one case in each time band. No delayed entry or censoring. If the fraction of exposed does not vary much over time, all the small studies can be analysed together as one. This is effectively matching on calendar time. Case-control studies (C&H 6) 30/ 59

The rare disease assumption Necessary to make the approximation: π /( π ) π 0 /( π 0 ) π π 0 This is more appropriately termed: The short study duration assumption each of the small studies we imagine as components of the entire study should be sufficiently short in relation to disease occurrence, so that the π (disease probability) if small. Case-control studies (C&H 6) 3/ 59 Nested case-control studies Study base = large cohort Expensive to get covariate information for all persons. (expensive analyses, tracing of histories,... ) Covariate information only for cases and time matched controls: To each case, choose one or more (usually 5) controls from the risk set. Case-control studies (C&H 6) 32/ 59 How many controls per case? The standard deviation of log(or): Equal number of cases and controls: D + H + D 0 + H 0 = = + + + D D D 0 D 0 ( + ) ( + ) D D 0 Case-control studies (C&H 6) 33/ 59

Twice as many controls as cases: D + H + D 0 + H 0 = = + + + D 2D D 0 2D 0 ( + ) ( + /2) D D 0 m times as many cases as controls: ( + + + = + ) ( + /m) D H D 0 H 0 D D 0 Case-control studies (C&H 6) 34/ 59 How many controls per case? The standard deviation of the log[or] is + m times larger in a case-control study, compared to the corresponding cohort-study. Therefore, 5 controls per case is normally sufficient. (Only relevant if controls are cheap compared to cases). But if cases and controls cost the same and are available the most efficient is to have the same number of cases and controls. Case-control studies (C&H 6) 35/ 59 SAS-intro Bendix Carstensen Steno Diabetes Center & Department of Biostatistics, University of Copenhagen bxc@steno.dk http://bendixcarstensen.com PhD-course in Epidemiology, Department of Biostatistics, Tuesday 3 January, 207

SAS Display manager (programming): program, log, output windows reproducible easy to document SAS ANALYST menu-oriented interface writes and runs programs for you no learning by heart, no syntax errors not every thing is included it is heavy to use in the long run SAS-intro () 37/ 59 Data set example: Blood pressure and obesity OBESE: weight/ideal weight BP: systolic blood pressure OBS SEX OBESE BP male.3 30 2 male.3 48 3 male.9 46 4 male. 22............ 0 female.64 36 02 female.73 208 SAS-intro () 38/ 59 Data Data are in the text file BP.TXT located at www.biostat.ku.dk/~pka/epidata and contains the following variables: SEX: Character variable ($) OBESE: weight/ideal weight BP: systolic blood pressure 3 variables and 02 observations SAS-intro () 39/ 59

Printing in SAS We read the file bp.txt directly from www and skip the first line containing variable names (firstobs=2). data bp; filename bpfile url http://www.biostat.ku.dk/~pka/epidata/bp.txt ; infile bpfile firstobs=2; input sex $ obese bp; run; proc print data=bp; var sex obese bp; run; A temporary data set bp which only exists within the current program. (Permanent data sets may be saved but we will not use this feature in this course.) SAS-intro () 40/ 59 SAS programming data-step: data bp; ( reading ) ; ( data manipulations ) ; run; proc-step: proc xx data=bp ; ( procedure statments ) ; run; NB: No data manipulations after run; only if we make a new data-step. better to revise the first data-step. SAS-intro () 4/ 59 Example data bp; filename bpfile url "http://www.biostat.ku.dk/~pka/epidata/bp.txt"; infile bpfile firstobs=2; input sex obese bp; run; data bp; set bp; if bp<25 then highbp=0; if bp>=25 then highbp=; /* an alternative way of creating the new variable highbp is: highbp = (bp>=25); */ run; proc freq data=bp; tables sex * highbp ; run; SAS-intro () 42/ 59

Example, simplfied data bp; filename bpfile url http://www.biostat.ku.dk/~pka/epidata/bp.txt ; infile bpfile firstobs=2; input sex obese bp; if bp < 25 then highbp=0; if bp >= 25 then highbp=; /* an alternative way of creating the new variable highbp is: highbp = (bp>=25); */ run; proc freq data=bp; tables sex * highbp ; run; SAS-intro () 43/ 59 Typing of programs is done in the Program Editor window: Works like all other text editors: arrow keys, backspace, delete etc. When the program is submitted (click on Submit or press F3), the results are in the Log-window: Here you can see how things went: how many observations you have, how many variables you have if there were any errors which pages were written by which procedures SAS-intro () 44/ 59 Output-window (perhaps): In this window you will find the results (if there are any) Graph-window (which we won t use on this course) Here plots are stored in order SAS-intro () 45/ 59

Making life simpler You can move between the windows by clicking Windows in the command bar, or use that: F5 is editor window, F6 is log window, F7 is output window. SAS-intro () 46/ 59 Modifications in the program When the program has been executed and you want to make changes: Go back to the Program-window The Log- Output- and Graph-windows cumulate, that is output is stored consecutively Clear by choosing Clear under Edit (or press Ctrl-E - for erase ) Don t print! Remember to save the the program from time to time before SAS crashes! SAS-intro () 47/ 59 Simple statistical models Proportions and rates Bendix Carstensen Steno Diabetes Center & Department of Biostatistics, University of Copenhagen bxc@steno.dk http://bendixcarstensen.com PhD-course in Epidemiology, Department of Biostatistics, Tuesday 3 January, 207

A single proportion The log-likelihood for π, the proportion dead, if we observe 4 deaths out of 0: l(π) = 4log(π) + 6log( π) The log-likelihood for ω, the odds of dying, if we observe 4 deaths and 6 non-deaths: l(π) = 4log(ω) 0log( + ω) Simple statistical models (Proportions and rates) 49/ 59 Programs General purpose programs for estimating in the binomial and Poisson distribution: SAS: proc genmod R: glm Stata: glm Here we primarily look at SAS. Simple statistical models (Proportions and rates) 50/ 59 Estimating odds: genmod data p ; input x n ; datalines ; 4 0 ; proc genmod data= p ; model x/n = / dist=bin link=logit ; estimate "4 versus 6" intercept / exp ; Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > Chi Intercept -0.4055 0.6455 -.6706 0.8597 0.39 0.52 Scale 0.0000 0.0000.0000.0000 Contrast Estimate Results L Beta Standard L Beta Chi- Label Estimate Error Confidence Limits Square Pr > ChiSq 4 versus 6-0.4055 0.6455 -.6706 0.8597 0.39 0.5299 Exp(4 versus 6) 0.6667 0.4303 0.88 2.3624 Simple statistical models (Proportions and rates) 5/ 59

Estimating a proportion: genmod The only difference from estimation of odds is the link= argument, which is changed to log (instead of logit): proc genmod data= p ; model x/n = / dist=bin link=log ; estimate "4 out of 0" intercept / exp ; Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > Chi Intercept -0.963 0.3873 -.6754-0.572 5.60 0.0 Scale 0.0000 0.0000.0000.0000 Contrast Estimate Results L Beta Standard L Beta Chi- Label Estimate Error Confidence Limits Square Pr > ChiSq 4 out of 0-0.963 0.3873 -.6754-0.572 5.60 0.080 Exp(4 out of 0) 0.4000 0.549 0.872 0.8545 Simple statistical models (Proportions and rates) 52/ 59 A single proportion: individual records data bissau; filename bisfile url "http://www.biostat.ku.dk/~pka/epidata/bissau.txt"; infile bisfile firstobs=2; input id fuptime dead bcg dtp age agem; run; title "Estimate odds - Bissau" ; proc genmod data=bissau descending ; model dead = / dist=bin link=logit ; estimate "odds of dying" intercept / exp ; Contrast Estimate Results L Beta Standard L Beta Chi- Label Estimate Error Confidence Limits Square Pr > ChiSq odds of dying -3.249 0.0686-3.2593-2.9905 2076.5 <.000 Exp(odds of dying) 0.0439 0.0030 0.0384 0.0503 Simple statistical models (Proportions and rates) 53/ 59 A single proportion: individual records title "Estimate proportion - Bissau" ; proc genmod data=bissau descending ; model dead = / dist=bin link=log ; estimate "prob of dying" intercept / exp ; Contrast Estimate Results L Beta Standard L Beta Chi- Label Estimate Error Confidence Limits Square Pr > ChiSq prob of dying -3.679 0.0657-3.2966-3.039 2325.8 <.000 Exp(prob of dying) 0.042 0.0028 0.0370 0.0479 Simple statistical models (Proportions and rates) 54/ 59

Likelihood for a single rate Recall the log-likelihood for a single rate, λ based on D events during Y person years: Dlog(λ) λy This is also the log-likelihood for a Poisson variate D with mean µ = λy. Therefor we can use a program for the Posson distribution to estimate rates, except we must remove the Y from the mean. Poisson distribution usually use the log-mean: log(µ) = log(λ) + log(y ) log(y ) extracted via the offset argument. Simple statistical models (Proportions and rates) 55/ 59 A single rate data r ; input d y ; ly = log(y) ; my = log(y/000) ; datalines ; 30 26.9 ; title "Estimate a rate per year" ; proc genmod data= r ; model d = / dist=poisson link=log offset=ly ; estimate "30 during 26.9 - per year" intercept / exp ; Contrast Estimate Results L Beta Standard L Beta Chi- Label Estimate Error Confidence Limits Square 30 during 26.9 - per year -2.668 0.826-2.5246 -.8089 40.85 Exp(30 during 26.9 - per year) 0.45 0.0209 0.080 0.638 Simple statistical models (Proportions and rates) 56/ 59 A single rate: Scaling Remember the data step statement: my = log(y/000) ; title "Estimate a rate per 000 year" ; proc genmod data= r ; model d = / dist=poisson link=log offset=my ; estimate "30 during 26.9 - per 000 years" intercept / exp ; Contrast Estimate Results L Beta Standard L Beta Label Estimate Error Alpha Confidence Limits 30 during 26.9 - per 000 years 4.740 0.826 0.05 4.3832 5.0988 Exp(30 during 26.9 - per 000 years) 4.5475 20.934 0.05 80.0900 63.8299 Simple statistical models (Proportions and rates) 57/ 59

A single rate: individual records data bissau ; set bissau ; ld = log(fuptime) ; ly = log(fuptime/36525) ; title "Estimate a rate per day" ; proc genmod data=bissau ; model dead = / dist=poisson link=log offset=ld ; estimate "mortality rate - per day" intercept / exp ; Contrast Estimate Results L Beta Standard L Beta Chi- Label Estimate Error Alpha Confidence Limits Square mortality rate - per day -8.2852 0.067 0.05-8.468-8.537 5239 Exp(mortality rate - per day) 0.0003 0.0000 0.05 0.0002 0.0003 Simple statistical models (Proportions and rates) 58/ 59 Single rate individual records, scaling Remember the data step statement: ly = log(fuptime/36525) ; title "Estimate a rate per year" ; proc genmod data=bissau ; model dead = / dist=poisson link=log offset=ly ; estimate "mortality rate - per 00 years" intercept / exp ; Contrast Estimate Results L Beta Standard L Beta Label Estimate Error Alpha Confidence Limits mortality rate - per 00 years 2.2205 0.067 0.05 2.0890 2.352 Exp(mortality rate - per 00 years) 9.223 0.683 0.05 8.0768 0.5074 Simple statistical models (Proportions and rates) 59/ 59