STA6938-Logistic Regression Model

Size: px
Start display at page:

Download "STA6938-Logistic Regression Model"

Transcription

1 Dr. Ying Zhang STA6938-Logistic Regression Model Topic 6-Logistic Regression for Case-Control Studies Outlines: 1. Biomedical Designs 2. Logistic Regression Models for Case-Control Studies 3. Logistic Regression Model for Matched Case-Control Studies 1

2 1. Biomedical Designs A. Types of Studies 1. Observational Study: collect data from an existing situation. The data collection does not intentionally interfere with the running of the system. Example: Sample Survey - sending out questionnaires about health care. 2. Experiment Study: investigator deliberately sets one or more factors to a specific level. Laboratory experiment: it takes place in an environment where experiment manipulation is facilitated. Comparative experiment: it compares two or more techniques, treatments or levels of some variables. Experimental unit (study unit): the smallest unit upon which the experiment or study is performed. Example: In a clinical study, the experimental units are usually humans. Cross-over experiment: the same experimental unit receives more than one treatment, or is investigated under more than one condition of the experiment. The different treatments are given during non-overlapping time periods. 2

3 Example: Laboratory animals are treated sequentially with more than one drug, and blood levels of certain metabolites are measured for each of the drugs. 3. Clinical Study: it takes place in the setting of clinical medicine. 4. Prospective Study: a cohort of people is followed for the occurrence or nonoccurrence of specified endpoints or events or measurements. Cohort: a group of people whose membership is clearly defined. Example: All individuals enrolling in the Graduate School at UCF. Endpoint: clearly defined outcome or event associated with an experimental or study unit. Baseline characteristics or baseline variables: values collected at the time of entry into the study. Example: (Cardiovascular Disease) A study was conducted to look at the effects of oral contraceptives (OC) on heart disease in women years of age. It is found that among 5,000 current OC users at baseline, 13 women develop a myocardial infarction (MI) over a 3- year period, while among 10,000 non-oc users, 7 develop an MI over a 3-year period. 3

4 OC Use Group MI Status Over 3 Years OC Users Non-OC Users This is a typical example of a prospective design. Cohort: Women in age years were disease free at baseline and had their exposure (OC use) measured at time of entry. They were followed for three years, during which some developed disease, while others remained disease free. Drawback: The endpoint of interest may occur infrequently. If that happens, extremely large numbers of people need to be followed. 5. Retrospective Study: individuals having a particular outcome or endpoint are identified and studied. The individuals with endpoint are usually compared to other individuals without the endpoint to investigate the possible causal factors of the endpoint. Subjects with particular characteristics of interest are often collected into registries. In the United States there are cancer registries, often defined for a specified metropolitan area. A cancer registry can be used retrospectively to compare presence or absence of possible causal factors of cancer after generating appropriate controls either randomly from the same population or by some matching mechanism. Alternatively, a cancer registry can be used prospectively by comparing survival times of cancer patients having various therapies. 4

5 Example: A hypothesis has been proposed that breast cancer in women is caused in part by events that occur between the age at menarche (i.e., the age when menstruation begins) and at first childbirth. In particular, the hypothesis states that the risk of breast cancer increases as the length of this time interval increases. If the theory is correct, then an important risk factor for breast cancer is age at first birth. This theory would explain in part why breast cancer incidence seems to be higher for women in the upper socioeconomic groups, since they tend to have their children relatively late. This is a typical example of a retrospective study. Breast cancer cases were identified together with controls who were in the hospital at the same time as the cases but did not have breast cancer and were of comparable age to the controls. Pregnancy history (age at first birth) was compared between cases and controls. Comparison between the two types of studies: Less bias in a prospective study than in a retrospective study, because (i) Patient s knowledge of their current health situation will be more precise than their recall of their past health situation. (ii) It is much more difficult to obtain a representative sample of people who already have the disease in question since, for example, some of the severe diseased individuals may have already died. (iii) The diseased individuals, if still alive, may tend to give biased answers about prior 5

6 health situation if they believe there is a relationship between their prior health situation and the disease. Retrospective study is much less expensive to perform and can be completed in much less time than a prospective study. 6. Case-control Study: select all cases, usually a disease, meeting fixed criteria. A comparison group for the cases, called controls is also selected. The cases and controls are compared with respect to various characteristics. Usually, the case-control study is applied to some rare disease in which the cases are hard to get. 7. Longitudinal Study: collect information on study units over a specified time interval. 8. Cross-sectional Study: collect information on study units at some fixed time. Longitudinal Study Time Cross- Sectional Study 9. Placebo: the treatment is designed to appear exactly like a comparison treatment, but to be devoid of the active part of the treatment. 6

7 10. Blind Study: B. Ethics Single blind: treated subjects are unaware of which treatment (including any control) they are receiving. Double blind: both patients and the individuals evaluating the outcome variables are unaware of which treatment the subjects are receiving. In the biomedical field, many researches and experiments involve both animal and/or human participants. Moral and legal issues are involved in both areas. 1. All investigators involved in a study are responsible for the conduct of an ethical study to the extent that they may be expected to know what is involved in the study. 2. Proposed studies involving humans (or animals) should be reviewed by individuals not concerned or connected with the study or the investigators. 3. Individuals participating in an experiment should understand and sign an informed consent form. 4. Participants should be free to withdraw at any time, or to refuse initial participation, without being penalized or jeopardized with respect to current and future care and activities. 5. Animal studies should be done prior to human experimentation. C. Steps to perform a study 7

8 Step 1: A question or problem area of interest is considered. Scientific question, not involve biostatistics Step 2: A study is to be designed to tackle the question (i) Identify the data to be collected Variable Experiment units Sample size (ii) An appropriate analytical model needs to be developed for describing and processing the data. (iii) What are the inferences one hopes to make from the particular study? What conclusions might one draw from the study? To what population(s) is the conclusion applicable? Step 3: The study is carried out and the data are collected Step 4: The data are analyzed and conclusions and inferences are drawn. Step 5: The results are used (i) changing operating procedures (ii) publishing results (iii) planning a subsequent study 8

9 2. Logistic Regression Models for Case-Control Studies a. Prospective Cohort studies In prospective cohort studies, a subject selected randomly from corresponding population is followed for a fixed period of time and the outcome variable is measured. A. Random sampling (including randomized trial according to treatments) Likelihood: L( β ) = PY ( i = yi xi) n i= 1 B. Stratified randomization: location, clinic, age. k Likelihood: L( β ) = PY ( ij = yij xij ) n i i= 1 j= 1 1. Model account for stratum effect π ( xij ) P( Yij 1 xij ) ' ( β0, i + β xij) ' ( β0, i β xi ) exp = = = 1 + exp + j 2. Model account for stratum-specific response π ( xij ) P( Yij 1 xij ) ' ( β0, i + βixij) ' ( β0, i βixij) exp = = = 1 + exp + To estimate these parameters, one can just use regular logistic regression model by adding dummy variable for 9

10 stratum and also consider the interaction between each covariate and stratum variables. b. Retrospective Case-Control Studies In retrospective case-control studies, we actually have two strata defined by the outcome variable (Y=1 or 0). The characteristics of subjects in each stratum are collected retrospectively. Let s be the selection variable, recording the selection status for each subject in the population. Suppose we have a case-control study with and n 0 controls (Y=0). n 1 cases (Y=1) n 1 0 Full Likelihood = Px ( y= 1, s= 1) Px ( y= 0, s= 1) (1) i i i i i i i= 1 i= 1 n Question: How do we incorporate the logistic regression model, ' exp( β0 + β x) π ( x) = P( Y = 1 x) = ' 1 + exp β + β x into the full likelihood (1)? ( 0 ) 10

11 Note: ( ) ( ) ( ) ( ) P( y s ) P x, y s = 1 P y x, s = 1 P x s = 1 P( x y, s = 1 ) = = (2) P y s = 1 = 1 ( 1 x, s 1) P y = = = = ( = 1, = 1 ) P( s = 1 x) P( y = 1 x) P( s = 1 y = 1, x) ( = 0 ) ( = 1 = 0, ) + ( = 1 ) ( = 1 = 1, ) P y s x P y x P s y x P y x P s y x Assume that the selection criterion is covariate independent, then 1 and 0 ( 1 1, ) ( 1 1) τ = P s = y = x = P s= y = τ = P( s = 1 y = 0, x) = P( s= 1 y = 0) and thus ( 1 x, s 1) P y = = = τπ 1 ( x) ( x ) + τ 1 π( ) τ π( x) 0 1 τ1 π ( x) τ 1 π( x) ' ( β + β x) ' ( x) 0 0 = = = π τ1 π( x) 1+ exp β0 + β 1+ τ 1 π( x) 0 exp ( x) with ( ) β = τ τ + β. 0 ln 1/

12 Also, we can assume that ( ) Px ( s= 1) = P x: the probability distribution of the covariates, because usually sampling is carried out independent of covariate values. Thus (2) can be written as ( 1, s 1) P x y = = = ( x) P( x) ( = 1 s= 1) π P y. We can do the same for P( x y 0, s 1) likelihood (1) can be reformulated as = =. Thus the full P( xi ) ( s = 1) n = L ( β ) i= 1 P yi i Full Likelihood, where L n y ( ) ( ) 1 ( ) 1 i β = π x i π x i yi. i= 1 -This is the likelihood obtained when we pretend the casecontrol data were collected in a cohort study. 12

13 What does this likelihood imply? Note that ( ) ( ) P y = 1 s = 1 = n / n and P y = 0 s = 1 = n / n. i i 1 i i 0 If we assume that the distribution of covariates in non informative about the regression coefficient, then maximizing the full likelihood with respect to β is equivalent to maximizing ( ) L β. Thus, maximization of the full likelihood with respect to the parameters in π ( x) need only considers that portion of the likelihood which looks like a cohort study. Implication: analysis of data from case-control studies via logistic regression may proceed in the same way and using the same computer programs as cohort study. But inference about β 0 is not available. Theoretically, the maximum likelihood estimators obtained by pretending that case-control data resulted from a cohort sample have the usual asymptotic properties associated with maximum likelihood estimators. Prentice and Pyke (1979) Biometrika, 66,

14 3. Logistic Regression Model for Matched Case-Control Studies Matched Case-Control Study: Subjects are stratified on the basis of variables belied to be associated with the outcome. Variables like age, sex are commonly used as stratification variables. Within each stratum, samples of cases (y=1) and controls (y=0) are chosen 1-1 matched study: one case and one control selected from each stratum 1-M matched study: one case and M controls selected from each stratum. A. Theoretical aspects of matched study Suppose we have total p strata, which amounts of total (n1+ n0) pobservations, which contains n1 cases and n0 controls for each stratum. Since we assume that the stratified variable is related to outcome, we should consider build a stratum-specific logistic regression model, i.e. p n + n 0 1 yki L ( β) = πk ( xk, i) 1 πk ( xk, i) where k= 1 i= 1 1 y, ki, 14

15 π k ( x) β k,0 e = 1 + e β + β' x k,0 + β' x (3) Note that the number of parameters increases in the rate of sample size. Consequence: failure to use the regular large sample theorem for making statistical inference. Remedy: Consider the conditional likelihood of observed data given the stratum total and the total number of cases observed. For the kth stratum, assume we have total n k,1 n k,0 which contain cases and controls. n k observations The conditional probability of the observed data given the total observations and cases is L ( β ) = n k,1 k k,1 n k,1 k ( = 0 ) ( = 0) P x y P x y i i i i i= 1 i= n + 1 k,1 k c n n where k ( ji, 0 ) (, 0) j i = j jij i = j P x y P x y j= 1 i = 1 i = n + 1 j j k,1 n c k nk nk! = = n n! n n n k subjects. cases to ( ) k,1 k,1 k k,1! : total possible ways to assign n k,1 15

16 If we assume the stratum-specific logistic regression model (3), then applying Bayes theorem to P( x y ) leads that L k ( ) n k,1 i= 1 β = c n k k,1 j= 1 i = 1 j e β ' x e i β ' x ji, j (4) Then the likelihood function for the K strata data is L ( ) n k,1 β ' xi K e i 1 c n k k,1 k = 1 β ' x ji, j. = e j= 1 i j = 1 β = Note that the number of unknown parameters in this likelihood function is fixed as the number of strata increases. The regular statistical inference procedures based on likelihood analysis apply. How to solve the problem? (Estimate the regression parameters) Modify the proportional hazards regression Some tricks in logistic regression model 16

17 B. 1-1 matched study Let x 1,k and x 0,k denote the covariate vectors for the case and control, respectively, in the kth stratum. The likelihood function (4) for the kth stratum is simply equal to L ( ) β = e β ' x k β x x 1, k e (5) ' 1, k β' 0, k + e L β = if x 1, k = x 0, k. Note that ( ) 0.5 k Implication: the kth stratum or kth pair is uninformative in estimating the coefficients-the value of covariate does not help distinguish which subject is more likely to be the case. So the estimator may be only based on small fraction of the total number of possible pairs (discordant pairs) if the covariate is a binary variable itself. Trick: We rewrite likelihood (5) in to L ( β ) β '( 1, 0, ) β ' = e e '( 1, 0, ) = β 1+ e 1 + e x x x k k k k x x x k k β ' k and then the full relevant conditional likelihood function can be written as L ( ) yk 1 yk k k k K K β' x K β' x β' x β e e e L ( β = k ) = = 1 β' x k β' xk β' xk k 1 k 11 e k 1 1 e 1 e = = + =

18 for all y k = 1, k = 1, 2,, K This implies that we can just think of the scenario that we have K observations with covariates given by xk = x1, k x0, k and all response being the case. Then the regular logistic regression package can be applied to find the estimate. But be aware that In making use of any statistical package for logistic regression model, make sure that the procedure allowing you to suppress the intercept term. Before making any adjustment, make sure that you have already had the design variables for the nominal covariates. Don t try to recode the categorical-like covariates in x k, just pretend that all covariates in that vector are continuous. Fortunately, SAS can do it! 18

19 Example 6.1 We fit the low birth-weight data LOWBWT11.DAT by taking care of the nature of 1-1 matched case-control experiment for this particular study. The data consists of 56 age matched case-control pairs. The description of the data set is given as follows Table: Code Sheet for the Variables in the Paired Low Birth Weight Data Set. Columns Variable Abbreviation Pair Number PAIR# 19 Low Birth Weight (0 = Birth Weight ge 2500g, LOW l = Birth Weight < 2500g) Age of the Mother in Years AGE Weight in Pounds at the Last Menstrual Period LWT 41 Race (1 = White, 2 = Black, 3 = Other) RACE 50 Smoking Status During Pregnancy (1 = Yes, 0 = No) SMOKE 57 History of Premature Labor (0 = None, 1 = Yes) PTD 63 History of Hypertension (1 = Yes, 0 = No) HT 69 Presence of Uterine Irritability (1 = Yes, 0 = No) UI We want to fit a logistic regression model to explain the risk factors for the low birth weight. 19

20 SAS code a. Data preparation data diffs; set Logistic.Lowbwtm11; / Initialize temporary variables to hold covariate values for the control of each pair. / retain pair1 lwt1 race3 race4 smoke1 ptd1 ht1 ui1 0; drop pair1 lwt1 race race3 race4 smoke1 ptd1 ht1 ui1; / Create design variables for race. / select(race); when (1) do; race1=0; race2=0; end; when (2) do; race1=1; race2=0; end; when (3) do; race1=0; race2=1; end; end; / Copy the values of the covariates of each control into temporary variables. / if(pair ne pair1) then do; pair1=pair; lwt1=lwt; race3=race1; race4=race2; smoke1=smoke; ptd1=ptd; ht1=ht; ui1=ui; end; / Calculate the differences (case-control) between the covariates of each pair and output the observation. / else do; lwt=lwt-lwt1; race1=race1-race3; race2=race2-race4; smoke=smoke-smoke1; ptd=ptd-ptd1; ht=ht-ht1; 20

21 end; run; ui=ui-ui1; output; b. Univariate Analysis / display the number of discordant pairs for the covariates. / proc freq data=diffs; tables lwt smoke race1 race2 ptd ht ui; run; / Univariate analysis. / proc logistic data=diffs; model low=lwt /noint; run; proc logistic data=diffs; model low=smoke /noint; run; proc logistic data=diffs; model low=race1 race2 /noint; run; proc logistic data=diffs; model low=ptd /noint; run; proc logistic data=diffs; model low=ht /noint; run; proc logistic data=diffs; model low=ui /noint; run; Table 1. Results from Univariate Analysis Variable Coeff. S.D. OR 95% Discordant Pairs ( n, n ) LWT (0.807,1.028) NA SMOKE (1.224,6.177) (22,8) RACE_ (0.391,3.042) NA RACE_ (0.446,2.114) NA PTD (1.245,11.299) (15,4) HT (0.603,9.023) (7,3) UI (0.968,9.302) (12,4) 21

22 Remarks: Odds ratio for LWT is based on the increment of 10 pounds For a binary variable, the coefficient is simply ˆ 10 β = ln n n 01 The univariate analysis shows that only SMOKE and PTD appear to be a significant factor for the low birth-weight. We have to be aware that the statistical inference we are making here is based on large sample theory. However, in the univariate analysis, the data are quite thin for many binary variables. So we shouldn t trust the results in Table 1 too much. c. Multivariate Analysis title1 'Multivariate Analysis of 1:1 Matched Case-Control Data'; proc logistic data=diffs; model low=lwt smoke race1 race2 ptd ht ui /noint; race: test race1=0, race2=0; run; 22

23 Model Fit Statistics Without With Criterion Covariates Covariates AIC SC Log L Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio Score Wald The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq LWT SMOKE race race PTD HT UI Linear Hypotheses Testing Results Wald Label Chi-Square DF Pr > ChiSq race Comments: By Wald-test, only RACE is apparently not a significant risk factor for the low birth-weight. However, the race is always considered as a potential confounding variable. Let examine the model without RACE. 23

24 proc logistic data=diffs; model low=lwt smoke ptd ht ui /noint; run; Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Without With Criterion Covariates Covariates AIC SC Log L Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio Score Wald The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq LWT SMOKE PTD HT UI Likelihood ratio test for RACE effect: G = ( ) = ( χ2 ) ( χ2 ) p value = Pr > G = Pr > = Adding RACE into the model does not seem to alter the regression coefficients too much. (22% is the maximum change happened for covariate LWT) 24

25 It is up to domain expert to decide if it is any advantage to keep this variable in the model. Here, we pretend that this variable is not practically important, so we will not consider including this variable in the model. d. Model diagnosis / Create an output data set of diagnostic statistics. / proc logistic data=diffs; model low=lwt smoke ptd ht ui /noint; output out=stats difchisq=d_chi difdev=d_dev h=hat predicted=pred; run; title1; options nolable; / Create some diagnostic plots. / proc gplot data=stats; axis1 label=(angle=-90 rotate=90 'Deviance Change'); plot d_devpred /frame vaxis=axis1; title2 'Change in Deviance Versus Predicted Probability'; run; axis2 label=(angle=-90 rotate=90 'Chi-Square Change'); bubble d_chipred=hat /frame vaxis=axis2; title2 'Change in Chi-Square Versus Predicted Probability'; run; 25

26 We plot the change of deviance and Pearson Chi-square statistics. 26

27 We found that there are 5 pairs that had large deviance and Pearson Chi-square changes if deleted. Among these five, two have the largest leverage values. These pairs should be closely examined. 27

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ Logistic Regression The goal of a logistic regression analysis is to find the best fitting and most parsimonious, yet biologically reasonable, model to describe the relationship between an outcome (dependent

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH The First Step: SAMPLE SIZE DETERMINATION THE ULTIMATE GOAL The most important, ultimate step of any of clinical research is to do draw inferences;

More information

Lecture 7 Time-dependent Covariates in Cox Regression

Lecture 7 Time-dependent Covariates in Cox Regression Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the

More information

More Statistics tutorial at Logistic Regression and the new:

More Statistics tutorial at  Logistic Regression and the new: Logistic Regression and the new: Residual Logistic Regression 1 Outline 1. Logistic Regression 2. Confounding Variables 3. Controlling for Confounding Variables 4. Residual Linear Regression 5. Residual

More information

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs STAT 5500/6500 Conditional Logistic Regression for Matched Pairs Motivating Example: The data we will be using comes from a subset of data taken from the Los Angeles Study of the Endometrial Cancer Data

More information

Tests for Two Correlated Proportions in a Matched Case- Control Design

Tests for Two Correlated Proportions in a Matched Case- Control Design Chapter 155 Tests for Two Correlated Proportions in a Matched Case- Control Design Introduction A 2-by-M case-control study investigates a risk factor relevant to the development of a disease. A population

More information

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data Person-Time Data CF Jeff Lin, MD., PhD. Incidence 1. Cumulative incidence (incidence proportion) 2. Incidence density (incidence rate) December 14, 2005 c Jeff Lin, MD., PhD. c Jeff Lin, MD., PhD. Person-Time

More information

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies Section 9c Propensity scores Controlling for bias & confounding in observational studies 1 Logistic regression and propensity scores Consider comparing an outcome in two treatment groups: A vs B. In a

More information

Lecture 8: Summary Measures

Lecture 8: Summary Measures Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8:

More information

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53

More information

Testing Independence

Testing Independence Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1

More information

11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies.

11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies. Matched and nested case-control studies Bendix Carstensen Steno Diabetes Center, Gentofte, Denmark http://staff.pubhealth.ku.dk/~bxc/ Department of Biostatistics, University of Copengen 11 November 2011

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Ignoring the matching variables in cohort studies - when is it valid, and why?

Ignoring the matching variables in cohort studies - when is it valid, and why? Ignoring the matching variables in cohort studies - when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposure-outcome association

More information

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection Model Selection in GLMs Last class: estimability/identifiability, analysis of deviance, standard errors & confidence intervals (should be able to implement frequentist GLM analyses!) Today: standard frequentist

More information

Nemours Biomedical Research Statistics Course. Li Xie Nemours Biostatistics Core October 14, 2014

Nemours Biomedical Research Statistics Course. Li Xie Nemours Biostatistics Core October 14, 2014 Nemours Biomedical Research Statistics Course Li Xie Nemours Biostatistics Core October 14, 2014 Outline Recap Introduction to Logistic Regression Recap Descriptive statistics Variable type Example of

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs STAT 5500/6500 Conditional Logistic Regression for Matched Pairs The data for the tutorial came from support.sas.com, The LOGISTIC Procedure: Conditional Logistic Regression for Matched Pairs Data :: SAS/STAT(R)

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Beyond GLM and likelihood

Beyond GLM and likelihood Stat 6620: Applied Linear Models Department of Statistics Western Michigan University Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence

More information

STAT 705: Analysis of Contingency Tables

STAT 705: Analysis of Contingency Tables STAT 705: Analysis of Contingency Tables Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Analysis of Contingency Tables 1 / 45 Outline of Part I: models and parameters Basic

More information

2 Describing Contingency Tables

2 Describing Contingency Tables 2 Describing Contingency Tables I. Probability structure of a 2-way contingency table I.1 Contingency Tables X, Y : cat. var. Y usually random (except in a case-control study), response; X can be random

More information

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) B.H. Robbins Scholars Series June 23, 2010 1 / 29 Outline Z-test χ 2 -test Confidence Interval Sample size and power Relative effect

More information

Survival Regression Models

Survival Regression Models Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant

More information

COMPLEMENTARY LOG-LOG MODEL

COMPLEMENTARY LOG-LOG MODEL COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α

More information

Lecture 01: Introduction

Lecture 01: Introduction Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

Longitudinal Modeling with Logistic Regression

Longitudinal Modeling with Logistic Regression Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

More information

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 Department of Statistics North Carolina State University Presented by: Butch Tsiatis, Department of Statistics, NCSU

More information

Asymptotic equivalence of paired Hotelling test and conditional logistic regression

Asymptotic equivalence of paired Hotelling test and conditional logistic regression Asymptotic equivalence of paired Hotelling test and conditional logistic regression Félix Balazard 1,2 arxiv:1610.06774v1 [math.st] 21 Oct 2016 Abstract 1 Sorbonne Universités, UPMC Univ Paris 06, CNRS

More information

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S Logistic regression analysis Birthe Lykke Thomsen H. Lundbeck A/S 1 Response with only two categories Example Odds ratio and risk ratio Quantitative explanatory variable More than one variable Logistic

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Confounding, mediation and colliding

Confounding, mediation and colliding Confounding, mediation and colliding What types of shared covariates does the sibling comparison design control for? Arvid Sjölander and Johan Zetterqvist Causal effects and confounding A common aim of

More information

CHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials

CHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials CHL 55 H Crossover Trials The Two-sequence, Two-Treatment, Two-period Crossover Trial Definition A trial in which patients are randomly allocated to one of two sequences of treatments (either 1 then, or

More information

Measures of Association and Variance Estimation

Measures of Association and Variance Estimation Measures of Association and Variance Estimation Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 35

More information

β j = coefficient of x j in the model; β = ( β1, β2,

β j = coefficient of x j in the model; β = ( β1, β2, Regression Modeling of Survival Time Data Why regression models? Groups similar except for the treatment under study use the nonparametric methods discussed earlier. Groups differ in variables (covariates)

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials. The GENMOD Procedure MODEL Statement MODEL response = < effects > < /options > ; MODEL events/trials = < effects > < /options > ; You can specify the response in the form of a single variable or in the

More information

Chapter 5: Logistic Regression-I

Chapter 5: Logistic Regression-I : Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Tests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X

Tests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X Chapter 157 Tests for the Odds Ratio in a Matched Case-Control Design with a Quantitative X Introduction This procedure calculates the power and sample size necessary in a matched case-control study designed

More information

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC Mantel-Haenszel Test Statistics for Correlated Binary Data by Jie Zhang and Dennis D. Boos Department of Statistics, North Carolina State University Raleigh, NC 27695-8203 tel: (919) 515-1918 fax: (919)

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2008 Paper 241 A Note on Risk Prediction for Case-Control Studies Sherri Rose Mark J. van der Laan Division

More information

multilevel modeling: concepts, applications and interpretations

multilevel modeling: concepts, applications and interpretations multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Treatment Variables INTUB duration of endotracheal intubation (hrs) VENTL duration of assisted ventilation (hrs) LOWO2 hours of exposure to 22 49% lev

Treatment Variables INTUB duration of endotracheal intubation (hrs) VENTL duration of assisted ventilation (hrs) LOWO2 hours of exposure to 22 49% lev Variable selection: Suppose for the i-th observational unit (case) you record ( failure Y i = 1 success and explanatory variabales Z 1i Z 2i Z ri Variable (or model) selection: subject matter theory and

More information

Welcome! Webinar Biostatistics: sample size & power. Thursday, April 26, 12:30 1:30 pm (NDT)

Welcome! Webinar Biostatistics: sample size & power. Thursday, April 26, 12:30 1:30 pm (NDT) . Welcome! Webinar Biostatistics: sample size & power Thursday, April 26, 12:30 1:30 pm (NDT) Get started now: Please check if your speakers are working and mute your audio. Please use the chat box to

More information

Case-control studies

Case-control studies Matched and nested case-control studies Bendix Carstensen Steno Diabetes Center, Gentofte, Denmark b@bxc.dk http://bendixcarstensen.com Department of Biostatistics, University of Copenhagen, 8 November

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Chapter 11: Analysis of matched pairs

Chapter 11: Analysis of matched pairs Chapter 11: Analysis of matched pairs Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 42 Chapter 11: Models for Matched Pairs Example: Prime

More information

SAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ;

SAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ; SAS Analysis Examples Replication C8 * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ; libname ncsr "P:\ASDA 2\Data sets\ncsr\" ; data c8_ncsr ; set ncsr.ncsr_sub_13nov2015

More information

Statistical Methods for Alzheimer s Disease Studies

Statistical Methods for Alzheimer s Disease Studies Statistical Methods for Alzheimer s Disease Studies Rebecca A. Betensky, Ph.D. Department of Biostatistics, Harvard T.H. Chan School of Public Health July 19, 2016 1/37 OUTLINE 1 Statistical collaborations

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response) Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

EDF 7405 Advanced Quantitative Methods in Educational Research. Data are available on IQ of the child and seven potential predictors.

EDF 7405 Advanced Quantitative Methods in Educational Research. Data are available on IQ of the child and seven potential predictors. EDF 7405 Advanced Quantitative Methods in Educational Research Data are available on IQ of the child and seven potential predictors. Four are medical variables available at the birth of the child: Birthweight

More information

Tutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis. Acknowledgements:

Tutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis. Acknowledgements: Tutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis Anna E. Barón, Keith E. Muller, Sarah M. Kreidler, and Deborah H. Glueck Acknowledgements: The project was supported

More information

Lecture 3.1 Basic Logistic LDA

Lecture 3.1 Basic Logistic LDA y Lecture.1 Basic Logistic LDA 0.2.4.6.8 1 Outline Quick Refresher on Ordinary Logistic Regression and Stata Women s employment example Cross-Over Trial LDA Example -100-50 0 50 100 -- Longitudinal Data

More information

BINF 702 SPRING Chapter 8 Hypothesis Testing: Two-Sample Inference. BINF702 SPRING 2014 Chapter 8 Hypothesis Testing: Two- Sample Inference 1

BINF 702 SPRING Chapter 8 Hypothesis Testing: Two-Sample Inference. BINF702 SPRING 2014 Chapter 8 Hypothesis Testing: Two- Sample Inference 1 BINF 702 SPRING 2014 Chapter 8 Hypothesis Testing: Two-Sample Inference Two- Sample Inference 1 A Poster Child for two-sample hypothesis testing Ex 8.1 Obstetrics In the birthweight data in Example 7.2,

More information

Lecture 2: Poisson and logistic regression

Lecture 2: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

Case-control studies C&H 16

Case-control studies C&H 16 Case-control studies C&H 6 Bendix Carstensen Steno Diabetes Center & Department of Biostatistics, University of Copenhagen bxc@steno.dk http://bendixcarstensen.com PhD-course in Epidemiology, Department

More information

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk

More information

a Sample By:Dr.Hoseyn Falahzadeh 1

a Sample By:Dr.Hoseyn Falahzadeh 1 In the name of God Determining ee the esize eof a Sample By:Dr.Hoseyn Falahzadeh 1 Sample Accuracy Sample accuracy: refers to how close a random sample s statistic is to the true population s value it

More information

Lecture 5: Poisson and logistic regression

Lecture 5: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 3-5 March 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

A Generalized Global Rank Test for Multiple, Possibly Censored, Outcomes

A Generalized Global Rank Test for Multiple, Possibly Censored, Outcomes A Generalized Global Rank Test for Multiple, Possibly Censored, Outcomes Ritesh Ramchandani Harvard School of Public Health August 5, 2014 Ritesh Ramchandani (HSPH) Global Rank Test for Multiple Outcomes

More information

Today. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News:

Today. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News: Today HW 1: due February 4, 11.59 pm. Aspects of Design CD Chapter 2 Continue with Chapter 2 of ELM In the News: STA 2201: Applied Statistics II January 14, 2015 1/35 Recap: data on proportions data: y

More information

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102 Background Regression so far... Lecture 21 - Sta102 / BME102 Colin Rundel November 18, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical

More information

Logistic Regression. Building, Interpreting and Assessing the Goodness-of-fit for a logistic regression model

Logistic Regression. Building, Interpreting and Assessing the Goodness-of-fit for a logistic regression model Logistic Regression In previous lectures, we have seen how to use linear regression analysis when the outcome/response/dependent variable is measured on a continuous scale. In this lecture, we will assume

More information

STA441: Spring Multiple Regression. More than one explanatory variable at the same time

STA441: Spring Multiple Regression. More than one explanatory variable at the same time STA441: Spring 2016 Multiple Regression More than one explanatory variable at the same time This slide show is a free open source document. See the last slide for copyright information. One Explanatory

More information

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Sections 3.4, 3.5 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 3.4 I J tables with ordinal outcomes Tests that take advantage of ordinal

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

Inference for Binomial Parameters

Inference for Binomial Parameters Inference for Binomial Parameters Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 58 Inference for

More information

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification Todd MacKenzie, PhD Collaborators A. James O Malley Tor Tosteson Therese Stukel 2 Overview 1. Instrumental variable

More information

TESTS FOR EQUIVALENCE BASED ON ODDS RATIO FOR MATCHED-PAIR DESIGN

TESTS FOR EQUIVALENCE BASED ON ODDS RATIO FOR MATCHED-PAIR DESIGN Journal of Biopharmaceutical Statistics, 15: 889 901, 2005 Copyright Taylor & Francis, Inc. ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543400500265561 TESTS FOR EQUIVALENCE BASED ON ODDS RATIO

More information

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,

More information

Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see Title stata.com logistic postestimation Postestimation tools for logistic Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu 1 / 35 Tip + Paper Tip Meet with seminar speakers. When you go on

More information

Chapter 11: Models for Matched Pairs

Chapter 11: Models for Matched Pairs : Models for Matched Pairs Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Probability and Probability Distributions. Dr. Mohammed Alahmed

Probability and Probability Distributions. Dr. Mohammed Alahmed Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about

More information

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis. Estimating the Mean Response of Treatment Duration Regimes in an Observational Study Anastasios A. Tsiatis http://www.stat.ncsu.edu/ tsiatis/ Introduction to Dynamic Treatment Regimes 1 Outline Description

More information

Chapter 2: Describing Contingency Tables - I

Chapter 2: Describing Contingency Tables - I : Describing Contingency Tables - I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]

More information

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016 Statistics 255 - Survival Analysis Presented March 3, 2016 Motivating Dan Gillen Department of Statistics University of California, Irvine 11.1 First question: Are the data truly discrete? : Number of

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Review of One-way Tables and SAS

Review of One-way Tables and SAS Stat 504, Lecture 7 1 Review of One-way Tables and SAS In-class exercises: Ex1, Ex2, and Ex3 from http://v8doc.sas.com/sashtml/proc/z0146708.htm To calculate p-value for a X 2 or G 2 in SAS: http://v8doc.sas.com/sashtml/lgref/z0245929.htmz0845409

More information

Missing covariate data in matched case-control studies: Do the usual paradigms apply?

Missing covariate data in matched case-control studies: Do the usual paradigms apply? Missing covariate data in matched case-control studies: Do the usual paradigms apply? Bryan Langholz USC Department of Preventive Medicine Joint work with Mulugeta Gebregziabher Larry Goldstein Mark Huberman

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 Work all problems. 60 points are needed to pass at the Masters Level and 75 to pass at the

More information

Appendix: Computer Programs for Logistic Regression

Appendix: Computer Programs for Logistic Regression Appendix: Computer Programs for Logistic Regression In this appendix, we provide examples of computer programs to carry out unconditional logistic regression, conditional logistic regression, polytomous

More information

Statistical Methods in Clinical Trials Categorical Data

Statistical Methods in Clinical Trials Categorical Data Statistical Methods in Clinical Trials Categorical Data Types of Data quantitative Continuous Blood pressure Time to event Categorical sex qualitative Discrete No of relapses Ordered Categorical Pain level

More information