Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Size: px
Start display at page:

Download "Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto."

Transcription

1 Introduction to Dalla Lana School of Public Health University of Toronto September 18,

2 : a review 38-2

3 Evidence Ideal: to advance the knowledge-base of clinical medicine, clinical research produces evidence. The totality of the available evidence eventually leads to knowledge through induction (including statistical inference). In particular, based on induction, nothing is ever concluded, let alone based on a single study, which can only contribute to the totality of evidence (Miettinen 2011, p. 36). The knowledge in turn serves as an input to decision making (Miettinen & Karp 2012, p. 131). Reality: sometimes decisions have to be made based on limited statistical evidence. 38-3

4 38-4 Object of inference Evidence can be reported for instance in the form of a p-value or a confidence interval. But let s not get ahead of ourselves. Evidence about what? Evidence about the object of inference. Terms that we will be using interchangeably with object are parameter or estimand, in the meaning (Miettinen 2011, p. 60) Parameter - A constant (of unknown magnitude) in a (statistical) model. Such parameters are not only unobserved, but unobservable, since statistical are themselves theoretical constructs. However, based on observed data, we can arrive at an estimate of the value/magnitude of the unknown parameter.

5 38-5 Rather than making conclusions, by using inferential, we aim to quantify the uncertainty about the objects of inference. By, we don t mean here the science of, but the plural of statistic, that is, (Miettinen 2011, p. 63) Statistic - A number derived from a sample. Statistics can be descriptive or inferential, the latter being (Miettinen 2011, p. 57) statistic - A statistic derived under a statistical model. Examples: the estimator ˆλ = D y for rate parameter λ, or the estimator ˆπ = D n for risk parameter π. These are inferential since they are derived as maximum likelihood estimators under Poisson and binomial, respectively.

6 Measures of imprecision We already noted that a point estimator such as ˆλ is not in itself sufficient for reporting statistical evidence; we need a measure of imprecision for the point estimate. Unlike a point estimate, a measure of imprecision does not directly characterize the object of study; instead, it characterizes the study itself, for instance, the size of the study (Miettinen 2011, p. 43). A measure of imprecision that we will be first concerned with is the standard error. To understand what a standard error is, we have to begin with the sampling distribution of a statistic. 38-6

7 Sampling distribution of the empirical risk In the absence of any other individual level information, the statistical model for the number of incident events in a population of size n within a specified time interval is D Binomial(n, π), where π is estimated by ˆπ = D n. Imagine now that a study involving the recruitment of n individuals and the observation of their event outcomes is carried out many times and ˆπ is calculated from each study. The distribution of the resulting estimates is the sampling distribution of ˆπ. Let π = 0.09 and see how this looks with different n. 38-7

8 n = 25 Frequency π^

9 n = 100 Frequency π^

10 n = 1000 Frequency π^

11 A certain shape appears Frequency π^

12 Standard error With large n, the sampling distribution for ˆπ is approximately ( ) π(1 π) ˆπ N π,, n that is, the normal distribution with the expectation π and π(1 π) standard deviation n. The standard deviation of a sampling distribution of a statistic is known as standard error. We will henceforth use this term also for an estimate of the standard error, in the present example given by ˆπ(1 ˆπ) S =. n 38-12

13 Back to sampling distributions Note that the normal distribution appeared even though the original observations were very non-normal (the binomial distribution arises as the sum of independent Bernoulli variables). However, by the central limit theorem the mean of independent outcomes is normally distributed irrespective of their original distribution. It applies to all maximum likelihood estimators ˆθ that with large n the sampling distribution is approximately ˆθ N(θ, S 2 ). The standard error S depends on various factors, in particular the sample size n

14 Estimating the log-rate ratio Even though the normal sampling distribution always appears with large enough n, for non-negative, such as empirical rate ratios, the normal approximation is better if we take an appropriate transformation of the parameter and the corresponding estimator. For non-negative this transformation is usually the logarithm. If the aggregate person-years by exposure status are known, we ( may) estimate the log rate ratio log θ = log λ1 λ 0 by ( ) D1 /y 1 log ˆθ = log. D 0 /y

15 example The data below is from a randomized trial studying the role of male circumcision for HIV prevention in Uganda. The events are incident HIV infections during the first 24 months of follow-up since the randomization and the intervention is male circumcision. Intervention group Control group Participants Incident events Person-years

16 Standard error of the log-rate ratio Now we have that log ˆθ N(log θ, S 2 ), where S = 1 D D 0. This depends only on the numbers of exposed and unexposed cases; the larger these are, the smaller the standard error. The standardized statistic is approximately distributed as Z = log ˆθ log θ 1 D1 + 1 D0 N(0, 1). This could tell us whether the observed value of log ˆθ is somehow unusual compared to the true value

17 Test statistic and p-value Under the null hypothesis H 0 : log θ = 0, the z-value is Z = log ˆθ D1 D0 We have constructed a test statistic. It will take large positive or negative values when the null hypothesis is not true. This unusualness is quantified by the p-value p = P( Z > z H 0 ), where z is the observed value of the Z-test statistic. Alternatively, the evidence may be reported in the form of a confidence interval

18 38-18 Confidence interval With the log-rate ( ) ratio parameter and the statistic log ˆθ = log D1 /y 1 D 0 /y 0 we have that ( ) P log ˆθ 1.96 S log θ log ˆθ S = 0.95, where S = 1 D D 0. This gives a 95% confidence interval for the log-rate ratio. To get a CI for the rate ratio itself, we note that ( P log ˆθ 1.96 S log θ log ˆθ ) S ) = P (e log ˆθ 1.96 S e log θ log ˆθ+1.96 S e = P (ˆθ e 1.96 S θ ˆθ e 1.96 S) = Thus, [ˆθ e 1.96 S, ˆθ ] e 1.96 S is a 95% CI for the rate ratio parameter.

19 Interpretation Clayton & Hills (1993, p. 91): 38-19

20 38-20

21 38-21 Decision making If we are only reporting evidence, our role is not to reject anything. Rejecting or not rejecting hypotheses is left for the scientific community, based on the totality of the available evidence. However, if we indeed have to make a decision (note again the difference between a decision and a conclusion), we have to weigh the costs of making the wrong decision. There are two different kinds of errors in hypothesis based decision making. We can either reject the null, when it is in fact correct (type I error, or false positive), or fail to reject the null when it in fact should be rejected (type II error, or false negative). There is no free lunch in ; there is a tradeoff between minimizing the false positive probability and minimizing the false negative probability.

22 A hypothesis problem example of a decision problem: should one treat future patients with a new procedure or drug based on limited statistical evidence on its efficacy? Consider the following example: the drug AZT was the first drug that seemed effective in delaying the onset of AIDS of HIV-positive patients. In a randomized study 435 HIV-positive subjects were assigned to take 500 milligrams of AZT and another 435 HIV-positive subjects were assigned to take a placebo. If the approval of the drug is made based on the evidence produced by this trial, we are involved in a decision problem, rather than just reporting statistical evidence. Let the risk of AIDS onset be π 1 with the treatment and π 0 without treatment. The null hypothesis could be formulated in terms of the risk ratio as H 0 : θ = π 1 π 0 =

23 Significance level For the purpose of making a decision, we choose a fixed significance level α, and reject H 0 if p < α. On the other hand, p H 0 U(0, 1). This follows because 1 p = P( Z < z), where P( Z < z) is the cumulative distribution function of Z, and the uniform distribution is preserved in a linear transformation. Thus, P(p < α H 0 ) = α, and the significance level is the probability of type I error. Note also that P(p < α H 1 ) > α (why?)

24 Power of a test Choosing a small α reduces the probability of a false positive result. However, unfortunately it also increases the probability of a false negative result, denoted as β = P(p α H 1 ). This means that avoiding a false positive result makes it more difficult to reject the null when it in fact should be rejected. In other words, a small type I error probability reduces the power of the test. Power of a test is the probability 1 β, the probability to reject the null when it should be rejected. As we will see later, in addition to the significance level, the power of a test depends on the sample size and effect size

25 Decision table The four possible decisions and the corresponding probabilities may be expressed in a 2 2-table as or Decision H 0 : θ = 1 H 1 : θ 1 p < α P(p < α H 0 ) P(p < α H 1 ) p α P(p α H 0 ) P(p α H 1 ) 1 1 Decision H 0 : θ = 1 H 1 : θ 1 p < α α 1 β p α 1 α β

26 Choosing the significance level Knowing that there is a tradeoff between the false positive and false negative probabilities, what would the appropriate α be? swer: it depends. In particular, it depends on the respective costs and benefits associated with the decisions. For instance, in a case of a false positive, the patient might be treated with a procedure or a drug that is not effective, but might be costly or have side-effects. In a case of a false negative, the patient is not treated, while in fact the treatment might have helped; possibly a fatal decision. Weighting such harms and benefits is highly subjective, and outside the realm of

27 38-27

28 2 2-table for conditional probabilities Recall the AZT randomized trial example: AIDS = 1 AIDS = 0 AZT = AZT = In terms of conditional probabilities, the 2 2-table may be presented as Y = 1 Y = 0 Z = 1 P(Y = 1 Z = 1) P(Y = 0 Z = 1) 1 Z = 0 P(Y = 1 Z = 0) P(Y = 0 Z = 0) 1 Or in terms of risk parameters as Y = 1 Y = 0 Z = 1 π 1 1 π 1 1 Z = 0 π 0 1 π

29 Reparametrization The problem could equally well be parametrized in terms of odds: Z = 1 Z = 0 Or in terms of log-odds: Y = 1 Y = 0 π 1 1 π 1 π 0 1 π 1 π 1 1 π 0 1 π 0 π 0 Y = 1 Y = 0 Z = 1 log π 1 1 π 1 log 1 π 1 π 1 Z = 0 log π 0 1 π 0 log 1 π 0 π 0 Neither are of direct interest to us, since the objective is to compare the risk of AIDS onset between the two groups

30 A more relevant reparametrization Redefine the four log-odds as: Y = 1 Y = 0 Z = 1 α + β (α + β) Z = 0 α α (These α and β are unrelated to the previous ones.) This corresponds to a regression equation log π Z 1 π Z = α + βz, where log Here π 1 1 π 1 π 0 1 π 0 π Z 1 π Z = eα+β e α is known as the logit transformation. = eα e β e α = eβ log ( π1 ) 1 π 1 π 0 = β. 1 π The regression coefficient β is a log-odds ratio.

31 Deterministic and stochastic model components The regression equation specifies the deterministic part of the model. To complete the model specification, we need to specify the stochastic component of the model, that is, a statistical distribution for the outcomes D 1 and D 0. The appropriate distribution is D Z Binomial(n Z, π Z ). Here the risk π Z is given by the regression equation as π Z = eα+βz 1 + e α+βz = e (α+βz). This inverse transformation is the so-called expit function: π Z = logit 1 (α + βz) = expit(α + βz)

32 Regression model Clayton & Hills (1993, p. 217): A common theme in all these situations is change from the original parameters to new parameters which are more relevant to the comparisons of interest. This change can be described by the equations which express the old parameters in terms of the new parameters. These equations are referred to as regression equations, and the statistical model is called a regression model. Now the old parameters are the two log-odds log π Z 1 π Z

33 Estimate the model parameters The parameters α and β can be estimated using maximum likelihood. Read in the data: d <- c(17,38) n <- c(435,435) z <- c(1,0) Fit the model: model <- glm(cbind(d,n-d) ~ z, family=binomial(link="logit")) Check the results: summary(model) 38-33

34 Logistic model results Call: glm(formula = cbind(d, n - d) ~ z, family = binomial(link = "logit")) Deviance Residuals: [1] 0 0 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) < 2e-16 *** z ** --- Signif. codes: 0 *** ** 0.01 * (Dispersion parameter for binomial family taken to be 1) Null deviance: e+00 on 1 degrees of freedom Residual deviance: e-14 on 0 degrees of freedom AIC: Number of Fisher Scoring iterations:

35 38-35 Log-linear model for risk Is there some particular reason why we have to use the logit link when modeling risk? Why could we not just parametrize the log-risk as log(π Z ) = α + βz? We can; in this case the regression coefficient β would be interpreted as a log-risk ratio: π 1 = eα+β π 0 e α = eα e β ( ) e α = π1 eβ log = β. However, note that there is nothing here bounding the risk to values below one, which might cause numerical problems. The log-linear model does bound the risk to non-negative values, so as long as the risk is small, log-linear and logistic regression give similar results. π 0

36 Log-linear model results Call: glm(formula = cbind(d, n - d) ~ z, family = binomial(link = "log")) Deviance Residuals: [1] 0 0 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) < 2e-16 *** z ** --- Signif. codes: 0 *** ** 0.01 * (Dispersion parameter for binomial family taken to be 1) Null deviance: e+00 on 1 degrees of freedom Residual deviance: e-14 on 0 degrees of freedom AIC: Number of Fisher Scoring iterations:

37 Disclaimer about The objective of a study is not to study something. In the same vein, modeling, including model selection, and checking, or, the correctness of a model, should not be an end in itself. In particular in clinical trials, we are only interested in a specific parameter in the model. By definition, there is no such thing as a correct model; a model is a simplification of reality, not the reality itself. Hence, a model need not capture all features of reality; if it could, it would no longer be a model. A good model is a model that is useful by serving some purpose. Some may be better than others, depending on the chosen criterion for better

38 References Clayton, D. and Hills, M. (2011). Models in Epidemiology. Oxford University Press. Miettinen, O. S. (2011). Epidemiological Research: Terms and Concepts. Springer, Dordrecht. Miettinen, O. S. and Karp, I. (2012). Epidemiological Research: Introduction. Springer, Dordrecht

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Introduction to the Analysis of Tabular Data

Introduction to the Analysis of Tabular Data Introduction to the Analysis of Tabular Data Anthropological Sciences 192/292 Data Analysis in the Anthropological Sciences James Holland Jones & Ian G. Robertson March 15, 2006 1 Tabular Data Is there

More information

Logistic Regression - problem 6.14

Logistic Regression - problem 6.14 Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example

More information

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression

More information

HYPOTHESIS TESTING. Hypothesis Testing

HYPOTHESIS TESTING. Hypothesis Testing MBA 605 Business Analytics Don Conant, PhD. HYPOTHESIS TESTING Hypothesis testing involves making inferences about the nature of the population on the basis of observations of a sample drawn from the population.

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

12 Modelling Binomial Response Data

12 Modelling Binomial Response Data c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

R Hints for Chapter 10

R Hints for Chapter 10 R Hints for Chapter 10 The multiple logistic regression model assumes that the success probability p for a binomial random variable depends on independent variables or design variables x 1, x 2,, x k.

More information

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

STAT 526 Spring Midterm 1. Wednesday February 2, 2011 STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points

More information

Modeling Overdispersion

Modeling Overdispersion James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 Introduction In this lecture we discuss the problem of overdispersion in

More information

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102 Background Regression so far... Lecture 21 - Sta102 / BME102 Colin Rundel November 18, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical

More information

Logistic Regression 21/05

Logistic Regression 21/05 Logistic Regression 21/05 Recall that we are trying to solve a classification problem in which features x i can be continuous or discrete (coded as 0/1) and the response y is discrete (0/1). Logistic regression

More information

Logistic Regressions. Stat 430

Logistic Regressions. Stat 430 Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to

More information

Checking the Poisson assumption in the Poisson generalized linear model

Checking the Poisson assumption in the Poisson generalized linear model Checking the Poisson assumption in the Poisson generalized linear model The Poisson regression model is a generalized linear model (glm) satisfying the following assumptions: The responses y i are independent

More information

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH The First Step: SAMPLE SIZE DETERMINATION THE ULTIMATE GOAL The most important, ultimate step of any of clinical research is to do draw inferences;

More information

Poisson regression: Further topics

Poisson regression: Further topics Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to

More information

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation Background Regression so far... Lecture 23 - Sta 111 Colin Rundel June 17, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical or categorical

More information

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between 7.2 One-Sample Correlation ( = a) Introduction Correlation analysis measures the strength and direction of association between variables. In this chapter we will test whether the population correlation

More information

STA102 Class Notes Chapter Logistic Regression

STA102 Class Notes Chapter Logistic Regression STA0 Class Notes Chapter 0 0. Logistic Regression We continue to study the relationship between a response variable and one or more eplanatory variables. For SLR and MLR (Chapters 8 and 9), our response

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

Explanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine.

Explanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine. Horseshoe crab example: There are 173 female crabs for which we wish to model the presence or absence of male satellites dependant upon characteristics of the female horseshoe crabs. 1 satellite present

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 3: Bivariate association : Categorical variables Proportion in one group One group is measured one time: z test Use the z distribution as an approximation to the binomial

More information

Chapter Six: Two Independent Samples Methods 1/51

Chapter Six: Two Independent Samples Methods 1/51 Chapter Six: Two Independent Samples Methods 1/51 6.3 Methods Related To Differences Between Proportions 2/51 Test For A Difference Between Proportions:Introduction Suppose a sampling distribution were

More information

Exercise 5.4 Solution

Exercise 5.4 Solution Exercise 5.4 Solution Niels Richard Hansen University of Copenhagen May 7, 2010 1 5.4(a) > leukemia

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Today. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News:

Today. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News: Today HW 1: due February 4, 11.59 pm. Aspects of Design CD Chapter 2 Continue with Chapter 2 of ELM In the News: STA 2201: Applied Statistics II January 14, 2015 1/35 Recap: data on proportions data: y

More information

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials Lecture : Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 27 Binomial Model n independent trials (e.g., coin tosses) p = probability of success on each trial (e.g., p =! =

More information

Generalized Linear Models. stat 557 Heike Hofmann

Generalized Linear Models. stat 557 Heike Hofmann Generalized Linear Models stat 557 Heike Hofmann Outline Intro to GLM Exponential Family Likelihood Equations GLM for Binomial Response Generalized Linear Models Three components: random, systematic, link

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Lecture 10: Introduction to Logistic Regression

Lecture 10: Introduction to Logistic Regression Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression Regression for a response variable that follows a binomial distribution Recall the binomial

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

STAT 526 Spring Final Exam. Thursday May 5, 2011

STAT 526 Spring Final Exam. Thursday May 5, 2011 STAT 526 Spring 2011 Final Exam Thursday May 5, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression 22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

Poisson Regression. The Training Data

Poisson Regression. The Training Data The Training Data Poisson Regression Office workers at a large insurance company are randomly assigned to one of 3 computer use training programmes, and their number of calls to IT support during the following

More information

Frequency table: Var2 (Spreadsheet1) Count Cumulative Percent Cumulative From To. Percent <x<=

Frequency table: Var2 (Spreadsheet1) Count Cumulative Percent Cumulative From To. Percent <x<= A frequency distribution is a kind of probability distribution. It gives the frequency or relative frequency at which given values have been observed among the data collected. For example, for age, Frequency

More information

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Prof. J. Taylor You may use your 4 single-sided pages of notes This exam is 7 pages long. There are 4 questions, first 3 worth 10

More information

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Statistical modelling: Theory and practice Gilles Guillot gigu@dtu.dk November 4, 2013 Gilles Guillot (gigu@dtu.dk)

More information

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model Regression: Part II Linear Regression y~n X, 2 X Y Data Model β, σ 2 Process Model Β 0,V β s 1,s 2 Parameter Model Assumptions of Linear Model Homoskedasticity No error in X variables Error in Y variables

More information

Many natural processes can be fit to a Poisson distribution

Many natural processes can be fit to a Poisson distribution BE.104 Spring Biostatistics: Poisson Analyses and Power J. L. Sherley Outline 1) Poisson analyses 2) Power What is a Poisson process? Rare events Values are observational (yes or no) Random distributed

More information

Interactions in Logistic Regression

Interactions in Logistic Regression Interactions in Logistic Regression > # UCBAdmissions is a 3-D table: Gender by Dept by Admit > # Same data in another format: > # One col for Yes counts, another for No counts. > Berkeley = read.table("http://www.utstat.toronto.edu/~brunner/312f12/

More information

Week 7 Multiple factors. Ch , Some miscellaneous parts

Week 7 Multiple factors. Ch , Some miscellaneous parts Week 7 Multiple factors Ch. 18-19, Some miscellaneous parts Multiple Factors Most experiments will involve multiple factors, some of which will be nuisance variables Dealing with these factors requires

More information

Matched Pair Data. Stat 557 Heike Hofmann

Matched Pair Data. Stat 557 Heike Hofmann Matched Pair Data Stat 557 Heike Hofmann Outline Marginal Homogeneity - review Binary Response with covariates Ordinal response Symmetric Models Subject-specific vs Marginal Model conditional logistic

More information

Hypothesis testing. Data to decisions

Hypothesis testing. Data to decisions Hypothesis testing Data to decisions The idea Null hypothesis: H 0 : the DGP/population has property P Under the null, a sample statistic has a known distribution If, under that that distribution, the

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

BMI 541/699 Lecture 22

BMI 541/699 Lecture 22 BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based

More information

On the Inference of the Logistic Regression Model

On the Inference of the Logistic Regression Model On the Inference of the Logistic Regression Model 1. Model ln =(; ), i.e. = representing false. The linear form of (;) is entertained, i.e. ((;)) ((;)), where ==1 ;, with 1 representing true, 0 ;= 1+ +

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information

Sample Size Calculations for Group Randomized Trials with Unequal Sample Sizes through Monte Carlo Simulations

Sample Size Calculations for Group Randomized Trials with Unequal Sample Sizes through Monte Carlo Simulations Sample Size Calculations for Group Randomized Trials with Unequal Sample Sizes through Monte Carlo Simulations Ben Brewer Duke University March 10, 2017 Introduction Group randomized trials (GRTs) are

More information

Faculty of Science FINAL EXAMINATION Mathematics MATH 523 Generalized Linear Models

Faculty of Science FINAL EXAMINATION Mathematics MATH 523 Generalized Linear Models Faculty of Science FINAL EXAMINATION Mathematics MATH 523 Generalized Linear Models Examiner: Professor K.J. Worsley Associate Examiner: Professor R. Steele Date: Thursday, April 17, 2008 Time: 14:00-17:00

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests Overall Overview INFOWO Statistics lecture S3: Hypothesis testing Peter de Waal Department of Information and Computing Sciences Faculty of Science, Universiteit Utrecht 1 Descriptive statistics 2 Scores

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

PAPER 218 STATISTICAL LEARNING IN PRACTICE

PAPER 218 STATISTICAL LEARNING IN PRACTICE MATHEMATICAL TRIPOS Part III Thursday, 7 June, 2018 9:00 am to 12:00 pm PAPER 218 STATISTICAL LEARNING IN PRACTICE Attempt no more than FOUR questions. There are SIX questions in total. The questions carry

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

1. Logistic Regression, One Predictor 2. Inference: Estimating the Parameters 3. Multiple Logistic Regression 4. AIC and BIC in Logistic Regression

1. Logistic Regression, One Predictor 2. Inference: Estimating the Parameters 3. Multiple Logistic Regression 4. AIC and BIC in Logistic Regression Logistic Regression 1. Logistic Regression, One Predictor 2. Inference: Estimating the Parameters 3. Multiple Logistic Regression 4. AIC and BIC in Logistic Regression 5. Target Marketing: Tabloid Data

More information

Practical Considerations Surrounding Normality

Practical Considerations Surrounding Normality Practical Considerations Surrounding Normality Prof. Kevin E. Thorpe Dalla Lana School of Public Health University of Toronto KE Thorpe (U of T) Normality 1 / 16 Objectives Objectives 1. Understand the

More information

Logistic & Tobit Regression

Logistic & Tobit Regression Logistic & Tobit Regression Different Types of Regression Binary Regression (D) Logistic transformation + e P( y x) = 1 + e! " x! + " x " P( y x) % ln$ ' = ( + ) x # 1! P( y x) & logit of P(y x){ P(y

More information

Data-analysis and Retrieval Ordinal Classification

Data-analysis and Retrieval Ordinal Classification Data-analysis and Retrieval Ordinal Classification Ad Feelders Universiteit Utrecht Data-analysis and Retrieval 1 / 30 Strongly disagree Ordinal Classification 1 2 3 4 5 0% (0) 10.5% (2) 21.1% (4) 42.1%

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Neutral Bayesian reference models for incidence rates of (rare) clinical events

Neutral Bayesian reference models for incidence rates of (rare) clinical events Neutral Bayesian reference models for incidence rates of (rare) clinical events Jouni Kerman Statistical Methodology, Novartis Pharma AG, Basel BAYES2012, May 10, Aachen Outline Motivation why reference

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours Instructions: STATS216v Introduction to Statistical Learning Stanford University, Summer 2017 Remember the university honor code. Midterm Exam (Solutions) Duration: 1 hours Write your name and SUNet ID

More information

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,

More information

MODULE 6 LOGISTIC REGRESSION. Module Objectives:

MODULE 6 LOGISTIC REGRESSION. Module Objectives: MODULE 6 LOGISTIC REGRESSION Module Objectives: 1. 147 6.1. LOGIT TRANSFORMATION MODULE 6. LOGISTIC REGRESSION Logistic regression models are used when a researcher is investigating the relationship between

More information

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 7 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, Colonic

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

PubHlth Intermediate Biostatistics Spring 2015 Exam 2 (Units 3, 4 & 5) Study Guide

PubHlth Intermediate Biostatistics Spring 2015 Exam 2 (Units 3, 4 & 5) Study Guide PubHlth 640 - Intermediate Biostatistics Spring 2015 Exam 2 (Units 3, 4 & 5) Study Guide Unit 3 (Discrete Distributions) Take care to know how to do the following! Learning Objective See: 1. Write down

More information

CHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials

CHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials CHL 55 H Crossover Trials The Two-sequence, Two-Treatment, Two-period Crossover Trial Definition A trial in which patients are randomly allocated to one of two sequences of treatments (either 1 then, or

More information

Statistical Analysis of List Experiments

Statistical Analysis of List Experiments Statistical Analysis of List Experiments Graeme Blair Kosuke Imai Princeton University December 17, 2010 Blair and Imai (Princeton) List Experiments Political Methodology Seminar 1 / 32 Motivation Surveys

More information

INTERVAL ESTIMATION AND HYPOTHESES TESTING

INTERVAL ESTIMATION AND HYPOTHESES TESTING INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Econometrics. 4) Statistical inference

Econometrics. 4) Statistical inference 30C00200 Econometrics 4) Statistical inference Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Confidence intervals of parameter estimates Student s t-distribution

More information

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples. Regression models Generalized linear models in R Dr Peter K Dunn http://www.usq.edu.au Department of Mathematics and Computing University of Southern Queensland ASC, July 00 The usual linear regression

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Introduction Fitting logistic regression models Results. Logistic Regression. Patrick Breheny. March 29

Introduction Fitting logistic regression models Results. Logistic Regression. Patrick Breheny. March 29 Logistic Regression March 29 Introduction Binary outcomes are quite common in medicine and public health: alive/dead, diseased/healthy, infected/not infected, case/control Assuming that these outcomes

More information

Sample solutions. Stat 8051 Homework 8

Sample solutions. Stat 8051 Homework 8 Sample solutions Stat 8051 Homework 8 Problem 1: Faraway Exercise 3.1 A plot of the time series reveals kind of a fluctuating pattern: Trying to fit poisson regression models yields a quadratic model if

More information

Logistic regression model for survival time analysis using time-varying coefficients

Logistic regression model for survival time analysis using time-varying coefficients Logistic regression model for survival time analysis using time-varying coefficients Accepted in American Journal of Mathematical and Management Sciences, 2016 Kenichi SATOH ksatoh@hiroshima-u.ac.jp Research

More information

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information