Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.
|
|
- Angel Cynthia Terry
- 5 years ago
- Views:
Transcription
1 Introduction to Dalla Lana School of Public Health University of Toronto September 18,
2 : a review 38-2
3 Evidence Ideal: to advance the knowledge-base of clinical medicine, clinical research produces evidence. The totality of the available evidence eventually leads to knowledge through induction (including statistical inference). In particular, based on induction, nothing is ever concluded, let alone based on a single study, which can only contribute to the totality of evidence (Miettinen 2011, p. 36). The knowledge in turn serves as an input to decision making (Miettinen & Karp 2012, p. 131). Reality: sometimes decisions have to be made based on limited statistical evidence. 38-3
4 38-4 Object of inference Evidence can be reported for instance in the form of a p-value or a confidence interval. But let s not get ahead of ourselves. Evidence about what? Evidence about the object of inference. Terms that we will be using interchangeably with object are parameter or estimand, in the meaning (Miettinen 2011, p. 60) Parameter - A constant (of unknown magnitude) in a (statistical) model. Such parameters are not only unobserved, but unobservable, since statistical are themselves theoretical constructs. However, based on observed data, we can arrive at an estimate of the value/magnitude of the unknown parameter.
5 38-5 Rather than making conclusions, by using inferential, we aim to quantify the uncertainty about the objects of inference. By, we don t mean here the science of, but the plural of statistic, that is, (Miettinen 2011, p. 63) Statistic - A number derived from a sample. Statistics can be descriptive or inferential, the latter being (Miettinen 2011, p. 57) statistic - A statistic derived under a statistical model. Examples: the estimator ˆλ = D y for rate parameter λ, or the estimator ˆπ = D n for risk parameter π. These are inferential since they are derived as maximum likelihood estimators under Poisson and binomial, respectively.
6 Measures of imprecision We already noted that a point estimator such as ˆλ is not in itself sufficient for reporting statistical evidence; we need a measure of imprecision for the point estimate. Unlike a point estimate, a measure of imprecision does not directly characterize the object of study; instead, it characterizes the study itself, for instance, the size of the study (Miettinen 2011, p. 43). A measure of imprecision that we will be first concerned with is the standard error. To understand what a standard error is, we have to begin with the sampling distribution of a statistic. 38-6
7 Sampling distribution of the empirical risk In the absence of any other individual level information, the statistical model for the number of incident events in a population of size n within a specified time interval is D Binomial(n, π), where π is estimated by ˆπ = D n. Imagine now that a study involving the recruitment of n individuals and the observation of their event outcomes is carried out many times and ˆπ is calculated from each study. The distribution of the resulting estimates is the sampling distribution of ˆπ. Let π = 0.09 and see how this looks with different n. 38-7
8 n = 25 Frequency π^
9 n = 100 Frequency π^
10 n = 1000 Frequency π^
11 A certain shape appears Frequency π^
12 Standard error With large n, the sampling distribution for ˆπ is approximately ( ) π(1 π) ˆπ N π,, n that is, the normal distribution with the expectation π and π(1 π) standard deviation n. The standard deviation of a sampling distribution of a statistic is known as standard error. We will henceforth use this term also for an estimate of the standard error, in the present example given by ˆπ(1 ˆπ) S =. n 38-12
13 Back to sampling distributions Note that the normal distribution appeared even though the original observations were very non-normal (the binomial distribution arises as the sum of independent Bernoulli variables). However, by the central limit theorem the mean of independent outcomes is normally distributed irrespective of their original distribution. It applies to all maximum likelihood estimators ˆθ that with large n the sampling distribution is approximately ˆθ N(θ, S 2 ). The standard error S depends on various factors, in particular the sample size n
14 Estimating the log-rate ratio Even though the normal sampling distribution always appears with large enough n, for non-negative, such as empirical rate ratios, the normal approximation is better if we take an appropriate transformation of the parameter and the corresponding estimator. For non-negative this transformation is usually the logarithm. If the aggregate person-years by exposure status are known, we ( may) estimate the log rate ratio log θ = log λ1 λ 0 by ( ) D1 /y 1 log ˆθ = log. D 0 /y
15 example The data below is from a randomized trial studying the role of male circumcision for HIV prevention in Uganda. The events are incident HIV infections during the first 24 months of follow-up since the randomization and the intervention is male circumcision. Intervention group Control group Participants Incident events Person-years
16 Standard error of the log-rate ratio Now we have that log ˆθ N(log θ, S 2 ), where S = 1 D D 0. This depends only on the numbers of exposed and unexposed cases; the larger these are, the smaller the standard error. The standardized statistic is approximately distributed as Z = log ˆθ log θ 1 D1 + 1 D0 N(0, 1). This could tell us whether the observed value of log ˆθ is somehow unusual compared to the true value
17 Test statistic and p-value Under the null hypothesis H 0 : log θ = 0, the z-value is Z = log ˆθ D1 D0 We have constructed a test statistic. It will take large positive or negative values when the null hypothesis is not true. This unusualness is quantified by the p-value p = P( Z > z H 0 ), where z is the observed value of the Z-test statistic. Alternatively, the evidence may be reported in the form of a confidence interval
18 38-18 Confidence interval With the log-rate ( ) ratio parameter and the statistic log ˆθ = log D1 /y 1 D 0 /y 0 we have that ( ) P log ˆθ 1.96 S log θ log ˆθ S = 0.95, where S = 1 D D 0. This gives a 95% confidence interval for the log-rate ratio. To get a CI for the rate ratio itself, we note that ( P log ˆθ 1.96 S log θ log ˆθ ) S ) = P (e log ˆθ 1.96 S e log θ log ˆθ+1.96 S e = P (ˆθ e 1.96 S θ ˆθ e 1.96 S) = Thus, [ˆθ e 1.96 S, ˆθ ] e 1.96 S is a 95% CI for the rate ratio parameter.
19 Interpretation Clayton & Hills (1993, p. 91): 38-19
20 38-20
21 38-21 Decision making If we are only reporting evidence, our role is not to reject anything. Rejecting or not rejecting hypotheses is left for the scientific community, based on the totality of the available evidence. However, if we indeed have to make a decision (note again the difference between a decision and a conclusion), we have to weigh the costs of making the wrong decision. There are two different kinds of errors in hypothesis based decision making. We can either reject the null, when it is in fact correct (type I error, or false positive), or fail to reject the null when it in fact should be rejected (type II error, or false negative). There is no free lunch in ; there is a tradeoff between minimizing the false positive probability and minimizing the false negative probability.
22 A hypothesis problem example of a decision problem: should one treat future patients with a new procedure or drug based on limited statistical evidence on its efficacy? Consider the following example: the drug AZT was the first drug that seemed effective in delaying the onset of AIDS of HIV-positive patients. In a randomized study 435 HIV-positive subjects were assigned to take 500 milligrams of AZT and another 435 HIV-positive subjects were assigned to take a placebo. If the approval of the drug is made based on the evidence produced by this trial, we are involved in a decision problem, rather than just reporting statistical evidence. Let the risk of AIDS onset be π 1 with the treatment and π 0 without treatment. The null hypothesis could be formulated in terms of the risk ratio as H 0 : θ = π 1 π 0 =
23 Significance level For the purpose of making a decision, we choose a fixed significance level α, and reject H 0 if p < α. On the other hand, p H 0 U(0, 1). This follows because 1 p = P( Z < z), where P( Z < z) is the cumulative distribution function of Z, and the uniform distribution is preserved in a linear transformation. Thus, P(p < α H 0 ) = α, and the significance level is the probability of type I error. Note also that P(p < α H 1 ) > α (why?)
24 Power of a test Choosing a small α reduces the probability of a false positive result. However, unfortunately it also increases the probability of a false negative result, denoted as β = P(p α H 1 ). This means that avoiding a false positive result makes it more difficult to reject the null when it in fact should be rejected. In other words, a small type I error probability reduces the power of the test. Power of a test is the probability 1 β, the probability to reject the null when it should be rejected. As we will see later, in addition to the significance level, the power of a test depends on the sample size and effect size
25 Decision table The four possible decisions and the corresponding probabilities may be expressed in a 2 2-table as or Decision H 0 : θ = 1 H 1 : θ 1 p < α P(p < α H 0 ) P(p < α H 1 ) p α P(p α H 0 ) P(p α H 1 ) 1 1 Decision H 0 : θ = 1 H 1 : θ 1 p < α α 1 β p α 1 α β
26 Choosing the significance level Knowing that there is a tradeoff between the false positive and false negative probabilities, what would the appropriate α be? swer: it depends. In particular, it depends on the respective costs and benefits associated with the decisions. For instance, in a case of a false positive, the patient might be treated with a procedure or a drug that is not effective, but might be costly or have side-effects. In a case of a false negative, the patient is not treated, while in fact the treatment might have helped; possibly a fatal decision. Weighting such harms and benefits is highly subjective, and outside the realm of
27 38-27
28 2 2-table for conditional probabilities Recall the AZT randomized trial example: AIDS = 1 AIDS = 0 AZT = AZT = In terms of conditional probabilities, the 2 2-table may be presented as Y = 1 Y = 0 Z = 1 P(Y = 1 Z = 1) P(Y = 0 Z = 1) 1 Z = 0 P(Y = 1 Z = 0) P(Y = 0 Z = 0) 1 Or in terms of risk parameters as Y = 1 Y = 0 Z = 1 π 1 1 π 1 1 Z = 0 π 0 1 π
29 Reparametrization The problem could equally well be parametrized in terms of odds: Z = 1 Z = 0 Or in terms of log-odds: Y = 1 Y = 0 π 1 1 π 1 π 0 1 π 1 π 1 1 π 0 1 π 0 π 0 Y = 1 Y = 0 Z = 1 log π 1 1 π 1 log 1 π 1 π 1 Z = 0 log π 0 1 π 0 log 1 π 0 π 0 Neither are of direct interest to us, since the objective is to compare the risk of AIDS onset between the two groups
30 A more relevant reparametrization Redefine the four log-odds as: Y = 1 Y = 0 Z = 1 α + β (α + β) Z = 0 α α (These α and β are unrelated to the previous ones.) This corresponds to a regression equation log π Z 1 π Z = α + βz, where log Here π 1 1 π 1 π 0 1 π 0 π Z 1 π Z = eα+β e α is known as the logit transformation. = eα e β e α = eβ log ( π1 ) 1 π 1 π 0 = β. 1 π The regression coefficient β is a log-odds ratio.
31 Deterministic and stochastic model components The regression equation specifies the deterministic part of the model. To complete the model specification, we need to specify the stochastic component of the model, that is, a statistical distribution for the outcomes D 1 and D 0. The appropriate distribution is D Z Binomial(n Z, π Z ). Here the risk π Z is given by the regression equation as π Z = eα+βz 1 + e α+βz = e (α+βz). This inverse transformation is the so-called expit function: π Z = logit 1 (α + βz) = expit(α + βz)
32 Regression model Clayton & Hills (1993, p. 217): A common theme in all these situations is change from the original parameters to new parameters which are more relevant to the comparisons of interest. This change can be described by the equations which express the old parameters in terms of the new parameters. These equations are referred to as regression equations, and the statistical model is called a regression model. Now the old parameters are the two log-odds log π Z 1 π Z
33 Estimate the model parameters The parameters α and β can be estimated using maximum likelihood. Read in the data: d <- c(17,38) n <- c(435,435) z <- c(1,0) Fit the model: model <- glm(cbind(d,n-d) ~ z, family=binomial(link="logit")) Check the results: summary(model) 38-33
34 Logistic model results Call: glm(formula = cbind(d, n - d) ~ z, family = binomial(link = "logit")) Deviance Residuals: [1] 0 0 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) < 2e-16 *** z ** --- Signif. codes: 0 *** ** 0.01 * (Dispersion parameter for binomial family taken to be 1) Null deviance: e+00 on 1 degrees of freedom Residual deviance: e-14 on 0 degrees of freedom AIC: Number of Fisher Scoring iterations:
35 38-35 Log-linear model for risk Is there some particular reason why we have to use the logit link when modeling risk? Why could we not just parametrize the log-risk as log(π Z ) = α + βz? We can; in this case the regression coefficient β would be interpreted as a log-risk ratio: π 1 = eα+β π 0 e α = eα e β ( ) e α = π1 eβ log = β. However, note that there is nothing here bounding the risk to values below one, which might cause numerical problems. The log-linear model does bound the risk to non-negative values, so as long as the risk is small, log-linear and logistic regression give similar results. π 0
36 Log-linear model results Call: glm(formula = cbind(d, n - d) ~ z, family = binomial(link = "log")) Deviance Residuals: [1] 0 0 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) < 2e-16 *** z ** --- Signif. codes: 0 *** ** 0.01 * (Dispersion parameter for binomial family taken to be 1) Null deviance: e+00 on 1 degrees of freedom Residual deviance: e-14 on 0 degrees of freedom AIC: Number of Fisher Scoring iterations:
37 Disclaimer about The objective of a study is not to study something. In the same vein, modeling, including model selection, and checking, or, the correctness of a model, should not be an end in itself. In particular in clinical trials, we are only interested in a specific parameter in the model. By definition, there is no such thing as a correct model; a model is a simplification of reality, not the reality itself. Hence, a model need not capture all features of reality; if it could, it would no longer be a model. A good model is a model that is useful by serving some purpose. Some may be better than others, depending on the chosen criterion for better
38 References Clayton, D. and Hills, M. (2011). Models in Epidemiology. Oxford University Press. Miettinen, O. S. (2011). Epidemiological Research: Terms and Concepts. Springer, Dordrecht. Miettinen, O. S. and Karp, I. (2012). Epidemiological Research: Introduction. Springer, Dordrecht
Survival Analysis I (CHL5209H)
Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationIntroduction to the Analysis of Tabular Data
Introduction to the Analysis of Tabular Data Anthropological Sciences 192/292 Data Analysis in the Anthropological Sciences James Holland Jones & Ian G. Robertson March 15, 2006 1 Tabular Data Is there
More informationLogistic Regression - problem 6.14
Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationGeneralized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model
Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example
More information7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis
Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression
More informationHYPOTHESIS TESTING. Hypothesis Testing
MBA 605 Business Analytics Don Conant, PhD. HYPOTHESIS TESTING Hypothesis testing involves making inferences about the nature of the population on the basis of observations of a sample drawn from the population.
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More information12 Modelling Binomial Response Data
c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationR Hints for Chapter 10
R Hints for Chapter 10 The multiple logistic regression model assumes that the success probability p for a binomial random variable depends on independent variables or design variables x 1, x 2,, x k.
More informationLecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University
Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationA Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46
A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response
More informationSTAT 526 Spring Midterm 1. Wednesday February 2, 2011
STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points
More informationModeling Overdispersion
James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 Introduction In this lecture we discuss the problem of overdispersion in
More informationRegression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102
Background Regression so far... Lecture 21 - Sta102 / BME102 Colin Rundel November 18, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical
More informationLogistic Regression 21/05
Logistic Regression 21/05 Recall that we are trying to solve a classification problem in which features x i can be continuous or discrete (coded as 0/1) and the response y is discrete (0/1). Logistic regression
More informationLogistic Regressions. Stat 430
Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to
More informationChecking the Poisson assumption in the Poisson generalized linear model
Checking the Poisson assumption in the Poisson generalized linear model The Poisson regression model is a generalized linear model (glm) satisfying the following assumptions: The responses y i are independent
More informationPubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH
PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH The First Step: SAMPLE SIZE DETERMINATION THE ULTIMATE GOAL The most important, ultimate step of any of clinical research is to do draw inferences;
More informationPoisson regression: Further topics
Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to
More informationTruck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation
Background Regression so far... Lecture 23 - Sta 111 Colin Rundel June 17, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical or categorical
More information7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between
7.2 One-Sample Correlation ( = a) Introduction Correlation analysis measures the strength and direction of association between variables. In this chapter we will test whether the population correlation
More informationSTA102 Class Notes Chapter Logistic Regression
STA0 Class Notes Chapter 0 0. Logistic Regression We continue to study the relationship between a response variable and one or more eplanatory variables. For SLR and MLR (Chapters 8 and 9), our response
More informationSimple logistic regression
Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a
More informationExplanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine.
Horseshoe crab example: There are 173 female crabs for which we wish to model the presence or absence of male satellites dependant upon characteristics of the female horseshoe crabs. 1 satellite present
More informationStatistics in medicine
Statistics in medicine Lecture 3: Bivariate association : Categorical variables Proportion in one group One group is measured one time: z test Use the z distribution as an approximation to the binomial
More informationChapter Six: Two Independent Samples Methods 1/51
Chapter Six: Two Independent Samples Methods 1/51 6.3 Methods Related To Differences Between Proportions 2/51 Test For A Difference Between Proportions:Introduction Suppose a sampling distribution were
More informationExercise 5.4 Solution
Exercise 5.4 Solution Niels Richard Hansen University of Copenhagen May 7, 2010 1 5.4(a) > leukemia
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationToday. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News:
Today HW 1: due February 4, 11.59 pm. Aspects of Design CD Chapter 2 Continue with Chapter 2 of ELM In the News: STA 2201: Applied Statistics II January 14, 2015 1/35 Recap: data on proportions data: y
More informationBinomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials
Lecture : Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 27 Binomial Model n independent trials (e.g., coin tosses) p = probability of success on each trial (e.g., p =! =
More informationGeneralized Linear Models. stat 557 Heike Hofmann
Generalized Linear Models stat 557 Heike Hofmann Outline Intro to GLM Exponential Family Likelihood Equations GLM for Binomial Response Generalized Linear Models Three components: random, systematic, link
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationLecture 10: Introduction to Logistic Regression
Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression Regression for a response variable that follows a binomial distribution Recall the binomial
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationSTAT 526 Spring Final Exam. Thursday May 5, 2011
STAT 526 Spring 2011 Final Exam Thursday May 5, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will
More informationLogistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction
More information22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression
22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationSTA6938-Logistic Regression Model
Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of
More informationPsychology 282 Lecture #4 Outline Inferences in SLR
Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations
More informationPoisson Regression. The Training Data
The Training Data Poisson Regression Office workers at a large insurance company are randomly assigned to one of 3 computer use training programmes, and their number of calls to IT support during the following
More informationFrequency table: Var2 (Spreadsheet1) Count Cumulative Percent Cumulative From To. Percent <x<=
A frequency distribution is a kind of probability distribution. It gives the frequency or relative frequency at which given values have been observed among the data collected. For example, for age, Frequency
More informationStatistics 203 Introduction to Regression Models and ANOVA Practice Exam
Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Prof. J. Taylor You may use your 4 single-sided pages of notes This exam is 7 pages long. There are 4 questions, first 3 worth 10
More informationIntroduction to the Generalized Linear Model: Logistic regression and Poisson regression
Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Statistical modelling: Theory and practice Gilles Guillot gigu@dtu.dk November 4, 2013 Gilles Guillot (gigu@dtu.dk)
More informationLinear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model
Regression: Part II Linear Regression y~n X, 2 X Y Data Model β, σ 2 Process Model Β 0,V β s 1,s 2 Parameter Model Assumptions of Linear Model Homoskedasticity No error in X variables Error in Y variables
More informationMany natural processes can be fit to a Poisson distribution
BE.104 Spring Biostatistics: Poisson Analyses and Power J. L. Sherley Outline 1) Poisson analyses 2) Power What is a Poisson process? Rare events Values are observational (yes or no) Random distributed
More informationInteractions in Logistic Regression
Interactions in Logistic Regression > # UCBAdmissions is a 3-D table: Gender by Dept by Admit > # Same data in another format: > # One col for Yes counts, another for No counts. > Berkeley = read.table("http://www.utstat.toronto.edu/~brunner/312f12/
More informationWeek 7 Multiple factors. Ch , Some miscellaneous parts
Week 7 Multiple factors Ch. 18-19, Some miscellaneous parts Multiple Factors Most experiments will involve multiple factors, some of which will be nuisance variables Dealing with these factors requires
More informationMatched Pair Data. Stat 557 Heike Hofmann
Matched Pair Data Stat 557 Heike Hofmann Outline Marginal Homogeneity - review Binary Response with covariates Ordinal response Symmetric Models Subject-specific vs Marginal Model conditional logistic
More informationHypothesis testing. Data to decisions
Hypothesis testing Data to decisions The idea Null hypothesis: H 0 : the DGP/population has property P Under the null, a sample statistic has a known distribution If, under that that distribution, the
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationLectures 5 & 6: Hypothesis Testing
Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across
More informationA Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps
More informationBMI 541/699 Lecture 22
BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based
More informationOn the Inference of the Logistic Regression Model
On the Inference of the Logistic Regression Model 1. Model ln =(; ), i.e. = representing false. The linear form of (;) is entertained, i.e. ((;)) ((;)), where ==1 ;, with 1 representing true, 0 ;= 1+ +
More informationStatistical Methods III Statistics 212. Problem Set 2 - Answer Key
Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423
More informationSample Size Calculations for Group Randomized Trials with Unequal Sample Sizes through Monte Carlo Simulations
Sample Size Calculations for Group Randomized Trials with Unequal Sample Sizes through Monte Carlo Simulations Ben Brewer Duke University March 10, 2017 Introduction Group randomized trials (GRTs) are
More informationFaculty of Science FINAL EXAMINATION Mathematics MATH 523 Generalized Linear Models
Faculty of Science FINAL EXAMINATION Mathematics MATH 523 Generalized Linear Models Examiner: Professor K.J. Worsley Associate Examiner: Professor R. Steele Date: Thursday, April 17, 2008 Time: 14:00-17:00
More informationLog-linear Models for Contingency Tables
Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More information1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests
Overall Overview INFOWO Statistics lecture S3: Hypothesis testing Peter de Waal Department of Information and Computing Sciences Faculty of Science, Universiteit Utrecht 1 Descriptive statistics 2 Scores
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationPAPER 218 STATISTICAL LEARNING IN PRACTICE
MATHEMATICAL TRIPOS Part III Thursday, 7 June, 2018 9:00 am to 12:00 pm PAPER 218 STATISTICAL LEARNING IN PRACTICE Attempt no more than FOUR questions. There are SIX questions in total. The questions carry
More informationIntroduction to Statistical Analysis
Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive
More information1. Logistic Regression, One Predictor 2. Inference: Estimating the Parameters 3. Multiple Logistic Regression 4. AIC and BIC in Logistic Regression
Logistic Regression 1. Logistic Regression, One Predictor 2. Inference: Estimating the Parameters 3. Multiple Logistic Regression 4. AIC and BIC in Logistic Regression 5. Target Marketing: Tabloid Data
More informationPractical Considerations Surrounding Normality
Practical Considerations Surrounding Normality Prof. Kevin E. Thorpe Dalla Lana School of Public Health University of Toronto KE Thorpe (U of T) Normality 1 / 16 Objectives Objectives 1. Understand the
More informationLogistic & Tobit Regression
Logistic & Tobit Regression Different Types of Regression Binary Regression (D) Logistic transformation + e P( y x) = 1 + e! " x! + " x " P( y x) % ln$ ' = ( + ) x # 1! P( y x) & logit of P(y x){ P(y
More informationData-analysis and Retrieval Ordinal Classification
Data-analysis and Retrieval Ordinal Classification Ad Feelders Universiteit Utrecht Data-analysis and Retrieval 1 / 30 Strongly disagree Ordinal Classification 1 2 3 4 5 0% (0) 10.5% (2) 21.1% (4) 42.1%
More informationStatistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018
Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical
More informationSTAT 525 Fall Final exam. Tuesday December 14, 2010
STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will
More informationNeutral Bayesian reference models for incidence rates of (rare) clinical events
Neutral Bayesian reference models for incidence rates of (rare) clinical events Jouni Kerman Statistical Methodology, Novartis Pharma AG, Basel BAYES2012, May 10, Aachen Outline Motivation why reference
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More informationSTATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours
Instructions: STATS216v Introduction to Statistical Learning Stanford University, Summer 2017 Remember the university honor code. Midterm Exam (Solutions) Duration: 1 hours Write your name and SUNet ID
More informationChapter 6. Logistic Regression. 6.1 A linear model for the log odds
Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,
More informationMODULE 6 LOGISTIC REGRESSION. Module Objectives:
MODULE 6 LOGISTIC REGRESSION Module Objectives: 1. 147 6.1. LOGIT TRANSFORMATION MODULE 6. LOGISTIC REGRESSION Logistic regression models are used when a researcher is investigating the relationship between
More informationA Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 7 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, Colonic
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationPubHlth Intermediate Biostatistics Spring 2015 Exam 2 (Units 3, 4 & 5) Study Guide
PubHlth 640 - Intermediate Biostatistics Spring 2015 Exam 2 (Units 3, 4 & 5) Study Guide Unit 3 (Discrete Distributions) Take care to know how to do the following! Learning Objective See: 1. Write down
More informationCHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials
CHL 55 H Crossover Trials The Two-sequence, Two-Treatment, Two-period Crossover Trial Definition A trial in which patients are randomly allocated to one of two sequences of treatments (either 1 then, or
More informationStatistical Analysis of List Experiments
Statistical Analysis of List Experiments Graeme Blair Kosuke Imai Princeton University December 17, 2010 Blair and Imai (Princeton) List Experiments Political Methodology Seminar 1 / 32 Motivation Surveys
More informationINTERVAL ESTIMATION AND HYPOTHESES TESTING
INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More informationEconometrics. 4) Statistical inference
30C00200 Econometrics 4) Statistical inference Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Confidence intervals of parameter estimates Student s t-distribution
More informationRegression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.
Regression models Generalized linear models in R Dr Peter K Dunn http://www.usq.edu.au Department of Mathematics and Computing University of Southern Queensland ASC, July 00 The usual linear regression
More informationStatistics in medicine
Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu
More informationIntroduction Fitting logistic regression models Results. Logistic Regression. Patrick Breheny. March 29
Logistic Regression March 29 Introduction Binary outcomes are quite common in medicine and public health: alive/dead, diseased/healthy, infected/not infected, case/control Assuming that these outcomes
More informationSample solutions. Stat 8051 Homework 8
Sample solutions Stat 8051 Homework 8 Problem 1: Faraway Exercise 3.1 A plot of the time series reveals kind of a fluctuating pattern: Trying to fit poisson regression models yields a quadratic model if
More informationLogistic regression model for survival time analysis using time-varying coefficients
Logistic regression model for survival time analysis using time-varying coefficients Accepted in American Journal of Mathematical and Management Sciences, 2016 Kenichi SATOH ksatoh@hiroshima-u.ac.jp Research
More informationBIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke
BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart
More information