Test of Association between Two Ordinal Variables while Adjusting for Covariates

Size: px
Start display at page:

Download "Test of Association between Two Ordinal Variables while Adjusting for Covariates"

Transcription

1 Test of Association between Two Ordinal Variables while Adjusting for Covariates Chun Li, Bryan Shepherd Department of Biostatistics Vanderbilt University May 13, 2009

2 Examples Amblyopia

3 Examples Anisometropic Amblyopia Anisometropia: unequal refractive error between the two eyes. Anisometropia was detected on 974 preschool children in a photoscreening program (Leon et al. 2008). Variables collected: anisometropia magnitude (X) (0, A, B, C) visual acuity, used to define amblyopia level (Y ) (0-3) age (Z ) (range 0-6) Goal: Test the association between anisometropia and amblyopia while adjusting for the effect of age.

4 Examples Anisometropic Amblyopia #subjects: Proportion of amblyopia levels A B C Anisometropia level (ABS SE)

5 Examples CIN Stage Cervical specimens were collected for 303 non-pregnant HIV-infected women in Lusaka, Zambia (Parham et al. 2006). Variables collected: condom use (X) (never, rarely, almost always, and always) stage of cervical intraepithelial neoplastic (CIN) lesions (Y ) (5 levels) other factors such age, education, number of sexual partners. (Z ) Goal: Test for association between condom use and CIN stage controlling for other factors which may be associated with stages of the two diseases.

6 Problems in regression Regression Connect an outcome variable with input variable(s). Outcome continuous binary count ordinal survival Type of regression linear regression logistic regression Poisson regression (log-linear model) ordinal logistic regression (proportional odds model) Cox regression (proportional hazard model) For all these regression analyses, the right-hand side is β 0 + β 1 X β k X k.

7 Problems in regression Regression Connect an outcome variable with input variable(s). Outcome continuous binary count ordinal survival Type of regression linear regression logistic regression Poisson regression (log-linear model) ordinal logistic regression (proportional odds model) Cox regression (proportional hazard model) For all these regression analyses, the right-hand side is β 0 + β 1 X β k X k. What do we do when X 1 is ordered categorical?

8 Problems in regression When Ordinal X 1 is Treated as Continuous......, we assume the effect of moving from level 1 to level 2 is the same as that from level 2 to level 3. Often this is unreasonable. One could assign numbers to the categories so that the assigned values reflect a linear relationship with the outcome. Such a transformation is difficult to choose and may lead to data dredging.

9 Problems in regression When Ordinal X 1 is Treated as Continuous......, we assume the effect of moving from level 1 to level 2 is the same as that from level 2 to level 3. Often this is unreasonable. One could assign numbers to the categories so that the assigned values reflect a linear relationship with the outcome. Such a transformation is difficult to choose and may lead to data dredging. Splines also have drawbacks: uncertainty in number and locations of knots dependence on how the categories are coded non-monotonic results when monotonicity is expected difficulty when there are only three categories

10 Problems in regression When Ordinal X 1 is Treated as Categorical......, the order information is ignored. May have low power due to high degrees of freedom. May have non-monotonic effect estimates.

11 Problems in regression When Ordinal X 1 is Treated as Categorical......, the order information is ignored. May have low power due to high degrees of freedom. May have non-monotonic effect estimates. Isotonic regression: Fit regression treating X 1 as categorical. If the coefficients are not monotonic, combine adjacent categories that are in reverse of the general trend and re-fit regression. The grouping is data driven and the results need adjustment for this source of model selection variability. The degrees of freedom may still be high.

12 tau and gamma Kendall s tau and Goodman and Kruskal s gamma X = Y = 1 n 11 n 12 n 13 2 n 21 n 22 n 23 3 n 31 n 32 n 33 4 n 41 n 42 n 43 Number of concordant pairs: C = n j1 l 1 n j2 l 2. j 1 <j 2,l 1 <l 2 Number of discordant pairs: D = n j1 l 1 n j2 l 2. j 1 <j 2,l 1 >l 2

13 tau and gamma Kendall s tau and Goodman and Kruskal s gamma X = Y = 1 n 11 n 12 n 13 2 n 21 n 22 n 23 3 n 31 n 32 n 33 4 n 41 n 42 n 43 Kendall s tau: τ = C D n(n 1)/2. Number of concordant pairs: C = n j1 l 1 n j2 l 2. j 1 <j 2,l 1 <l 2 Number of discordant pairs: D = n j1 l 1 n j2 l 2. j 1 <j 2,l 1 >l 2

14 tau and gamma Kendall s tau and Goodman and Kruskal s gamma X = Y = 1 n 11 n 12 n 13 2 n 21 n 22 n 23 3 n 31 n 32 n 33 4 n 41 n 42 n 43 Number of concordant pairs: C = n j1 l 1 n j2 l 2. j 1 <j 2,l 1 <l 2 Number of discordant pairs: D = n j1 l 1 n j2 l 2. j 1 <j 2,l 1 >l 2 Kendall s tau: τ = C D n(n 1)/2. Goodman and Kruskal s gamma: γ = C D C + D. This is same as Sommer s d for two groups with no tie in predicted probabilities.

15 tau and gamma Kendall s tau and Goodman and Kruskal s gamma X = Y = 1 π 11 π 12 π 13 2 π 21 π 22 π 23 3 π 31 π 32 π 33 4 π 41 π 42 π 43 Goodman and Kruskal s gamma: γ = π C π D π C + π D. Concordance probability: π C = π j1 l 1 π j2 l 2. j 1 <j 2,l 1 <l 2 Discordance probability: π D = π j1 l 1 π j2 l 2. j 1 <j 2,l 1 >l 2

16 tau and gamma Kendall s Partial tau τ xy,z = (τ xy τ xz τ yz )/ (1 τxz)(1 2 τyz). 2 But τ xy,z 0 under the null of H 0 : Y X Z.

17 tau and gamma Kendall s Partial tau τ xy,z = (τ xy τ xz τ yz )/ (1 τxz)(1 2 τyz). 2 But τ xy,z 0 under the null of H 0 : Y X Z. In general, for (X, Y, Z) N(0, Σ), we can similarly define r xy,z = (r xy r xz r yz )/ (1 rxz)(1 2 ryz). 2 When X = Z + e X, Y = Z + e Y, we have 0 < r xz = r yz = r xy < 1 and r xy,z 0.

18 tau and gamma Davis s Partial gamma Z = 1. Z = k π 11,1 π 12,1 π 13,1 π 21,1 π 22,1 π 23,1 π 31,1 π 32,1 π 33,1 π 41,1 π 42,1 π 43,1. π 11,k π 12,k π 13,k π 21,k π 22,k π 23,k π 31,k π 32,k π 33,k π 41,k π 42,k π 43,k Let π C = π C,i and π D = π D,i. Stratum-specific γ: γ 1 = π C,1 π D,1 π C,1 + π D,1,, γ k = π C,k π D,k π C,k + π D,k

19 tau and gamma Davis s Partial gamma Z = 1. Z = k π 11,1 π 12,1 π 13,1 π 21,1 π 22,1 π 23,1 π 31,1 π 32,1 π 33,1 π 41,1 π 42,1 π 43,1. π 11,k π 12,k π 13,k π 21,k π 22,k π 23,k π 31,k π 32,k π 33,k π 41,k π 42,k π 43,k Stratum-specific γ: γ 1 = π C,1 π D,1 π C,1 + π D,1,, γ k = π C,k π D,k π C,k + π D,k Let π C = π C,i and π D = π D,i. Davis s partial gamma: γ = π C π D π C + π D k ( πc,i + π D,i = π C + π D i=1 ) γ i Requires stratification on continuous or multivariable Z.

20 Motivation Motivation of Our Approach In linear regression, Y = β 0 + β 1 X + γ 1 Z γ k Z k, we test for H 0 : β 1 = 0. Alternatively, we could: 1 fit a linear regression of Y on Z and obtain Y res, 2 fit a linear regression of X on Z and obtain X res, 3 fit Y res = α 0 + α 1 X res and test for H 0 : α 1 = 0. Then ˆβ 1 ˆα 1 and their significance levels are similar if n k.

21 Motivation Correlation of Residuals in Linear Regression y, E(y z) y E(y z) x, E(x z) x E(x z)

22 Motivation Our Approach 1 Fit a proportional odds model of Y on Z, 2 Fit a proportional odds model of X on Z, 3 Construct test statistics.

23 Motivation Our Approach 1 Fit a proportional odds model of Y on Z, 2 Fit a proportional odds model of X on Z, 3 Construct test statistics. Question: What is a residual for proportional odds models?

24 Proportional odds models Proportional Odds Models Suppose Y has levels 1 Y < < s Y. For j = 1,, s 1, logit[p(y j Z)] = α Y j + Z β Y. 1 For each level j, it is a logistic regression model with intercept α Y j and coefficient β Y. 2 For any level j, the log odds ratio between two subjects with covariates z 2 and z 1 is logit[p(y j z 2 )] logit[p(y j z 1 )] = (z 2 z 1 )βy. That is, odds 2 /odds 1 = e z 2 βy /e z 1 βy for ANY j.

25 Notation and model fitting Notation Y has levels 1 Y < < s Y ; X has levels 1 X < < t X. In general, their joint distribution is P = P(Y, X) = {π jl }. Under the null, P(Y, X Z) = P(Y Z)P(X Z), and P 0 = P 0 (Y, X) = P(Y, X Z)dZ = P(Y Z)P(X Z)dZ. z z

26 Notation and model fitting Notation Y has levels 1 Y < < s Y ; X has levels 1 X < < t X. In general, their joint distribution is P = P(Y, X) = {π jl }. Under the null, P(Y, X Z) = P(Y Z)P(X Z), and P 0 = P 0 (Y, X) = P(Y, X Z)dZ = P(Y Z)P(X Z)dZ. z Suppose (X i, Y i, Z i ) i.i.d. (X, Y, Z), i = 1,, n. For subject i, let p j i = P(Y i = j Z i = z i ), (j = 1,, s). qi l = P(X i = l Z i = z i ), (l = 1,, t). z

27 Notation and model fitting Estimation of P and P 0 Under the null, we model P(Y Z) and P(X Z) separately: and obtain estimates ˆp j i and ˆq l i. ˆP 0 = {ˆπ 0 jl }, where ˆπ0 jl = 1 n logit[p(y j Z)] = α Y j + Z β Y, (1) logit[p(x l Z)] = α X l + Z β X, (2) i ˆpj i ˆql i. ˆP = {ˆπ jl }, where ˆπ jl = n jl /n and n jl = #(Y = j, X = l).

28 Method 1: Observed versus expected Estimation of P and P 0 ˆq 1 i ˆq 2 i ˆq 3 i ˆp 1 i 1 ˆq i 1 1 ˆq i 2 1 ˆq i 3 ˆp 2 i 2 ˆq i 1 2 ˆq i 2 2 ˆq i 3 ˆp 3 i 3 ˆq i 1 3 ˆq i 2 3 ˆq i 3 ˆp 4 i 4 ˆq i 1 4 ˆq i 2 4 ˆq i 3 P ˆp 1 i ˆq 1 i P ˆp 1 i ˆq 2 i P ˆp 1 i ˆq 3 i n 11 n 12 n 13 nˆp 0 = P ˆp 2 i ˆq 1 i P ˆp 2 i ˆq 2 i P ˆp 2 i ˆq 3 i nˆp = n 21 n 22 n 23 P ˆp 3 i ˆq 1 i P ˆp 3 i ˆq 2 i P ˆp 3 i ˆq 3 i n 31 n 32 n 33 P ˆp 4 i ˆq 1 i P ˆp 4 i ˆq 2 i P ˆp 4 i ˆq 3 i n 41 n 42 n 43

29 Method 1: Observed versus expected Method 1: Observed versus Expected We compare the observed joint distribution P with the expected distribution P 0 under the null. 1 Summarize the two distributions separately by calculating Goodman and Kruskal s gamma, Γ 1 = Γ(ˆP) and Γ 0 = Γ(ˆP 0 ). 2 Test 1: T 1 = Γ 1 Γ 0. Note: A direct goodness-of-fit approach that involves calculating a statistic in the form of j,l (O E)2 /E, ignores the order information.

30 Method 2: Residual-based test statistics Method 2: Residual-based Test Statistics Residual in linear regression: Y = β 0 + γ 1 Z γ k Z k. For a subject with Y = y and Z = z, we first obtain the fitted value, ŷ = E(Y z), and then calculate the residual as y ŷ.

31 Method 2: Residual-based test statistics Method 2: Residual-based Test Statistics Residual in linear regression: Y = β 0 + γ 1 Z γ k Z k. For a subject with Y = y and Z = z, we first obtain the fitted value, ŷ = E(Y z), and then calculate the residual as y ŷ. In addition to the fitted value, a linear regression model also yields a distribution of possible outcome values, say Y fit Y z. The residual is y ŷ = y E(Y fit ) = E(y Y fit ). This is easy to understand from a Bayesian perspective.

32 Method 2: Residual-based test statistics Residual as A Distribution density density y z y y yfit

33 Method 2: Residual-based test statistics Residuals for Proportional Odds Models In model for P(Y Z), logit[p(y j Z)] = α Y j + Z β Y, Y i,fit Y z i {ˆp j i } is multinomial. We cannot calculate y i Y i,fit. We can evaluate if y i is at a higher or lower level than Y i,fit. 1 p i,high = P(y i > Y i,fit ) = j<y i ˆp j i (scored as +1) 2 p i,low = P(y i < Y i,fit ) = j>y i ˆp j i (scored as 1) 3 P(y i = Y i,fit ) = ˆp y i i (scored as 0) Residual is defined as the expected score, Y i,res = p i,high p i,low. For model (2), we have q i,high, q i,low, and X i,res = q i,high q i,low.

34 Method 2: Residual-based test statistics Residuals for Proportional Odds Models lower ( 1) tie (0) higher (+1) residual

35 Method 2: Residual-based test statistics Residual-based Statistics Test 2: T 2 = corr(y res, X res ).

36 Method 3: (Y i, X i ) versus (Y, X) Z i Method 3: (Y i, X i ) versus (Y, X) Z i For each subject, compare the observed value of (Y i, X i ) with all possible values of (Y, X) Z i {p j i ql i } under the null. Consider drawing (Y i, X i ) randomly from (Y, X) Z i, and scoring concordance, if Y i > Y i & X i > X i (or Y i < Y i & X i < X i ) discordance, if Y i > Y i & X i < X i (or Y i < Y i & X i > X i ) tie, otherwise

37 Method 3: (Y i, X i ) versus (Y, X) Z i Compare (Y i, X i ) with (Y, X) Z i ˆq 1 i ˆq 2 i ˆq 3 i q i,high ˆq 2 i q i,low ˆp 1 i 1 ˆq i 1 1 ˆq i 2 1 ˆq i 3 p i,high C tie D ˆp 2 i 2 ˆq i 1 2 ˆq i 2 2 ˆq i 3 ˆp 3 i tie tie tie ˆp 3 i 3 ˆq i 1 3 ˆq i 2 3 ˆq i 3 p i,low D tie C ˆp 4 i 4 ˆq i 1 4 ˆq i 2 4 ˆq i 3 Pr(concordance) is estimated as Ĉi = p i,high q i,high + p i,low q i,low. Pr(discordance) is estimated as ˆD i = p i,high q i,low + p i,low q i,high. Under the null, E(Ĉi) = E(ˆD i ). Test 3: T 3 = 1 n i (Ĉi ˆD i ).

38 Simulations: Methods to be Compared with We compared our approach with proportional odds models with: X continuous, and H 0 : η = 0. logit[p(y j Z)] = α Y j + Zβ Y + ηx X categorical, and H 0 : η 2 = = η t = 0. logit[p(y j Z)] = α Y j + Zβ Y + η 2 I {X=2} + + η t I {X=t} isotonic proportional odds regression X transformed using restricted cubic splines with three pre-selected knots

39 Data simulation steps: 1 Generate Z N(0, 1). Simulation Setup 2 Generate X (5 levels) using model (2) with α X = ( 1, 0, 1, 2) and β X = 1. 3 Generate Y (4 levels) using proportional odds model logit[p(y j Z)] = α Y j + Zβ Y + η 1 I {X=1} + + η t I {X=t} with α Y = ( 1, 0, 1) and β Y = 0.5. We considered 4 scenarios for η = (η 1,, η t ): 1 η = (0, 0, 0, 0, 0) (the null) 2 η = ( 0.4, 0.2, 0, 0.2, 0.4) (linear) 3 η = ( 0.30, 0.18, 0.20, 0.22, 0.24) (monotonic non-linear) 4 η = ( 0.2, 0, 0.2, 0, 0.2) (non-monotonic) For each scenario, 10,000 data sets, each with 500 subjects.

40 Simulation Results: Type I Error and Power (%) Analysis method Simulation scenarios Null Linear Non-linear Non-monotonic Our method T 1 Empirical Asymptotic X linear X categorical Isotonic Splines

41 Anisometropic Amblyopia Data Analysis Our method, log10(p) OLS, log10(p)

42 Limitations 1 We focus on hypothesis testing, not estimation. 2 We make an assumption of no interaction between X and Z. 3 We make a proportional odds assumption on X over Z. 4 Our method doesn t help if one is interested in the effect of Z on ordinal Y after adjusting for ordinal X.

43 P-value via Empirical Distribution Let T be one of the three test statistics. We simulate replicate data sets under the null. Repeat the following N emp times: 1 Generate one observation from {ˆp j i ˆql i }, (i = 1,, n). 2 Carry out the entire estimating procedure using the newly generated replicate data set to obtain T. The two-sided p-value is then computed as either #( T T )/N emp or 2 min{#(t T), #(T T)}/N emp. This procedure is essentially a parametric bootstrap procedure.

44 M-Estimation Theory Consider parameter vector θ of length p, whose estimate ˆθ is obtained by solving i Ψ(θ) = 0, where Ψ i(θ) = Ψ(Y i, X i, Z i ; θ) is a p-variate function that satisfies E θ [Ψ i (θ)] = 0. From M-estimation theory, if Ψ is suitably smooth, then n(ˆθ θ) d N(0, V(θ)), where V(θ) = A(θ) 1 B(θ)[A(θ) 1 ], A(θ) = E [ θ Ψ i(θ) ], and B(θ) = E[Ψ i (θ)ψ i (θ) ]. If T = g(ˆθ) is a smooth function of ˆθ, then n[g(ˆθ) g(θ)] d N(0, σ 2 ), where σ 2 = [ θ g(θ)] V(θ) [ θ g(θ)]. If g(θ) = 0 under the null, then the p-value is 2Φ ( ) T σ/. n

45 P-value via Estimating Equations For all three statistics, θ = (θ 1, θ 2, θ 3 ), where θ 1 = (α Y, β Y ), θ 2 = (α X, β X ), and θ 3 is different for each statistic. The corresponding estimating function Ψ i (θ) will have the form θ 1 l 1 (Y i, Z i ; θ 1 ) Ψ i (θ) = θ 2 l 2 (X i, Z i ; θ 2 ) ψ(y i, X i, Z i ; θ 3 ), where l 1 and l 2 are the log-likelihood functions of the proportional odds models [(1) and (2), respectively. ] They [ are score functions ] and thus E θ θ 1 l 1 (Y i, Z i ; θ 1 ) = 0 and E θ θ 2 l 2 (X i, Z i ; θ 2 ) = 0. The function ψ(y i, X i, Z i ; θ 3 ) will be different for each statistic.

46 1 Introduction Examples Problems in regression tau and gamma Outline 2 Our Approach Motivation Proportional odds models Notation and model fitting Method 1: Observed versus expected Method 2: Residual-based test statistics Method 3: (Y i, X i ) versus (Y, X) Z i 3 Simulations 4 Example Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Describing Contingency tables

Describing Contingency tables Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds

More information

Chapter 3: Maximum Likelihood Theory

Chapter 3: Maximum Likelihood Theory Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Chapter 2: Describing Contingency Tables - II

Chapter 2: Describing Contingency Tables - II : Describing Contingency Tables - II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]

More information

Small n, σ known or unknown, underlying nongaussian

Small n, σ known or unknown, underlying nongaussian READY GUIDE Summary Tables SUMMARY-1: Methods to compute some confidence intervals Parameter of Interest Conditions 95% CI Proportion (π) Large n, p 0 and p 1 Equation 12.11 Small n, any p Figure 12-4

More information

Multiple Sample Categorical Data

Multiple Sample Categorical Data Multiple Sample Categorical Data paired and unpaired data, goodness-of-fit testing, testing for independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

Unit 14: Nonparametric Statistical Methods

Unit 14: Nonparametric Statistical Methods Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain 0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher

More information

Textbook Examples of. SPSS Procedure

Textbook Examples of. SPSS Procedure Textbook s of IBM SPSS Procedures Each SPSS procedure listed below has its own section in the textbook. These sections include a purpose statement that describes the statistical test, identification of

More information

Comparison between conditional and marginal maximum likelihood for a class of item response models

Comparison between conditional and marginal maximum likelihood for a class of item response models (1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia

More information

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21

Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21 Sections 2.3, 2.4 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 21 2.3 Partial association in stratified 2 2 tables In describing a relationship

More information

Optimal exact tests for complex alternative hypotheses on cross tabulated data

Optimal exact tests for complex alternative hypotheses on cross tabulated data Optimal exact tests for complex alternative hypotheses on cross tabulated data Daniel Yekutieli Statistics and OR Tel Aviv University CDA course 29 July 2017 Yekutieli (TAU) Optimal exact tests for complex

More information

Lecture 8: Summary Measures

Lecture 8: Summary Measures Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8:

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk Causal Inference in Observational Studies with Non-Binary reatments Statistics Section, Imperial College London Joint work with Shandong Zhao and Kosuke Imai Cass Business School, October 2013 Outline

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department

More information

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk

More information

CDA Chapter 3 part II

CDA Chapter 3 part II CDA Chapter 3 part II Two-way tables with ordered classfications Let u 1 u 2... u I denote scores for the row variable X, and let ν 1 ν 2... ν J denote column Y scores. Consider the hypothesis H 0 : X

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

A Reliable Constrained Method for Identity Link Poisson Regression

A Reliable Constrained Method for Identity Link Poisson Regression A Reliable Constrained Method for Identity Link Poisson Regression Ian Marschner Macquarie University, Sydney Australasian Region of the International Biometrics Society, Taupo, NZ, Dec 2009. 1 / 16 Identity

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Discrete Multivariate Statistics

Discrete Multivariate Statistics Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

Strategy of Bayesian Propensity. Score Estimation Approach. in Observational Study

Strategy of Bayesian Propensity. Score Estimation Approach. in Observational Study Theoretical Mathematics & Applications, vol.2, no.3, 2012, 75-86 ISSN: 1792-9687 (print), 1792-9709 (online) Scienpress Ltd, 2012 Strategy of Bayesian Propensity Score Estimation Approach in Observational

More information

Graduate Econometrics I: Maximum Likelihood I

Graduate Econometrics I: Maximum Likelihood I Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

STAC51: Categorical data Analysis

STAC51: Categorical data Analysis STAC51: Categorical data Analysis Mahinda Samarakoon January 26, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 32 Table of contents Contingency Tables 1 Contingency Tables Mahinda Samarakoon

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

ECON 5350 Class Notes Functional Form and Structural Change

ECON 5350 Class Notes Functional Form and Structural Change ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

2018 2019 1 9 sei@mistiu-tokyoacjp http://wwwstattu-tokyoacjp/~sei/lec-jhtml 11 552 3 0 1 2 3 4 5 6 7 13 14 33 4 1 4 4 2 1 1 2 2 1 1 12 13 R?boxplot boxplotstats which does the computation?boxplotstats

More information

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Sections 3.4, 3.5 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 3.4 I J tables with ordinal outcomes Tests that take advantage of ordinal

More information

Generalized Linear Modeling - Logistic Regression

Generalized Linear Modeling - Logistic Regression 1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Lecture 01: Introduction

Lecture 01: Introduction Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction

More information

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases

More information

Confounding, mediation and colliding

Confounding, mediation and colliding Confounding, mediation and colliding What types of shared covariates does the sibling comparison design control for? Arvid Sjölander and Johan Zetterqvist Causal effects and confounding A common aim of

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

3 Joint Distributions 71

3 Joint Distributions 71 2.2.3 The Normal Distribution 54 2.2.4 The Beta Density 58 2.3 Functions of a Random Variable 58 2.4 Concluding Remarks 64 2.5 Problems 64 3 Joint Distributions 71 3.1 Introduction 71 3.2 Discrete Random

More information

WEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION

WEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION WEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION Michael Amiguet 1, Alfio Marazzi 1, Victor Yohai 2 1 - University of Lausanne, Institute for Social and Preventive Medicine, Lausanne, Switzerland 2 - University

More information

Additive and multiplicative models for the joint effect of two risk factors

Additive and multiplicative models for the joint effect of two risk factors Biostatistics (2005), 6, 1,pp. 1 9 doi: 10.1093/biostatistics/kxh024 Additive and multiplicative models for the joint effect of two risk factors A. BERRINGTON DE GONZÁLEZ Cancer Research UK Epidemiology

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline. Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

arxiv: v1 [stat.ap] 1 Mar 2018

arxiv: v1 [stat.ap] 1 Mar 2018 Probability-Scale Residuals in HIV/AIDS Research: Diagnostics and Inference arxiv:1803.00200v1 [stat.ap] 1 Mar 2018 Bryan E. Shepherd 1, Qi Liu 2, Valentine Wanga 3, Chun Li 4 1 Department of Biostatistics,

More information

Statistical Methods for Alzheimer s Disease Studies

Statistical Methods for Alzheimer s Disease Studies Statistical Methods for Alzheimer s Disease Studies Rebecca A. Betensky, Ph.D. Department of Biostatistics, Harvard T.H. Chan School of Public Health July 19, 2016 1/37 OUTLINE 1 Statistical collaborations

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Generalized Linear Models and Exponential Families

Generalized Linear Models and Exponential Families Generalized Linear Models and Exponential Families David M. Blei COS424 Princeton University April 12, 2012 Generalized Linear Models x n y n β Linear regression and logistic regression are both linear

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

Homework 1 Solutions

Homework 1 Solutions 36-720 Homework 1 Solutions Problem 3.4 (a) X 2 79.43 and G 2 90.33. We should compare each to a χ 2 distribution with (2 1)(3 1) 2 degrees of freedom. For each, the p-value is so small that S-plus reports

More information

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

Logistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014

Logistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014 Logistic Regression Advanced Methods for Data Analysis (36-402/36-608 Spring 204 Classification. Introduction to classification Classification, like regression, is a predictive task, but one in which the

More information

Exercises. (a) Prove that m(t) =

Exercises. (a) Prove that m(t) = Exercises 1. Lack of memory. Verify that the exponential distribution has the lack of memory property, that is, if T is exponentially distributed with parameter λ > then so is T t given that T > t for

More information

Recitation 5. Inference and Power Calculations. Yiqing Xu. March 7, 2014 MIT

Recitation 5. Inference and Power Calculations. Yiqing Xu. March 7, 2014 MIT 17.802 Recitation 5 Inference and Power Calculations Yiqing Xu MIT March 7, 2014 1 Inference of Frequentists 2 Power Calculations Inference (mostly MHE Ch8) Inference in Asymptopia (and with Weak Null)

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

General Regression Model

General Regression Model Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical

More information

Data-analysis and Retrieval Ordinal Classification

Data-analysis and Retrieval Ordinal Classification Data-analysis and Retrieval Ordinal Classification Ad Feelders Universiteit Utrecht Data-analysis and Retrieval 1 / 30 Strongly disagree Ordinal Classification 1 2 3 4 5 0% (0) 10.5% (2) 21.1% (4) 42.1%

More information

Generalized Additive Models

Generalized Additive Models Generalized Additive Models The Model The GLM is: g( µ) = ß 0 + ß 1 x 1 + ß 2 x 2 +... + ß k x k The generalization to the GAM is: g(µ) = ß 0 + f 1 (x 1 ) + f 2 (x 2 ) +... + f k (x k ) where the functions

More information

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

More information

Master s Written Examination - Solution

Master s Written Examination - Solution Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

Chapter 14 Logistic and Poisson Regressions

Chapter 14 Logistic and Poisson Regressions STAT 525 SPRING 2018 Chapter 14 Logistic and Poisson Regressions Professor Min Zhang Logistic Regression Background In many situations, the response variable has only two possible outcomes Disease (Y =

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

A class of latent marginal models for capture-recapture data with continuous covariates

A class of latent marginal models for capture-recapture data with continuous covariates A class of latent marginal models for capture-recapture data with continuous covariates F Bartolucci A Forcina Università di Urbino Università di Perugia FrancescoBartolucci@uniurbit forcina@statunipgit

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Statistics: A review. Why statistics?

Statistics: A review. Why statistics? Statistics: A review Why statistics? What statistical concepts should we know? Why statistics? To summarize, to explore, to look for relations, to predict What kinds of data exist? Nominal, Ordinal, Interval

More information

Nemours Biomedical Research Statistics Course. Li Xie Nemours Biostatistics Core October 14, 2014

Nemours Biomedical Research Statistics Course. Li Xie Nemours Biostatistics Core October 14, 2014 Nemours Biomedical Research Statistics Course Li Xie Nemours Biostatistics Core October 14, 2014 Outline Recap Introduction to Logistic Regression Recap Descriptive statistics Variable type Example of

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

Sections 4.1, 4.2, 4.3

Sections 4.1, 4.2, 4.3 Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear

More information

Statistics of Contingency Tables - Extension to I x J. stat 557 Heike Hofmann

Statistics of Contingency Tables - Extension to I x J. stat 557 Heike Hofmann Statistics of Contingency Tables - Extension to I x J stat 557 Heike Hofmann Outline Testing Independence Local Odds Ratios Concordance & Discordance Intro to GLMs Simpson s paradox Simpson s paradox:

More information

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Part III Measures of Classification Accuracy for the Prediction of Survival Times Part III Measures of Classification Accuracy for the Prediction of Survival Times Patrick J Heagerty PhD Department of Biostatistics University of Washington 102 ISCB 2010 Session Three Outline Examples

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

Robust negative binomial regression

Robust negative binomial regression Robust negative binomial regression Michel Amiguet, Alfio Marazzi, Marina S. Valdora, Víctor J. Yohai september 2016 Negative Binomial Regression Poisson regression is the standard method used to model

More information