Test of Association between Two Ordinal Variables while Adjusting for Covariates
|
|
- Jonas Cannon
- 5 years ago
- Views:
Transcription
1 Test of Association between Two Ordinal Variables while Adjusting for Covariates Chun Li, Bryan Shepherd Department of Biostatistics Vanderbilt University May 13, 2009
2 Examples Amblyopia
3 Examples Anisometropic Amblyopia Anisometropia: unequal refractive error between the two eyes. Anisometropia was detected on 974 preschool children in a photoscreening program (Leon et al. 2008). Variables collected: anisometropia magnitude (X) (0, A, B, C) visual acuity, used to define amblyopia level (Y ) (0-3) age (Z ) (range 0-6) Goal: Test the association between anisometropia and amblyopia while adjusting for the effect of age.
4 Examples Anisometropic Amblyopia #subjects: Proportion of amblyopia levels A B C Anisometropia level (ABS SE)
5 Examples CIN Stage Cervical specimens were collected for 303 non-pregnant HIV-infected women in Lusaka, Zambia (Parham et al. 2006). Variables collected: condom use (X) (never, rarely, almost always, and always) stage of cervical intraepithelial neoplastic (CIN) lesions (Y ) (5 levels) other factors such age, education, number of sexual partners. (Z ) Goal: Test for association between condom use and CIN stage controlling for other factors which may be associated with stages of the two diseases.
6 Problems in regression Regression Connect an outcome variable with input variable(s). Outcome continuous binary count ordinal survival Type of regression linear regression logistic regression Poisson regression (log-linear model) ordinal logistic regression (proportional odds model) Cox regression (proportional hazard model) For all these regression analyses, the right-hand side is β 0 + β 1 X β k X k.
7 Problems in regression Regression Connect an outcome variable with input variable(s). Outcome continuous binary count ordinal survival Type of regression linear regression logistic regression Poisson regression (log-linear model) ordinal logistic regression (proportional odds model) Cox regression (proportional hazard model) For all these regression analyses, the right-hand side is β 0 + β 1 X β k X k. What do we do when X 1 is ordered categorical?
8 Problems in regression When Ordinal X 1 is Treated as Continuous......, we assume the effect of moving from level 1 to level 2 is the same as that from level 2 to level 3. Often this is unreasonable. One could assign numbers to the categories so that the assigned values reflect a linear relationship with the outcome. Such a transformation is difficult to choose and may lead to data dredging.
9 Problems in regression When Ordinal X 1 is Treated as Continuous......, we assume the effect of moving from level 1 to level 2 is the same as that from level 2 to level 3. Often this is unreasonable. One could assign numbers to the categories so that the assigned values reflect a linear relationship with the outcome. Such a transformation is difficult to choose and may lead to data dredging. Splines also have drawbacks: uncertainty in number and locations of knots dependence on how the categories are coded non-monotonic results when monotonicity is expected difficulty when there are only three categories
10 Problems in regression When Ordinal X 1 is Treated as Categorical......, the order information is ignored. May have low power due to high degrees of freedom. May have non-monotonic effect estimates.
11 Problems in regression When Ordinal X 1 is Treated as Categorical......, the order information is ignored. May have low power due to high degrees of freedom. May have non-monotonic effect estimates. Isotonic regression: Fit regression treating X 1 as categorical. If the coefficients are not monotonic, combine adjacent categories that are in reverse of the general trend and re-fit regression. The grouping is data driven and the results need adjustment for this source of model selection variability. The degrees of freedom may still be high.
12 tau and gamma Kendall s tau and Goodman and Kruskal s gamma X = Y = 1 n 11 n 12 n 13 2 n 21 n 22 n 23 3 n 31 n 32 n 33 4 n 41 n 42 n 43 Number of concordant pairs: C = n j1 l 1 n j2 l 2. j 1 <j 2,l 1 <l 2 Number of discordant pairs: D = n j1 l 1 n j2 l 2. j 1 <j 2,l 1 >l 2
13 tau and gamma Kendall s tau and Goodman and Kruskal s gamma X = Y = 1 n 11 n 12 n 13 2 n 21 n 22 n 23 3 n 31 n 32 n 33 4 n 41 n 42 n 43 Kendall s tau: τ = C D n(n 1)/2. Number of concordant pairs: C = n j1 l 1 n j2 l 2. j 1 <j 2,l 1 <l 2 Number of discordant pairs: D = n j1 l 1 n j2 l 2. j 1 <j 2,l 1 >l 2
14 tau and gamma Kendall s tau and Goodman and Kruskal s gamma X = Y = 1 n 11 n 12 n 13 2 n 21 n 22 n 23 3 n 31 n 32 n 33 4 n 41 n 42 n 43 Number of concordant pairs: C = n j1 l 1 n j2 l 2. j 1 <j 2,l 1 <l 2 Number of discordant pairs: D = n j1 l 1 n j2 l 2. j 1 <j 2,l 1 >l 2 Kendall s tau: τ = C D n(n 1)/2. Goodman and Kruskal s gamma: γ = C D C + D. This is same as Sommer s d for two groups with no tie in predicted probabilities.
15 tau and gamma Kendall s tau and Goodman and Kruskal s gamma X = Y = 1 π 11 π 12 π 13 2 π 21 π 22 π 23 3 π 31 π 32 π 33 4 π 41 π 42 π 43 Goodman and Kruskal s gamma: γ = π C π D π C + π D. Concordance probability: π C = π j1 l 1 π j2 l 2. j 1 <j 2,l 1 <l 2 Discordance probability: π D = π j1 l 1 π j2 l 2. j 1 <j 2,l 1 >l 2
16 tau and gamma Kendall s Partial tau τ xy,z = (τ xy τ xz τ yz )/ (1 τxz)(1 2 τyz). 2 But τ xy,z 0 under the null of H 0 : Y X Z.
17 tau and gamma Kendall s Partial tau τ xy,z = (τ xy τ xz τ yz )/ (1 τxz)(1 2 τyz). 2 But τ xy,z 0 under the null of H 0 : Y X Z. In general, for (X, Y, Z) N(0, Σ), we can similarly define r xy,z = (r xy r xz r yz )/ (1 rxz)(1 2 ryz). 2 When X = Z + e X, Y = Z + e Y, we have 0 < r xz = r yz = r xy < 1 and r xy,z 0.
18 tau and gamma Davis s Partial gamma Z = 1. Z = k π 11,1 π 12,1 π 13,1 π 21,1 π 22,1 π 23,1 π 31,1 π 32,1 π 33,1 π 41,1 π 42,1 π 43,1. π 11,k π 12,k π 13,k π 21,k π 22,k π 23,k π 31,k π 32,k π 33,k π 41,k π 42,k π 43,k Let π C = π C,i and π D = π D,i. Stratum-specific γ: γ 1 = π C,1 π D,1 π C,1 + π D,1,, γ k = π C,k π D,k π C,k + π D,k
19 tau and gamma Davis s Partial gamma Z = 1. Z = k π 11,1 π 12,1 π 13,1 π 21,1 π 22,1 π 23,1 π 31,1 π 32,1 π 33,1 π 41,1 π 42,1 π 43,1. π 11,k π 12,k π 13,k π 21,k π 22,k π 23,k π 31,k π 32,k π 33,k π 41,k π 42,k π 43,k Stratum-specific γ: γ 1 = π C,1 π D,1 π C,1 + π D,1,, γ k = π C,k π D,k π C,k + π D,k Let π C = π C,i and π D = π D,i. Davis s partial gamma: γ = π C π D π C + π D k ( πc,i + π D,i = π C + π D i=1 ) γ i Requires stratification on continuous or multivariable Z.
20 Motivation Motivation of Our Approach In linear regression, Y = β 0 + β 1 X + γ 1 Z γ k Z k, we test for H 0 : β 1 = 0. Alternatively, we could: 1 fit a linear regression of Y on Z and obtain Y res, 2 fit a linear regression of X on Z and obtain X res, 3 fit Y res = α 0 + α 1 X res and test for H 0 : α 1 = 0. Then ˆβ 1 ˆα 1 and their significance levels are similar if n k.
21 Motivation Correlation of Residuals in Linear Regression y, E(y z) y E(y z) x, E(x z) x E(x z)
22 Motivation Our Approach 1 Fit a proportional odds model of Y on Z, 2 Fit a proportional odds model of X on Z, 3 Construct test statistics.
23 Motivation Our Approach 1 Fit a proportional odds model of Y on Z, 2 Fit a proportional odds model of X on Z, 3 Construct test statistics. Question: What is a residual for proportional odds models?
24 Proportional odds models Proportional Odds Models Suppose Y has levels 1 Y < < s Y. For j = 1,, s 1, logit[p(y j Z)] = α Y j + Z β Y. 1 For each level j, it is a logistic regression model with intercept α Y j and coefficient β Y. 2 For any level j, the log odds ratio between two subjects with covariates z 2 and z 1 is logit[p(y j z 2 )] logit[p(y j z 1 )] = (z 2 z 1 )βy. That is, odds 2 /odds 1 = e z 2 βy /e z 1 βy for ANY j.
25 Notation and model fitting Notation Y has levels 1 Y < < s Y ; X has levels 1 X < < t X. In general, their joint distribution is P = P(Y, X) = {π jl }. Under the null, P(Y, X Z) = P(Y Z)P(X Z), and P 0 = P 0 (Y, X) = P(Y, X Z)dZ = P(Y Z)P(X Z)dZ. z z
26 Notation and model fitting Notation Y has levels 1 Y < < s Y ; X has levels 1 X < < t X. In general, their joint distribution is P = P(Y, X) = {π jl }. Under the null, P(Y, X Z) = P(Y Z)P(X Z), and P 0 = P 0 (Y, X) = P(Y, X Z)dZ = P(Y Z)P(X Z)dZ. z Suppose (X i, Y i, Z i ) i.i.d. (X, Y, Z), i = 1,, n. For subject i, let p j i = P(Y i = j Z i = z i ), (j = 1,, s). qi l = P(X i = l Z i = z i ), (l = 1,, t). z
27 Notation and model fitting Estimation of P and P 0 Under the null, we model P(Y Z) and P(X Z) separately: and obtain estimates ˆp j i and ˆq l i. ˆP 0 = {ˆπ 0 jl }, where ˆπ0 jl = 1 n logit[p(y j Z)] = α Y j + Z β Y, (1) logit[p(x l Z)] = α X l + Z β X, (2) i ˆpj i ˆql i. ˆP = {ˆπ jl }, where ˆπ jl = n jl /n and n jl = #(Y = j, X = l).
28 Method 1: Observed versus expected Estimation of P and P 0 ˆq 1 i ˆq 2 i ˆq 3 i ˆp 1 i 1 ˆq i 1 1 ˆq i 2 1 ˆq i 3 ˆp 2 i 2 ˆq i 1 2 ˆq i 2 2 ˆq i 3 ˆp 3 i 3 ˆq i 1 3 ˆq i 2 3 ˆq i 3 ˆp 4 i 4 ˆq i 1 4 ˆq i 2 4 ˆq i 3 P ˆp 1 i ˆq 1 i P ˆp 1 i ˆq 2 i P ˆp 1 i ˆq 3 i n 11 n 12 n 13 nˆp 0 = P ˆp 2 i ˆq 1 i P ˆp 2 i ˆq 2 i P ˆp 2 i ˆq 3 i nˆp = n 21 n 22 n 23 P ˆp 3 i ˆq 1 i P ˆp 3 i ˆq 2 i P ˆp 3 i ˆq 3 i n 31 n 32 n 33 P ˆp 4 i ˆq 1 i P ˆp 4 i ˆq 2 i P ˆp 4 i ˆq 3 i n 41 n 42 n 43
29 Method 1: Observed versus expected Method 1: Observed versus Expected We compare the observed joint distribution P with the expected distribution P 0 under the null. 1 Summarize the two distributions separately by calculating Goodman and Kruskal s gamma, Γ 1 = Γ(ˆP) and Γ 0 = Γ(ˆP 0 ). 2 Test 1: T 1 = Γ 1 Γ 0. Note: A direct goodness-of-fit approach that involves calculating a statistic in the form of j,l (O E)2 /E, ignores the order information.
30 Method 2: Residual-based test statistics Method 2: Residual-based Test Statistics Residual in linear regression: Y = β 0 + γ 1 Z γ k Z k. For a subject with Y = y and Z = z, we first obtain the fitted value, ŷ = E(Y z), and then calculate the residual as y ŷ.
31 Method 2: Residual-based test statistics Method 2: Residual-based Test Statistics Residual in linear regression: Y = β 0 + γ 1 Z γ k Z k. For a subject with Y = y and Z = z, we first obtain the fitted value, ŷ = E(Y z), and then calculate the residual as y ŷ. In addition to the fitted value, a linear regression model also yields a distribution of possible outcome values, say Y fit Y z. The residual is y ŷ = y E(Y fit ) = E(y Y fit ). This is easy to understand from a Bayesian perspective.
32 Method 2: Residual-based test statistics Residual as A Distribution density density y z y y yfit
33 Method 2: Residual-based test statistics Residuals for Proportional Odds Models In model for P(Y Z), logit[p(y j Z)] = α Y j + Z β Y, Y i,fit Y z i {ˆp j i } is multinomial. We cannot calculate y i Y i,fit. We can evaluate if y i is at a higher or lower level than Y i,fit. 1 p i,high = P(y i > Y i,fit ) = j<y i ˆp j i (scored as +1) 2 p i,low = P(y i < Y i,fit ) = j>y i ˆp j i (scored as 1) 3 P(y i = Y i,fit ) = ˆp y i i (scored as 0) Residual is defined as the expected score, Y i,res = p i,high p i,low. For model (2), we have q i,high, q i,low, and X i,res = q i,high q i,low.
34 Method 2: Residual-based test statistics Residuals for Proportional Odds Models lower ( 1) tie (0) higher (+1) residual
35 Method 2: Residual-based test statistics Residual-based Statistics Test 2: T 2 = corr(y res, X res ).
36 Method 3: (Y i, X i ) versus (Y, X) Z i Method 3: (Y i, X i ) versus (Y, X) Z i For each subject, compare the observed value of (Y i, X i ) with all possible values of (Y, X) Z i {p j i ql i } under the null. Consider drawing (Y i, X i ) randomly from (Y, X) Z i, and scoring concordance, if Y i > Y i & X i > X i (or Y i < Y i & X i < X i ) discordance, if Y i > Y i & X i < X i (or Y i < Y i & X i > X i ) tie, otherwise
37 Method 3: (Y i, X i ) versus (Y, X) Z i Compare (Y i, X i ) with (Y, X) Z i ˆq 1 i ˆq 2 i ˆq 3 i q i,high ˆq 2 i q i,low ˆp 1 i 1 ˆq i 1 1 ˆq i 2 1 ˆq i 3 p i,high C tie D ˆp 2 i 2 ˆq i 1 2 ˆq i 2 2 ˆq i 3 ˆp 3 i tie tie tie ˆp 3 i 3 ˆq i 1 3 ˆq i 2 3 ˆq i 3 p i,low D tie C ˆp 4 i 4 ˆq i 1 4 ˆq i 2 4 ˆq i 3 Pr(concordance) is estimated as Ĉi = p i,high q i,high + p i,low q i,low. Pr(discordance) is estimated as ˆD i = p i,high q i,low + p i,low q i,high. Under the null, E(Ĉi) = E(ˆD i ). Test 3: T 3 = 1 n i (Ĉi ˆD i ).
38 Simulations: Methods to be Compared with We compared our approach with proportional odds models with: X continuous, and H 0 : η = 0. logit[p(y j Z)] = α Y j + Zβ Y + ηx X categorical, and H 0 : η 2 = = η t = 0. logit[p(y j Z)] = α Y j + Zβ Y + η 2 I {X=2} + + η t I {X=t} isotonic proportional odds regression X transformed using restricted cubic splines with three pre-selected knots
39 Data simulation steps: 1 Generate Z N(0, 1). Simulation Setup 2 Generate X (5 levels) using model (2) with α X = ( 1, 0, 1, 2) and β X = 1. 3 Generate Y (4 levels) using proportional odds model logit[p(y j Z)] = α Y j + Zβ Y + η 1 I {X=1} + + η t I {X=t} with α Y = ( 1, 0, 1) and β Y = 0.5. We considered 4 scenarios for η = (η 1,, η t ): 1 η = (0, 0, 0, 0, 0) (the null) 2 η = ( 0.4, 0.2, 0, 0.2, 0.4) (linear) 3 η = ( 0.30, 0.18, 0.20, 0.22, 0.24) (monotonic non-linear) 4 η = ( 0.2, 0, 0.2, 0, 0.2) (non-monotonic) For each scenario, 10,000 data sets, each with 500 subjects.
40 Simulation Results: Type I Error and Power (%) Analysis method Simulation scenarios Null Linear Non-linear Non-monotonic Our method T 1 Empirical Asymptotic X linear X categorical Isotonic Splines
41 Anisometropic Amblyopia Data Analysis Our method, log10(p) OLS, log10(p)
42 Limitations 1 We focus on hypothesis testing, not estimation. 2 We make an assumption of no interaction between X and Z. 3 We make a proportional odds assumption on X over Z. 4 Our method doesn t help if one is interested in the effect of Z on ordinal Y after adjusting for ordinal X.
43 P-value via Empirical Distribution Let T be one of the three test statistics. We simulate replicate data sets under the null. Repeat the following N emp times: 1 Generate one observation from {ˆp j i ˆql i }, (i = 1,, n). 2 Carry out the entire estimating procedure using the newly generated replicate data set to obtain T. The two-sided p-value is then computed as either #( T T )/N emp or 2 min{#(t T), #(T T)}/N emp. This procedure is essentially a parametric bootstrap procedure.
44 M-Estimation Theory Consider parameter vector θ of length p, whose estimate ˆθ is obtained by solving i Ψ(θ) = 0, where Ψ i(θ) = Ψ(Y i, X i, Z i ; θ) is a p-variate function that satisfies E θ [Ψ i (θ)] = 0. From M-estimation theory, if Ψ is suitably smooth, then n(ˆθ θ) d N(0, V(θ)), where V(θ) = A(θ) 1 B(θ)[A(θ) 1 ], A(θ) = E [ θ Ψ i(θ) ], and B(θ) = E[Ψ i (θ)ψ i (θ) ]. If T = g(ˆθ) is a smooth function of ˆθ, then n[g(ˆθ) g(θ)] d N(0, σ 2 ), where σ 2 = [ θ g(θ)] V(θ) [ θ g(θ)]. If g(θ) = 0 under the null, then the p-value is 2Φ ( ) T σ/. n
45 P-value via Estimating Equations For all three statistics, θ = (θ 1, θ 2, θ 3 ), where θ 1 = (α Y, β Y ), θ 2 = (α X, β X ), and θ 3 is different for each statistic. The corresponding estimating function Ψ i (θ) will have the form θ 1 l 1 (Y i, Z i ; θ 1 ) Ψ i (θ) = θ 2 l 2 (X i, Z i ; θ 2 ) ψ(y i, X i, Z i ; θ 3 ), where l 1 and l 2 are the log-likelihood functions of the proportional odds models [(1) and (2), respectively. ] They [ are score functions ] and thus E θ θ 1 l 1 (Y i, Z i ; θ 1 ) = 0 and E θ θ 2 l 2 (X i, Z i ; θ 2 ) = 0. The function ψ(y i, X i, Z i ; θ 3 ) will be different for each statistic.
46 1 Introduction Examples Problems in regression tau and gamma Outline 2 Our Approach Motivation Proportional odds models Notation and model fitting Method 1: Observed versus expected Method 2: Residual-based test statistics Method 3: (Y i, X i ) versus (Y, X) Z i 3 Simulations 4 Example Data Analysis
Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationStatistics in medicine
Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu
More informationDescribing Contingency tables
Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds
More informationChapter 3: Maximum Likelihood Theory
Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood
More informationIntroduction to Statistical Analysis
Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive
More informationChapter 2: Describing Contingency Tables - II
: Describing Contingency Tables - II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]
More informationSmall n, σ known or unknown, underlying nongaussian
READY GUIDE Summary Tables SUMMARY-1: Methods to compute some confidence intervals Parameter of Interest Conditions 95% CI Proportion (π) Large n, p 0 and p 1 Equation 12.11 Small n, any p Figure 12-4
More informationMultiple Sample Categorical Data
Multiple Sample Categorical Data paired and unpaired data, goodness-of-fit testing, testing for independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More informationUnit 14: Nonparametric Statistical Methods
Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationf(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain
0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher
More informationTextbook Examples of. SPSS Procedure
Textbook s of IBM SPSS Procedures Each SPSS procedure listed below has its own section in the textbook. These sections include a purpose statement that describes the statistical test, identification of
More informationComparison between conditional and marginal maximum likelihood for a class of item response models
(1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia
More informationSections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21
Sections 2.3, 2.4 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 21 2.3 Partial association in stratified 2 2 tables In describing a relationship
More informationOptimal exact tests for complex alternative hypotheses on cross tabulated data
Optimal exact tests for complex alternative hypotheses on cross tabulated data Daniel Yekutieli Statistics and OR Tel Aviv University CDA course 29 July 2017 Yekutieli (TAU) Optimal exact tests for complex
More informationLecture 8: Summary Measures
Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8:
More informationFigure 36: Respiratory infection versus time for the first 49 children.
y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects
More informationSTAT 461/561- Assignments, Year 2015
STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More information,..., θ(2),..., θ(n)
Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.
More informationCausal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk
Causal Inference in Observational Studies with Non-Binary reatments Statistics Section, Imperial College London Joint work with Shandong Zhao and Kosuke Imai Cass Business School, October 2013 Outline
More informationCausal Inference with General Treatment Regimes: Generalizing the Propensity Score
Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke
More informationGoodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links
Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department
More informationLecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University
Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk
More informationCDA Chapter 3 part II
CDA Chapter 3 part II Two-way tables with ordered classfications Let u 1 u 2... u I denote scores for the row variable X, and let ν 1 ν 2... ν J denote column Y scores. Consider the hypothesis H 0 : X
More informationStat 642, Lecture notes for 04/12/05 96
Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal
More informationA Reliable Constrained Method for Identity Link Poisson Regression
A Reliable Constrained Method for Identity Link Poisson Regression Ian Marschner Macquarie University, Sydney Australasian Region of the International Biometrics Society, Taupo, NZ, Dec 2009. 1 / 16 Identity
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationNow consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.
Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)
More informationStrategy of Bayesian Propensity. Score Estimation Approach. in Observational Study
Theoretical Mathematics & Applications, vol.2, no.3, 2012, 75-86 ISSN: 1792-9687 (print), 1792-9709 (online) Scienpress Ltd, 2012 Strategy of Bayesian Propensity Score Estimation Approach in Observational
More informationGraduate Econometrics I: Maximum Likelihood I
Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationEmpirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design
1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary
More informationSTAC51: Categorical data Analysis
STAC51: Categorical data Analysis Mahinda Samarakoon January 26, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 32 Table of contents Contingency Tables 1 Contingency Tables Mahinda Samarakoon
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationECON 5350 Class Notes Functional Form and Structural Change
ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this
More informationMarginal Screening and Post-Selection Inference
Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2
More information2018 2019 1 9 sei@mistiu-tokyoacjp http://wwwstattu-tokyoacjp/~sei/lec-jhtml 11 552 3 0 1 2 3 4 5 6 7 13 14 33 4 1 4 4 2 1 1 2 2 1 1 12 13 R?boxplot boxplotstats which does the computation?boxplotstats
More informationSections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Sections 3.4, 3.5 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 3.4 I J tables with ordinal outcomes Tests that take advantage of ordinal
More informationGeneralized Linear Modeling - Logistic Regression
1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More informationLecture 01: Introduction
Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction
More informationCategorical data analysis Chapter 5
Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases
More informationConfounding, mediation and colliding
Confounding, mediation and colliding What types of shared covariates does the sibling comparison design control for? Arvid Sjölander and Johan Zetterqvist Causal effects and confounding A common aim of
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More information3 Joint Distributions 71
2.2.3 The Normal Distribution 54 2.2.4 The Beta Density 58 2.3 Functions of a Random Variable 58 2.4 Concluding Remarks 64 2.5 Problems 64 3 Joint Distributions 71 3.1 Introduction 71 3.2 Discrete Random
More informationWEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION
WEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION Michael Amiguet 1, Alfio Marazzi 1, Victor Yohai 2 1 - University of Lausanne, Institute for Social and Preventive Medicine, Lausanne, Switzerland 2 - University
More informationAdditive and multiplicative models for the joint effect of two risk factors
Biostatistics (2005), 6, 1,pp. 1 9 doi: 10.1093/biostatistics/kxh024 Additive and multiplicative models for the joint effect of two risk factors A. BERRINGTON DE GONZÁLEZ Cancer Research UK Epidemiology
More informationImproving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates
Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/
More informationLecture 3. Truncation, length-bias and prevalence sampling
Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationDependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.
Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationarxiv: v1 [stat.ap] 1 Mar 2018
Probability-Scale Residuals in HIV/AIDS Research: Diagnostics and Inference arxiv:1803.00200v1 [stat.ap] 1 Mar 2018 Bryan E. Shepherd 1, Qi Liu 2, Valentine Wanga 3, Chun Li 4 1 Department of Biostatistics,
More informationStatistical Methods for Alzheimer s Disease Studies
Statistical Methods for Alzheimer s Disease Studies Rebecca A. Betensky, Ph.D. Department of Biostatistics, Harvard T.H. Chan School of Public Health July 19, 2016 1/37 OUTLINE 1 Statistical collaborations
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationGeneralized Linear Models and Exponential Families
Generalized Linear Models and Exponential Families David M. Blei COS424 Princeton University April 12, 2012 Generalized Linear Models x n y n β Linear regression and logistic regression are both linear
More informationAnalysing geoadditive regression data: a mixed model approach
Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression
More informationHomework 1 Solutions
36-720 Homework 1 Solutions Problem 3.4 (a) X 2 79.43 and G 2 90.33. We should compare each to a χ 2 distribution with (2 1)(3 1) 2 degrees of freedom. For each, the p-value is so small that S-plus reports
More informationFaculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics
Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More informationCombining multiple observational data sources to estimate causal eects
Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationModelling geoadditive survival data
Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationA Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness
A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model
More informationModule 22: Bayesian Methods Lecture 9 A: Default prior selection
Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical
More informationLogistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014
Logistic Regression Advanced Methods for Data Analysis (36-402/36-608 Spring 204 Classification. Introduction to classification Classification, like regression, is a predictive task, but one in which the
More informationExercises. (a) Prove that m(t) =
Exercises 1. Lack of memory. Verify that the exponential distribution has the lack of memory property, that is, if T is exponentially distributed with parameter λ > then so is T t given that T > t for
More informationRecitation 5. Inference and Power Calculations. Yiqing Xu. March 7, 2014 MIT
17.802 Recitation 5 Inference and Power Calculations Yiqing Xu MIT March 7, 2014 1 Inference of Frequentists 2 Power Calculations Inference (mostly MHE Ch8) Inference in Asymptopia (and with Weak Null)
More informationFractional Imputation in Survey Sampling: A Comparative Review
Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationGeneral Regression Model
Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical
More informationData-analysis and Retrieval Ordinal Classification
Data-analysis and Retrieval Ordinal Classification Ad Feelders Universiteit Utrecht Data-analysis and Retrieval 1 / 30 Strongly disagree Ordinal Classification 1 2 3 4 5 0% (0) 10.5% (2) 21.1% (4) 42.1%
More informationGeneralized Additive Models
Generalized Additive Models The Model The GLM is: g( µ) = ß 0 + ß 1 x 1 + ß 2 x 2 +... + ß k x k The generalization to the GAM is: g(µ) = ß 0 + f 1 (x 1 ) + f 2 (x 2 ) +... + f k (x k ) where the functions
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationMaster s Written Examination - Solution
Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2
More informationFor more information about how to cite these materials visit
Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/
More informationBiost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation
Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest
More informationSTAT 525 Fall Final exam. Tuesday December 14, 2010
STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More informationChapter 14 Logistic and Poisson Regressions
STAT 525 SPRING 2018 Chapter 14 Logistic and Poisson Regressions Professor Min Zhang Logistic Regression Background In many situations, the response variable has only two possible outcomes Disease (Y =
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationA class of latent marginal models for capture-recapture data with continuous covariates
A class of latent marginal models for capture-recapture data with continuous covariates F Bartolucci A Forcina Università di Urbino Università di Perugia FrancescoBartolucci@uniurbit forcina@statunipgit
More informationMultilevel Statistical Models: 3 rd edition, 2003 Contents
Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction
More informationStatistics: A review. Why statistics?
Statistics: A review Why statistics? What statistical concepts should we know? Why statistics? To summarize, to explore, to look for relations, to predict What kinds of data exist? Nominal, Ordinal, Interval
More informationNemours Biomedical Research Statistics Course. Li Xie Nemours Biostatistics Core October 14, 2014
Nemours Biomedical Research Statistics Course Li Xie Nemours Biostatistics Core October 14, 2014 Outline Recap Introduction to Logistic Regression Recap Descriptive statistics Variable type Example of
More informationSurvival Analysis Math 434 Fall 2011
Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup
More informationSections 4.1, 4.2, 4.3
Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear
More informationStatistics of Contingency Tables - Extension to I x J. stat 557 Heike Hofmann
Statistics of Contingency Tables - Extension to I x J stat 557 Heike Hofmann Outline Testing Independence Local Odds Ratios Concordance & Discordance Intro to GLMs Simpson s paradox Simpson s paradox:
More informationPart III Measures of Classification Accuracy for the Prediction of Survival Times
Part III Measures of Classification Accuracy for the Prediction of Survival Times Patrick J Heagerty PhD Department of Biostatistics University of Washington 102 ISCB 2010 Session Three Outline Examples
More informationAn Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data
An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)
More informationRobust negative binomial regression
Robust negative binomial regression Michel Amiguet, Alfio Marazzi, Marina S. Valdora, Víctor J. Yohai september 2016 Negative Binomial Regression Poisson regression is the standard method used to model
More information