Research Projects. Hanxiang Peng. March 4, Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis

Hanxiang Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis March 4, 2009

Outline Project I: Free Knot Spline Cox Model

Project I: Free Knot Spline Cox Model Consider a parametric Cox model: h(t) = h 0 (t)exp(g θ (t, Z(t))), (1) where g θ is a smooth function. Our candidate of g θ is a free knot spline. i A quadratic free-knot polynomial spline with knots in time: g θ (t, z) = (β 0 + β 1 t + β 2 t 2 + β 2 (t γ) 2 +)z, where θ = [β 1, β 2, β 3, γ] is the parameter of interest. The knot γ can be a threshold value such as a changepoint.

ii A quadratic free-knot polynomial spline with knots in covariates: g θ (z) = β 1 z + β 2 z 2 + β 3 (z κ) 2 +, where θ = [β 1, β 2, β 3, κ] is the parameter of interest. The knot κ can be a threshold value such as a nadir (of BMI), a changepoint, etc.

iv g θ has first continuous derivative, g θ (z) = [z, z 2, (z κ) 2 1 {z>κ}, 2β 3 (z κ)1 {z>κ} ], but the 2nd derivative does not exist at knot κ.

iv g θ has first continuous derivative, g θ (z) = [z, z 2, (z κ) 2 1 {z>κ}, 2β 3 (z κ)1 {z>κ} ], but the 2nd derivative does not exist at knot κ. v Consistency? Asymptotic Normality? Under continuous first derivative, we have obtained. vi Consistency? Asymptotic Normality? Knots in covariates: g θ (z) = β 1 z + β 2 (z κ) +, g θ (z) = β(1 z κ 1 z<κ ). Knots in time: g θ (t, z) = (β 0 +β 1 t+β 2 (t γ) + )z, g θ (t, z) = β(1 t γ 1 t<γ )z,

Motivations Data consisting of groups of dependent binary random variables arise commonly in many fields including developmental toxicity studies, longitudinal studies, studies of familial diseases, cluster sample surveys and others. Independence assumption is not appropriate in many areas of science. E.g., in developmental toxicity study, fetuses from the same litter are correlated and may respond more similarly to a stimulus than fetuses from different litters. This intra-litter correlation causes over-dispersion (extra-binomial variation).

Disadvantages of Some Commonly Used Methods The statistical inference based on the conditional analysis does not use any information about the distribution of the latent effects since it is lost in the conditioning. The EM algorithm method is conceptually simple, but the computations may be formidable because each E-step in the computation may require a numerical integration. The empirical Bayes approach may also be computationally difficult.

The Proposed Approach By relaxing independence to exchangeability, we introduce a rich family of parsimonious distributions resulted from completely monotonic functions, extending GLMs.

The Proposed Approach By relaxing independence to exchangeability, we introduce a rich family of parsimonious distributions resulted from completely monotonic functions, extending GLMs. We present a general framework that unifies the existing procedures such the beta-binomial, Williams procedures, and give many new distributions. Our approach uses all the distributional information including, of course, all order moments, so that it is a full likelihood approach.

Exchangeable Binomial B 1,...,B m are exchangeable if for every {0, 1}-valued variables b 1,...,b m, P(B 1 = b 1,...,B m = b m ) = P(B π1 = b 1,...,B πm = b m ), (2) for every permutation π 1,..., π m of 1,...,m. Let Y = B 1 +... + B m. Then P(Y = y) = ( ) m y m ( ) m y ( 1) k λ y+k, y = 0, 1,...,m, y k k=0 where λ 0 = 1, λ k = P(B 1 = 1,...,B k = 1), 1 k m are the marginal probabilities.

Complete Monotonicity The marginal probabilities λ = {λ i : i = 0, 1,...,m} (λ 0 = 1) are completely monotone (CM): ( 1) k k λ i 0, i = 0, 1,...,m, k + i m, (3) where is the difference operator: λ i = λ i+1 λ i. Write θ = (θ, ϑ), where θ is a parameter of interest, while ϑ is treated as a nuisance parameter. We model λ j = h j (β X; ϑ), j = 0, 1, 2,...,M.

Table: The Complete Monotone Links. Name Link(θ = (θ 1, θ 2 )) Parameters Ind-Bin θ t θ (0, 1) MM-Bin θ/(θ + t) θ (0, ) Beta-Bin B(θ 1 + t, θ 2 )/B(θ 1, θ 2 ) θ (0, ) 2 Gamma-Bin (1 + θ 2 t) θ 1 θ (0, ) 2 Poisson-Bin exp(θ(e t 1)) θ (0, ) Normal-Bin 2 exp((σt) 2 /2)(1 Φ(σt)) σ 2 (0, )

Efficient Estimation in a Semiparametric Copula Model Sklar Theorem. Let H be a joint CDF with margins F and G. Then there exists a copula C such that: H(x, y) = C(F(x), G(x)), x, y R. If F and G are continuous, then C is unique. Parametric copulas: Archimedean copulas, Pareto copulas, Placket copulas, etc. Suppose F = Fα and G = G β, while C is completely unspecified. Suppose we have bivariate data (X i, Y i ) s. How can we efficiently estimate C? Two issues: (1) How good can we estimate? (2) How to construct efficient estimate?