Regression models, selection effects and sampling bias

Size: px

Start display at page:

Download "Regression models, selection effects and sampling bias"

Egbert Strickland
5 years ago
Views:

1 Regression models, selection effects and sampling bias Peter McCullagh Department of Statistics University of Chicago Hotelling Lecture I UNC Chapel Hill Dec (JRSSB 2008)

2 Outline Regression models Gaussian models Binary regression model Attenuation of treatment effect Problems with regression models A point process model Ways of thinking Consequences of auto-generation Sampling bias Volunteer samples Randomization All creatures great and small Ewes and lambs Notation Estimating functions Interference Inference and prediction

3 Standard regression model: definition Fixed index set U (always infinite): u 1, u 2,... subjects, plots... Covariate x(u 1 ), x(u 2 ),... (non-random, vector-valued) Response Y (u 1 ), Y (u 2 ),... (random, real-valued) Regression model: Sample = finite ordered subset u 1,..., u n (distinct in U) For each sample with configuration x = (x(u 1 ),..., x(u n )) Response distribution p x (y) on R n depends on x Kolmogorov consistency condition on distributions p x ( ) Covariate configuration x is not a random variable p x ( ) is not the conditional distribution given x Key properties: exchangeability and no interference: Two samples such that x = x have the same response distn Distn of Y (u) does not depend on x(u )

4 Gaussian regression model Covariate x(u) = (x 1 (u), x 2 (u)) partitioned into two components x 1 (u) = (variety(u), treatment(u),...) affecting the mean x 2 (u) = (block(u), coordinates(u)) affecting covariances Example: response distribution for a fixed sample of n plots p x (y A; θ) = N n (X 1 β, σ 2 0 I n + σ 2 1 K [x 2])(A) A R n, K [x] = {K (x i, x j )} block-factor models: K (i, j) = 1 if block(i) = block(j) spatial/temporal models: K (i, j) = exp( x 2 (i) x 2 (j) /τ) Generalized random field: K (i, j) = ave x i, x j log x x Equivalent to Y (u) = µ(u) + ɛ(u) + η(x(u)) with ɛ η,...

5 Binary regression model (GLMM) Units: u 1, u 2,... subjects, patients, plots (labelled) Covariate x(u 1 ), x(u 2 ),... (non-random, X -valued) Latent process η on X (Gaussian, for example) Responses Y (u 1 ),... conditionally independent given η logit pr(y (u) = 1 η) = α + βx(u) + η(x(u)) Joint distribution for sample having configuration x p x (y) = E η n i=1 e (α+βx i +η(x i ))y i 1 + e α+βx i +η(x i ) parameters α, β, K, K (x, x ) = cov(η(x), η(x )).

6 Attenuation of treatment effect η is a random effect (associated with clinical centre) logit pr(y i = 1 η; Trt = 0) = α + η(x i ) logit pr(y i = 1 η; Trt = 1) = α + η(x i ) + τ Unconditionally pr(y i = 1; Trt i = t) = logit pr(y i = 1; Trt i = t) α + τ t e α+η+τ t 1 + e α+η+τ t φ(η; σ 2 ) dη (correlated responses Y i, Y j if x(i) = x(j)) Attenuation: τ τ/ σ 2 Treatment effect = odds(success w/treat)/odds(success without) Treatment effect: τ or τ?

7 Binary regression model: computation GLMM computational problem: Options: p x (y) = n R n i=1 e (α+βx i +η(x i ))y i 1 + e α+βx φ(η; K ) dη i +η(x i ) Taylor approx: Laird and Ware; Schall; Breslow and Clayton, McC and Nelder, Drum and McC,... Laplace approximation: Wolfinger 1993; Shun and McC 1994 Numerical approximation: Egret E.M. algorithm: McCulloch 1994 for probit models Monte Carlo: Z&L,...

8 But,..., wait a minute... p x (y) is the distribution for each fixed sample of n units. But... the sample might not be predetermined volunteer samples in clinical trials; pre-screening of patients to increase compliance; behavioural studies in ecology; marketing studies, with purchase events as units; public policy: crime type with crime events as units Ergo, x is also random, so we need a joint distribution. Also, examples suggest that sampling may not be neutral... Q1: For a bivariate process, what does p x (y) represent? Q2: Is it necessarily the case that p x (y) = p(y x)?... even if sample size is random?

9 Problems in the application of regression models Clinical trials / market research / traffic studies / crime... (i) Operational interpretation of a sample as a fixed subset or as a random subset independent of the process (ii) Key property of regression models: sampling neutrality two units u, u s.t. u in sample u not; x(u) = x(u ) Y (u) Y (u ) (iii) Sample units generated by a random process sequential recruitment, purchase events, traffic studies... (iv) Population also generated by a random process in time animal populations, purchase events, crime events,... (v) Samples: random, sequential, quota,... (vi) Conditional distribution given observed random x versus stratum distribution for fixed x

10 y = 2 y = 1 y = 0 X A point process on C X for C = {0, 1, 2}, and the superposition process on X. Intensity λ r (x) for class r: r = 0, 1, 2. x-values auto-generated by the superposition process with intensity λ.(x). To each auto-generated unit there corresponds an x-value and a y-value.

11 Binary point process model Intensity process λ 0 (x) for class 0, λ 1 (x) for class 1 Log ratio: η(x) = log λ 1 (x) log λ 0 (x) Events form a PP with intensity λ on {0, 1} X. Conventional GLMM calculation (Bayesian and frequentist): pr(y = 1 x, λ) = λ 1(x) λ. (x) = eη(x) 1 + e η(x) ( ) ( ) λ1 (x) e η(x) pr(y = 1 x) = E = E λ. (x) 1 + e η(x) GLMM calculation is correct in a sense, but irrelevant there might not be an event at x!

12 Correct calculation for auto-generated units pr(event of type r in dx λ) = λ r (x) dx + o(dx) pr(event of type r in dx) = E(λ r (x)) dx + o(dx) pr(event in SPP in dx λ) = λ. (x) dx + o(dx) pr(event in SPP in dx) = E(λ. (x)) dx + o(dx) pr(y (x) = r SPP event at x) = Eλ r (x) Eλ. (x) ( λr (x) ) = E x SPP = Eλ ( ) r (x) λ. (x) Eλ. (x) E λr (x) λ. (x) Sampling bias: Distn for fixed x versus distn for autogenerated x.

13 Two ways of thinking First way: waiting for Godot! Fix x X and wait for an event to occur at x pr(y = 1 λ, x) = λ 1(x) ( λ.(x) ) pr(y = 1; x) = E λ1 (x) = E(Y i i : X i = x) λ.(x) Conventional GLMM calculation... mathematically correct, but seldom relevant Second way: come what may! SPP event occurs at x, a random point in X joint density at (y, x) proportional to E(λ y (x)) = m y (x) x has marginal density proportional to E(λ. (x)) = m. (x) pr(y = 1 x SPP) = Eλ ( ) 1(x) Eλ. (x) E λ1 (x) = E(Y i i : X i = x) λ. (x)

14 Log Gaussian illustration of sampling bias η 0 (x) GP(0, K ), λ 0 (x) = exp(η 0 (x)) η 1 (x) GP(α + βx, K ), λ 1 (x) = exp(η 1 (x)) η(x) = η 1 (x) η 0 (x) GP(α + βx, 2K ), K (x, x) = σ 2 One-dimensional sampling distributions: ( ) e η(x) ρ(x) = p x (Y = 1) = E 1 + e η(x) logit(ρ(x)) α + β x ( β < β ) e α+βx+σ2/2 π(x) = pr(y = 1 x SPP) = Eλ 1(x) Eλ. (x) = e σ2 /2 + e α+βx+σ2 /2 logit pr(y = 1 x SPP) = α + βx No approximation; no attenuation

15 Randomization: Simple treatment effect Event x > pair (t(x), y(x)), treatment status, outcome H 0 H a Expected intensity (t, y) λ y (x)γ t λ y (x)γ t,y m y (x)γ t,y (0, 0) λ 0 (x)γ 0 λ 0 (x)γ 00 m 0 (x)γ 00 (0, 1) λ 1 (x)γ 0 λ 1 (x)γ 01 m 1 (x)γ 01 (1, 0) λ 0 (x)γ 1 λ 0 (x)γ 10 m 0 (x)γ 10 (1, 1) λ 1 (x)γ 1 λ 1 (x)γ 11 m 1 (x)γ 11 Conditional on λ: odds(y (x) = 1 t, λ) = λ 1 (x)γ t1 /(λ 0 (x)γ t0 ) odds ratio = τ = γ 11 γ 00 /(γ 01 γ 10 ) Treatment effect constant in x and independent of λ

16 GLMM analysis for fixed x Sample (X, Y, t) 1, (X, Y, t) 2,... observed sequentially GLMM analysis for a single event at x λ 1 (x)γ t1 pr(y (x) = 1 t, λ) = λ 1 (x)γ t1 + λ 0 (x)γ ( t0 ) λ 1 (x)γ t1 pr(y (x) = 1 t) = E λ 1 (x)γ t1 + λ 0 (x)γ t0 odds(y (x) = 1 t = 1) odds ratio = odds(y (x) = 1 t = 0) = τ Conclusion: treatment effect is attenuated log τ < log τ

17 Point process analysis for autogenerated units Sample (X, Y, t) 1, (X, Y, t) 2,... observed sequentially Correct analysis for a single event at x λ 1 (x)γ t1 pr(y (x) = 1 t, λ) = λ 1 (x)γ t1 + λ 0 (x)γ t0 E(λ 1 (x))γ t1 pr(y (x) = 1 t, x SPP) = E(λ 1 (x))γ t1 + E(λ 0 (x))γ t0 odds(y (x) = 1 t, x SPP) = m 1 (x)γ t1 /(m 0 (x)γ t0 ) odds ratio = γ 11 γ 00 /(γ 01 γ 10 ) = τ Conclusion: No attenuation of treatment effect.

18 All creatures great and small... Problem: assess the effectiveness of treatment for scour in lambs Randomized trial: flock of 100 ewes due to produce lambs in Feb-March 10% zero; 40% single; 50% twins 140 lambs Randomize assignment of treatment (t = 0 or t = 1) to lambs randomization protocol... Response Y = 1 if lamb suffers from 6 weeks Lambs are the experimental units, but... Ewe = litter = cluster: (use index i for ewe, l for lamb) Responses Y il, Y il in same cluster are correlated

19 Correlation in clusters Standard GLMM model for correlation in clusters Introduce an iid cluster-specific random effect {η i }. eη i +α+βt il pr(y il = 1 η, t) = 1 + e η i +α+βt il Assume Y -values conditionally independent given η. Hence, the model distributions are e η i +α+βt il pr(y il = 1; t) = E η 1 + e η eα +β til i +α+βt il 1 + e α +β t il p t (y) = ( e η i +α+βt il ) E η 1 + e η i +α+βt il i l Use GLMM software to estimate β or β... or Bayesian software to obtain posterior distribution Does this make sense in the context?

20 Simple model: correct analysis Response distn w/o treatment for ewe i given λ w.p. λ (0) y w.p. λ (1) y (y {0, 1}) (y 1, y 2 ) w.p. λ (2) y 1,y 2 (y 1, y 2 {0, 1}) Distribution of #lambs: iid E(λ (0), λ (1)., λ (2).. ) Joint response distn incl treatment: (y, t) w.p. λ (1) y γ ty (y, t {0, 1}) (y 1, t 1, y 2, t 2 ) w.p. λ (2) y 1,y 2 γ t1 y 1 γ t2 y 2 such that γ.y = 1. (γ 11 γ 00 /(γ 10 γ 01 ) = e β ) constant

21 Simple model: correct analysis contd. pr(m i = 1, Y i = y, t i = t λ) = λ (1) y γ ty pr(m i = 1, Y i = y, t i = t) = E(λ (1) y )γ ty y, t {0, 1} odds(y i = 1 t i = 1) odds(y i = 1 t i = 0) = E(λ(1) 1 )E(λ(1) 0 ) γ 11γ 00 E(λ (1) 1 )E(λ(1) 0 ) γ = e β 10γ 01 (t i = 1 implies m i = 1): Compare with incorrect analysis pr(y il = 1 λ, t il = t) = λ (1) 1 γ t1 λ (1) 1 γ t1 + λ (1) 0 γ t0 log odds(y il = 1 t il = 1) log odds(y il = 1 t il = 0) β with β < β, giving the illusion of attenuation. = eη i +βt 1 + e η i +βt (GL

22 Implications for estimating equations Attenuation: π(t) = eα+βt e η+α+βt 1 + e α+βt ρ(t) = E η 1 + e η+α+βt eα +β t 1 + e α +β t Estimating functions (sum over lambs) S(t, y) = lambs Y l π(t l ) (PP) S (t, y) = lambs Y l ρ(t l ) (GLMM, PA) Mean and variance of S and S E(S(t, y)) = 0; E(S (t, y)) 0 var(s(t, y)) =?

23 Notation: meaning of E(Y i X i = x) Exchangeable sequence (Y 1, X 1 ), (Y 2, X 2 ),... with binary Y implies conditionally iid given λ Stratum x: U x = {i : X i = x} an infinite random subsequence Stratum average: ave{y i : i U x } = λ 1 (x)/λ. (x) Stratum mean = expected value of stratum average: ρ(x) = E ( ) λ1 (x) eα +β x λ. (x) 1 + e α +β x is declared target in much biostatistical work (PA) Correct calculation for a random stratum in SPP: ( λ1 (x) ) π(x) = E x SPP = E(λ 1(x)) λ. (x) E(λ. (x)) = eα+βx 1 + e α+βx Conditional mean π(x) = E(Y i X i = x) versus stratum mean ρ(x) = E(Y i i : X i = x)

24 Consequences of ambiguous notation Sample (Y 1, X 1 ), (Y 2, X 2 ),... observed sequentially ( λ1 (x) ) π(x) = E x SPP = eα+βx λ. (x) 1 + e α+βx = E(Y i X i = x) ( ) λ1 (x) ρ(x) = E eα +β x λ. (x) 1 + e α +β x = E(Y i i : X i = x) (ρ(x) computed by logistic-normal integral) Conventional PA estimating function Y i ρ(x i ) is such that and same for Y 2,.... E(Y 1 ρ(x) X 1 = x) = π(x) ρ(x) 0

25 Estimating functions done correctly Mean intensity for class r: m r (x) = E(λ r (x)) π r (x) = m r (x)/m. (x); ρ r (x) = E(λ r (x)/λ. (x)) E(Y i ) = ρ(x) for each i in U x = {u : X u = x} For autogenerated x, E(Y x SPP) = π(x) ρ(x) T (x, y) = x SPP h(x)(y (x) π(x)) has zero mean for auto-generated configurations x. Note: E(T x) 0; average is also over configurations

26 Explanation of unbiasedness z = {(x 1, y 1 ),...} configuration generated in (0, t). x = {x 1,..., } marginal configuration (SPP) z is a random measure with mean t m r (x) ν(dx) at (r, x) x is marginal random measure with mean t m. (x) ν(dx) at x π r (x) = m r (x)/m. (x) Hence E(z(r, dx)) = π r (x)e(x(dx)) for all (r, x) implies T (x, y) = = x SPP X h(x)(y (x) π(x)) h(x) has zero expectation. But E(T x) 0. ( ) z(r, dx) π r (x)x(dx)

27 variance calculation: binary case (y, x) generated by point process; T (x, y) = x SPP h(x)(y (x) π(x)) E(T (x, y)) = 0; E(T x) 0 var(t ) = h 2 (x)π(x)(1 π(x)) m. (x) dx X + h(x)h(x ) V (x, x ) m.. (x, x ) dx dx X 2 + h(x)h(x ) 2 (x, x )m.. (x, x ) dx dx X 2 V : spatial or within-cluster correlation; : interference

28 What is interference? Physical/biological interference: distribution of Y (u) depends on x(u ) Sampling interference for autogenerated units m r (x) = E(λ r (x)); m rs (x, x ) = E(λ r (x)λ s (x )) Univariate distributions: π r (x) = m r (x)/m. (x) = pr(y (x) = r x SPP) Bivariate distributions: π rs (x, x ) = m rs (x, x )/m.. (x, x ) π rs (x, x ) = pr(y (x) = r, Y (x ) = s x, x SPP) Hence π r. (x, x ) = pr(y (x) = r x, x SPP) r (x, x ) = π r. (x, x ) π r (x) No second-order sampling interference if r (x, x ) = 0

29 Inference: Conventional Gaussian model Model p x (A) = N n (Xβ, Σ x = σ0 2 I n + K [x])(a) Rationalization Y (i) = x i i + η(x(i)) Stratum average: Ȳ (U x) = ave{y i i : x i = x} = x β + η(x) Conditional distribution of Y u for u U x given observation y, X Y u data N(x β + k Σ 1 x (y Xβ), Σ uu k Σ 1 x k) Ȳ (U x ) data N(x β + k Σ 1 x (y Xβ), K (x, x) k Σ 1 x k) k i = K (x, x i ), (such as e x x i or x x i 3 ) Stratum average is a random variable, not a parameter Estimate is a distribution (not a function of sufficient statistic) Likewise for GLMMs

30 Inference and predicton for the PP model For a sequential sample Observation (x, y) (x (0),..., x (k 1) ) Product density m r (x (r) ) = E( x x (r) λ r (x)) for class r Conditional distribution as a random labelled partition of x: p(y x) m 0 (x (0) ) m k 1 (x (k 1) ) For a subsequent autogenerated event p(y (x ) = r data, x SPP) m r (x (r), x )/m r (x (r) ) Use likelihood or estimating function to estimate parameters Use conditional distribution for inference/prediction

31 Brief summary of conclusions (i) Reasonable case for fixed-population model in certain areas laboratory work; field trials; veterinary trials;... (ii) Good case for autogenerated units in other areas clinical trials; marketing; crime; animal behaviour (iii) The choice matters in random-effects models (iv) π(x) = E(Y i X i = x) versus ρ(x) = E(Y i i : X i = x) conditional distribution versus stratum distribution attenuation or non-attenuation (v) What is modelled and estimated by PA? claims to estimate ρ(x) but actually estimates π(x) (vi) Distn for sequential sample is different from GLMM distn (vii) What does the GLMM likelihood estimate? Difficult to say; probably neither ρ(x) nor π(x)

Statstical models, selection effects and sampling bias

Statstical models, selection effects and sampling bias Department of Statistics University of Chicago Statslab Cambridge Nov 14 2008 Outline 1 Gaussian models Binary regression model Attenuation of treatment