Statstical models, selection effects and sampling bias

Size: px

Start display at page:

Download "Statstical models, selection effects and sampling bias"

Beryl Henry
5 years ago
Views:

1 Statstical models, selection effects and sampling bias Department of Statistics University of Chicago Statslab Cambridge Nov

2 Outline 1 Gaussian models Binary regression model Attenuation of treatment effect Problems with conventional models 2 Sampling bias Volunteer samples Joint distributions of Y given x Randomization 3 Ewes and lambs Notation Estimating functions Interference 4

3 Conventional regression model Gaussian models Binary regression model Attenuation of treatment effect Problems with conventional models Fixed index set U (always infinite): u 1, u 2,... subjects, plots... Covariate x(u 1 ), x(u 2 ),... (non-random, vector-valued) Response Y (u 1 ), Y (u 2 ),... (random, real-valued) Regression model: Sample = finite ordered subset u 1,..., u n (distinct in U) For each sample with configuration x = (x(u 1 ),..., x(u n )) Response distribution p x (y) on R n depends on x Kolmogorov consistency condition on distributions p x ( ) x is not a random variable p x ( ) is not the conditional distribution given x

4 Gaussian regression model Gaussian models Binary regression model Attenuation of treatment effect Problems with conventional models Covariate x(u) = (x 1 (u), x 2 (u)) partitioned into two components x 1 (u) = (variety(u), treatment(u),...) affecting the mean x 2 (u) = (block(u), coordinates(u)) affecting covariances Example: response distribution for a fixed sample of n plots p x (y A; θ) = N n (X 1 β, σ 2 0 I n + σ 2 1 K [x 2])(A) A R n, K [x] = {K (x i, x j )} block-factor models: K (i, j) = 1 if block(i) = block(j) spatial/temporal models: K (i, j) = exp( x 2 (i) x 2 (j) /τ) Generalized random field: K (i, j) = ave x i, x j log x x Equivalent to Y (u) = µ(u) + ɛ(u) + η(u) with ɛ η,...

5 Binary regression model (GLMM) Gaussian models Binary regression model Attenuation of treatment effect Problems with conventional models Units: u 1, u 2,... subjects, patients, plots (labelled) Covariate x(u 1 ), x(u 2 ),... (non-random, X -valued) Latent process η on X (Gaussian, for example) Responses Y (u 1 ),... conditionally independent given η logit pr(y (u) = 1 η) = α + βx(u) + η(x(u)) Joint distribution for sample having configuration x p x (y) = E η n i=1 e (α+βx i +η(x i ))y i 1 + e α+βx i +η(x i ) parameters α, β, K, K (x, x ) = cov(η(x), η(x )).

6 Attenuation of treatment effect Gaussian models Binary regression model Attenuation of treatment effect Problems with conventional models η is a random effect (associated with clinical centre) logit pr(y i = 1 η, Trt = 0) = α + η(x i ) logit pr(y i = 1 η, Trt = 1) = α + η(x i ) + τ Unconditionally pr(y i = 1 Trt = t) = logit pr(y i = 1 Trt = t) = α + τ t e α+η+τ t 1 + e α+η+τ t φ(η; σ 2 ) dη Attenuation: τ τ/ σ 2 Treatment effect = odds(success w/treat)/odds(success without) Treatment effect: τ or τ?

7 Gaussian models Binary regression model Attenuation of treatment effect Problems with conventional models Binary regression model: computation GLMM computational problem: Options: p x (y) = n R n i=1 e (α+βx i +η(x i ))y i 1 + e α+βx φ(η; K ) dη i +η(x i ) Taylor approx: Laird and Ware; Schall; Breslow and Clayton, McC and Nelder, Drum and McC,... Laplace approximation: Wolfinger 1993; Shun and McC 1994 Numerical approximation: Egret E.M. algorithm: McCulloch 1994 for probit models Monte Carlo: Z&L,...

8 But,..., wait a minute... Gaussian models Binary regression model Attenuation of treatment effect Problems with conventional models p x (y) is the distribution for each fixed sample of n units. But... the sample might not be predetermined volunteer samples in clinical trials; pre-screening of patients to increase compliance; behavioural studies in ecology; marketing studies, with purchase events as units; public policy: crime type with crime events as units Ergo, x is also random, so we need a joint distribution. Q1: For a bivariate process, what does p x (y) represent? Q2: Is it necessarily the case that p x (y) = p(y x)?... even if sample size is random?

9 Gaussian models Binary regression model Attenuation of treatment effect Problems with conventional models

10 Gaussian models Binary regression model Attenuation of treatment effect Problems with conventional models Problems in the application of conventional models Clinical trials / market research / traffic studies / crime... (i) Operational interpretation of a sample as a fixed subset or as a random subset independent of the process (ii) Sample units generated by a random process sequential recruitment, purchase events, traffic studies... (iii) Population also generated by a random process in time animal populations, purchase events, crime events,... (iv) Samples: random, sequential, quota,... (v) Conditional distribution given observed random x versus stratum distribution for fixed x

11 y = 2 Sampling bias Volunteer samples Joint distributions of Y given x Randomization y = 1 y = 0 X A point process on C X for C = {0, 1, 2}, and the superposition process on X. Intensity λ r (x) for class r: r = 0, 1, 2. x-values auto-generated by the superposition process with intensity λ.(x). To each auto-generated unit there corresponds an x-value and a y-value.

12 Binary point process model Sampling bias Volunteer samples Joint distributions of Y given x Randomization Intensity process λ 0 (x) for class 0, λ 1 (x) for class 1 Log ratio: η(x) = log λ 1 (x) log λ 0 (x) Events form a PP with intensity λ on {0, 1} X. Conventional GLMM calculation (Bayesian and frequentist): pr(y = 1 x, λ) = λ 1(x) λ. (x) = eη(x) 1 + e η(x) ( ) ( ) λ1 (x) e η(x) pr(y = 1 x) = E = E λ. (x) 1 + e η(x) GLMM calculation is correct in a sense, but irrelevant there might not be an event at x!

13 Sampling bias Volunteer samples Joint distributions of Y given x Randomization Correct calculation for auto-generated units pr(event of type r in dx λ) = λ r (x) dx + o(dx) pr(event of type r in dx) = E(λ r (x)) dx + o(dx) pr(event in SPP in dx λ) = λ. (x) dx + o(dx) pr(event in SPP in dx) = E(λ. (x)) dx + o(dx) pr(y (x) = r SPP event at x) = Eλ r (x) Eλ. (x) ( λr (x) ) = E x SPP = Eλ ( ) r (x) λ. (x) Eλ. (x) E λr (x) λ. (x) Sampling bias: Distn for fixed x versus distn for autogenerated x.

14 Two ways of thinking Sampling bias Volunteer samples Joint distributions of Y given x Randomization First way: waiting for Godot! Fix x X and wait for an event to occur at x pr(y = 1 λ, x) = λ 1(x) ( λ.(x) ) pr(y = 1; x) = E λ1 (x) = E(Y i i : X i = x) λ.(x) Conventional, mathematically correct, but seldom relevant Second way: come what may! SPP event occurs at x, a random point in X joint density at (y, x) proportional to E(λ y (x)) = m y (x) x has marginal density proportional to E(λ. (x)) = m. (x) pr(y = 1 x SPP) = Eλ ( ) 1(x) Eλ. (x) E λ1 (x) = E(Y i i : X i = x) λ. (x)

15 Ewes and lambs Notation Estimating functions Interference Log Gaussian illustration of sampling bias η 0 (x) GP(0, K ), λ 0 (x) = exp(η 0 (x)) η 1 (x) GP(α + βx, K ), λ 1 (x) = exp(η 1 (x)) η(x) = η 1 (x) η 0 (x) GP(α + βx, 2K ), K (x, x) = σ 2 One-dimensional sampling distributions: ( ) e η(x) ρ(x) = p x (Y = 1) = E 1 + e η(x) logit(ρ(x)) α + β x ( β < β ) e α+βx+σ2/2 π(x) = pr(y = 1 x SPP) = Eλ 1(x) Eλ. (x) = e σ2 /2 + e α+βx+σ2 /2 logit pr(y = 1 x SPP) = α + βx No approximation; no attenuation

16 Ewes and lambs Notation Estimating functions Interference Alternative formulation involving selection Fix the population U Associate with each u a random response intensity λ (u) y,t ( t = 0 t = 1 y = 0 λ (u) 00 λ (u) ) 01 y = 1 λ (u) 10 λ (u) 11 implying a volunteer intensity λ (u).. (random and small) pr(u Sample) = E(λ (u).. ) pr(u S & t u = t) = E(λ (u).t ) = m (u).t pr(y u = 1 u S & t u = t) = m (u) 1t /m (u).t E(λ 1t /λ.t )

17 Ewes and lambs Notation Estimating functions Interference Conditional joint distributions of Y given x Sampling with fixed x quota ( ) λyi (x i ) p x (y) = E λ. (x i ) ( ) e y i η(x i ) = E 1 + e η(x i ) coincides with standard GLMM model Sequential sampling fixed time: #x random p(y x) = E λ yi (x i ) e R λ.(x)ν(dx) E λ. (x i ) e R λ.(x)ν(dx)

18 Ewes and lambs Notation Estimating functions Interference Randomization: Simple treatment effect Event x > pair (t(x), y(x)), treatment status, outcome H 0 H a Expected intensity (t, y) λ y (x)γ t λ y (x)γ t,y m y (x)γ t,y (0, 0) λ 0 (x)γ 0 λ 0 (x)γ 00 m 0 (x)γ 00 (0, 1) λ 1 (x)γ 0 λ 1 (x)γ 01 m 1 (x)γ 01 (1, 0) λ 0 (x)γ 1 λ 0 (x)γ 10 m 0 (x)γ 10 (1, 1) λ 1 (x)γ 1 λ 1 (x)γ 11 m 1 (x)γ 11 Conditional on λ: odds(y (x) = 1 t, λ) = λ 1 (x)γ t1 /(λ 0 (x)γ t0 ) odds ratio = τ = γ 11 γ 00 /(γ 01 γ 10 ) Treatment effect constant in x and independent of λ

19 GLMM analysis for fixed x Ewes and lambs Notation Estimating functions Interference Sample (X, Y, t) 1, (X, Y, t) 2,... observed sequentially GLMM analysis for a single event at x λ 1 (x)γ t1 pr(y (x) = 1 t, λ) = λ 1 (x)γ t1 + λ 0 (x)γ ( t0 ) λ 1 (x)γ t1 pr(y (x) = 1 t) = E λ 1 (x)γ t1 + λ 0 (x)γ t0 odds(y (x) = 1 t = 1) odds ratio = odds(y (x) = 1 t = 0) = τ Conclusion: treatment effect is attenuated log τ < log τ

20 Ewes and lambs Notation Estimating functions Interference Point process analysis for autogenerated units Sample (X, Y, t) 1, (X, Y, t) 2,... observed sequentially Correct analysis for a single event at x λ 1 (x)γ t1 pr(y (x) = 1 t, λ) = λ 1 (x)γ t1 + λ 0 (x)γ t0 E(λ 1 (x))γ t1 pr(y (x) = 1 t, x SPP) = E(λ 1 (x))γ t1 + E(λ 0 (x))γ t0 odds(y (x) = 1 t, x SPP) = m 1 (x)γ t1 /(m 0 (x)γ t0 ) odds ratio = γ 11 γ 00 /(γ 01 γ 10 ) = τ Conclusion: No attenuation of treatment effect.

21 ... Problem: assess the effectiveness of treatment for scour in lambs Randomized trial: flock of 100 ewes due to produce lambs in Feb-March 10% zero; 40% single; 50% twins 140 lambs Randomize assignment of treatment (t = 0 or t = 1) to lambs randomization protocol... Response Y = 1 if lamb suffers from 6 weeks Lambs are the experimental units, but... Ewe = litter = cluster: (use index i for ewe, l for lamb) Responses Y il, Y il in same cluster are correlated

22 Correlation in clusters Standard GLMM model for correlation in clusters Introduce an iid cluster-specific random effect {η i }. eη i +α+βt il pr(y il = 1 η, t) = 1 + e η i +α+βt il Assume Y -values conditionally independent given η. Hence, the model distributions are e η i +α+βt il pr(y il = 1; t) = E η 1 + e η eα +β til i +α+βt il 1 + e α +β t il p t (y) = ( e η i +α+βt il ) E η 1 + e η i +α+βt il i l Use GLMM software to estimate β or β... or Bayesian software to obtain posterior distribution Does this make sense in the context?

23 Simple model: correct analysis Response distn w/o treatment for ewe i given λ w.p. λ (0) y w.p. λ (1) y (y {0, 1}) (y 1, y 2 ) w.p. λ (2) y 1,y 2 (y 1, y 2 {0, 1}) Distribution of #lambs: iid E(λ (0), λ (1)., λ (2).. ) Joint response distn incl treatment: (y, t) w.p. λ (1) y γ ty (y, t {0, 1}) (y 1, y 2 ) w.p. λ (2) y 1,y 2 γ t1 y 1 γ t2 y 2 such that γ.y = 1. (γ 11 γ 00 /(γ 10 γ 01 ) = e β ) constant

24 Simple model: correct analysis contd. pr(m i = 1, Y i = y, t i = t λ) = λ (1) y γ ty pr(m i = 1, Y i = y, t i = t) = E(λ (1) y )γ ty y, t {0, 1} odds(y i = 1 t i = 1) odds(y i = 1 t i = 0) = E(λ(1) 1 )E(λ(1) 0 ) γ 11γ 00 E(λ (1) 1 )E(λ(1) 0 ) γ = e β 10γ 01 (t i = 1 implies m i = 1): Compare with incorrect analysis pr(y il = 1 λ, t il = t) = λ (1) 1 γ t1 λ (1) 1 γ t1 + λ (1) 0 γ t0 log odds(y il = 1 t il = 1) log odds(y il = 1 t il = 0) β with β < β, giving the illusion of attenuation. = eη i +βt 1 + e η i +βt (GL

25 Attenuation: π(t) = eα+βt e η+α+βt 1 + e α+βt ρ(t) = E η 1 + e η+α+βt eα +β t 1 + e α +β t Estimating functions (sum over lambs) S(t, y) = lambs Y l π(t l ) S (t, y) = lambs Y l ρ(t l ) (GLMM, PA) Mean and variance of S and S E(S(t, y)) = 0; E(S (t, y)) 0 var(s(y, y)) =?

26 Notation: meaning of E(Y i X i = x) Exchangeable sequence (Y 1, X 1 ), (Y 2, X 2 ),... with binary Y implies conditionally iid given λ Stratum x: U x = {i : X i = x} an infinite random subsequence Stratum average: ave{y i : i U x } = λ 1 (x)/λ. (x) Stratum mean = expected value of stratum average: ( ) λ1 (x) ρ(x) = E eα +β x λ. (x) 1 + e α +β x is declared target in much biostatistical work (PA) Correct calculation for a random stratum in SPP: ( λ1 (x) ) π(x) = E x SPP = E(λ 1(x)) λ. (x) E(λ. (x)) = eα+βx 1 + e α+βx Conditional mean π(x) = E(Y i X i = x) versus stratum mean ρ(x) = E(Y i i : X i = x)

27 Consequences of ambiguous notation Sample (Y 1, X 1 ), (Y 2, X 2 ),... observed sequentially ( λ1 (x) ) π(x) = E x SPP = eα+βx λ. (x) 1 + e α+βx = E(Y i X i = x) ( ) λ1 (x) ρ(x) = E eα +β x λ. (x) 1 + e α +β x = E(Y i i : X i = x) (ρ(x) computed by logistic-normal integral) Conventional PA estimating function Y i ρ(x i ) is such that and same for Y 2,.... E(Y 1 ρ(x) X 1 = x) = π(x) ρ(x) 0

28 Estimating functions done correctly Mean intensity for class r: m r (x) = E(λ r (x)) π r (x) = m r (x)/m. (x); ρ r (x) = E(λ r (x)/λ. (x)) E(Y i ) = ρ(x) for each i in U x = {u : X u = x} For autogenerated x, E(Y x SPP) = π(x) ρ(x) T (x, y) = x SPP h(x)(y (x) π(x)) has zero mean for auto-generated configurations x. Note: E(T x) 0; average is also over configurations

29 Explanation of unbiasedness z = {(x 1, y 1 ),...} configuration generated in (0, t). x = {x 1,..., } marginal configuration (SPP) z is a random measure with mean t m r (x) ν(dx) at (r, x) x is marginal random measure with mean t m. (x) ν(dx) at x π r (x) = m r (x)/m. (x) Hence E(z(r, dx)) = π r (x)e(x(dx)) for all (r, x) implies T (x, y) = = x SPP X h(x)(y (x) π(x)) h(x) has zero expectation. But E(T x) 0. ( ) z(r, dx) π r (x)x(dx)

30 variance calculation: binary case (y, x) generated by point process; T (x, y) = h(x)(y (x) π(x)) x SPP E(T (x, y)) = 0; E(T x) 0 var(t ) = h 2 (x)π(x)(1 π(x)) m. (x) dx X + h(x)h(x ) V (x, x ) m.. (x, x ) dx dx X 2 + h(x)h(x ) 2 (x, x )m.. (x, x ) dx dx X 2 V : spatial or within-cluster correlation; : interference

31 What is interference? Physical/biological interference: distribution of Y (u) depends on x(u ) Sampling interference for autogenerated units m r (x) = E(λ r (x)); m rs (x, x ) = E(λ r (x)λ s (x )) Univariate distributions: π r (x) = m r (x)/m. (x) = pr(y (x) = r x SPP) Bivariate distributions: π rs (x, x ) = m rs (x, x )/m.. (x, x ) π rs (x, x ) = pr(y (x) = r, Y (x ) = s x, x SPP) Hence π r. (x, x ) = pr(y (x) = r x, x SPP) r (x, x ) = π r. (x, x ) π r (x) No second-order sampling interference if r (x, x ) = 0

32 Inference: Conventional Gaussian model Model p x (A) = N n (Xβ, Σ x = σ0 2 I n + K [x])(a) Rationalization Y (i) = x i i + η(x(i)) Stratum average: Ȳ (U x) = ave{y i i : x i = x} = x β + η(x) Conditional distribution of Y u for u U x given observation y, X Y u data N(x β + k Σ 1 x (y Xβ), Σ uu k Σ 1 x k) Ȳ (U x ) data N(x β + k Σ 1 x (y Xβ), K (x, x) k Σ 1 x k) k i = K (x, x i ), (such as e x x i or x x i 3 ) Stratum average is a random variable, not a parameter Estimate is a distribution (not a function of sufficient statistic) Likewise for GLMMs

33 Inference and predicton for the PP model For a sequential sample Observation (x, y) (x (0),..., x (k 1) ) Product density m r (x (r) ) = E( x x (r) λ r (x)) for class r Conditional distribution as a random labelled partition of x: p(y x) m 0 (x (0) ) m k 1 (x (k 1) ) For a subsequent autogenerated event p(y (x ) = r data, x SPP) m r (x (r), x )/m r (x (r) ) Use likelihood or estimating function to estimate parameters Use conditional distribution for inference/prediction

34 Brief summary of conclusions (i) Reasonable case for fixed-population model in certain areas laboratory work; field trials; veterinary trials;... (ii) Good case for autogenerated units in other areas clinical trials; marketing; crime; animal behaviour (iii) The choice matters in random-effects models (iv) π(x) = E(Y i X i = x) versus ρ(x) = E(Y i i : X i = x) conditional distribution versus stratum distribution attenuation or non-attenuation (v) What is modelled and estimated by PA? claims to estimate ρ(x) but actually estimates π(x) (vi) What does the GLMM likelihood estimate? Difficult to say; probably neither

Regression models, selection effects and sampling bias

Regression models, selection effects and sampling bias Peter McCullagh Department of Statistics University of Chicago Hotelling Lecture I UNC Chapel Hill Dec 1 2008 www.stat.uchicago.edu/~pmcc/reports/bias.pdf