Prediction of New Observations

Size: px

Start display at page:

Download "Prediction of New Observations"

Franklin Norman
5 years ago
Views:

1 Statistic Seminar: 6 th talk ETHZ FS2010 Prediction of New Observations Martina Albers 12. April 2010 Papers: Welham (2004), Yiang (2007) 1

2 Content Introduction Prediction of Mixed Effects Prediction of Future Observation Principles of Prediction Prediction Process Fixed and Random Terms Example: Split-Plot Design 2

3 Linear mixed model y = X β + Zb + ε = W γ + ε y n 1 vector of observations X n p matrix associating observations with the appropriate combination of fixed effects β p 1 vector of fixed effects Z n q matrix associating observations with the app. comb. of random effects b q 1 vector of random effects ε n 1 vector of residual errors W, γ combined design matrix resp. vector of effects Introduction 3

4 Linear mixed model y = X β + Zb + ε = W τ + ε b ε Assumption ( ) ( 2 Y B = b ~ N Xβ + Zb, σ ) 1 I n ( Σ ) B ~ N 0, θ where Σ θ = cov( B) q q, symmetric 2 = σ Λ Λ θ θ Use: B =Λ U ( ) ( 2 θ Y U = u ~ N Xβ + ZΛ ) θu, σ1 In ( 2 U ~ N 0, σ ) 1 I 2 q U ~ N 0, σ I ( ) 1 q Introduction 4

5 Compute û and Linear mixed model β ˆθ uˆ y ZΛθ X X u arg min ˆ = β u, β θ 0 I 0 β 2 Λ I u ˆ θz ZΛθ + q Λ θ Z X Λ θz y = ˆ θ β y X YΛ X X θ X Lθ 0 L θ RZX 0 RZX RX RX L R θ ZX = Λ Z X θ R R = X X R R X X ZX ZX Introduction 5

6 What do we mean by prediction? Estimation of effects in the model prediction = linear combination of estimated effects Marginal vs. Conditional predictions What is needed? Marginal vs. Conditional predictions In Example: variety or nitrogen prediction? Introduction 6

7 Is there a general strategy? Problem: Each situation needs to be analyzed by hand! Questions that might arise: In- or exclusion of random model terms from the prediction? Different weighting schemes? etc. Introduction 7

8 Predictions 2 types of prediction: Prediction of mixed effects Prediction of a future observation Introduction 8

9 Linear mixed model y = X β + Zb + ε = W γ + ε y n 1 vector of observations X n p matrix associating observations with the appropriate combination of fixed effects β p 1 vector of fixed effects Z n q matrix associating observations with the app. comb. of random effects b q 1 vector of random effects ε n 1 vector of residual errors W, γ combined design matrix resp. vector of effects Prediction of Mixed effects 9

10 Prediction of Mixed Effects all parameters are known, i.e. fixed effects + variance components are known Consider : γ = x β + z b = x β + ξ x,z : known vectors β, b : vector of fixed/random effects Best predictor for ξ (as MSE): ˆ ξ = E ξ y = z E b y Prediction of Mixed Effects 10

11 Assumption b 0 cov( ) cov( ) ~ N b b Z, y X β cov( b ) Z V R = cov( ε), V = cov( y) = Z cov( b) Z + R 1 E b y = var( b ) Z ' V y X β ˆ 1 ξ = x' var( b) Z ' V y Xβ ( ) ( ) Best linear predictor of γ is then ( y ) 1 ˆ γ = x' β + z' cov( b) Z ' V Xβ Prediction of Mixed Effects 11

12 Example: IQ-Test IQ ~ Estimate the true IQ of a student scoring 130 in a test model: N(100,15 2 ) 2 score IQ ~ N(IQ,5 ) y = µ + b + ε, y student test score, b realization of a random effect Predict: µ + b student's true IQ (unobservable) Result: IQ = 127 Prediction of Mixed Effects 12

13 Prediction of Mixed Effects Fixed effects + variance components unknown 20 Students each student: 5 tests Computations as described before See R-File: rf_prediction3.r Prediction of Mixed Effects 13

14 Prediction of Future Observations AIM: construct prediction intervals, i.e. an interval in which future observation will fall with a certain probability given what has been observed. Examples: 1. Longitudinal studies Prediction of a fut. obs. from an individual not previously observed Less interest to predict another observation from an observed individual as the studies often aim at applications to a larger population E.g.: drugs going to the market after clinical trials 2. Surveys 2-step survey: A) number of families randomly selected B) some family members of each family are interviewed Prediction for a non-selected family Prediction of Future Observations 14

15 Prediction Intervals a. Assumption: fut. obs. has certain distribution Distribution defined up to a finite number of unknown parameters Estimate parameter obtain prediction interval BUT: if distribution assumption fails, the interval might be wrong b. Distribution-free Normality is not assumed Distribution-free approach Assumption: future observation is independent of current ones c. Markov-Chains, Montecarlo d. et cetera Prediction of Future Observations 15

16 Confidence vs. Prediction Intervals Confidence Interval (CI) Interval estimate of population parameter unobservable population parameter predict distribution of estimate of unobservable quantity of interest (e.g. true pop. mean) Prediction Interval (PI) Interval estimate of future observation future observation Predict distribution of individual future points Prediction of Future Observations 16

17 Confidence vs. Prediction Intervals (math.) Fixed effect model y = Xβ + ε, ε ~ N ˆ β = var var ( X ' X ) 1 X ' y ( 2 0, σ ) ( ˆ) 2 1 β = σ ( X ' X ) ( ˆ ) 2 1 β β = σ ( X ' X ) true y i = x iβ observed yi = x iβ + ε CI Interval: ˆ ˆ y ± 1.96 var modelled yˆ i = x iβ PI predicted yˆ ˆ n+ 1 = x n+ 1β Normal Approximation Prediction of Future Observations 17

18 Confidence vs. Prediction Intervals (math.) 2 ( ˆ ) = σ ( ) 1 CI: var y x X ' X x i i i ( 1 ) 2 ( ˆ ) σ ( ) PI: var y y = x X ' X x + 1 n+ 1 n+ 1 n+ 1 n+ 1 y ˆ x X X x 2 Confidence Interval: ± 1.96 σ i ' ˆ ( ) 1 i ( 1 n+ 1 ( ) n+ 1 ) 2 Prediction Interval: y ± 1.96 σ x X ' X x + 1 Prediction of Future Observations 18

19 Confidence vs. Prediction Intervals How do we construct Prediction Intervals for a more general model? Mixed effects model: y = X β + Zb + ε Prediction of Future Observations 19

20 Model: n Observations V cov Prediction Intervals y = X β + Zb + ε, b ε = ZΣ Z' + cov( ε ) θ Estimate and b : ( ) ( ) b ~ N 0, Σ q ( ) θ ( 2 σ I ) ε ~ N 0, n b = Σθ = σ I cov ε = σ I β ( β = X' V X) n ˆ ˆ X' Vˆ y bɶ = ˆ σ Z ' Vˆ ( y X ˆ β ) ˆ ˆ ˆ 2 2 where V = σ1 ZZ ' + σ In Prediction of Future Observations 20

21 Prediction Intervals cov Marginal: ( ˆ ) = ( ˆ β ) = ( ˆ β ) cov y cov x x cov x i i i i ( ˆ β β ε ) yˆ y = cov x x + z b + = ( ) ( ) n+ 1 n+ 1 n+ 1 n+ 1 n+ 1 ( ) 2 = x cov ˆ β β x + z Σ z + σ I Conditional (on all random effects): 2 n+ 1 n+ 1 n+ 1 θ n+ 1 n ( ) ( 1 ( ) ) 2 yˆ n+ 1 yn+ 1 b = bɶ = σ x n+ 1 X X x n cov 1 Prediction of Future Observations 21

22 Prediction Intervals Marginal: ( ˆ β ) Confidence Interval: yˆ ± 1.96 x cov x i ( ( ) ) 2 ˆ n 1 β β + n+ 1 n+ 1Σθ n+ 1 σ In Prediction Interval: yˆ ± 1.96 x cov x + z z + i Conditional on all random effects: ˆ ( 1 x ( ) ) n 1 X + n+ 1 2 Prediction Interval: y ± 1.96 σ X x + 1 Prediction of Future Observations 22

23 Prediction Intervals There is a difference between marginal and conditional predictions! Which one is of interest? Prediction of Future Observations 23

24 The prediction process Prediction: is a linear function of the best linear (unbiased) predictor of random effects with the best linear (unbiased) estimator of fixed effects in the model is typically associated with a combination of explanatory variables either averaged over, ignoring, or at a specific value of other explanatory variables in the model Principles of Prediction 24

25 The prediction process Partition of the explanatory variables (e.v.) into 3 sets: 1. Classifying set e.v. for which predicted values are required 2. Averaging set e.v. which have to have averaged over 3. Rest e.v. which will be ignored Principles of Prediction 25

26 The role of fixed and random effects with respect to prediction Fixed Terms have associated set of effects (parameters) which have to be estimated Random Terms associated effects are normally distributed with 0 mean and co-variance matrix co-variance matrix is function of (usually) unknown parameters error terms due to randomization or other structure of the data Principles of Prediction 26

27 How to deal with Random Factor Terms 1. Evaluate at a given value(s) specified by user 2. Average over the set of random effects Prediction specific to / conditional on the random effects observed Conditional prediction w.r.t. the term 3. Omit the random term from the model Prediction at the population average (zero) substitutes the assumed pop. mean for an unknown random effect Marginal prediction w.r.t. the term Principles of Prediction 27

28 How to deal with Fixed Factors no pre-defined population average no natural interpretation for a prediction derived by omitting a fixed term from the fitted values average over all the present levels to give a conditional average or: user should specify the value(s) Principles of Prediction 28

29 4 conceptual steps for the prediction process 1. Choose e.v. and their respective values for which predictive margins are required, i.e. determine the classifying set 2. Determine which variables should be averaged over, i.e. determine the averaging set 3. Determine terms that are needed to compute parameters and estimations 4. Choose the weighting for taking means over margins (for the averaging set) Principles of Prediction 29

30 Experiment: 4 levels of nitrogen 3 oat varieties 6 tries, i.e. 6 blocks 4 subplots 3 whole-plots random allocation of nitrogen within a block Split-Plot Design Fixed effects: treatment combination Random effects: blocking factors (source of error variation) AIM: estimate the performance of each treatment combination within the experiment Split-Plot Design 30

31 The Data-Set Split-Plot Design 31

32 The model: components fixed random residual ~ constant + variety + nitrogen + variety : nitrogen ~ blocks + blocks : wplots ~ blocks : wplots : splots Random terms: error terms used in estimation of treatment effects Not otherwise relevant to the prediction of treatment effects Split-Plot Design 32

33 Conditional vs. Marginal prediction for the random effects Conditional prediction: gives a prediction specific to the blocks and plots used in the experiment appropriate to inference for the specific instance that occurred in the dataset Marginal prediction: the prediction corresponds to the yields expected from a similar experiment laid out using different blocks and plots appropriate when inference is required for members of the wider population Split-Plot Design 33

34 n s v r s y ijk b i ijk i The model y = µ + b + v + w + n + vn + e r ( ij ) effect of variety r( ij) ( ij) randomizat ion of r( ij) ij s( ijk ) yield on block i = 1,...,6, whole - plot j = 1,..3, sub - plot µ overall constant w ( ijk ) ( ijk ) vn e ij rs ijk effect of block varieties effect of whole - plot j in block i effect of nitrogen level s randomizat ion of nitrogen levels to sub - plots interactio n of variety level r with nitrogen level s residual error for block i, to whole - plots rs whole - plot j, sub - plot k Split-Plot Design ijk k = 1,...,4 34

35 ijk i Assumption y = µ + b + v + w + n + vn + e r( ij) ij s( ijk ) rs ijk For the following terms we assume a normal distribution ( b, b,..., b ), w= ( w, w,..., w ) and e ( e, e e ) b = = ,..., 634 b w e ~ N 0 0, 0 σ 2 b 0 0 I 6 σ 0 2 w 0 I 18 σ I 72 Split-Plot Design 35

36 y = µ + b + v + w + n + vn + e ijk i r( ij) ij s( ijk ) rs ijk ANOVA no variety nitrogen interaction usually: drop non-significant terms Split-Plot Design 36

37 ijk Prediction Process y = µ + b + v + w + n + vn + e i r( ij) ij s( ijk ) rs ijk Prediction of yield for each nitrogen level = general effect of different nitrogen applications unweighted average across all varieties = prediction of yield for nitrogen level l for average block+whole-plot marginal prediction wrt block+whole-plot Calculation: ignore random terms: 3 1 ˆ µ + nˆ ( ˆ ) l + v j + vn jl 3 j= 1 Split-Plot Design 37

38 ijk Prediction Process y = µ + b + v + w + n + vn + e i r( ij) ij s( ijk ) rs ijk Prediction specific to blocks+whole-plots in experiment conditional prediction Calculation: include random terms: ˆ µ + b ɶ ˆ ( ˆ ) i + nl + wɶ ij + v j + vn jl i= 1 i= 1 j= 1 j= 1 Split-Plot Design 38

39 Prediction Process Explanatory variable Set Levels Averaging M C M C M C Variety a a all all e e Nitrogen c c all all n n Blocks x a - all - e Wplots x a - all - e splots x x M : marginal pred. C : conditional pred. a : averaging set c : classifying set x : excluded e : equal weights n : none Split-Plot Design 39

40 Prediction Process Model term In prediction? Constant + + Variety + + Nitrogen + + Variety:nitrogen + + Blocks x + Blocks:wplots x + Blocks:wplots:splots x x M + : used x : ignored C Split-Plot Design 40

41 The resulting predictions Predictions for nitrogen application levels with SE and SED, using marginal (M) or conditional (C) values of blocks+whole-plots Nitrogen application Prediction SE 0.0 cwt/acre cwt/acre cwt/acre cwt/acre SED M C Split-Plot Design 41

42 The resulting predictions variety nitrogen predictions Variety Nitrogen application (cwt/acre) Margin Golden Rain Marvellous Victory Margin Split-Plot Design 42

43 The resulting predictions SE smaller for conditional predictions because predictions are calculated conditional on the blocks+whole-plots observed! Using marginal values = no information on block+whole-plot effect Split-Plot Design 43

44 Special case: data missing Data for all replicates of Golden Rain with 0 cwt nitrogen are missing Split-Plot Design 44

45 Special case: data missing Variety Nitrogen application (cwt/acre) Margin Golden Rain??? Marvellous Victory Margin Corresponding cell cannot be estimated without additional assumptions! Split-Plot Design 45

46 Special case: data missing No significant variety main effect/ interactions present in the model Approach chosen has no great influence on nitrogen prediction consider variety predictions Possible approaches set inestimable parameters to 0, average over all cells average over cells with data present average over levels of nitrogen for which all varieties are present Split-Plot Design 46

47 The resulting predictions Variety predictions with Golden Rain + 0 cwt nitrogen plots set to missing value Margin Variety Inestimable parameters zero For data present On nitrogen levels Golden Rain Marvellous Victory Complete data Split-Plot Design 47

48 Variety The resulting predictions Inestimable parameters zero For data present On nitrogen levels Golden Rain Marvellous Victory In 2 nd case: variety ordering has changed prediction not comparable because of large nitrogen effect Split-Plot Design 48

49 Comparison Data complete Variety Margin Golden Rain Marvellous Victory Margin Data missing Variety Inest. param. zero For data present On nitrogen levels Golden Rain Marvellous Victory Split-Plot Design 49

50 Other special cases: data missing First entry in the data is missing (i.e. Victory, 0.0 cwt/acre, Block I) Data in Block I for all replicates with 0.0 cwt nitrogen are missing Split-Plot Design 50

51 Other special cases: data missing All Data for replicates with 0.0 cwt nitrogen are missing Split-Plot Design 51

52 Making the computations R-FILE! Split-Plot Design 52

Inference for the Regression Coefficient

Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates