Spatio-Temporal Latent Variable Models: A Potential Waste of Space and Time?

Size: px

Start display at page:

Download "Spatio-Temporal Latent Variable Models: A Potential Waste of Space and Time?"

Patrick Fitzgerald
5 years ago
Views:

1 Spatio-Temporal Latent Variable Models: A Potential Waste of Space and Time? Francis K.C. Hui (Australian National University) Nicole Hill (Institute of Marine and Antarctic Studies) A.H. Welsh (Australian National University) Talk Outline: SO-CPR survey Spatio-temporal LVMs Estimation under misspecification Some simulations JSM 2018 Some images courtesy of Google images 1

Take home messages When fitting spatio-temporal LVMs, if you misspecify and assume latent variables are independent across sites, then: Inference on the regression

2 Take home messages When fitting spatio-temporal LVMs, if you misspecify and assume latent variables are independent across sites, then: Inference on the regression coefficients remains relatively robust, particularly for Gaussian responses Inference on the loadings and latent variable predictions is badly off You save time! 2

3 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species Goal is to identify environmental factors driving species assemblage over time 3

4 Spatio-temporal LVMs Data: 4

5 Spatio-temporal LVMs Data: Model: 5

6 Spatio-temporal LVMs Likelihood: 6

7 Spatio-temporal LVMs Likelihood: In community ecology: GLLVMs and space-time extensions gaining traction in ecology e.g., Warton et al., (2015) > 100 citations; Ovaskainen et al., (2017) > 50 citations 7

8 Spatio-temporal LVMs What if we assume independence? 8

9 Spatio-temporal LVMs What if we assume independence? Pro: Save time; lower-d and simpler integral Con: Model misspecification; two species are correlated at any particular site-time combination, but otherwise not! Can we get away with assuming independence for some forms of inference? 9

10 Some Sims: Design Interested in regression coefficients, loadings, LV predictions T=1 i.e., purely spatial LVMs. Results expected to carry over to T > 1 (expanding domain?) True model is an spatial LVM with d = 3 latent variables. Each latent variable is spatially correlated through an exponential correlation function Data generated on a square grid: n = 7 x 7; 10 x 10; 14 x 14; 22 x 22 (expanding domain) 10

11 Some Sims: Design Interested in regression coefficients, loadings, LV predictions T=1 i.e., purely spatial LVMs. Results expected to carry over to T > 1 (expanding domain?) True model is an spatial LVM with d = 3 latent variables. Each latent variable is spatially correlated through an exponential correlation function Data generated on a square grid: n = 7 x 7; 10 x 10; 14 x 14; 22 x 22 (expanding domain) Compare true and misspecified LVMs MLE done using Laplace approximation via Template Model Builder (TMB, Kristensen et al., 2015) Standard information matrix for true LVM; sandwich information matrix for misspecified LVM 11

12 Some Sims: Normal Responses (most common response type/assumption outside of ecology) Bias of regression coefficients Not much difference between true and misspecified LVMs Bias tends to zero irrespective of coefficient value and norm of loading 12

13 Some Sims: Normal Responses Bias of regression coefficients Not much difference between true and misspecified LVMs Bias tends to zero irrespective of coefficient value and norm of loading Root MSE of regression coefficients Misspecified LVMs have slightly higher RMSE RMSE tends to zero irrespective of coefficient value and norm of loading 13

14 Some Sims: Normal Responses Bias of regression coefficients Not much difference between true and misspecified LVMs Bias tends to zero irrespective of coefficient value and norm of loading Root MSE of regression coefficients Misspecified LVMs have slightly higher RMSE RMSE tends to zero irrespective of coefficient value and norm of loading Coverage probability of regression coefficients Not much difference between true and misspecified LVMs Tends to nominal level irrespective of coefficient value and norm of loading 14

15 Some Sims: Normal Responses Bias of loadings Bias larger for misspecified LVMs particularly at large n and loading values Bias for both LVMs is smaller for loadings close to zero 15

16 Some Sims: Normal Responses Bias of loadings Bias larger for misspecified LVMs particularly at large n and loading values Bias for both LVMs is smaller for loadings close to zero Root MSE of loadings RMSE larger for misspecified LVMs particularly at large n and loading values RMSE for both LVMs is small for loadings close to zero 16

17 Some Sims: Normal Responses Bias of loadings Bias larger for misspecified LVMs particularly at large n and loading values Bias for both LVMs is smaller for loadings close to zero Root MSE of loadings RMSE larger for misspecified LVMs particularly at large n and loading values RMSE for both LVMs is small for loadings close to zero Coverage probability of loadings Misspecified LVMs suffer major undercoverage at large n and larger loading values (but OK for truly zero loadings) Both true and misspecified LVMs undercover at small n 17

18 Some Sims: Normal Responses Prediction of latent variables Misspecified LVMs substantially higher Procrustes error; both tend to zero with large n 18

19 Some Sims: Normal Responses Computation times Misspecified LVMs + sandwich information are much faster to fit and scale better than n 19

20 Some Sims: Binary Responses (presence absence data in ecology; least information inherent in data) Bias of regression coefficients Not much difference between true and misspecified LVMs Bias very close to zero for small betas Evidence of an effect of norm of loadings 20

21 Some Sims: Binary Responses Bias of regression coefficients Not much difference between true and misspecified LVMs Bias very close to zero for small betas Evidence of an effect of norm of loadings Root MSE of regression coefficients Misspecified LVMs tend to have higher RMSE at larger coefficient values and norm of loadings RMSE very close to zero (but more variable) for small betas Clear effect of norm of loadings 21

22 Some Sims: Binary Responses Bias of regression coefficients Not much difference between true and misspecified LVMs Bias very close to zero for small betas Evidence of an effect of norm of loadings Root MSE of regression coefficients Misspecified LVMs tend to have higher RMSE at larger coefficient values and norm of loadings RMSE very close to zero (but more variable) for small betas Clear effect of norm of loadings Coverage probability of regression coefficients At small n, misspecified LVMs overcover while true LVMs undercover At large n, misspecified LVMs undercover at larger coefficient values and norm of loadings Clear effect of norm of loadings Coverage very close to nominal level (but more variable) for small betas 22

23 Some Sims: Binary Responses Results for loadings and latent variables predictions in the case of binary spatio-temporal LVMs are similar to the normal response case. Under misspecification: Point estimation is badly biased Severe undercoverage for sandwich CIs Much high Procrustes errors for predictions of LVs. Computation times are again much faster under misspecification 23

24 Take home messages When fitting spatio-temporal LVMs, if you misspecify and assume latent variables are independent across sites, then: Inference on the regression coefficients remains relatively robust, particularly for Gaussian responses Inference on the loadings and latent variable predictions is badly off You save time! 24

25 Discussion There is some theory but I didn t have time to discuss it e.g., full consistency of coefficients for normal responses; zero consistency of coefficients for non-normal responses; zero consistency of loadings. If your goal is variable selection in LVMs, then misspecifying and assuming independence does not hurt you very much...how much efficiency do you lose versus how much can you get away with? Thanks for listening! 25

26 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 26

27 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 27

28 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 28

29 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 29

30 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 30

31 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 31

32 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 32

33 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 33

34 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 34

35 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 35

36 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 36

37 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 37

38 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 38

39 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 39

40 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 40

41 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 41

42 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 42

43 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 43

44 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 44

45 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 45

46 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 46

47 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 47

48 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 48

49 Southern Ocean Continuous Plankton Recorder Survey SO-CPR Survey: Running annually since 1991 Vessels of opportunity traversing the Southern ocean Presence-absence of around 100 zooplankton species 49

50 Southern Ocean Continuous Plankton Recorder Survey Every site visited once Expanding domain? Infill? Hybrid? Neither? 50

51 Estimation Under Misspecification What asymptotic framework should be work within? A pragmatic regularity condition: Implies MLEs of true spatio-temporal LVM are consistent Focus not on how to obtain consistency, but what happens under misspecification supposing you have consistency to start with. 51

52 Estimation Under Misspecification Full consistency for coefficients when responses are normal: Solving unbiased generalized least squares equation 52

53 Estimation Under Misspecification Full consistency for coefficients when responses are normal: Solving unbiased generalized least squares equation Zero consistency for coefficients in general: For covariates that are uninformative for all species, the misspecified LVM will consistently estimate these zeros. 53

Estimation Under Misspecification Full consistency for coefficients when responses are normal: Solving unbiased generalized least squares equation Zero consistency for coefficients in general: For

54 Estimation Under Misspecification Full consistency for coefficients when responses are normal: Solving unbiased generalized least squares equation Zero consistency for coefficients in general: For covariates that are uninformative for all species, the misspecified LVM will consistently estimate these zeros. Weak: says nothing about fully/partly informative predictors Very strict assumption of independence between the truly informative and uninformative covariates Same conditions made for studying misspecification in generalized linear mixed models e.g., Neuhaus, McCulloch, and Boylan (2013) Analogous to partial orthogonality condition assumed in high-d variable selection e.g., Huang, Horowitz, and Ma (2008); Fan and Song (2010) 54

55 Estimation Under Misspecification Zero consistency for loadings: If a species is uncorrelated with anything else, then assuming independence does no harm Not a very realistic assumption in ecology, but possible in other settings e.g., social sciences 55

56 Some Sims: Normal Responses Bias of regression coefficients Not much difference between true and misspecified LVMs Bias tends to zero irrespective of coefficient value and norm of loading Root MSE of regression coefficients Misspecified LVMs have slightly higher RMSE RMSE tends to zero irrespective of coefficient value and norm of loading Coverage probability of regression coefficients Not much difference between true and misspecified LVMs Tends to nominal level irrespective of coefficient value and norm of loading CI width of regression coefficients Sandwidth CIs from misspecified LVMs are wider 56

57 Some Sims: Binary Responses Bias of regression coefficients Not much difference between true and misspecified LVMs; biases very close to zero for small betas (zero consistency for coefficients) Evidence of an effect of norm of loadings Root MSE of regression coefficients Misspecified LVMs tend to have higher RMSE at larger coefficient values and norm of loadings Clear effect of norm of loadings RMSE very close to zero (but more variable) for small betas (zero consistency for coefficients); Coverage probability of regression coefficients Misspecified LVMs overcover while true LVMs undercover at small n; at large n, misspecified LVMs undercover at larger coefficient values and norm of loadings Clear effect of norm of loadings Coverage very close to nominal level (but more variable) for small betas CI width of regression coefficients Sandwich CIs from misspecified LVMs are huge at small n! Differences true and misspecified LVMs are small at larger n 57

58 Some Sims: Binary Responses Bias of loadings Misspecified LVMs substantially more biased at large n and large loading values (zero consistency for loadings) 58

59 Some Sims: Binary Responses Bias of loadings Misspecified LVMs substantially more biased at large n and large loading values (zero consistency for loadings) Root MSE of loadings True and misspecified LVMs perform similarly and poorly at small n, but at large n misspecified LVMs have substantially more variability RMSE for misspecified LVMs are close to zero (but more variable) for loadings close to zero (zero consistency for loadings) 59

60 Some Sims: Binary Responses Bias of loadings Misspecified LVMs substantially more biased at large n and large loading values (zero consistency for loadings) Root MSE of loadings True and misspecified LVMs perform similarly and poorly at small n, but at large n misspecified LVMs have substantially more variability RMSE for misspecified LVMs are close to zero (but more variable) for loadings close to zero (zero consistency for loadings) Coverage probability of loadings Misspecified LVMs suffer major undercoverage at large n and larger loading values (but OK for truly zero loadings) True LVMs undercover at smaller n Results for interval width [not presented] show sandwich CIs for misspecified LVMs are wider than standard CIs for true LVMs, sometimes considerably so. 60

61 Some Sims: Binary Responses Prediction of latent variables Misspecified LVMs substantially higher Procrustes error than true LVMs; especially at large n 61

62 Some Sims: Binary Responses Computation times Misspecified LVMs + sandwich information are much faster to fit and scale better than n 62

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San