Assessing small sample bias in coordinate based meta-analyses for fmri

Size: px

Start display at page:

Download "Assessing small sample bias in coordinate based meta-analyses for fmri"

Cathleen Simon
6 years ago
Views:

1 Assessing small sample bias in coordinate based meta-analyses for fmri F Acar, R Seurinck & B Moerkerke IBS Channel Network Conference, Hasselt 25 April 2017

2 What is fmri?

3 fmri Meta-analysis Small sample bias Discussion What is fmri

4 fmri Meta-analysis Small sample bias Discussion What is fmri Brain y x

5 fmri Meta-analysis Small sample bias Discussion What is fmri Brain Divided into > voxels y y x x

6 fmri Meta-analysis Small sample bias Discussion What is fmri Brain Divided into > voxels BOLD-response is measured y y y x x x

7 fmri Meta-analysis Small sample bias Discussion Publication bias Illustrated result Thresholding methods Thresholding: In which voxels is activation larger than can be expected by chance? Correct for multiple testing problem Different methods Uncorrected threshold False Discovery Rate Random-Field Theory (FWE)

fmri Meta-analysis Small sample bias Publication bias Discussion Illustrated result Thresholding methods Thresholding: In which voxels is activation larger

8 fmri Meta-analysis Small sample bias Publication bias Discussion Illustrated result Thresholding methods Thresholding: In which voxels is activation larger than can be expected by chance? Correct for multiple testing problem Different methods Uncorrected threshold False Discovery Rate Random-Field Theory (FWE) Results

9 fmri Meta-analysis Small sample bias Discussion A problem of power and reproducibility Small sample sizes: Median of n = 15 (Carp, 2012) Causes more false positives (FPs) and false negatives (FNs) (Button et al, 2013) Experiments are very expensive Statistical test in over 100,000 voxels: Multiple testing problem Explosion of FPs Multiple comparisons corrections (FDR, FWER, RFT, ) Increase in specificity, dramatic decrease in sensitivity Studies with small sample sizes tend to employ more lenient thresholds => small sample bias

10 fmri Meta-analysis Small sample bias Discussion A problem of power and reproducibility Small sample sizes: Median of n = 15 (Carp, 2012) Causes more false positives (FPs) and false negatives (FNs) (Button et al, 2013) Experiments are very expensive Statistical test in over 100,000 voxels: Multiple testing problem Explosion of FPs Multiple comparisons corrections (FDR, FWER, RFT, ) Increase in specificity, dramatic decrease in sensitivity Studies with small sample sizes tend to employ more lenient thresholds => small sample bias

11 fmri Meta-analysis Small sample bias Discussion A problem of power and reproducibility Small sample sizes: Median of n = 15 (Carp, 2012) Causes more false positives (FPs) and false negatives (FNs) (Button et al, 2013) Experiments are very expensive Statistical test in over 100,000 voxels: Multiple testing problem Explosion of FPs Multiple comparisons corrections (FDR, FWER, RFT, ) Increase in specificity, dramatic decrease in sensitivity Studies with small sample sizes tend to employ more lenient thresholds => small sample bias

12 fmri Meta-analysis Small sample bias Discussion Why Publications with keyword 'fmri' in Web of Science 5000 Low power? Increase N! Would it not be awesome if we could re-use existing research? Yearly > 5000 publications using fmri Yes! Meta-analysis Statistical tool Combine results of multiple studies Aim: derive pooled estimates to approach truth in population Count Year

13 fmri Meta-analysis Small sample bias Discussion Meta-analysis Classic meta-analysis: Original: univariate approach Focus on effect sizes Weighted average

14 fmri Meta-analysis Small sample bias Discussion Meta-analysis Classic meta-analysis: Meta-analysis of fmri studies Original: univariate approach Focus on effect sizes Weighted average

15 fmri Meta-analysis Small sample bias Discussion Coordinate-based meta-analysis (ALE)

16 fmri Meta-analysis Small sample bias Discussion Coordinate-based meta-analysis (ALE) * ALE = 1 '(1 MA * ),-

17 fmri Meta-analysis Small sample bias Discussion Coordinate-based meta-analysis y x

18 fmri Meta-analysis Small sample bias Discussion Coordinate-based meta-analysis y x

19 fmri Meta-analysis Small sample bias Discussion Coordinate-based meta-analysis n (participants per study) # contributing peaks y x

20 fmri Meta-analysis Small sample bias Discussion Coordinate-based meta-analysis n (participants per study) # contributing peaks n (participants per study) # contributing peaks y 40 n (participants per study) # contributing peaks n (participants per study) # contributing peaks x

21 fmri Meta-analysis Small sample bias Discussion Coordinate-based meta-analysis n (participants per study) # contributing peaks n (participants per study) # contributing peaks y 40 n (participants per study) # contributing peaks n (participants per study) # contributing peaks x

22 fmri Meta-analysis Small sample bias Discussion Coordinate-based meta-analysis Cluster 2, slope= 002, p=04484 y Individual study sample size No Yes Cluster contribution? x

23 fmri Meta-analysis Small sample bias Discussion Coordinate-based meta-analysis n (participants per study) # contributing peaks n (participants per study) # contributing peaks y 40 n (participants per study) # contributing peaks n (participants per study) # contributing peaks x

24 fmri Meta-analysis Small sample bias Discussion Coordinate-based meta-analysis Cluster 3, slope=001, p<0001 y Individual study sample size No Yes Cluster contribution? x

25 fmri Meta-analysis Small sample bias Discussion Regression We plot individual study sample size in function of cluster contribution Possible scenario's Individual study sample size No publication bias No Yes Individual study sample size No No effect Yes Individual study sample size No Publication bias Yes Cluster contribution? Cluster contribution? Cluster contribution?

26 fmri Meta-analysis Small sample bias Discussion Regression We plot individual study sample size in function of cluster contribution Possible scenario's Individual study sample size No publication bias No Yes Individual study sample size No No effect Yes Individual study sample size No Publication bias Yes Cluster contribution? Cluster contribution? Cluster contribution?

27 fmri Meta-analysis Small sample bias Discussion Regression We plot individual study sample size in function of cluster contribution Possible scenario's Individual study sample size No publication bias No Yes Individual study sample size No No effect Yes Individual study sample size No Publication bias Yes Cluster contribution? Cluster contribution? Cluster contribution?

28 fmri Meta-analysis Small sample bias Discussion Effect of lenient thresholding What is the effect of small sample bias on the regression?

29 fmri Meta-analysis Small sample bias Discussion Effect of lenient thresholding Simulated 500 meta-analyses with 43 studies Approx 39 small studies (n < 31), 4 large studies 1 target area with activation Lenient (p < 0001, uncorrected) or non-lenient (FDR whole brain corrected, q < 005) threshold 2 scenario s no lenient thresholding lenient threshold for the small studies in which no activation was found with the FDR threshold ALE meta-analysis with statistically significant peaks y x

30 fmri Meta-analysis Small sample bias Discussion Effect of lenient thresholding Simulated 500 meta-analyses with 43 studies Approx 39 small studies (n < 31), 4 large studies 1 target area with activation Lenient (p < 0001, uncorrected) or non-lenient (FDR whole brain corrected, q < 005) threshold 2 scenario s no lenient thresholding lenient threshold for the small studies in which no activation was found with the FDR threshold ALE meta-analysis with statistically significant peaks

31 fmri Meta-analysis Small sample bias Discussion Effect of lenient thresholding Simulated 500 meta-analyses with 43 studies Approx 39 small studies (n < 31), 4 large studies 1 target area with activation Lenient (p < 0001, uncorrected) or non-lenient (FDR whole brain corrected, q < 005) threshold 2 scenario s no lenient thresholding lenient threshold for the small studies in which no activation was found with the FDR threshold ALE meta-analysis with statistically significant peaks

32 fmri Meta-analysis Small sample bias Discussion Effect of lenient thresholding Select 1 t-map from a meta-analysis (neurovault) Compute average effect size in region of interest Compute power in region of interest for different sample sizes With standard thresholding (FDR, q < 001) With lenient thresholding n > 30: FDR, q < 001 n 30: uncorrected, p < 005 Simulate cluster contribution based on power (x 100) Depends on sample size, effect size and thresholding method Plot cluster contribution with and without lenient thresholding y x

33 fmri Meta-analysis Small sample bias Discussion Effect of lenient thresholding Select 1 t-map from a meta-analysis (neurovault) Compute average effect size in region of interest Compute power in region of interest for different sample sizes With standard thresholding (FDR, q < 001) With lenient thresholding n > 30: FDR, q < 001 n 30: uncorrected, p < 005 Simulate cluster contribution based on power (x 100) Depends on sample size, effect size and thresholding method Plot cluster contribution with and without lenient thresholding y x

34 fmri Meta-analysis Small sample bias Discussion Effect of lenient thresholding Select 1 t-map from a meta-analysis (neurovault) Compute average effect size in region of interest Compute power in region of interest for different sample sizes With standard thresholding (FDR, q < 001) With lenient thresholding n > 30: FDR, q < 001 n 30: uncorrected, p < 005 Simulate cluster contribution based on power (x 100) Depends on sample size, effect size and thresholding method Plot cluster contribution with and without lenient thresholding y x

35 fmri Meta-analysis Small sample bias Discussion Effect of lenient thresholding Select 1 t-map from a meta-analysis (neurovault) Compute average effect size in region of interest Compute power in region of interest for different sample sizes With standard thresholding (FDR, q < 001) With lenient thresholding n > 30: FDR, q < 001 n 30: uncorrected, p < 005 Simulate cluster contribution based on power (x 100) Depends on sample size, effect size and thresholding method Plot cluster contribution with and without lenient thresholding y x

36 fmri Meta-analysis Small sample bias Discussion Effect of lenient thresholding Standard thresholding, slope=638 Lenient thresholding, slope=495 Individual study sample size Individual study sample size No Yes No Yes Cluster contribution? Cluster contribution?

37 fmri Meta-analysis Small sample bias Discussion Effect of lenient thresholding Goal Assessment of small sample bias with as little information as possible Remarks Little attention for validity activated clusters (raise awareness) Amount of contributing peaks Sample sizes contributing studies Robustness of activated clusters

38 fmri Meta-analysis Small sample bias Discussion Effect of lenient thresholding Goal Assessment of small sample bias with as little information as possible Remarks Little attention for validity activated clusters (raise awareness) Amount of contributing peaks Sample sizes contributing studies Robustness of activated clusters

39 fmri Meta-analysis Small sample bias Discussion Discussion: FSN

40 fmri Meta-analysis Small sample bias Discussion Discussion: FSN Amount of null studies that can be added depends on Thresholding method Sample size Number of peaks Developed a tool to generate null studies based on parameters of the meta-analysis

41 fmri Meta-analysis Small sample bias Discussion Discussion: FSN Amount of null studies that can be added depends on Thresholding method Sample size Number of peaks Developed a tool to generate null studies based on parameters of the meta-analysis

42 Thank you!

Small Sample Inference for the Probabilistic Index Model: A Flexible Class of Rank Tests Gustavo Guimarães de Castro Amorim 1 joint work with Olivier Thas 1,2, Karel Vermeulen 1, Stijn Vansteelandt 3

43 Small Sample Inference for the Probabilistic Index Model: A Flexible Class of Rank Tests Gustavo Guimarães de Castro Amorim 1 joint work with Olivier Thas 1,2, Karel Vermeulen 1, Stijn Vansteelandt 3 and Jan De Neve 4 1 Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, Belgium 2 National Institute for Applied Statistics Research Australia (NIASRA), University of Wollongong, Wollongong, Australia 3 Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Belgium 4 Department of Data Analysis, Ghent University, Belgium gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 1 / 21

44 Probabilistic Index Model (PIM) Models the conditional probability that the outcome of one randomly chosen subject exceeds the outcome of another independently chosen subject, given their respective covariates [Thas et al, 2012, JRSS-B] Small Sample Inference for the Probabilistic Index Model 2 / 21

45 Probabilistic Index Model (PIM) Models the conditional probability that the outcome of one randomly chosen subject exceeds the outcome of another independently chosen subject, given their respective covariates [Thas et al, 2012, JRSS-B] P ( Y < Y X, X ) = g 1 (Z T β) where β is p-dimensional parameter of interest gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 2 / 21

46 Probabilistic Index Model (PIM) Models the conditional probability that the outcome of one randomly chosen subject exceeds the outcome of another independently chosen subject, given their respective covariates [Thas et al, 2012, JRSS-B] P ( Y < Y X, X ) = g 1 (Z T β) Probabilistic index where β is p-dimensional parameter of interest Z : function of (X, X ) g( ): link function (eg, logit or probit) gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 2 / 21

47 Probabilistic Index Model PIMs: Semiparametric regression model Available on CRAN > library(pim) pim(formula, data, link = c("logit", "probit", "identity"), model = c("difference", "marginal", "regular", "customized"), ) Can be used to generate classical (and new) rank tests Supplementing them with interpretable effect sizes gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 3 / 21

Relation to rank tests Example: Two-sample problem Let Y (k), with Y (k) F k, be the outcome in the kth group (k = 1, 2) and define a dummy variable X (k) = k 1 Take g 1 (Zβ) = 1 2 + (X (2) X

48 Relation to rank tests Example: Two-sample problem Let Y (k), with Y (k) F k, be the outcome in the kth group (k = 1, 2) and define a dummy variable X (k) = k 1 Take g 1 (Zβ) = (X (2) X (1) )β so that ( ) P Y (1) < Y (2) X (1) = 0, X (2) = 1 = β Identity link function gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 4 / 21

49 Relation to rank tests Rejecting H 0 : β = 0 in favor of H 1 : β 0 H 1 : β 1 2 ( ) H 1 : P Y (1) < Y (2) X (1) = 0, X (2) = The WMW test procedure is embedded in PIM modeling (just like t-test procedure is embedded in linear regression in a two-sample design) gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 5 / 21

50 Relation to rank tests Similarly, PIM generates: Kruskal - Wallis rank test Friedman rank test Mack - Skillings test and many others PIM generates in addition more flexible rank tests which allow for covariate adjustment (more in De Neve and Thas [2015, JASA]) gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 6 / 21

51 However Small Sample Inference for the Probabilistic Index Model 7 / 21

52 However We rely on asymptotic normality: ˆβ is asymptotically normal (see Vermeulen et al [2017, Submitted]) *[Little, 2006, The American Statistician] Small Sample Inference for the Probabilistic Index Model 7 / 21

53 Small sample inference Example: We generate data from the normal linear model fitted a correctly specified probit-pim β Y = X α + ɛ, with X N(0, 1) Distribution of ˆβ for n = 15 gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 8 / 21

54 Small sample inference Example: We generate data from the normal linear model Y = X α + ɛ, with X N(0, 1) fitted a correctly specified probit-pim β standard Wald confidence intervals Interessed in the empirical coverage (after 1,000 Monte Carlo simulations) Distribution of ˆβ for n = 15 gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 8 / 21

55 Small sample inference Standard Wald interval based on ˆβ ( ) gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 9 / 21

56 Small sample inference Again, we generate data from the normal linear model Y = X T α + ɛ, with X N(0, I) with α T = (0,, 0), (of length p = 2,, 5) fitted a correctly specified probit-pim β standard Wald confidence region Interessed in the empirical coverage (after 1,000 Monte Carlo simulations) gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 10 / 21

57 Small sample inference Standard Wald interval based on ˆβ ( ) gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 11 / 21

58 Small sample inference We need better methods for small sample inference Small Sample Inference for the Probabilistic Index Model 12 / 21

59 Small sample inference We need better methods for small sample inference Alternatives: Bootstrapping [Jiang and Kalbfleisch, 2012, Sankhya B] β is estimated by solving a set of estimating equations PIM models all pairwise comparisons inflated number of estimating functions computationally intensive gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 12 / 21

60 Small sample inference We need better methods for small sample inference Alternatives: Bootstrapping [Jiang and Kalbfleisch, 2012, Sankhya B] β is estimated by solving a set of estimating equations PIM models all pairwise comparisons inflated number of estimating functions computationally intensive Empirical likelihood [Owen, 1990, Annals of Statistics] gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 12 / 21

61 Empirical likelihood For the PIM, { n n L(β) = max w i : w i > 0, i = 1,, n; w i = 1, w i=1 i=1 } n w i w j U(X i, X j ; β) = 0 i,j=1 where U( ) is the PIM estimating function 1 Non-linear in the weights w i 2 Restriction might not have a solution 3 Computationally intensive [Chen et al, 2008, JCGS] gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 13 / 21

62 Building on empirical likelihood methods, we proposed the Small Sample Inference for the Probabilistic Index Model 14 / 21

63 Building on empirical likelihood methods, we proposed the Bias-reduced adjusted jackknife empirical likelihood method Small Sample Inference for the Probabilistic Index Model 14 / 21

64 Building on empirical likelihood methods, we proposed the Bias-reduced adjusted jackknife empirical 1 Applies the jackknife 0 Minimizes the second-order bias of β gustavoguimaraesdecastroamorim@ugentbe method to U(X, X ; β) 2 Jackknife pseudo-values 3 Apply the usual empirical likelihood to them likelihood method Jackknife Empirical Likelihood (deals with #1) (Non-linear in the weights wi ) [Jing et al, 2009, JASA] Small Sample Inference for the Probabilistic Index Model 14 / 21

65 Building on empirical likelihood methods, we proposed the Bias-reduced adjusted jackknife empirical likelihood method Adjusted Empirical Likelihood (deals with #2) (Restriction might not have a solution) [Chen et al, 2008, JCGS] Jackknife Empirical Likelihood (deals with #1) (Non-linear in the weights w i ) [Jing et al, 2009, JASA] gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 14 / 21

66 Building on empirical likelihood methods, we proposed the Bias-reduced adjusted jackknife empirical likelihood method Minimizes the second-order bias of ˆβ Adjusted Empirical Likelihood (deals with #2) (Restriction might not have a solution) [Chen et al, 2008, JCGS] Jackknife Empirical Likelihood (deals with #1) (Non-linear in the weights w i ) [Jing et al, 2009, JASA] gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 14 / 21

67 Theorem Let β0 be the true value of β Under regularity conditions, L(β0 ) χ2p 2 log {R(β0 )} = 2 log (n + 1) (n+1) in distribution as n gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 15 / 21

68 Theorem Let β0 be the true value of β Under regularity conditions, L(β0 ) χ2p 2 log {R(β0 )} = 2 log (n + 1) (n+1) in distribution as n If p > 1, replace χ2p by Fnp, where Fnp (n 1)p F (p, n p) n p [Owen, 1990, Annals of Statistics] gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 15 / 21

69 Are there any improvements? *Credits to xkcdcom Small Sample Inference for the Probabilistic Index Model 16 / 21

70 Results Standard Wald interval based on ˆβ ( ) gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 17 / 21

71 Results Standard Wald interval based on ˆβ ( likelihood ( ) ) and bias-reduced adjusted jackknife empirical gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 17 / 21

72 Results Standard Wald interval based on ˆβ ( ), gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 18 / 21

73 Results Standard Wald interval based on ˆβ ( ( ) ), bias-reduced adjusted jackknife empirical likelihood gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 18 / 21

74 Results Standard Wald interval based on ˆβ ( ), bias-reduced adjusted jackknife empirical likelihood ( ) and bias-reduced adjusted jackknife empirical likelihood with the F-approximation ( ) gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 18 / 21

75 Conclusion Probabilistic index models: Flexible semiparametric regression model Applicable to discrete, continuous or ordinal data Gives consistent and asymptotically normal estimates Small Sample Inference for the Probabilistic Index Model 19 / 21

76 Conclusion Probabilistic index models: Flexible semiparametric regression model Applicable to discrete, continuous or ordinal data Gives consistent and asymptotically normal estimates Strongly related to rank tests: PIM can also be used to generate rank tests for complex designs with the additional advantage of supplementing them with an effect size Small Sample Inference for the Probabilistic Index Model 19 / 21

77 Conclusion Probabilistic index models: Flexible semiparametric regression model Applicable to discrete, continuous or ordinal data Gives consistent and asymptotically normal estimates Strongly related to rank tests: PIM can also be used to generate rank tests for complex designs with the additional advantage of supplementing them with an effect size Currently available on CRAN Small Sample Inference for the Probabilistic Index Model 19 / 21

78 Conclusion Inference in small samples: Bias-reduced adjusted jackknife empirical likelihood gives coverage close to the nominal value for sample sizes as small as 20 In the process of being added into the PIM R package PIMs can therefore be used for small sample inference gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 20 / 21

79 Bibliography J Chen, A M Variyath, and B Abraham Adjusted empirical likelihood and its properties 17(2): , 2008, JCGS J De Neve and O Thas A regression framework for rank tests based on the probabilistic index model 110(511): , 2015, JASA W Jiang and J D Kalbfleisch Bootstrapping u-statistics: applications in least squares and robust regression 74(1):56 76, 2012, Sankhya B B-Y Jing, J Yuan, and W Zhou Jackknife empirical likelihood 104(487): , 2009, JASA R J Little Calibrated bayes: a bayes/frequentist roadmap 60(3): , 2006, The American Statistician A B Owen Empirical likelihood for confidence regions 75:90 120, Annals of Statistics, 1990, Annals of Statistics O Thas, J D Neve, L Clement, and J-P Ottoy Probabilistic index models 74(4): , 2012, JRSS-B K Vermeulen, G Amorim, J De Neve, O Thas, and S Vansteelandt Semiparametric estimation of probabilistic index models: efficiency and bias (-):00 00, 2017, Submitted gustavoguimaraesdecastroamorim@ugentbe Small Sample Inference for the Probabilistic Index Model 21 / 21

80 IBS Channel Network Conference 2017 Positive-definite multivariate spectral estimation: a geometric wavelet approach Joris Chau (Boursier FRIA), Rainer von Sachs April 25, 2017 Institute of Statistics, Biostatistics and Actuarial Sciences Université catholique de Louvain, Belgium

81 Background and motivation In estimating the covariance matrix of a (non-degenerate) complex random vector, the target is Hermitian, positive definite (HPD), ie Σ = Σ, and x Σ x > 0 for each x C d Our interest is in statistical problems where the target f (ω) is a curve of HPD matrices, ie for ω [0, 1) f (ω) = f (ω), and x f (ω) x > 0 for each x C d We focus in particular on nonparametric spectral estimation of a multivariate stationary time series, where the underlying spectral matrix is a curve of HPD matrices across frequency Classical approach: 1 Compute periodogram matrix, a noisy (asympotically unbiased, but inconsistent) HPSD estimator of the spectral matrix 2 Smooth periodogram to get a consistent HPD estimate of the spectral matrix (by eg kernel regression, projection estimators, multitaper estimators 1, etc) 1 Thomson, DJ (1982) Spectrum estimation and harmonic analysis Proc IEEE 70,

82 Background and motivation Typically, to guarantee positive-definiteness of the estimator, one applies equivalent smoothing parameters to each matrix-component (eg same bandwidth parameter for kernel regression, or same number of tapers for multitaper estimators) Link to Shiny-app: 3

83 Introduction Geometry of the space of HPD matrices P d d A (d d)-spectral matrix f (ω) at frequency ω is HPD, ie f (ω) P d d (P d d, +, S) is not a vector space, eg possibly p 1 p 2 / P d d Also, P d d endowed with the Euclidean distance is an incomplete metric space However, P d d is a well-studied Riemannian manifold, ie P d d locally looks like R d2 and can be equipped with a Riemannian metric, (a smooth family of inner products for each p P d d ) [Rahman et al (2005)] 2 develop wavelet transforms for M-valued data with tractable Exp-/Log-maps (local bijective maps between M and T p(m)), and the notion of a midpoint between p 1, p 2 M Based on these ideas, we can also construct wavelet transforms acting only on the Riemmanian manifold P d d 2 Rahman, I U, Drori, I, Stodden, V C, Donoho, D L, and Schröder, P (2005) Multiscale representations for manifold-valued data Multiscale Modeling & Simulation, 4(4),

84 Preliminaries and tools 1 Distance function: a specific choice of (invariant) Riemannian metric induces the following manifold-distance function 3 : δ(p 1, p 2 ) = Log(p 1/2 1 p 2 ) F, for p 1, p 2 P d d ( d ) 1/2 = log(λ i (p 1 1 p 2)) 2 i=1 with the notation y x := y xy for matrix congruence transformation In particular, δ(p 1, p 2 ) < for each p 1, p 2 P d d and singular matrices are pushed to the boundary of the metric space 2 Geodesics: the metric space (P d d, δ) is a complete metric space, and unique geodesics joining any two points p 1, p 2 P d d are given by, γ(p 1, p 2, t) = p 1/2 1 (p 1/2 1 p 2 ) t, 0 t 1 The midpoint Mid(p 1, p 2 ) := γ(p 1, p 2, 1/2) is defined as the halfway point along the (unique) geodesic connecting p 1 and p 2 3 Pennec, X, Fillard, P, Ayache, R (2006), A Riemannian framework for tensor computing, International Journal of Computer Vision, 66(1),

85 Preliminaries and tools 3 Exp-/Log-maps: the Exp-maps are global diffeomorphisms Exp p : T p(p d d ) P d d from the tangent space (attached at a point p) to the manifold via: ( ) Exp p (h) = p 1/2 Exp p 1/2 h where Exp( ) denotes the ordinary matrix exponential Similarly, the Log-maps are defined as the (unique) inverse exponential maps, with Log p : P d d T p(p d d ) Figure: Illustration of geodesics on the manifold and Log-/Exp-maps relating P d d to T p0 (P d d ) 6

86 Forward wavelet transform M 1,0 M 1,1 M 2,0 M 2,1 M 2,2 M 2,3 M J 1,0 M J 1,1 M J 1, n 2 2 M J 1, n 2 1 M J,0 M J,1 := := M J,2 M J,3 := := MJ,n 4 MJ,n 3 := := M J,n 2 M J,n 1 := := P n (ω 0) P n (ω 1) P n (ω 2) P n (ω 3) P n(ω n 4) P n(ω n 3) P n(ω n 2) P n(ω n 1) Figure: Midpoint-pyramid of the curve (P n(ω k )) k such that M j 1,k := Mid(M j,2k, M j,2k+1 ) 7

87 Forward wavelet transform M j 2,0 M j 1,1 M j 1,0 M j 1,1 M j 1,2 M j 1,3 M j,2 M j,3 Figure: Compute pyramid of imputed midpoints ( M j,k ) j,k Compute imputed midpoints M j,2k, M j,2k+1 Step (1) Collect 2D + 1 closest neighbors (M j,k+l ) D l D of M j,k with D 0 Step (2) Transform each neighbor to T Mj,k (P d d ) = H d d by the Log-map, and decompose in terms of ONB of H d d, (a real vector space) Step (3) Impute finer-scale real-valued coefficients by (ordinary) polynomial interpolation, and jump back to P d d using the Exp-map 8

88 Forward wavelet transform M j 2,0 M j 1,1 M j 1,0 M j 1,1 M j 1,2 M j 1,3 M j,2 M j,3 M j,2 Mj,3 Figure: Compute pyramid of imputed midpoints ( M j,k ) j,k Compute imputed midpoints M j,2k, M j,2k+1 Step (1) Collect 2D + 1 closest neighbors (M j,k+l ) D l D of M j,k with D 0 Step (2) Transform each neighbor to T Mj,k (P d d ) = H d d by the Log-map, and decompose in terms of ONB of H d d, (a real vector space) Step (3) Impute finer-scale real-valued coefficients by (ordinary) polynomial interpolation, and jump back to P d d using the Exp-map 8

89 Forward wavelet transform Figure: Compute pyramid of imputed midpoints ( M j,k ) j,k Compute imputed midpoints M j,2k, M j,2k+1 Step (1) Collect 2D + 1 closest neighbors (M j,k+l ) D l D of M j,k with D 0 Step (2) Transform each neighbor to T Mj,k (P d d ) = H d d by the Log-map, and decompose in terms of ONB of H d d, (a real vector space) Step (3) Impute finer-scale real-valued coefficients by (ordinary) polynomial interpolation, and jump back to P d d using the Exp-map 8

90 Forward wavelet transform M j 2,0 M j 1,1 M j 1,0 M j 1,1 M j 1,2 M j 1,3 D j,2 D j,3 Figure: Compute pyramid of wavelet coefficients (D j,k ) j,k Compute wavelet coefficient D j,k Given true and imputed midpoints M j,k, M j,k, the wavelet coefficients are defined as a difference in the tangent space, ( ) D j,k := Log M 1/2 M j,k j,k T Id (P d d ) Note that D j,k Id = D j,k F = δ( M j,k, M j,k ) by definition of the distance function 9

91 Forward wavelet transform Figure: Compute pyramid of wavelet coefficients (D j,k ) j,k Compute wavelet coefficient D j,k Given true and imputed midpoints M j,k, M j,k, the wavelet coefficients are defined as a difference in the tangent space, ( ) D j,k := Log M 1/2 M j,k j,k T Id (P d d ) Note that D j,k Id = D j,k F = δ( M j,k, M j,k ) by definition of the distance function 9

92 Forward wavelet transform M 1,0, D 1,0 M 1,1, D 1,1 D 2,0 D 2,1 D 2,2 D 2,3 D J 2,0 D J 2,1 D J 2, n 4 2 D J 2, n 4 1 D J 1,0 D J 1,1 D J 1,2 D J 1,3 D J 1, n 2 4 DJ 1, n 2 3 D J 1, n 2 2 DJ 1, n 2 1 Figure: Equivalent representation of (P n(ω k )) k in terms of wavelet coefficients 10

93 Periodogram denoising Permutation-invariance of the wavelet spectral estimator Other nonparametric multivariate spectral estimation methods 45 rely on smoothing the Cholesky decomposition of the (pre-smoothed) periodogram The Cholesky-smoothed spectral estimator is not necessarily equivariant under a permutation of the ordering of the time series To be precise: if π(1,, d) is a permutation of the ordering of the time series, then a natural requirement is that: ˆf π(ω) = U πˆf (ω)u π with U π the permutation matrix corresponding to π(1,, d) The wavelet thresholded spectral estimator satisfies the above permutation-invariance for any permutation π(1,, d), whereas the Cholesky-based spectral estimators generally do not 4 Dai, M, and Guo, W (2004) Multivariate spectral analysis using Cholesky decomposition Biometrika 91(3), Rosen, O, and Stoffer, D S (2007) Automatic estimation of multivariate spectra via smoothing splines Biometrika 94(2),

94 Concluding remarks Link to Shiny-app: This talk is based on the paper: Chau, J, and von Sachs, R (2017) Positive-definite multivariate spectral estimation: a geometric wavelet approach, (currently under review) An R-package pdspecest containing the tools to perform wavelet-based multivariate spectral estimation and clustering is available on CRAN Install the latest developmental version via devtools::install_github("jorischau/pdspecest") Other applications exploiting the geometric properties of the space of Hermitian positive-definite matrices as a Riemannian manifold include: extending the notion of a data depth for observations in the space of Hermitian PD matrices (available soon in the R-package pdspecest) But also, time-varying multivariate spectral estimation, and multivariate spectral estimation of replicated time series 12

spectral estimation: a geometric wavelet approach Joris Chau (Boursier

95 Thank you! IBS Channel Network Conference 2017 Positive-definite multivariate spectral estimation: a geometric wavelet approach Joris Chau (Boursier FRIA), Rainer von Sachs April 25, 2017 Institute of Statistics, Biostatistics and Actuarial Sciences Université catholique de Louvain, Belgium 13

Probabilistic Index Models

Probabilistic Index Models Jan De Neve Department of Data Analysis Ghent University M3 Storrs, Conneticut, USA May 23, 2017 Jan.DeNeve@UGent.be 1 / 37 Introduction 2 / 37 Introduction to Probabilistic