BMI/STAT 768 : Lecture 13 Sparse Models in Images

Size: px
Start display at page:

Download "BMI/STAT 768 : Lecture 13 Sparse Models in Images"

Transcription

1 BMI/STAT 768 : Lecture 13 Sparse Models in Images Moo K. Chung mkchung@wisc.edu March 25, Why sparse models are needed? If we are interested quantifying the voxel measurements in every voxel in an image simultaneously, the standard procedure is to set up a multivariate general linear model (MGLM), which generalizes widely used univariate GLM by incorporating vector valued responses and explanatory variables [1, 12, 33, 34, 30, 6]. Hotelling s T 2 statistic is a special case of MGLM and has been mainly used for inference on surface shapes and deformations [31, 19, 4, 13, 7]. Let J n p = (J ij ) be the measurement matrix, J ij is the measurement for subject i at voxel position j. The

2 subscripts denote the dimension of matrix. We can think J ij as either Jacobian determinant, fractional anisotropy values or fmri activation. Assume there are total n subjects and p voxels of interest. The measurement vector at the j-th voxel is denoted as x j = (J 1j,, J nj ). The measurement vector for the i-th subject is denoted as y i = (J i1,, J ip ). y i, is expected to be distributed identically and independently over subjects. Note that J = (x 1,, x p ) = (y 1,, y n ). We may assume the covariance matrix of y i to be V(y 1 ) = = V(y n ) = Σ p p = (σ kl ). With these notations, we now set up the following MGLM over all subjects and across different voxel positions: J n p = X n k B k p + Z n q G q p + U n p Σ 1/2 p p, (1) where X is the matrix of contrasted explanatory variables while B is the matrix of unknown coefficients to be estimated. Nuisance covariates of non-interest are in the matrix Z and the corresponding coefficients are in the matrix G. The components of Gaussian random matrix U are independently distributed with zero mean and unit variance. The symmetric matrix Σ 1/2 is the square-root

3 of the covariance matrix accounting for the spatial dependency across different voxels. In MGLM (1), we are interested in testing the null hypothesis H 0 : B = 0. The parameter matrices in the model are estimated via the least squares method. The resulting multivariate test statistics are called the Lawley-Hotelling trace or Roy s maximum root. When there is only one voxel, i.e. p = 1, these multivariate test statistics collapses to Hotelling s T 2 statistic [34, 6]. Note that MGLM (1) is equivalent to the assumption that y i follows multivariate normal with some mean µ and covariance Σ, i.e., y i N(µ, Σ). Then neglecting constant terms, the log-likelihood function L of y i is given by L(µ, Σ) = log det Σ 1 1 n (y i µ) Σ 1 (y i µ). n i=1 By maximizing the log-likelihood, MLE of µ and Σ are given by µ = ȳ i = 1 n y i n Σ = 1 n i=1 n (y i ȳ i ) (y i ȳ i ). (2) i=

4 For a notational convenience, we can center the measurement y i such that y i y i ȳ i. We are basically centering the measurements by subtracting the group mean over subjects. Then MLE (3) can be written in a more compact form Σ = 1 n J p nj n p. (3) However, there is a serious defect with MGLM (1) and its MLE (3); namely the estimated covariance matrix Σ is positive definite only for n p [12, 29]. J J becomes rank deficient for n < p. In most imaging studies, there are more voxels than the number of subjects, i.e., n < p. When Σ is singular, we do not properly have the inverse of Σ, which is the precision matrix often needed in partial correlation based network analyses [22]. This is the main reason MGLM was rarely employed over the whole brain region and researchers are still using mostly univariate approaches in imaging studies. 1.1 Why sparse network? The majority of functional and structural connectivity studies in brain imaging are usually performed following the

5 standard analysis framework [14, 15, 10, 35]. From 3D whole brain images, n regions of interest (ROI) are identified and serve as the nodes of the brain network. Measurements at ROIs are then correlated in a pair-wise fashion to produce the connectivity matrix of size n n. The connectivity matrix is then thresholded to produce the adjacency matrix consisting of zeros and ones that define the link between two nodes. The binarized adjacency matrix is then used to construct the brain network. Then various graph complexity measures such as degree, clustering coefficients, entropy, path length, hub centrality and modularity are defined on the graph and the subsequent statistical inference is performed on these complexity measures. For a large number of nodes, simple thresholding of correlation will produce a large number of links which makes the interpretation difficult. For example, for voxels in an image, we can possibly have a total of links in the graph. For this reason we used the sparse data recovery framework in obtaining a far smaller number of significant links.

6 2 Graphical-LASSO To remedy the small n and large-p problem, the likelihood is regularized with a L1-norm penalty. If we center the measurements y i, µ = 0. So the log-likelihood can be written as L(Σ) = log det Σ 1 1 n n yi Σ 1 y i i=1 = log det Σ 1 tr ( Σ 1 S ), where S = 1 n n i=1 y i y i is the sample covariance matrix. We used the fact that the trace of a scalar value is equivalent to the scalar value itself and tr(ab) = tr(ba) for matrices A and B. We made the likelihood as a function of Σ 1 to simply emphasize that we are trying to estimate the inverse covariance matrix. To avoid the small-n large-p problem, we penalize the log-likelihood with L1-norm penalty: ( ) L(Σ) = log det Σ 1 tr Σ 1 S λ Σ 1 1, (4) where 1 is the sum of the absolute values of the elements. The penalized log-likelihood is maximized over the space of all possible symmetric positive definite matrices. (4) is a convex problem and it is usually solved

7 using the graphical-lasso (GLASSO) algorithm [3, 2, 11, 18, 25]. The tuning parameter λ > 0 controls the sparsity of the off-diagonal elements of the inverse covariance matrix. By increasing λ > 0, the estimated inverse covariance matrix becomes more sparse. To remedy this small n and large-p problem, we propose to regularize the likelihood term with L 1 -penalty and maximize the sparse likelihood: ( ) L(Σ) = log det Σ 1 tr Σ 1 S λ Σ 1, (5) where is the sum of the absolute values of the elements. The sparse-likelihood is given as a function of Σ 1 to emphasize that we are actually interested in estimating the inverse covariance. The tuning parameter λ > 0 controls the sparsity of the off-diagonal elements of the covariance matrix. Then we maximize L over the space of all possible symmetric positive definite matrices. (5) is a convex problem and we solve it using the graphical-lasso (GLASSO) algorithm [3, 11, 18]. By increasing λ, the estimated covariance matrix becomes more sparse. GLASSO is a fairly time consuming algorithm [11, 18]. Solving GLASSO for 548 nodes, for instance, takes about 6 minutes on a desktop computer. If Σ i (λ) is the estimated sparse covariance for group i at given sparse

8 parameter λ, we are usually interested in testing the equivalence of covariance matrices between the two groups at fixed λ, i.e., H 0 : Σ 1 (λ) = Σ 2 (λ). 2.1 Filtration in graphical-lasso The solution to graphical-lasso has a peculiar topological structure. Let Σ 1 (λ) = (σ ij (λ)) be the inverse covariance estimated from graphical-lasso. Let A(λ) = (a ij ) be the corresponding adjacency matrix given by { 1 if σ ij 0; a ij (λ) = (6) 0 otherwise. The adjacency matrix A induces a graph G(λ) consisting of κ(λ) number of partitioned subgraphs G(λ) = κ(λ) l=1 G l (λ) with G l = {V l (λ), A l (λ)}, where V l and A l are node and edge sets of subgraph G l. Let S = (s ij ) be the sample covariance matrix. Let B(λ) = (b ij ) be the adjacency matrix defined by { 1 if ŝ ij > λ; b ij (λ) = (7) 0 otherwise.

9 Figure 1: Left: Adjacency matrices obtained through graphical-lasso with increasing λ values. The persistent homological structure is self-evident. Right: Adjacency matrices are clustered as a block diagonal matrix D by permutation. The adjacency matrix B similarly induces a graph with τ (λ) disjoint subgraphs: τ (λ) H(λ) = [ Hl (λ) with Hl = {Wl (λ), Bl (λ)}, l=1 where Wl and Bl are node and edge sets of subgraph Hl. Then the partitioned graphs are shown to be partially nested in a sense that the node sets exhibits persistency.

10 Theorem 1 For any λ > 0, the adjacency matrices (6) and (7) induce the identical vertex partition so that κ(λ) = τ(λ) and V l (λ) = W l (λ). Further, the node sets V l and W l form filtrations over the sparse parameter: V l (λ 1 ) V l (λ 2 ) V l (λ 3 ) (8) W l (λ 1 ) W l (λ 2 ) W l (λ 3 ) (9) for λ 1 λ 2 λ 3. From (7), it is trivial to see the filtration holds for W l. The filtration for V l is proved in [18]. The equivalence of the node sets V l = W l is proved in [25]. Note that the edge sets may not form a filtration. The construction of the filtration on the node sets V l (8) is very time consuming since we have to solve the sequence of graphical-lasso. For instance, for 548 node sets and 547 different filtration values, the whole filtration takes more than 54 hours in a desktop [5]. In Figure 1, we randomly simulated the data matrix X 5 10 from the standard normal distribution. The sample covariance matrix is then feed into graphical-lasso with different filtration values. To identify the structure better, we transformed the adjacency matrix A by permutation P such that D = P AP 1 is a block diagonal matrix. Theoretically only the partitioned node sets are expected to exhibit the nestedness but in this example, the edge sets are also nested as well.

11 3 Sparse correlation network The problem with graphial-lasso or any type of similar L1 norm optimization is that it becomes computationally expensive as the number of node p increases. So it is not really practical for large-scale brain networks. For largescale brain networks, we simply recommend thresholding correlations. Here is the mathematical justification. 3.1 Correlations Consider measurement vector x j on node j. If we center and rescale the measurement x j such that x j 2 = x jx j = 1, the sample correlation between nodes i and j is given by x i x j. Since the data is normalized, the sample covariance matrix is reduced to the sample correlation matrix. Consider the following linear regression between nodes j and k (k j): x j = γ jk x k + ɛ j. (10) We are basically correlating data at node j to data at node k. In this particular case, γ jk is the usual Pearson correlation. The least squares estimation (LSE) of γ jk is then

12 given by γ jk = x jx k, (11) which is the sample correlation. For the normalized data, regression coefficient estimation is exactly the sample correlation. For the normalized and centered data, the regression coefficient is the correlation. It can be shown that (11) minimizes the sum of least squares over all nodes: p x j γ jk x k 2. (12) j=1 k j Note that we do not really care about correlating x j to itself since the correlation is then trivially γ jj = Sparse correlations Let Γ = (γ jk ) be the correlation matrix. The sparse penalized version of (12) is given by F (Γ) = 1 p p x j γ jk x k 2 +λ γ jk. (13) 2 j=1 k j j=1 k j The sparse correlation is given by minimizing F (Γ). By increasing λ, the estimated correlation matrix Γ(λ) becomes more sparse. When λ = 0, the sparse correlation

13 is simply given by the sample correlation, i.e. γ jk = x j x k. As λ increases, the correlation matrix Γ shrinks to zero and becomes more sparse. This is separable compressed sensing or LASSO type problem. However, there is no need to numerically optimize (13) using the coordinate descent learning or the active-set algorithm often used in compressed sensing [27, 11]. The minimization of (13) can be done by the proposed soft-thresholding method analytically by exploiting the topological structure of the problem. This sparse regression is not orthogonal, i.e. x i x j δ ij, the Dirac delta, so the existing soft-thresholding method for LASSO [32] is not applicable. Theorem 2 For λ 0, the solution of the following separable LASSO problem 1 γ jk (λ) = arg min γ jk 2 p j=1 k j x j γ jk x k 2 +λ p γ jk, j=1 k j is given by the soft-thresholding x j x k λ if x j x k > λ γ jk (λ) = 0 if x j x k λ. (14) x j x k + λ if x j x k < λ

14 Proof. Write (13) as F (Γ) = 1 2 p f(γ jk ), (15) j=1 k j where f(γ jk ) = x j γ jk x k 2 +2λ γ jk. Since f(γ jk ) is nonnegative and convex, F (Γ) is minimum if each component f(γ jk ) achieves minimum. So we only need to minimize each component f(γ jk ). This differentiates our sparse correlation formulation from the standard compressed sensing that cannot be optimized in this component wise fashion. f(γ jk ) can be rewritten as f(γ jk ) = x j 2 2γ jk x jx k + γ 2 jk x k 2 + 2λ γ jk = (γ jk x jx k ) 2 + 2λ γ jk + 1. We used the fact x j x j = 1. For λ = 0, the minimum of f(γ jk ) is achieved when γ jk = x j x k, which is the usual LSE. For λ > 0, Since f(γ jk ) is quadratic in γ jk, the minimum is achieved when f γ jk = 2γ jk 2x jx k ± 2λ = 0 (16)

15 The sign of λ depends on the sign of γ jk. Thus, sparse correlation γ jk is given by a soft-thresholding of x j x k: x j x k λ if x j x k > λ γ jk (λ) = 0 if x j x k λ. (17) x j x k + λ if x j x k < λ The estimated sparse correlation (17) basically thresholds the sample correlation that is larger or smaller than λ by the amount λ. Due to this simple expression, there is no need to optimize (13) numerically as often done in compressed sensing or LASSO [27, 11]. However, Theorem 2 is only applicable to separable cases and for nonseparable cases, numerical optimization is still needed. The different choices of sparsity parameter λ will produce different solutions in sparse model A(λ). Instead of analyzing each model separately, we can analyze the whole collection of all the sparse solutions for many different values of λ. This avoids the problem of identifying the optimal sparse parameter that may not be optimal in practice. The question is then how to use the collection of A(λ) in a coherent mathematical fashion. This can be addressed using persistent homology [9, 20, 21].

16 3.3 Filtration in sparse correlations Using the sparse solution (17), we can construct a filtration. We will basically build a graph G using spare correlations. Let γ jk (λ) be the sparse correlation estimate. Let A(λ) = (a ij ) be the adjacency matrix defined as { 1 if γ jk (λ) 0; a jk (λ) = 0 otherwise. This is equivalent to the adjacency matrix B = (b jk ) defined as { 1 if x j b jk (λ) = x k > λ; (18) 0 otherwise. The adjacency matrix B is simply obtained by thresholding the sample correlations. Then the adjacency matrices A and B induce a identical graph G(λ) consisting of κ(λ) number of partitioned subgraphs G(λ) = κ(λ) l=1 G l (λ) with G l = {V l (λ), E l (λ)}, where V l and E l are node and edge sets respectively. Note G l Gm = for any l m.

17 Figure 2: Jocobian determinant of deformation field are measured at 548 nodes along the white matter boundary [5]. The β0 -number (number of connected components) of the filtrations on the sample correlations and covariances show huge group separation between normal controls and post-institutionalized (PI) children. and no two nodes between the different partitions are connected. The node S and edge sets are denoted as V(λ) = S κ κ V and E(λ) = l=1 l l=1 El respectively. Then we have the following theorem: Theorem 3 The induced graph from the spare correlation forms a filtration: G(λ1 ) G(λ2 ) G(λ3 ) (19)

18 for λ 1 λ 2 λ 3. Equivalently, the node and edge sets also form filtrations as well: V(λ 1 ) V(λ 2 ) V(λ 3 ) E(λ 1 ) E(λ 2 ) E(λ 3 ). The proof can be easily obtained from the definition of adjacency matrix (18). 4 Partial correlation network Let p be the number of nodes in the network. In most applications, the number of nodes is expected to be larger than the number of observations n, which gives an underdetermined system. Consider measurement vector at the j-th node x j = (x 1j,, x nj ) consisting of n measurements. Vector x j are assumed to be distributed with mean zero and covariance Σ = (σ ij ). The correlation γ ij between the two nodes i and j is given by γ ij = σ ij. σii σ jj By thresholding the correlation, we can establish a link between two nodes. However, there is a problem with

19 this simplistic approach in that it fails to explicitly factor out the confounding effect of other nodes. To remedy this problem, partial correlations can be used in factoring out the dependency of other nodes [16, 24, 17, 18, 27]. If we denote the inverse covariance matrix as Σ 1 = (σ ij ), the partial correlation between the nodes i and j while factoring out the effect of all other nodes is given by σij ρ ij =. (20) σ ii σjj Equivalently, we can compute the partial correlation via a linear model as follows. Consider a linear model of correlating measurement at node j to all other nodes: x j = k j β jk x k + ɛ k. (21) The parameters β jk are estimated by minimizing the sum of squared residual of (21) p L(β) = β jk x k 2 (22) j=1 x j k j in a least squares fashion. If we denote the least squares estimator by β jk, the residuals are given by r j = x j k j β jk x k. (23)

20 The partial correlation is then obtained by computing the correlation between the residuals [16, 23, 27]: ρ ij = corr (r j, r j ). 4.1 Sparse partial correlations There is a serious problem with the least squares estimation framework discussed in the previous section. Since n p, this is a significantly underdetermined system. This is also related to the covariance matrix Σ being singular so we cannot just invert the covariance matrix. For this, we need sparse network modeling. The minimization of (22) is exactly given by solving the normal equation: x j = k j β jk x k, (24) which can be turned into standard linear form y = Aβ [22]. Note that (24) can be written as β j1 β j2 x j = [x 1,, x j 1, 0, x j+1,, x p ] }{{} X j. β jp, } {{ } β j

21 where 0 n 1 is a column vector of all zero entries. Then we have x 1 X β 1 x 2 0 X 2 0 β 2 =......, (25).. x p 0 0 X p β p }{{}}{{}}{{} y np 1 A np p 2 β p 2 1 where A is a block diagonal matrix and 0 n p is a matrix of all zero entries. We regularize (25) by incorporating l 1 LASSO-penalty J [32, 27, 22]: J = i,j β ij. The sparse estimation of β ij is then given by minimizing L + λj. Since there is dependency between y and A, (25) is not exactly a standard compressed sensing problem [27, 22]. It should be intuitively understood that sparsity makes the linear equation (24) less underdetermined. The larger the value of λ, the more sparse the underlying topological structure gets. Since ρ ij = β ij σ ii σ jj, the sparsity of β ij directly corresponds to the sparsity of ρ ij, which is the strength of the link between nodes i and

22 j [27, 22]. Once the sparse partial correlation matrix ρ is obtained, we can simply link nodes j and j, if ρ ij > 0 and assign the weight ρ ij to the edge. This way, we obtain the weighted graph. 4.2 Limitations However, the sparse partial correlation framework has a serious computational bottleneck. For n measurements over p nodes, it is required that we solve a linear system with an extremely large A matrix of size np p 2, so that the complexity of the problem increases by a factor of p 3! Consequently, for a large number of nodes, the problem immediately becomes almost intractable for a small computer. For example, for 1 million nodes, we have to compute 1 trillion possible pairwise relationships between nodes. One practical solution is to modify (21) so that the measurement at node i is represented more sparsely over some possible index set S i : x i = S i β ij x j + ɛ i. making the problem substantially smaller. An alternate approach is to simply follow the homotopy path, which adds network links one by one with a very limited increase of computational complexity so

23 there is no need to compute β repeatedly from scratch [8, 28, 26]. The trajectory of the optimal solution β in LASSO follows a piecewise linear path as we change λ. By tracing the linear path, we can substantially reduce the computational burden of reestimating β when λ changes. References [1] T.W. Anderson. An Introduction to Multivariate Statistical Analysis. Wiley, 2nd edition, [2] O. Banerjee, L. El Ghaoui, and A. d Aspremont. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. The Journal of Machine Learning Research, 9: , [3] O. Banerjee, L.E. Ghaoui, A. d Aspremont, and G. Natsoulis. Convex optimization techniques for fitting sparse Gaussian graphical models. In Proceedings of the 23rd International Conference on Machine Learning, page 96, [4] J. Cao and K.J. Worsley. The detection of local shape changes via the geometry of Hotelling s T2 fields. Annals of Statistics, 27: , 1999.

24 [5] M.K. Chung, J.L. Hanson, J. Ye, R.J. Davidson, and S.D. Pollak. Persistent homology in sparse regression and its application to brain morphometry. IEEE Transactions on Medical Imaging, 34: , [6] M.K. Chung, K.J. Worsley, M.N. Brendon, K.M. Dalton, and R.J. Davidson. General multivariate linear modeling of surface shapes using SurfStat. NeuroImage, 53: , [7] M.K. Chung, K.J. Worsley, T. Paus, D.L. Cherif, C. Collins, J. Giedd, J.L. Rapoport, and A.C. Evans. A unified statistical approach to deformation-based morphometry. NeuroImage, 14: , [8] D.L. Donoho and Y. Tsaig. Fast solution of l 1 -norm minimization problems when the solution may be sparse. Citeseer, [9] H. Edelsbrunner and J. Harer. Persistent homology - a survey. Contemporary Mathematics, 453: , [10] A. Fornito, A. Zalesky, and E.T. Bullmore. Network scaling effects in graph analytic studies of human resting-state fmri data. Frontiers in Systems Neuroscience, 4:1 16, 2010.

25 [11] J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9:432, [12] K.J. Friston, A.P. Holmes, K.J. Worsley, J.-P. Poline, C.D. Frith, and R.S.J. Frackowiak. Statistical parametric maps in functional imaging: A general linear approach. Human Brain Mapping, 2: , [13] C. Gaser, H.-P. Volz, S. Kiebel, S. Riehemann, and H. Sauer. Detecting structural changes in whole brain based on nonlinear deformations - Application to schizophrenia research. NeuroImage, 10: , [14] G. Gong, Y. He, L. Concha, C. Lebel, D.W. Gross, A.C. Evans, and C. Beaulieu. Mapping anatomical connectivity patterns of human cerebral cortex using in vivo diffusion tensor imaging tractography. Cerebral Cortex, 19: , [15] P. Hagmann, M. Kurant, X. Gigandet, P. Thiran, V.J. Wedeen, R. Meuli, and J.P. Thiran. Mapping human whole-brain structural networks with diffusion MRI. PLoS One, 2(7):e597, [16] Y. He, Z.J. Chen, and A.C. Evans. Small-world anatomical networks in the human brain revealed

26 by cortical thickness from MRI. Cerebral Cortex, 17: , [17] S. Huang, J. Li, L. Sun, J. Liu, T. Wu, K. Chen, A. Fleisher, E. Reiman, and J. Ye. Learning brain connectivity of Alzheimer s disease from neuroimaging data. In Advances in Neural Information Processing Systems, pages , [18] S. Huang, J. Li, L. Sun, J. Ye, A. Fleisher, T. Wu, K. Chen, and E. Reiman. Learning brain connectivity of Alzheimer s disease by sparse inverse covariance estimation. NeuroImage, 50: , [19] S.C. Joshi. Large Deformation Diffeomorphisms and Gaussian Random Fields for Statistical Characterization of Brain Sub-Manifolds. PhD thesis, Washington University, St. Louis, [20] H. Lee, M.K. Chung, H. Kang, B.-N. Kim, and D.S. Lee. Computing the shape of brain networks using graph filtration and Gromov-Hausdorff metric. MICCAI, Lecture Notes in Computer Science, 6892: , [21] H. Lee, H. Kang, M.K. Chung, B.-N. Kim, and D.S Lee. Persistent brain network homology from the perspective of dendrogram. IEEE Transactions on Medical Imaging, 31: , 2012.

27 [22] H. Lee, D.S.. Lee, H. Kang, B.-N. Kim, and M.K. Chung. Sparse brain network recovery under compressed sensing. IEEE Transactions on Medical Imaging, 30: , [23] J.P. Lerch, K. Worsley, W.P. Shaw, D.K. Greenstein, R.K. Lenroot, J. Giedd, and A.C. Evans. Mapping anatomical correlations across cerebral cortex (MACACC) using cortical thickness from MRI. NeuroImage, 31: , [24] G. Marrelec, A. Krainik, H. Duffau, M. Pélégrini- Issac, S. Lehéricy, J. Doyon, and H. Benali. Partial correlation for functional brain interactivity investigation in functional MRI. NeuroImage, 32: , [25] R. Mazumder and T. Hastie. Exact covariance thresholding into connected components for largescale graphical LASSO. The Journal of Machine Learning Research, 13: , [26] M.R. Osborne, B. Presnell, and B.A. Turlach. A new approach to variable selection in least squares problems. IMA Journal of Numerical Analysis, 20: , [27] J. Peng, P. Wang, N. Zhou, and J. Zhu. Partial correlation estimation by joint sparse regression mod-

28 els. Journal of the American Statistical Association, 104: , [28] M.D. Plumbley. Geometry and homotopy for l 1 sparse representations. Proceedings of SPARS, 5: , [29] J. Schäfer and K. Strimmer. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4:32, [30] J.E. Taylor and K.J. Worsley. Random fields of multivariate test statistics, with applications to shape analysis. Annals of Statistics, 36:1 27, [31] P.M. Thompson, D. MacDonald, M.S. Mega, C.J. Holmes, A.C. Evans, and A.W Toga. Detection and mapping of abnormal brain structure with a probabilistic atlas of cortical surfaces. Journal of Computer Assisted Tomography, 21: , [32] R. Tibshirani. Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society. Series B (Methodological), 58: , 1996.

29 [33] K.J. Worsley, S. Marrett, P. Neelin, A.C. Vandal, K.J. Friston, and A.C. Evans. A unified statistical approach for determining significant signals in images of cerebral activation. Human Brain Mapping, 4:58 73, [34] K.J. Worsley, J.E. Taylor, F. Tomaiuolo, and J. Lerch. Unified univariate and multivariate random field theory. NeuroImage, 23:S , [35] A. Zalesky, A. Fornito, I.H. Harding, L. Cocchi, M. Yücel, C. Pantelis, and E.T. Bullmore. Wholebrain anatomical networks: Does the choice of nodes matter? NeuroImage, 50: , 2010.

1 Introduction. Moo K. Chung 1, Jamie L. Hanson 1, Hyekyung Lee 2, Nagesh Adluru 1, Andrew L. Alexander 1, Richard J. Davidson 1, Seth D.

1 Introduction. Moo K. Chung 1, Jamie L. Hanson 1, Hyekyung Lee 2, Nagesh Adluru 1, Andrew L. Alexander 1, Richard J. Davidson 1, Seth D. Exploiting Hidden Persistent Structures in Multivariate Tensor-Based Morphometry and Its Application to Detecting White Matter Abnormality in Maltreated Children Moo K. Chung 1, Jamie L. Hanson 1, Hyekyung

More information

BMI/STAT 768: Lecture 09 Statistical Inference on Trees

BMI/STAT 768: Lecture 09 Statistical Inference on Trees BMI/STAT 768: Lecture 09 Statistical Inference on Trees Moo K. Chung mkchung@wisc.edu March 1, 2018 This lecture follows the lecture on Trees. 1 Inference on MST In medical imaging, minimum spanning trees

More information

BMI/STAT 768: Lecture 04 Correlations in Metric Spaces

BMI/STAT 768: Lecture 04 Correlations in Metric Spaces BMI/STAT 768: Lecture 04 Correlations in Metric Spaces Moo K. Chung mkchung@wisc.edu February 1, 2018 The elementary statistical treatment on correlations can be found in [4]: http://www.stat.wisc.edu/

More information

arxiv: v2 [stat.me] 9 Mar 2015

arxiv: v2 [stat.me] 9 Mar 2015 Persistent Homology in Sparse Regression and Its Application to Brain Morphometry arxiv:1409.0177v2 [stat.me] 9 Mar 2015 Moo K. Chung, Jamie L. Hanson, Jieping Ye, Richard J. Davidson, Seth D. Pollak January

More information

An unbiased estimator for the roughness of a multivariate Gaussian random field

An unbiased estimator for the roughness of a multivariate Gaussian random field An unbiased estimator for the roughness of a multivariate Gaussian random field K.J. Worsley July 28, 2000 Abstract Images from positron emission tomography (PET) and functional magnetic resonance imaging

More information

Neuroimage Processing

Neuroimage Processing Neuroimage Processing Instructor: Moo K. Chung mkchung@wisc.edu Lecture 2. General Linear Models (GLM) Multivariate General Linear Models (MGLM) September 11, 2009 Research Projects If you have your own

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Big & Quic: Sparse Inverse Covariance Estimation for a Million Variables

Big & Quic: Sparse Inverse Covariance Estimation for a Million Variables for a Million Variables Cho-Jui Hsieh The University of Texas at Austin NIPS Lake Tahoe, Nevada Dec 8, 2013 Joint work with M. Sustik, I. Dhillon, P. Ravikumar and R. Poldrack FMRI Brain Analysis Goal:

More information

Unified univariate and multivariate random field theory

Unified univariate and multivariate random field theory Unified univariate and multivariate random field theory Keith J. Worsley 1, Jonathan E. Taylor 3, Francesco Tomaiuolo 4, Jason Lerch 1 Department of Mathematics and Statistics, and Montreal Neurological

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina. Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and

More information

Frist order optimization methods for sparse inverse covariance selection

Frist order optimization methods for sparse inverse covariance selection Frist order optimization methods for sparse inverse covariance selection Katya Scheinberg Lehigh University ISE Department (joint work with D. Goldfarb, Sh. Ma, I. Rish) Introduction l l l l l l The field

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models

MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models 1/13 MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models Dominique Guillot Departments of Mathematical Sciences University of Delaware May 4, 2016 Recall

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.)

MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) 1/12 MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) Dominique Guillot Departments of Mathematical Sciences University of Delaware May 6, 2016

More information

17th Annual Meeting of the Organization for Human Brain Mapping. Multivariate cortical shape modeling based on sparse representation

17th Annual Meeting of the Organization for Human Brain Mapping. Multivariate cortical shape modeling based on sparse representation 17th Annual Meeting of the Organization for Human Brain Mapping Multivariate cortical shape modeling based on sparse representation Abstract No: 2207 Authors: Seongho Seo 1, Moo K. Chung 1,2, Kim M. Dalton

More information

Deformation Morphometry: Basics and Applications

Deformation Morphometry: Basics and Applications Deformation Morphometry: Basics and Applications Valerie Cardenas Nicolson, Ph.D. Assistant Adjunct Professor NCIRE, UCSF, SFVA Center for Imaging of Neurodegenerative Diseases VA Challenge Clinical studies

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

Sparse Inverse Covariance Estimation for a Million Variables

Sparse Inverse Covariance Estimation for a Million Variables Sparse Inverse Covariance Estimation for a Million Variables Inderjit S. Dhillon Depts of Computer Science & Mathematics The University of Texas at Austin SAMSI LDHD Opening Workshop Raleigh, North Carolina

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Spectral Perturbation of Small-World Networks with Application to Brain Disease Detection

Spectral Perturbation of Small-World Networks with Application to Brain Disease Detection Spectral Perturbation of Small-World Networks with Application to Brain Disease Detection Chenhui Hu May 4, 22 Introduction Many real life systems can be described by complex networks, which usually consist

More information

Lecture 2 Part 1 Optimization

Lecture 2 Part 1 Optimization Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

The General Linear Model. Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London

The General Linear Model. Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London The General Linear Model Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM Course Lausanne, April 2012 Image time-series Spatial filter Design matrix Statistical Parametric

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

Population Based Analysis of Directional Information in Serial Deformation Tensor Morphometry

Population Based Analysis of Directional Information in Serial Deformation Tensor Morphometry Population Based Analysis of Directional Information in Serial Deformation Tensor Morphometry Colin Studholme 1,2 and Valerie Cardenas 1,2 1 Department of Radiiology, University of California San Francisco,

More information

Introduction to General Linear Models

Introduction to General Linear Models Introduction to General Linear Models Moo K. Chung University of Wisconsin-Madison mkchung@wisc.edu September 27, 2014 In this chapter, we introduce general linear models (GLM) that have been widely used

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

The Nonparanormal skeptic

The Nonparanormal skeptic The Nonpara skeptic Han Liu Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Fang Han Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Ming Yuan Georgia Institute

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Multivariate Normal Models

Multivariate Normal Models Case Study 3: fmri Prediction Graphical LASSO Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox February 26 th, 2013 Emily Fox 2013 1 Multivariate Normal Models

More information

Multivariate Normal Models

Multivariate Normal Models Case Study 3: fmri Prediction Coping with Large Covariances: Latent Factor Models, Graphical Models, Graphical LASSO Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

LASSO Review, Fused LASSO, Parallel LASSO Solvers

LASSO Review, Fused LASSO, Parallel LASSO Solvers Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

Variables. Cho-Jui Hsieh The University of Texas at Austin. ICML workshop on Covariance Selection Beijing, China June 26, 2014

Variables. Cho-Jui Hsieh The University of Texas at Austin. ICML workshop on Covariance Selection Beijing, China June 26, 2014 for a Million Variables Cho-Jui Hsieh The University of Texas at Austin ICML workshop on Covariance Selection Beijing, China June 26, 2014 Joint work with M. Sustik, I. Dhillon, P. Ravikumar, R. Poldrack,

More information

Neuroimage Processing

Neuroimage Processing Neuroimage Processing Instructor: Moo K. Chung mkchung@wisc.edu Lecture 10-11. Deformation-based morphometry (DBM) Tensor-based morphometry (TBM) November 13, 2009 Image Registration Process of transforming

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Robust and sparse Gaussian graphical modelling under cell-wise contamination

Robust and sparse Gaussian graphical modelling under cell-wise contamination Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical

More information

Sometimes the domains X and Z will be the same, so this might be written:

Sometimes the domains X and Z will be the same, so this might be written: II. MULTIVARIATE CALCULUS The first lecture covered functions where a single input goes in, and a single output comes out. Most economic applications aren t so simple. In most cases, a number of variables

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Sparse and Locally Constant Gaussian Graphical Models

Sparse and Locally Constant Gaussian Graphical Models Sparse and Locally Constant Gaussian Graphical Models Jean Honorio, Luis Ortiz, Dimitris Samaras Department of Computer Science Stony Brook University Stony Brook, NY 794 {jhonorio,leortiz,samaras}@cs.sunysb.edu

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

Morphometry. John Ashburner. Wellcome Trust Centre for Neuroimaging, 12 Queen Square, London, UK. Voxel-Based Morphometry

Morphometry. John Ashburner. Wellcome Trust Centre for Neuroimaging, 12 Queen Square, London, UK. Voxel-Based Morphometry Morphometry John Ashburner Wellcome Trust Centre for Neuroimaging, 12 Queen Square, London, UK. Overview Voxel-Based Morphometry Morphometry in general Volumetrics VBM preprocessing followed by SPM Tissue

More information

Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation

Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation Daniel B Rowe Division of Biostatistics Medical College of Wisconsin Technical Report 40 November 00 Division of Biostatistics

More information

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

High-dimensional Covariance Estimation Based On Gaussian Graphical Models High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix

More information

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

A. Motivation To motivate the analysis of variance framework, we consider the following example.

A. Motivation To motivate the analysis of variance framework, we consider the following example. 9.07 ntroduction to Statistics for Brain and Cognitive Sciences Emery N. Brown Lecture 14: Analysis of Variance. Objectives Understand analysis of variance as a special case of the linear model. Understand

More information

An Introduction to Graphical Lasso

An Introduction to Graphical Lasso An Introduction to Graphical Lasso Bo Chang Graphical Models Reading Group May 15, 2015 Bo Chang (UBC) Graphical Lasso May 15, 2015 1 / 16 Undirected Graphical Models An undirected graph, each vertex represents

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Learning Markov Network Structure using Brownian Distance Covariance

Learning Markov Network Structure using Brownian Distance Covariance arxiv:.v [stat.ml] Jun 0 Learning Markov Network Structure using Brownian Distance Covariance Ehsan Khoshgnauz May, 0 Abstract In this paper, we present a simple non-parametric method for learning the

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic

More information

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Proximity-Based Anomaly Detection using Sparse Structure Learning

Proximity-Based Anomaly Detection using Sparse Structure Learning Proximity-Based Anomaly Detection using Sparse Structure Learning Tsuyoshi Idé (IBM Tokyo Research Lab) Aurelie C. Lozano, Naoki Abe, and Yan Liu (IBM T. J. Watson Research Center) 2009/04/ SDM 2009 /

More information

HYBRID PERMUTATION TEST WITH APPLICATION TO SURFACE SHAPE ANALYSIS

HYBRID PERMUTATION TEST WITH APPLICATION TO SURFACE SHAPE ANALYSIS Statistica Sinica 8(008), 553-568 HYBRID PERMUTATION TEST WITH APPLICATION TO SURFACE SHAPE ANALYSIS Chunxiao Zhou and Yongmei Michelle Wang University of Illinois at Urbana-Champaign Abstract: This paper

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

Computational Brain Anatomy

Computational Brain Anatomy Computational Brain Anatomy John Ashburner Wellcome Trust Centre for Neuroimaging, 12 Queen Square, London, UK. Overview Voxel-Based Morphometry Morphometry in general Volumetrics VBM preprocessing followed

More information

Testing for group differences in brain functional connectivity

Testing for group differences in brain functional connectivity Testing for group differences in brain functional connectivity Junghi Kim, Wei Pan, for ADNI Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Banff Feb

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Topology identification via growing a Chow-Liu tree network

Topology identification via growing a Chow-Liu tree network 2018 IEEE Conference on Decision and Control (CDC) Miami Beach, FL, USA, Dec. 17-19, 2018 Topology identification via growing a Chow-Liu tree network Sepideh Hassan-Moghaddam and Mihailo R. Jovanović Abstract

More information

Nonparametric regression for topology. applied to brain imaging data

Nonparametric regression for topology. applied to brain imaging data , applied to brain imaging data Cleveland State University October 15, 2010 Motivation from Brain Imaging MRI Data Topology Statistics Application MRI Data Topology Statistics Application Cortical surface

More information

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework

More information

A Short Introduction to the Lasso Methodology

A Short Introduction to the Lasso Methodology A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael

More information

Multivariate Statistical Analysis of Deformation Momenta Relating Anatomical Shape to Neuropsychological Measures

Multivariate Statistical Analysis of Deformation Momenta Relating Anatomical Shape to Neuropsychological Measures Multivariate Statistical Analysis of Deformation Momenta Relating Anatomical Shape to Neuropsychological Measures Nikhil Singh, Tom Fletcher, Sam Preston, Linh Ha, J. Stephen Marron, Michael Wiener, and

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

Optimization Problems

Optimization Problems Optimization Problems The goal in an optimization problem is to find the point at which the minimum (or maximum) of a real, scalar function f occurs and, usually, to find the value of the function at that

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,

More information

Human Brain Networks. Aivoaakkoset BECS-C3001"

Human Brain Networks. Aivoaakkoset BECS-C3001 Human Brain Networks Aivoaakkoset BECS-C3001" Enrico Glerean (MSc), Brain & Mind Lab, BECS, Aalto University" www.glerean.com @eglerean becs.aalto.fi/bml enrico.glerean@aalto.fi" Why?" 1. WHY BRAIN NETWORKS?"

More information

An Homotopy Algorithm for the Lasso with Online Observations

An Homotopy Algorithm for the Lasso with Online Observations An Homotopy Algorithm for the Lasso with Online Observations Pierre J. Garrigues Department of EECS Redwood Center for Theoretical Neuroscience University of California Berkeley, CA 94720 garrigue@eecs.berkeley.edu

More information

Detecting fmri activation allowing for unknown latency of the hemodynamic response

Detecting fmri activation allowing for unknown latency of the hemodynamic response Detecting fmri activation allowing for unknown latency of the hemodynamic response K.J. Worsley McGill University J.E. Taylor Stanford University January 7, 006 Abstract Several authors have suggested

More information

On Algorithms for Solving Least Squares Problems under an L 1 Penalty or an L 1 Constraint

On Algorithms for Solving Least Squares Problems under an L 1 Penalty or an L 1 Constraint On Algorithms for Solving Least Squares Problems under an L 1 Penalty or an L 1 Constraint B.A. Turlach School of Mathematics and Statistics (M19) The University of Western Australia 35 Stirling Highway,

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Structure estimation for Gaussian graphical models

Structure estimation for Gaussian graphical models Faculty of Science Structure estimation for Gaussian graphical models Steffen Lauritzen, University of Copenhagen Department of Mathematical Sciences Minikurs TUM 2016 Lecture 3 Slide 1/48 Overview of

More information

Robust Inverse Covariance Estimation under Noisy Measurements

Robust Inverse Covariance Estimation under Noisy Measurements .. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related

More information

Pathwise coordinate optimization

Pathwise coordinate optimization Stanford University 1 Pathwise coordinate optimization Jerome Friedman, Trevor Hastie, Holger Hoefling, Robert Tibshirani Stanford University Acknowledgements: Thanks to Stephen Boyd, Michael Saunders,

More information

Functional Connectivity and Network Methods

Functional Connectivity and Network Methods 18/Sep/2013" Functional Connectivity and Network Methods with functional magnetic resonance imaging" Enrico Glerean (MSc), Brain & Mind Lab, BECS, Aalto University" www.glerean.com @eglerean becs.aalto.fi/bml

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1 10-708: Probabilistic Graphical Models, Spring 2015 27: Case study with popular GM III Lecturer: Eric P. Xing Scribes: Hyun Ah Song & Elizabeth Silver 1 Introduction: Gene association mapping for complex

More information