FUSED MULTIPLE GRAPHICAL LASSO

Size: px
Start display at page:

Download "FUSED MULTIPLE GRAPHICAL LASSO"

Transcription

1 FUSED MULTIPLE GRAPHICAL LASSO SEN YANG, ZHAOSONG LU, XIAOTONG SHEN, PETER WONKA, JIEPING YE Abstract. In this paper, we consider the probem of estimating mutipe graphica modes simutaneousy using the fused asso penaty, which encourages adjacent graphs to share simiar structures. A motivating exampe is the anaysis of brain networks of Azheimer s disease using neuroimaging data. Specificay, we may wish to estimate a brain network for the norma contros (NC), a brain network for the patients with mid cognitive impairment (MCI), and a brain network for Azheimer s patients (AD). We expect the two brain networks for NC and MCI to share common structures but not to be identica to each other; simiary for the two brain networks for MCI and AD. The proposed formuation can be soved using a second-order method. Our key technica contribution is to estabish the necessary and sufficient condition for the graphs to be decomposabe. Based on this key property, a simpe screening rue is presented, which decomposes the arge graphs into sma subgraphs and aows an efficient estimation of mutipe independent (sma) subgraphs, dramaticay reducing the computationa cost. We perform experiments on both synthetic and rea data; our resuts demonstrate the effectiveness and efficiency of the proposed approach. Key words. Fused mutipe graphica asso, screening, second-order method AMS subject cassifications. 90C22, 90C25, 90C47, 65K05, 62J10 1. Introduction. Undirected graphica modes expore the reationships among a set of random variabes through their joint distribution. The estimation of undirected graphica modes has appications in many domains, such as computer vision, bioogy, and medicine [11, 17, 44]. One instance is the anaysis of gene expression data. As shown in many bioogica studies, genes tend to work in groups based on their bioogica functions, and there exist some reguatory reationships between genes [5]. Such bioogica knowedge can be represented as a graph, where nodes are the genes, and edges describe the reguatory reationships. Graphica modes provide a usefu too for modeing these reationships, and can be used to expore gene activities. One of the most widey used graphica modes is the Gaussian graphica mode (GGM), which assumes the variabes to be Gaussian distributed [2, 47]. In the framework of GGM, the probem of earning a graph is equivaent to estimating the inverse of the covariance matrix (precision matrix), since the nonzero off-diagona eements of the precision matrix represent edges in the graph [2, 47]. In recent years many research efforts have focused on estimating the precision matrix and the corresponding graphica mode (see, for exampe [2, 10, 16, 17, 22, 23, 25, 26, 29, 30, 33, 47]. Meinshausen and Bühmann [30] estimated edges for each node in the graph by fitting a asso probem [36] using the remaining variabes as predictors. Yuan and Lin [47] and Banerjee et a. [2] proposed a penaized maximum ikeihood mode using 1 reguarization to estimate the sparse precision matrix. Numerous methods have been deveoped for soving this mode. For exampe, d Aspremont et a. [8] and Lu [25, 26] studied Nesterov s smooth gradient methods [32] for soving this probem or its dua. Banerjee et a. [2] and Friedman et a. [10] proposed bock coordinate ascent methods for soving the dua probem. The atter method [10] is Schoo of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, Arizona 85287, USA (senyang@asu.edu, peter.wonka@asu.edu, jieping.ye@asu.edu). Department of Mathematics, Simon Frasor University, Burnaby, BC, V5A 156, Canada (zhaosong@sfu.ca). Schoo of Statistics, University of Minnesota, Minneapois, Minnesota 55455, USA (xshen@umn.edu). 1

2 2 FUSED MULTIPLE GRAPHICAL LASSO widey referred to as Graphica asso (GLasso). Mazumder and Hastie [29] proposed a new agorithm caed DP-GLasso, each step of which is a box-constrained QP probem. Scheinberg and Rish [35] proposed a coordinate descent method for soving this mode in a greedy approach. Yuan [48] and Scheinberg et a. [34] appied aternating direction method of mutipiers (ADMM) [4] to this probem. Li and Toh [22] and Yuan and Lin [47] proposed to sove this probem using interior point methods. Wang et a. [40], Hsieh et a. [16], Osen et a. [33], and Dinh et a. [9] studied Newton method for soving this mode. The main chaenge of estimating a sparse precision matrix for the probems with a arge number of nodes (variabes) is its intensive computation. Witten et a. [42] and Mazumder and Hastie [28] independenty derived a necessary and sufficient condition for the soution of a singe graphica asso to be bock diagona (subject to some rearrangement of variabes). This can be used as a simpe screening test to identify the associated bocks, and the origina probem can thus be decomposed into a group of smaer sized but independent probems corresponding to these bocks. When the number of bocks is arge, it can achieve massive computationa gain. However, these formuations assume that observations are independenty drawn from a singe Gaussian distribution. In many appications the observations may be drawn from mutipe Gaussian distributions; in this case, mutipe graphica modes need to be estimated. There are some recent works on the estimation of mutipe precision matrices [7, 11, 12, 13, 19, 20, 31, 49]. Guo et a. [11] proposed a method to jointy estimate mutipe graphica modes using a hierarchica penaty. However, their mode is not convex. Honorio and Samaras [13] proposed a convex formuation to estimate mutipe graphica modes using the 1, reguarizer. Hara and Washio [12] introduced a method to earn common substructures among mutipe graphica modes. Danaher et a. [7] estimated mutipe precision matrices simutaneousy using a pairwise fused penaty and grouping penaty. ADMM was used to sove the probem, but it requires computing mutipe eigen decompositions at each iteration. Mohan et a. [31] proposed to estimate mutipe precision matrices based on the assumption that the network differences are generated from node perturbations. Compared with singe graphica mode earning, earning mutipe precision matrices jointy is even more chaenging to sove. Recenty, a necessary and sufficient condition for mutipe graphs to be decomposabe was proposed in [7]. However, such necessary and sufficient condition was restricted to two graphs ony when the fused penaty is used. It is not cear whether this screening rue can be extended to the more genera case with more than two graphs, which is the case in brain network modeing. There are two types of fused penaties that can be used for estimating mutipe (more than two) graphs: (a) pairwise fused or (b) sequentia fused [37]. In this paper we set out to address the sequentia fused case first, because we work on practica appications that can be more appropriatey formuated using the sequentia formuation. Specificay, we consider the probem of estimating mutipe graphica modes by maximizing a penaized og ikeihood with 1 and sequentia fused reguarization. The 1 reguarization yieds a sparse soution, and the fused reguarization encourages adjacent graphs to be simiar. The graphs considered in this paper have a natura order, which is common in many appications. A motivating exampe is the modeing of brain networks for Azheimer s disease using neuroimaging data such as Positron emission tomography (PET). In this case, we want to estimate graphica modes for three groups: norma contros (NC), patients of mid cognitive impairment (MCI), and Azheimer s patients (AD). These networks are expected to share some common con-

3 S. YANG, Z. LU, X. SHEN, P. WONKA, AND J. YE 3 nections, but they are not identica. Furthermore, the networks are expected to evove over time, in the order of disease progression from NC to MCI to AD. Estimating the graphica modes separatey fais to expoit the common structures among them. It is thus desirabe to jointy estimate the three networks (graphs). Our key technica contribution is to estabish the necessary and sufficient condition for the soution of the fused mutipe graphica asso (FMGL) to be bock diagona. The duaity theory and severa other toos in inear programming are used to drive the necessary and sufficient condition. Based on this crucia property of FMGL, we deveop a screening rue which enabes the efficient estimation of arge mutipe precision matrices for FMGL. The proposed screening rue can be combined with any agorithms to reduce computationa cost. We empoy a second-order method [16, 21, 38] to sove the fused mutipe graphica asso, where each step is soved by the spectra projected gradient method [27, 43]. In addition, we propose a shrinking scheme to identify the variabes to be updated in each step of the second-order method, which reduces the computation cost of each step. We conduct experiments on both synthetic and rea data; our resuts demonstrate the effectiveness and efficiency of the proposed approach Notation. In this paper, R stands for the set of a rea numbers, R n denotes the n-dimensiona Eucidean space, and the set of a m n matrices with rea entries is denoted by R m n. A matrices are presented in bod format. The space of symmetric matrices is denoted by S n. If X S n is positive semidefinite (resp. definite), we write X 0 (resp. X 0). Aso, we write X Y to mean X Y 0. The cone of positive semidefinite matrices in S n is denoted by S+. n Given matrices X and Y in R m n, the standard inner product is defined by X, Y := tr(xy T ), where tr( ) denotes the trace of a matrix. X Y and X Y means the Hadamard and Kronecker product of X and Y, respectivey. We denote the identity matrix by I, whose dimension shoud be cear from the context. The determinant and the minima eigenvaue of a rea symmetric matrix X are denoted by det(x) and λ min (X), respectivey. Given a matrix X R n n, diag(x) denotes the vector formed by the diagona of X, that is, diag(x) i = X ii for i = 1,..., n. Diag(X) is the diagona matrix which shares the same diagona as X. vec(x) is the vectorization of X. In addition, X > 0 means that a entries of X are positive. The rest of the paper is organized as foows. We introduce the fused mutipe graphica asso formuation in Section 2. The screening rue is presented in Section 3. The proposed second-order method is presented in Section 4. The experimenta resuts are shown in Section 5. We concude the paper in Section Fused mutipe graphica asso. Assume we are given K data sets, x (k) R nk p, k = 1,..., K with K 2, where n k is the number of sampes, and p is the number of features. The p features are common for a K data sets, and a K n k sampes are independent. Furthermore, the sampes within each data set x (k) are identicay distributed with a p-variate Gaussian distribution with zero mean and positive definite covariance matrix Σ (k), and there are many conditionay independent pairs of features, i.e., the precision matrix Θ (k) = (Σ (k) ) 1 shoud be sparse. For notationa simpicity, we assume that n 1 =... = n K = n. Denote the sampe covariance matrix for each data set x (k) as S (k) with S (k) = 1 n (x(k) ) T x (k), and Θ = (Θ (1),..., Θ (K) ). Then the negative og ikeihood for the data takes the form of (2.1) K ( ) og det(θ (k) ) + tr(s (k) Θ (k) ).

4 4 FUSED MULTIPLE GRAPHICAL LASSO Ceary, minimizing (2.1) eads to the maximum ikeihood estimate (MLE) ˆΘ (k) = (S (k) ) 1. However, the MLE fais when S (k) is singuar. Furthermore, the MLE is usuay dense. The 1 reguarization has been empoyed to induce sparsity, resuting in the sparse inverse covariance estimation [2, 10, 46]. In this paper, we empoy both the 1 reguarization and the fused reguarization for simutaneousy estimating mutipe graphs. The 1 reguarization eads to a sparse soution, and the fused penaty encourages Θ (k) to be simiar to its neighbors. Mathematicay, we sove the foowing formuation: (2.2) where min K Θ (k) 0,...K P (Θ) = λ 1 K i j ( ) og det(θ (k) ) + tr(s (k) Θ (k) ) + P (Θ), Θ (k) + λ 2 K 1 i j Θ (k) Θ (k+1), λ 1 > 0 and λ 2 > 0 are positive reguarization parameters. This mode is referred to as the fused mutipe graphica asso (FMGL). To ensure the existence of a soution for probem (2.2), we assume throughout this paper that diag(s (k) ) > 0, k = 1,..., K. Reca that S (k) is a sampe covariance matrix, and hence diag(s (k) ) 0. The diagona entries may be not, however, stricty positive. But we can aways add a sma perturbation (say 10 8 ) to ensure the above assumption hods. The foowing theorem shows that under this assumption the FMGL (2.2) has a unique soution. Theorem 2.1. Under the assumption that diag(s (k) ) > 0, k = 1,..., K, probem (2.2) has a unique optima soution. To prove Theorem 2.1, we first estabish a technica emma which regards the existence of a soution for a standard graphica asso probem. Lemma 2.2. Let S S p + and Λ S p be such that Diag(S) + Λ > 0 and diag(λ) 0. Consider the probem (2.3) min og det(x) + tr(sx) + Λ X. X 0 }{{} f(x) Then the foowing statements hod: (a) Probem (2.3) has a unique optima soution; (b) The sub-eve set L = {X 0 : f(x) α} is compact for any α f, where f is the optima vaue of (2.3). Proof. (a) Let U = {U S p : U [ 1, 1], i, j}. Consider the probem (2.4) max {og det(s + Λ U) : S + Λ U 0}. U U We first caim that the feasibe region of probem (2.4) is nonempty, or equivaenty, there exists Ū U such that λ min(s + Λ Ū) > 0. Indeed, one can observe that max λ min(s + Λ U) = max {t : Λ U + S ti 0}, U U t,u U

5 S. YANG, Z. LU, X. SHEN, P. WONKA, AND J. YE 5 (2.5) = min max {t + tr(x(λ U + S ti))}, X 0 t,u U = min X 0 tr(sx) + Λ X : tr(x) = 1, where the second equaity foows from the Lagrangian duaity since its associated Sater condition is satisfied. Let Ω := {X S p : tr(x) = 1, X 0}. By the assumption Diag(S) + Λ > 0, we see that Λ > 0 for a i j and S ii + Λ ii > 0 for every i. Since Ω S p +, we have tr(sx) 0 for a X Ω. If there exists some k such that X k > 0, then i j Λ X > 0 and hence, (2.6) tr(sx) + Λ X > 0, X Ω. Otherwise, one has X = 0 for a i j, which, together with the facts that S ii +Λ ii > 0 for a i and tr(x) = 1, impies that for a X Ω, tr(sx) + Λ X = i (S ii + Λ ii )X ii tr(x) min i (S ii + Λ ii ) > 0. Hence, (2.6) again hods. Combining (2.5) with (2.6), one can see that max λ min(s + U U Λ U) > 0. Therefore, probem (2.4) has at east a feasibe soution. We next show that probem (2.4) has an optima soution. Let Ū be a feasibe point of (2.4), and Ω := {U U : og det(s + Λ U) og det(s + Λ Ū), S + Λ U 0}. One can observe that {S + Λ U : U U} is compact. Using this fact, it is not hard to see that og det(s + Λ U) as U U and λ min (S + Λ U) 0. Thus there exists some δ > 0 such that which impies that Ω {U U : S + Λ U δi}, Ω = {U U : og det(s + Λ U) og det(s + Λ Ū), S + Λ U δi}. Hence, Ω is a compact set. In addition, one can observe that probem (2.4) is equivaent to max og det(s + Λ U). U Ω The atter probem ceary has an optima soution and so is probem (2.4). Finay we show that X = (S+Λ U ) 1 is the unique optima soution of (2.3), where U is an optima soution of (2.4). Since S + Λ U 0, we have X 0. By the definitions of U and X, and the first-order optimaity conditions of (2.4) at U, one can have U = 1 if X > 0; β [ 1, 1] if X = 0; 1 otherwise.

6 6 FUSED MULTIPLE GRAPHICAL LASSO It foows that Λ U ( Λ X ) at X = X, where ( ) stands for the subdifferentia of the associated convex function. For convenience, et f(x) denote the objective function of (2.3). Then we have (X ) 1 + S + Λ U f(x ), which, together with X = (S + Λ U ) 1, impies that 0 f(x ). Hence, X is an optima soution of (2.3) and moreover it is unique due to the strict convexity of og det( ). (b) By statement (a), probem (2.3) has a finite optima vaue f. Hence, the above sub-eve set L is nonempty. We can observe that for any X L, 1 Λ X = f(x) [ og det(x) + tr(sx) + 1 Λ X ], 2 2 }{{} f(x) (2.7) α f, where f := inf{f(x) : X 0}. By the assumption Diag(S) + Λ > 0, one has Diag(S) + Λ/2 > 0. This together with statement (a) yieds f R. Notice that Λ > 0 for a i j. This reation and (2.7) impy that X is bounded for a X L and i j. In addition, it is we-known that det(x) X 11 X 22 X pp for a X 0. Using this reation, the definition of f( ), and the boundedness of X for a X L and i j, we have that for every X L, og(x ii ) + (S ii + Λ ii )X ii f(x) (S X + Λ X ), i j (2.8) i α i j (S X + Λ X ) δ for some δ > 0. In addition, notice from the assumption that S ii + Λ ii > 0 for a i, and hence og(x ii ) + (S ii + Λ ii )X ii 1 + min og(s kk + Λ kk ) =: σ k for a i. This reation together with (2.8) impies that for every X L and a i, og(x ii ) + (S ii + Λ ii )X ii δ (p 1)σ, and hence X ii is bounded for a i and X L. We thus concude that L is bounded. In view of this resut and the definition of f, it is not hard to see that there exists some ν > 0 such that λ min (X) ν for a X L. Hence, one has L = {X νi : f(x) α}. By the continuity of f on {X : X νi}, it foows that L is cosed. Hence, L is compact. We are now ready to prove Theorem 2.1. Proof. Since λ 1 > 0 and diag(s (k) ) > 0, k = 1,..., K, it foows from Lemma 2.2 that there exists some δ such that for each k = 1,..., K, og det(θ (k) ) + tr(s (k) Θ (k) ) + λ 1 Θ (k) δ, Θ(k) 0. i j

7 S. YANG, Z. LU, X. SHEN, P. WONKA, AND J. YE 7 For convenience, et h(θ) denote the objective function of (2.2) and Θ = ( Θ (1),..., Θ (K) ) an arbitrary feasibe point of (2.2). Let Ω = { Θ = (Θ (1),..., Θ (K) ) : h(θ) h( Θ), Θ (k) 0, k = 1,..., K }, Ω k = {Θ (k) 0 : og det(θ (k) ) + tr(s (k) Θ (k) ) + λ 1 i j Θ(k) δ } for k = 1,..., K, where δ = h( Θ) (K 1)δ. Then it is not hard to observe that Ω Ω := Ω 1 Ω K. Moreover, probem (2.2) is equivaent to (2.9) min h(θ). Θ Ω In view of Lemma 2.2, we know that Ω k is compact for a k, which impies that Ω is aso compact. Notice that h is continuous and stricty convex on Ω. Hence, probem (2.9) has a unique optima soution and so is probem (2.2). 3. The screening rue for fused mutipe graphica asso. Due to the presence of the og determinant, it is chaenging to sove the formuations invoving the penaized og-ikeihood efficienty. The existing methods for singe graphica asso are not scaabe to the probems with a arge amount of features because of the high computationa compexity. Recent studies have shown that the graphica mode may contain many connected components, which are disjoint with each other, due to the sparsity of the graphica mode, i.e., the corresponding precision matrix has a bock diagona structure (subject to some rearrangement of features). To reduce the computationa compexity, it is advantageous to first identify the bock structure and then compute the diagona bocks of the precision matrix instead of the whoe matrix. Danaher et a. [7] deveoped a simiar necessary and sufficient condition for fused graphica asso with two graphs, thus the bock structure can be identified. However, it remains a chaenge to derive the necessary and sufficient condition for the soution of fused mutipe graphica asso to be bock diagona for K > 2 graphs. In this section, we first present a theorem demonstrating that FMGL can be decomposabe once its soution has a bock diagona structure. Then we derive a necessary and sufficient condition for the soution of FMGL to be bock diagona for arbitrary number of graphs. Let C 1,..., C L be a partition of the p features into L non-overapping sets, with C C =, and L =1 C = {1,..., p}. We say that the soution Θ of FMGL (2.2) is bock diagona with L known bocks consisting of features in the sets C, = 1,..., L if there exists a permutation matrix U R p p such that each estimation precision matrix takes the form of (3.1) Θ (k) = U Θ (k) 1... Θ (k) L UT, k = 1,..., K. For simpicity of presentation, we assume throughout this paper that U = I. The foowing decomposition resut for probem (2.2) is straightforward. Its proof is thus omitted. Theorem 3.1. Suppose that the soution Θ of FMGL (2.2) is bock diagona with L known C, = 1,..., L, i.e., each estimated precision matrix has the form (3.1)

8 8 FUSED MULTIPLE GRAPHICAL LASSO with U = I. Let Θ = ( (3.2) Θ = arg min Θ 0 (1) Θ,..., K ( (K) Θ ) for = 1,..., L. Then there hods: og det(θ (k) ) + tr(s (k) Θ (k) ) ) + P (Θ ), = 1,..., L, where Θ (k) and S (k) are the C C symmetric submatrices of Θ (k) and S (k) corresponding to the -th diagona bock, respectivey, for k = 1,..., K, and Θ = (Θ (1),..., Θ (K) ) for = 1,..., L. The above theorem demonstrates that if a arge-scae FMGL probem has a bock diagona soution, it can then be decomposed into a group of smaer sized FMGL probems. The computationa cost for the atter probems can be much cheaper. Now one natura question is how to efficienty identify the bock diagona structure of the FMGL soution before soving the probem. We address this question in the remaining part of this section. The foowing theorem provides a necessary and sufficient condition for the soution of FMGL to be bock diagona with L bocks C, = 1,..., L, which is a key for deveoping efficient decomposition scheme for soving FMGL. Since its proof requires some substantia deveopment of other technica resuts, we sha postpone the proof unti the end of this section. Theorem 3.2. The FMGL (2.2) has a bock diagona soution Θ (k), k = 1,..., K with L known bocks C, = 1,..., L if and ony if S (k), k = 1,..., K satisfy the foowing inequaities: t S(k) tλ 1 + λ 2, (3.3) t 1 k=0 S(r+k) tλ 1 + 2λ 2, 2 r K t, t S(K t+k) tλ 1 + λ 2, K S(k) Kλ 1 for t = 1,..., K 1, i C, j C,. One immediate consequence of Theorem 3.2 is that the conditions (3.3) can be used as a screening rue to identify the bock diagona structure of the FMGL soution. The steps about this rue are described as foows. 1. Construct an adjacency matrix E = I p p. Set E = E ji = 0 if S (k), k = 1,..., K satisfy the conditions (3.3). Otherwise, set E = E ji = Identify the connected components of the adjacency matrix E (for exampe, it can be done by caing the Matab function graphconncomp ). In view of Theorem 3.2, it is not hard to observe that the resuting connected components are the partition of the p features into nonoverapping sets. It then foows from Theorem 3.1 that a arge-scae FMGL probem can be decomposed into a group of smaer sized FMGL probems restricted to the features in each connected component. The computationa cost for the atter probems can be much cheaper. Therefore, this approach may enabe us to sove arge-scae FMGL probems very efficienty. In the remainder of this section we provide a proof for Theorem 3.2. proceeding, we estabish severa technica emmas as foows. Before

9 S. YANG, Z. LU, X. SHEN, P. WONKA, AND J. YE 9 Lemma 3.3. Given any two arbitrary index sets I {1,, n} and J {1,, n 1}, et Ī and J be the compement of I and J with respect to {1,, n} and {1,, n 1}, respectivey. Define (3.4) P I,J = { y R n : y I 0, y Ī 0, y J y J+1 0, y J y J+1 0 }, where J + 1 = {j + 1 : j J} and J + 1 = {j + 1 : j J}. Then, the foowing statements hod: (i) Either P I,J = {0} or P I,J is unbounded; (ii) 0 is the unique extreme point of P I,J ; (iii) Suppose that P I,J is unbounded. Then, = ext(p I,J ) Q, where ext(p I,J ) denotes the set of a extreme rays of P I,J, and (3.5) Q := {α(0,, 0, 1,, 1, 0,, 0) T R n : α 0, m 0, 1 n}. }{{}}{{} m Proof. (i) We observe that 0 P I,J. If P I,J {0}, then there exists 0 y P I,J. Hence, {αy : α 0} P I,J, which impies that P I,J is unbounded. (ii) It is easy to see that 0 P I,J and moreover there exist n ineary independent active inequaities at 0. Hence, 0 is an extreme point of P I,J. On the other hand, suppose y is an arbitrary extreme point of P I,J. Then there exist n ineary independent active inequaities at y, which together with the definition of P I,J immediatey impies y = 0. Therefore, 0 is the unique extreme point of P I,J. (iii) Suppose that P I,J is unbounded. By statement (ii), we know that P I,J has a unique extreme point. Using Minkowski s resoution theorem (e.g., see [3]), we concude that ext(p I,J ). Let d ext(p I,J ) be arbitrariy chosen. Then d 0. It foows from (3.4) that d satisfies the inequaities (3.6) d I 0, d Ī 0, d J d J+1 0, d J d J+1 0, and moreover, the number of independent active inequaities at d is n 1. If a entries of d are nonzero, then d must satisfy d J d J+1 = 0 and d J d J+1 = 0 (with a tota number n 1), which impies d 1 = d 2 = = d n and thus d Q. We now assume that d has at east one zero entry. Then, there exist positive integers k, {m i } k i=1 and {n i } k i=1 satisfying m i n i < m i+1 n i+1 for i = 1,..., k 1 such that (3.7) {i : d i = 0} = {m 1,, n 1 } {m 2,, n 2 } {m k,, n k }. One can immediatey observe that (3.8) d mi = = d ni = 0, d j d j+1 = 0, m i j n i 1, 1 i k. We next divide the rest of proof into four cases. Case (a): m 1 = 1 and n k = n. In view of (3.7), one can observe that d mi 1 d mi 0 and d ni 1 d ni for i = 2,..., k. We then see from (3.6) that except the active inequaities given in (3.8), a other possibe active inequaities at d are (3.9) d j d j+1 = 0, n i 1 < j < m i 1, 2 i k (with a tota number k i=2 (m i n i 1 2)). Notice that the tota number of independent active inequaities given in (3.8) is k i=1 (n i m i + 1). Hence, the number

10 10 FUSED MULTIPLE GRAPHICAL LASSO of independent active inequaities at d is at most k (n i m i + 1) + i=1 k (m i n i 1 2) = n k m 1 k + 2 = n k + 1. i=2 Reca that the number of independent active inequaities at d is n 1. Hence, we have n k + 1 n 1, which impies k 2. Due to d 0, we observe that k 1 hods for this case. Aso, we know that k > 0. Hence, k = 2. We then see that a possibe active inequaities described in (3.9) must be active at d, which together with k = 2 immediatey impies that d Q. Case (b): m 1 = 1 and n k < n. Using (3.7), we observe that d mi 1 d mi 0 for i = 2,..., k and d ni d ni +1 0 for i = 1,..., k. In view of these reations and a simiar argument as in case (a), one can see that the number of independent active inequaities at d is at most k (n i m i + 1) + i=1 k (m i n i 1 2) + n n k 1 = n m 1 k + 1 = n k. i=2 Simiary as in case (a), we can concude from the above reation that k = 1 and d Q. Case (c): m 1 > 1 and n k = n. By (3.7), one can observe that d mi 1 d mi 0 for i = 1,..., k and d ni d ni+1 0 for i = 1,..., k 1. Using these reations and a simiar argument as in case (a), we see that the number of independent active inequaities at d is at most m k (n i m i + 1) + i=1 k (m i n i 1 2) = n k k = n k. i=2 Simiary as in case (a), we can concude from the above reation that k = 1 and d Q. Case (d): m 1 > 1 and n k < n. From (3.7), one can observe that d mi 1 d mi 0 for i = 1,..., k and d ni d ni +1 0 for i = 1,..., k. By virtue of these reations and a simiar argument as in case (a), one can see that the number of independent active inequaities at d is at most m k (n i m i + 1) + i=1 k (m i n i 1 2) + n n k 1 = n k 1. i=2 Reca that k 1 and the number of independent active inequaities at d is n 1. Hence, this case cannot occur. Combining the above four cases, we concude that ext(p I,J ) Q. Lemma 3.4. Let P IJ and Q be defined in (3.4) and (3.5), respectivey. Then, {ext(p I,J ) : I {1,, n}, J {1,, n 1}} = Q. Proof. It foows from Lemma 3.3 (iii) that {ext(p I,J ) : I {1,, n}, J {1,, n 1}} Q.

11 We next show that S. YANG, Z. LU, X. SHEN, P. WONKA, AND J. YE 11 {ext(p I,J ) : I {1,, n}, J {1,, n 1}} Q. Indeed, et d Q be arbitrariy chosen. Then, there exist α 0 and positive integers m 1 and n 1 satisfying 1 m 1 n 1 such that d i = α for m 1 i n 1 and the rest of d i s are 0. If α > 0, it is not hard to see that d ext(p I,J ) with I = {1,, n} and J = {m 1,, n 1}. Simiary, if α < 0, d ext(p I,J ) with I = and J being the compement of J = {m1,, n 1}. Hence, d {ext(p I,J ) : I {1,, n}, J {1,, n 1}}. Lemma 3.5. Let x R n, λ 1, λ 2 0 be given, and et f(y) := x T y λ 1 n i=1 n 1 y i λ 2 y i y i+1. Then, f(y) 0 for a y R n if and ony if x satisfies the foowing inequaities: i=1 k j=1 x j kλ 1 + λ 2, k 1 j=0 x i+j kλ 1 + 2λ 2, 2 i n k, k j=1 x n k+j kλ 1 + λ 2, n j=1 x j nλ 1 for k = 1,..., n 1. Proof. Let P I,J be defined in (3.4) for any I {1,..., n} and J {1,..., n 1}. We observe that (a) R n = {P I,J : I {1,..., n}, J {1,..., n 1}}; (b) f(y) 0 for a y R n if and ony if f(y) 0 for a y P I,J, and every I {1,..., n} and J {1,..., n 1}; (c) f(y) is a inear function of y when restricted to the set P I,J for every I {1,..., n} and J {1,..., n 1}. If P I,J is bounded, we have P I,J = {0} and f(y) = 0 for y P I,J. Suppose that P I,J is unbounded. By Lemma 3.3 and Minkowski s resoution theorem, P I,J equas the finitey generated cone by ext(p I,J ). It then foows that f(y) 0 for a y P I,J if and ony if f(d) 0 for a d ext(p I,J ). Using these facts and Lemma 3.4, we see that f(y) 0 for a y R n if and ony if f(d) 0 for a d Q, where Q is defined in (3.5). By the definitions of Q and f, we further observe that f(y) 0 for a y R n if and ony if f(d) 0 for a d ±(0, } {{, 0, 1,, 1, 0,, 0) T R n : m 0, 1 n }}{{}, m which together with the definition of f immediatey impies that the concusion of this emma hods.

12 12 FUSED MULTIPLE GRAPHICAL LASSO (3.10) Lemma 3.6. Let x R n, λ 1, λ 2 0 be given. The inear system x 1 + λ 1 γ 1 + λ 2 v 1 = 0, x i + λ 1 γ i + λ 2 (v i v i 1 ) = 0, 2 i n 1, x n + λ 1 γ n λ 2 v n 1 = 0, 1 γ i 1, i = 1,..., n, 1 v i 1, i = 1,..., n 1 has a soution (γ, v) if and ony if (x, λ 1, λ 2 ) satisfies the foowing inequaities: k j=1 x j kλ 1 + λ 2, k 1 j=0 x i+j kλ 1 + 2λ 2, 2 i n k, k j=1 x n k+j kλ 1 + λ 2, n j=1 x j nλ 1 for k = 1,..., n 1. Proof. The inear system (3.10) has a soution if and ony if the inear programming (3.11) min γ,v {0T γ + 0 T v : (γ, v) satisfies (3.10)} has an optima soution. The Lagrangian dua of (3.11) is max min y γ,v { which is equivaent to x T y + λ 1 n i=1 } n 1 y i γ i + λ 2 (y i y i+1 )v i : 1 γ, v 1, i=1 (3.12) max f(y) := x T y λ 1 y n i=1 n 1 y i λ 2 y i y i+1. i=1 By the Lagrangian duaity theory, probem (3.11) has an optima soution if and ony if its dua probem (3.12) has optima vaue 0, which is equivaent to f(y) 0 for a y R n. The concusion of this emma then immediatey foows from Lemma 3.5. We are now ready to prove Theorem 3.2. Proof. For the sake of convenience, we denote the inverse of Θ (k) as Ŵ(k) for k = 1,..., K. By the first-order optimaity conditions, we observe that Θ (k) 0, k = 1,..., K is the optima soution of probem (2.2) if and ony if it satisfies (3.13) (3.14) (3.15) (3.16) Ŵ(k) ii + S (k) ii = 0, 1 k K, Ŵ(1) + S (1) + λ 1 γ (1) + λ 2 υ (1,2) = 0, Ŵ(k) Ŵ(K) + S (k) + S (K) + λ 1 γ (k) + λ 1 γ (K) + λ 2 ( υ (k 1,k) λ 2 υ (K 1,K) = 0 + υ (k,k+1) ) = 0, 2 k K 1,

13 for a i, j = 1,..., p, i j, where γ (k) υ (k,k+1) ( Θ (k) S. YANG, Z. LU, X. SHEN, P. WONKA, AND J. YE 13 is a subgradient of Θ (k), Θ (k+1) and υ (k,k+1) ), that is, υ (k,k+1) [ 1, 1] if = Θ (k) Θ (k+1) is a subgradient of Θ (k) at Θ(k) = Θ (k) = 1 if (k+1) Θ. with respect to Θ (k) (k+1) > Θ, υ (k,k+1) Θ (k) ; and at (Θ (k), Θ(k+1) ) = (k) (k+1) = 1 if Θ < Θ, Necessity: Suppose that Θ (k), k = 1,..., K is a bock diagona optima soution of probem (2.2) with L known bocks C, = 1,..., L. Note that Ŵ(k) has the same bock diagona structure as Θ (k). Hence, Ŵ(k) (k) = Θ = 0 for i C, j C,. This together with (3.14)-(3.16) impies that for each i C, j C,, there exist, v(k,k+1) ), k = 1,..., K 1 and γ (K) such that (γ (k) (3.17) S (1) + λ 1 γ (1) + λ 2 υ (1,2) = 0, S (k) S (K) + λ 1 γ (k) + λ 1 γ (K) + λ 2 ( υ (k 1,k) λ 2 υ (K 1,K) = 0, 1 γ (k) 1, 1 k K, 1 v (k,k+1) 1, 1 k K 1. + υ (k,k+1) ) = 0, 2 k K 1, Using (3.17) and Lemma 3.6, we see that (3.3) hods for t = 1,..., K 1, i C, j C,. Sufficiency: Suppose that (3.3) hods for t = 1,..., K 1, i C, j C,. It then foows from Lemma 3.6 that for each i C, j C,, there exist (γ (k), v(k,k+1) ), k = 1,..., K 1 and γ (K) such that (3.17) hods. Now et Θ (k), k = 1,..., K be a bock diagona matrix as defined in (3.1) with U = I, where Θ = (1) (K) ( Θ,..., Θ ) is given by (3.2) for = 1,..., L. Aso, et Ŵ(k) be the inverse of Θ (k) for k = 1,..., K. Since Θ is the optima soution of probem (3.2), the first-order optimaity conditions impy that (3.13)-(3.16) hod for a i, j C, i j, = 1,..., L. (k) Notice that Θ = Ŵ(k) = 0 for every i C, j C,. Using this fact and (3.17), we observe that (3.13)-(3.16) aso hod for a i C, j C,. It then foows that Θ (k), k = 1,..., K is an optima soution of probem (2.2). In addition, Θ (k), k = 1,..., K is bock diagona with L known bocks C, = 1,..., L. The concusion thus hods. 4. Second-order method. The screening rue proposed in Section 3 is capabe of partitioning a features into a group of smaer sized bocks. Accordingy, a argescae FMGL (2.2) can be decomposed into a number of smaer sized FMGL probems. For each bock, we need to compute its individua estimated precision matrix Θ (k) by soving the FMGL (2.2) with S (k) repaced by S (k). In this section, we discuss how to sove those singe bock FMGL probems efficienty. For simpicity of presentation, we assume throughout this section that the FMGL (2.2) has ony one bock, that is, L = 1. We now propose a second-order method to sove the FMGL (2.2). For simpicity of notation, we et Θ := (Θ (1),..., Θ (K) ) and use t to denote the Newton iteration index. Let Θ t = (Θ (1) t,..., Θ (K) t ) be the approximate soution obtained at the t-th Newton iteration.

14 14 FUSED MULTIPLE GRAPHICAL LASSO The optimization probem (2.2) can be rewritten as (4.1) where min F (Θ) := K f k (Θ (k) ) + P (Θ), Θ 0 f k (Θ (k) ) = og det(θ (k) ) + tr(s (k) Θ (k) ). In the second-order method, we approximate the objective function F (Θ) at the current iterate Θ t by a quadratic mode Q t (Θ): (4.2) min Θ Q t(θ) := K q k (Θ (k) ) + P (Θ), where q k is the quadratic approximation of f k at Θ (k) t, that is, q k (Θ (k) ) = 1 2 tr(w(k) t D (k) W (k) D (k) ) + tr((s (k) W (k) )D (k) ) + f k (Θ (k) t with W (k) t = (Θ (k) t ) 1 and D (k) = Θ (k) Θ (k) t. Suppose that Θ t+1 is the optima soution of (4.2). Then we obtain the Newton search direction t t ) (4.3) D = Θ t+1 Θ t. We sha mention that the subprobem (4.2) can be suitaby soved by the nonmonotone spectra projected gradient (NSPG) method (see, for exampe, [43, 27]). It was shown by Lu and Zhang [27] that the NSPG method is ocay ineary convergent. Numerous computationa studies have demonstrated that the NSPG method is very efficient though its goba convergence rate is so far unknown. When appied to (4.2), the NSPG method requires soving the proxima subprobems in the form of (4.4) min Θ 1 2 K Θ (k) G (k) 2 F + αp (Θ) for some G = (G (1),..., G (K) ) and α > 0. By the definition of P (Θ), it is not hard to see that probem (4.4) can be decomposed into a set of independent and smaer sized probems (4.5) 1 min Θ (k),,...,k 2 K (Θ (k) G (k) )2 + α 1 K K 1 Θ (k) + α 2 Θ (k) Θ (k+1) for a i j, j = 1,..., p, where (α 1, α 2 ) = α(λ 1, λ 2 ). The probem (4.5) is known as the fused asso signa approximator, which can be soved very efficienty and exacty [6, 24]. In addition, they are independent from each other and thus can be soved in parae. Given the current search direction D = (D (1),..., D (K) ) that is computed above, we need to find the suitabe step ength β (0, 1] to ensure a sufficient reduction in the objective function of (2.2) and positive definiteness of the next iterate Θ (k) Θ (k) t t+1 = + βd (k), k = 1,..., K. In the context of the standard (singe) graphica asso,

15 S. YANG, Z. LU, X. SHEN, P. WONKA, AND J. YE 15 Hsieh et a. [16] have shown that a step ength satisfying the above requirements aways exists. We can simiary prove that the desired step ength aso exists for the FMGL (2.2). Lemma 4.1. Let Θ t = (Θ (1) t,..., Θ (K) t ) be such that Θ (k) t 0 for k = 1,..., K, and et D = (D (1),..., D (K) ) be the associated Newton search direction computed according to (4.2). Suppose D 0. 1 Then there exists a β > 0 such that Θ (k) t + βd (k) 0 and the sufficient reduction condition (4.6) F (Θ t + βd) F (Θ t ) + σβδ hods for a 0 < β < β, where σ (0, 1) is a given constant and δ = K tr((s (k) W (k) t )D (k) ) + P (Θ t + D) P (Θ t ). Proof. Let β = 1/ max{ (Θ (k) t ) 1 D (k) 2 : k = 1,..., K}, where 2 denotes the spectra norm of a matrix. Since D 0 and Θ (k) 0, k = 1,..., K, we see that β > 0. Moreover, we have for a 0 < β < β and k = 1,..., K, ( ) Θ (k) t + βd (k) (Θ (k) t ) 1 2 = I + β(θ (k) t (Θ (k) t ) 1 2 By the definition of D and (4.2), one can easiy show that δ K t ) 1 2 D (k) (Θ (k) t ) 1 2 (1 β (Θ (k) t ) 1 D (k) 2 )I 0. tr(w (k) t D (k) W (k) t D (k) ), which together with the fact that W (k) t 0, k = 1,..., K and D 0 impies that δ < 0. Using differentiabiity of f k, convexity of P, and the definition of δ, we obtain that for a sufficienty sma β > 0, F (Θ t + βd) F (Θ t ) = K (f k(θ (k) t + βd (k) ) f k (Θ (k) t )) + P (Θ t + βd) P (Θ t ), = K tr((s(k) W (k) t )D (k) )β + o(β) + P (β(θ t + D) + (1 β)θ t ) P (Θ t ), K tr((s(k) W (k) t )D (k) )β + o(β) + βp (Θ t + D) + (1 β)p (Θ t ) P (Θ t ), βδ + o(β). This inequaity together with δ < 0 and σ (0, 1) impies that there exists ˆβ > 0 such that for a β (0, ˆβ), F (Θ t + βd) F (Θ t ) σβδ. It then foows that the concusion of this emma hods for β = min{ β, ˆβ}. By virtue of Lemma 4.1, we can adopt the we-known Armo s backtracking ine search rue [38] to seect a step ength β (0, 1] so that Θ (k) t + βd (k) 0 and (4.6) hods. In particuar, we choose β to be the argest number of the sequence {1, 1/2,..., 1/2 i,...} that satisfies these requirements. We can use the Choesky factorization to check the positive definiteness of Θ (k) t + βd (k), k = 1,..., K [16]. In addition, the associated terms og det(θ (k) t + βd (k) ) and (Θ (k) t + βd (k) ) 1 can be efficienty computed as a byproduct of the Choesky decomposition of Θ (k) t + βd (k). 1 It is we known that if D = 0, Θ t is the optima soution of probem (2.2).

16 16 FUSED MULTIPLE GRAPHICAL LASSO 4.1. Shrinking scheme. Given the arge number of unknown variabes in (4.2), it is advantageous to minimize (4.2) in a reduced space. The issue now is how to identify the reduced space. In the case of a singe graph (K = 1), probem (4.2) degenerates to a asso probem of size p 2. Hsieh et a. [16] proposed a strategy to determine a subset of variabes that are aowed to be updated in each Newton iteration for singe graphica asso. Specificay, the p 2 variabes in singe graphica asso are partitioned into two sets, J free and J fixed, based on the gradient at the start of each Newton iteration, and then the minimization is ony performed on the variabes in J free. We ca this technique shrinking in this paper. Due to the sparsity of the precision matrix, the size of J free is usuay much smaer than p 2. Moreover, it has been shown in the singe graph case that the size of J free wi decrease quicky [16]. The shrinking technique can thus improve the computationa efficiency. This technique was aso successfuy used in [18, 33, 45]. We show that shrinking can be extended to the fused mutipe graphica asso based on the resuts estabished in Section 3. G (k) t Denote the gradient of f k at t-th iteration by eement by. Then we have the foowing resut. G (k) t, = S (k) W (k) t Lemma 4.2. For Θ t in the t-th iteration, define the fixed set J fixed as, and its (i, j)-th J fixed = {(i, j) Θ (1) t, =... = Θ(K) (1) (K) t, = 0 and G t,,..., G t, satisfy the inequaities beow}. (4.7) u (k) G t, < uλ 1 + λ 2, u 1 (r+k) k=0 G t, < uλ 1 + 2λ 2, 2 r K u, u (K u+k) G t, < uλ 1 + λ 2, K G (k) t, < Kλ 1 for u = 1,..., K 1. Then, the soution of the foowing optimization probem is D (1) =... = D (K) = 0 : (4.8) min Q t(θ t + D) such that D (1) =... = D (K) = 0, (i, j) / J fixed. D (4.9) Proof. Consider probem (4.8), which can be reformuated to where H (k) t ( 1 K min D 2 vec(d(k) ) T H (k) t vec(d (k) ) + vec( +P (Θ t + D), s.t. D (1) = W (k) t =... = D (K) = 0, (i, j) / J fixed, W (k) t. Because of the constraint D (1) ) (k) G t ) T vec(d (k) ) =... = D (K) = 0, (i, j) / J fixed, we ony consider the variabes in the set J fixed. According to Lemma 3.6, it is easy to see that D Jfixed = 0 satisfies the optimaity condition of the foowing probem min D Jfixed K vec( G (k) t,j fixed ) T vec(d (k) J fixed ) + P (D Jfixed ).

17 S. YANG, Z. LU, X. SHEN, P. WONKA, AND J. YE 17 Since K vec(d(k) ) T H (k) t vec(d (k) ) 0, the optima soution of (4.8) is given by D (1) =... = D (K) = 0. Lemma 4.2 provides a shrinking scheme to partition the variabes into the free set J free and the fixed set J fixed. With shrinking, each Newton step of the proposed second-order method fas into a bock coordinate gradient descent framework [38]. Lemma 4.2 shows that when the variabes in the free set J free are fixed, no update is needed for the variabes in the fixed set J fixed. Minimization of (4.2) restricted to the free set can therefore guarantee the convergence to the unique optima soution [16, 38]. In addition, it has been shown that oca quadratic convergence rate can be achieved when the exact Hessian is used (see, for exampe, [16, 21]). The resuting second-order method for soving the fused mutipe graphica asso is summarized in Agorithm 1. Agorithm 1: Proposed second-order method for Fused Mutipe Graphica Lasso (FMGL) Input: S (k), k = 1,..., K, λ 1, λ 2 Output: Θ (k), k = 1,..., K Initiaization: Θ (k) 0 = (diag(s (k) )) 1 ; whie Not Converged do Determine the sets of free and fixed indices J free and J fixed using Lemma 4.2. Compute the Newton direction D (k), k = 1,..., K by soving (4.2) and (4.3) over the free variabes J free. Choose Θ (k) t+1 by performing the Armo backtracking ine search aong Θ (k) t + βd (k) for k = 1,..., K. end return Θ (k), k = 1,..., K; 5. Experimenta resuts. In this section, we evauate the proposed agorithm and screening rue on synthetic datasets and two rea datasets: ADHD and FDG- PET images 3. The experiments are performed on a PC with quad-core Inte 2.67GHz CPU and 9GB memory Simuation Efficiency. We conduct experiments to demonstrate the effectiveness of the proposed screening rue and the efficiency of our method FMGL. The foowing agorithms are incuded in our comparisons: FMGL: the proposed second-order method in Agorithm 1. ADMM: ADMM method. FMGL-S: FMGL with screening. ADMM-S: ADMM with screening. Both FMGL and ADMM are written in Matab, and they are avaiabe onine 4. Since both methods invove soving (4.4) which invoves a doube oop, we impement the sub-routine for soving (4.4) in C for a fair comparison

18 18 FUSED MULTIPLE GRAPHICAL LASSO The synthetic covariance matrices are generated as foows. We first generate K bock diagona ground truth precision matrices Θ (k) with L bocks, and each bock Θ (k) is of size (p/l) (p/l). Each Θ (k), = 1,..., L, k = 1,..., K has random sparsity structures. We contro the number of nonzeros in each Θ (k) to be about 10p/L so that the tota number of nonzeros in the K precision matrices is 10Kp. Given the precision matrices, we draw 5p sampes from each Gaussian distribution to compute the sampe covariance matrices. The fused penaty parameter λ 2 is fixed to 0.1, and the 1 reguarization parameter λ 1 is seected so that the tota number of nonzeros in the soution is about 10Kp. We terminate the NSPG in the FMGL when the reative error max{ Θ(k) r Θ(k) r 1 } 1e-6. The FMGL is terminated when max{ Θ (k) r 1 } the reative error of the objective vaue is smaer than 1e-5, and ADMM stops unti it achieves an objective vaue equa to or smaer than that of FMGL. The resuts presented in Tabe 5.1 show that FMGL is consistenty faster than ADMM. FMGL converges much more quicky than ADMM. Moreover, the screening rue can achieve great computationa gain. The speedup with the screening rue is about 10 and 20 times for L = 5 and 10 respectivey. Tabe 5.1 Comparison of the proposed FMGL and ADMM with and without screening in terms of average computationa time (seconds). FMGL-S and ADMM-S are FMGL and ADMM with screening respectivey. p stands for the dimension, K is the number of graphs, L is the number of bocks, and λ 1 is the 1 reguarization parameter. The fused penaty parameter λ 2 is fixed to 0.1. Θ 0 represents the tota number of nonzero entries in ground truth precision matrices Θ (k), k = 1,..., K, and Θ 0 is the number of nonzeros in the soution. Data and parameter setting Computationa time (iteration numbers) p K L Θ 0 λ 1 Θ 0 FMGL-S FMGL ADMM-S ADMM (6) (152) (6) (140) (6) (146) (6) (150) (6) (152) (6) (155) (7) (155) (6) (155) (7) (153) (6) (172) (6) (157) (6) (168) Stabiity. We conduct experiments to demonstrate the effectiveness of FMGL. The synthetic sparse precision matrices are generated in the foowing way: we set the first precision matrix Θ (1) as 0.25I p p, where p = 100. When adding an edge (i, j) in the graph, we add σ to θ (1) ii and θ (1) jj, and subtract σ from θ(1) and θ (1) ji to keep the positive definiteness of Θ (1), where σ is uniformy drawn from [0.1, 0.3]. When deeting an edge (i, j) from the graph, we reverse the above steps with σ = θ (1). We randomy assign 200 edges for Θ(1). Θ (2) is obtained by adding 25 edges and deeting 25 different edges from Θ (1). Θ (3) is obtained from Θ (2) in the same way. For each precision matrix, we randomy draw n sampes from the Gaussian distribution with the corresponding precision matrix, where n varies from 40 to 200 with a step of 20. We perform 500 repications for each n. For each n,

19 S. YANG, Z. LU, X. SHEN, P. WONKA, AND J. YE 19 λ 2 is fixed to 0.08, and λ 1 is adjusted to make sure that the edge number is about 200. The accuracy n d /n g is used to measure the performance of FMGL and GLasso, where n d is the number of true edges detected by FGML and GLasso, and n g is the number of true edges. The resuts are shown in Figure 5.1. We can see from the figure that FMGL achieves higher accuracies, demonstrating the effectiveness of FMGL for earning mutipe graphica modes simutaneousy GLasso FMGL Θ (1) GLasso FMGL Θ (2) GLasso FMGL Θ (3) Accuracy Accuracy Accuracy Sampe Size Sampe Size Sampe Size Fig Comparison of FMGL and GLasso in detecting true edges. Sampe size varies from 40 to 200 with a step of Rea data ADHD-200. Attention Deficit Hyperactivity Disorder (ADHD) affects at east 5-10% of schoo-age chidren with annua costs exceeding 36 biion/year in the United States. The ADHD-200 project has reeased resting-state functiona magnetic resonance images (fmri) of 491 typicay deveoping chidren and 285 ADHD chidren, aiming to encourage the research on ADHD. The data used in this experiment is preprocessed using the NIAK pipeine, and downoaded from neurobureau 5. More detais about the preprocessing strategy can be found in the same website. The dataset we choose incudes 116 typicay deveoping chidren (TDC), 29 ADHD-Combined (ADHD-C), and 49 ADHD-Inattentive (ADHD-I). There are 231 time series and 2834 brain regions for each subject. We want to estimate the graphs of the three groups simutaneousy. The sampe covariance matrix is computed using a data from the same group. Since the number of brain regions p is 2834, obtaining the precision matrices is computationay intensive. We use this data to test the effectiveness of the proposed screening rue. λ 1 and λ 2 are set to 0.6 and The convergence criterion is 1e-5. The comparison of FMGL and ADMM in terms of the objective vaue curve is shown in Figure 5.2. The resut shows that FMGL converges much faster than ADMM. The computationa times of FMGL and ADMM are and seconds respectivey. However, utiizing the screening, the computationa times of FMGL-S and ADMM-S are and seconds respectivey, demonstrating the superiority of the screening rue. The obtained soution has 1443 bocks. The argest one incuding 634 nodes is shown in Figure 5.3. The bock structures of the FMGL soution are the same as those identified by the screening rue. The screening rue can be used to anayze the rough structures of the graphs. The cost of identifying bocks using the screening rue is negigibe compared to that of estimating the graphs. For high-dimensiona data such as ADHD-200, it 5

20 20 FUSED MULTIPLE GRAPHICAL LASSO 10 0 FMGL ADMM reative error (og scae) time (seconds) Fig Comparison of FMGL and ADMM in terms of objective vaue curve on the ADHD- 200 dataset. The dimension p is 2834, and the number of graphs K is 3. Fig A subgraph of ADHD-200 identified by FMGL with the proposed screening rue. The grey edges are common edges among the three graphs; the red, green, and bue edges are the specific edges for TDC, ADHD-I, and ADHD-C respectivey. is practica to use the screening rue to identify the bock structure before estimating the arge graphs. We use the screening rue to identify bock structures on ADHD-200 data with varying λ 1 and λ 2. The size distribution is shown in Figure 5.4. We can observe that the number of bocks increases, and the size of bocks deceases when the

21 S. YANG, Z. LU, X. SHEN, P. WONKA, AND J. YE 21 og 10 (Size of Bock) >500 og 10 (Size of Bock) > λ 1 (a) λ 2 (b) Fig The size distribution of bocks (in the ogarithmic scae) identified by the proposed screening rue. The coor represents the number of bocks of a specified size. (a): λ 1 varies from 0.5 to 0.95 with λ 2 fixed to (b): λ 2 varies from 0 to 0.2 with λ 1 fixed to reguarization parameter vaue increases FDG-PET. In this experiment, we use FDG-PET images from 74 Azheimer s disease (AD), 172 mid cognitive impairment (MCI), and 81 norma contro (NC) subjects downoaded from the Azheimer s disease neuroimaging initiative (ADNI) database. The different regions of the whoe brain voume can be represented by 116 anatomica voumes of interest (AVOI), defined by Automated Anatomica Labeing (AAL) [39]. Then we extracted data from each of the 116 AVOIs, and derived the average of each AVOI for each subject. The 116 AVOIs can be categorized into 10 groups: prefronta obe, other parts of the fronta obe, parieta obe, occipita obe, thaamus, insua, tempora obe, corpus striatum, cerebeum, and vermis. More detais about the categories can be found in [39, 41]. We remove two sma groups (thaamus and insua) containing ony 4 AVOIs in our experiments. To examine whether FMGL can effectivey utiize the information of common structures, we randomy seect g percent sampes from each group, where g varies from 20 to 100 with a step size of 10. For each g, λ 2 is fixed to 0.1, and λ 1 is adjusted to make sure the number of edges in each group is about the same. We perform 500 repications for each g. The edges with probabiity arger than 0.85 are considered as stabe edges. The resuts showing the numbers of stabe edges are summarized in Figure 5.5. We can observe that FMGL is more stabe than GLasso. When the sampe size is too sma (say 20%), there are ony 20 stabe edges in the graph of NC obtained by GLasso. But the graph of NC obtained by FMGL sti has about 140 stabe edges, iustrating the superiority of FMGL in stabiity. The brain connectivity modes obtained by FMGL are shown in Figure 5.6. We can see that the number of connections within the prefronta obe significanty increases, and the number of connections within the tempora obe significanty decreases from NC to AD, which are supported by previous iteratures [1, 14]. The connections between the prefronta and occipita obes increase from NC to AD, and connections within cerebeum decrease. We can aso find that the adjacent graphs are simiar, indicating that FMGL can identify the common structures, but aso keep

Moreau-Yosida Regularization for Grouped Tree Structure Learning

Moreau-Yosida Regularization for Grouped Tree Structure Learning Moreau-Yosida Reguarization for Grouped Tree Structure Learning Jun Liu Computer Science and Engineering Arizona State University J.Liu@asu.edu Jieping Ye Computer Science and Engineering Arizona State

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view

More information

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai

More information

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS ISEE 1 SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS By Yingying Fan and Jinchi Lv University of Southern Caifornia This Suppementary Materia

More information

A. Distribution of the test statistic

A. Distribution of the test statistic A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch

More information

Primal and dual active-set methods for convex quadratic programming

Primal and dual active-set methods for convex quadratic programming Math. Program., Ser. A 216) 159:469 58 DOI 1.17/s117-15-966-2 FULL LENGTH PAPER Prima and dua active-set methods for convex quadratic programming Anders Forsgren 1 Phiip E. Gi 2 Eizabeth Wong 2 Received:

More information

Convergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems

Convergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems Convergence Property of the Iri-Imai Agorithm for Some Smooth Convex Programming Probems S. Zhang Communicated by Z.Q. Luo Assistant Professor, Department of Econometrics, University of Groningen, Groningen,

More information

Source and Relay Matrices Optimization for Multiuser Multi-Hop MIMO Relay Systems

Source and Relay Matrices Optimization for Multiuser Multi-Hop MIMO Relay Systems Source and Reay Matrices Optimization for Mutiuser Muti-Hop MIMO Reay Systems Yue Rong Department of Eectrica and Computer Engineering, Curtin University, Bentey, WA 6102, Austraia Abstract In this paper,

More information

Asynchronous Control for Coupled Markov Decision Systems

Asynchronous Control for Coupled Markov Decision Systems INFORMATION THEORY WORKSHOP (ITW) 22 Asynchronous Contro for Couped Marov Decision Systems Michae J. Neey University of Southern Caifornia Abstract This paper considers optima contro for a coection of

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Schoo of Computer Science Probabiistic Graphica Modes Gaussian graphica modes and Ising modes: modeing networks Eric Xing Lecture 0, February 0, 07 Reading: See cass website Eric Xing @ CMU, 005-07 Network

More information

A Brief Introduction to Markov Chains and Hidden Markov Models

A Brief Introduction to Markov Chains and Hidden Markov Models A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,

More information

Problem set 6 The Perron Frobenius theorem.

Problem set 6 The Perron Frobenius theorem. Probem set 6 The Perron Frobenius theorem. Math 22a4 Oct 2 204, Due Oct.28 In a future probem set I want to discuss some criteria which aow us to concude that that the ground state of a sef-adjoint operator

More information

Approximated MLC shape matrix decomposition with interleaf collision constraint

Approximated MLC shape matrix decomposition with interleaf collision constraint Approximated MLC shape matrix decomposition with intereaf coision constraint Thomas Kainowski Antje Kiese Abstract Shape matrix decomposition is a subprobem in radiation therapy panning. A given fuence

More information

Statistical Learning Theory: A Primer

Statistical Learning Theory: A Primer Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO

More information

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After

More information

(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

(This is a sample cover image for this issue. The actual cover is not yet available at this time.) (This is a sampe cover image for this issue The actua cover is not yet avaiabe at this time) This artice appeared in a journa pubished by Esevier The attached copy is furnished to the author for interna

More information

THE REACHABILITY CONES OF ESSENTIALLY NONNEGATIVE MATRICES

THE REACHABILITY CONES OF ESSENTIALLY NONNEGATIVE MATRICES THE REACHABILITY CONES OF ESSENTIALLY NONNEGATIVE MATRICES by Michae Neumann Department of Mathematics, University of Connecticut, Storrs, CT 06269 3009 and Ronad J. Stern Department of Mathematics, Concordia

More information

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete Uniprocessor Feasibiity of Sporadic Tasks with Constrained Deadines is Strongy conp-compete Pontus Ekberg and Wang Yi Uppsaa University, Sweden Emai: {pontus.ekberg yi}@it.uu.se Abstract Deciding the feasibiity

More information

Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso

Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso Journa of Machine Learning Research 13 (2012) 723-736 Submitted 8/11; Revised 1/12; Pubished 3/12 Exact Covariance Threshoding into Connected Components for Large-Scae Graphica Lasso Rahu Mazumder Trevor

More information

Approximated MLC shape matrix decomposition with interleaf collision constraint

Approximated MLC shape matrix decomposition with interleaf collision constraint Agorithmic Operations Research Vo.4 (29) 49 57 Approximated MLC shape matrix decomposition with intereaf coision constraint Antje Kiese and Thomas Kainowski Institut für Mathematik, Universität Rostock,

More information

Lecture Note 3: Stationary Iterative Methods

Lecture Note 3: Stationary Iterative Methods MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or

More information

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES SARAH DAY, JEAN-PHILIPPE LESSARD, AND KONSTANTIN MISCHAIKOW Abstract. One of the most efficient methods for determining the equiibria of a continuous parameterized

More information

Distributed average consensus: Beyond the realm of linearity

Distributed average consensus: Beyond the realm of linearity Distributed average consensus: Beyond the ream of inearity Usman A. Khan, Soummya Kar, and José M. F. Moura Department of Eectrica and Computer Engineering Carnegie Meon University 5 Forbes Ave, Pittsburgh,

More information

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction Akaike Information Criterion for ANOVA Mode with a Simpe Order Restriction Yu Inatsu * Department of Mathematics, Graduate Schoo of Science, Hiroshima University ABSTRACT In this paper, we consider Akaike

More information

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is

More information

BALANCING REGULAR MATRIX PENCILS

BALANCING REGULAR MATRIX PENCILS BALANCING REGULAR MATRIX PENCILS DAMIEN LEMONNIER AND PAUL VAN DOOREN Abstract. In this paper we present a new diagona baancing technique for reguar matrix pencis λb A, which aims at reducing the sensitivity

More information

Distributed Optimization With Local Domains: Applications in MPC and Network Flows

Distributed Optimization With Local Domains: Applications in MPC and Network Flows Distributed Optimization With Loca Domains: Appications in MPC and Network Fows João F. C. Mota, João M. F. Xavier, Pedro M. Q. Aguiar, and Markus Püsche Abstract In this paper we consider a network with

More information

Stochastic Variational Inference with Gradient Linearization

Stochastic Variational Inference with Gradient Linearization Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,

More information

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina. Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and

More information

Cryptanalysis of PKP: A New Approach

Cryptanalysis of PKP: A New Approach Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in

More information

arxiv: v2 [stat.ml] 19 Oct 2016

arxiv: v2 [stat.ml] 19 Oct 2016 Sparse Quadratic Discriminant Anaysis and Community Bayes arxiv:1407.4543v2 [stat.ml] 19 Oct 2016 Ya Le Department of Statistics Stanford University ye@stanford.edu Abstract Trevor Hastie Department of

More information

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA) 1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using

More information

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating

More information

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7 6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17 Soution 7 Probem 1: Generating Random Variabes Each part of this probem requires impementation in MATLAB. For the

More information

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION SAHAR KARIMI AND STEPHEN VAVASIS Abstract. In this paper we present a variant of the conjugate gradient (CG) agorithm in which we invoke a subspace minimization

More information

Power Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks

Power Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks ower Contro and Transmission Scheduing for Network Utiity Maximization in Wireess Networks Min Cao, Vivek Raghunathan, Stephen Hany, Vinod Sharma and. R. Kumar Abstract We consider a joint power contro

More information

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization Scaabe Spectrum ocation for Large Networks ased on Sparse Optimization innan Zhuang Modem R&D Lab Samsung Semiconductor, Inc. San Diego, C Dongning Guo, Ermin Wei, and Michae L. Honig Department of Eectrica

More information

Partial permutation decoding for MacDonald codes

Partial permutation decoding for MacDonald codes Partia permutation decoding for MacDonad codes J.D. Key Department of Mathematics and Appied Mathematics University of the Western Cape 7535 Bevie, South Africa P. Seneviratne Department of Mathematics

More information

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe

More information

Algorithms to solve massively under-defined systems of multivariate quadratic equations

Algorithms to solve massively under-defined systems of multivariate quadratic equations Agorithms to sove massivey under-defined systems of mutivariate quadratic equations Yasufumi Hashimoto Abstract It is we known that the probem to sove a set of randomy chosen mutivariate quadratic equations

More information

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION Hsiao-Chang Chen Dept. of Systems Engineering University of Pennsyvania Phiadephia, PA 904-635, U.S.A. Chun-Hung Chen

More information

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES SIAM J. NUMER. ANAL. Vo. 0, No. 0, pp. 000 000 c 200X Society for Industria and Appied Mathematics VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES SARAH DAY, JEAN-PHILIPPE LESSARD, AND KONSTANTIN MISCHAIKOW

More information

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel Sequentia Decoding of Poar Codes with Arbitrary Binary Kerne Vera Miosavskaya, Peter Trifonov Saint-Petersburg State Poytechnic University Emai: veram,petert}@dcn.icc.spbstu.ru Abstract The probem of efficient

More information

The Group Structure on a Smooth Tropical Cubic

The Group Structure on a Smooth Tropical Cubic The Group Structure on a Smooth Tropica Cubic Ethan Lake Apri 20, 2015 Abstract Just as in in cassica agebraic geometry, it is possibe to define a group aw on a smooth tropica cubic curve. In this note,

More information

Statistical Inference, Econometric Analysis and Matrix Algebra

Statistical Inference, Econometric Analysis and Matrix Algebra Statistica Inference, Econometric Anaysis and Matrix Agebra Bernhard Schipp Water Krämer Editors Statistica Inference, Econometric Anaysis and Matrix Agebra Festschrift in Honour of Götz Trenker Physica-Verag

More information

arxiv: v1 [cs.lg] 31 Oct 2017

arxiv: v1 [cs.lg] 31 Oct 2017 ACCELERATED SPARSE SUBSPACE CLUSTERING Abofaz Hashemi and Haris Vikao Department of Eectrica and Computer Engineering, University of Texas at Austin, Austin, TX, USA arxiv:7.26v [cs.lg] 3 Oct 27 ABSTRACT

More information

Rate-Distortion Theory of Finite Point Processes

Rate-Distortion Theory of Finite Point Processes Rate-Distortion Theory of Finite Point Processes Günther Koiander, Dominic Schuhmacher, and Franz Hawatsch, Feow, IEEE Abstract We study the compression of data in the case where the usefu information

More information

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems Componentwise Determination of the Interva Hu Soution for Linear Interva Parameter Systems L. V. Koev Dept. of Theoretica Eectrotechnics, Facuty of Automatics, Technica University of Sofia, 1000 Sofia,

More information

Statistical Learning Theory: a Primer

Statistical Learning Theory: a Primer ??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa

More information

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm 1 Asymptotic Properties of a Generaized Cross Entropy Optimization Agorithm Zijun Wu, Michae Koonko, Institute for Appied Stochastics and Operations Research, Caustha Technica University Abstract The discrete

More information

A BUNDLE METHOD FOR A CLASS OF BILEVEL NONSMOOTH CONVEX MINIMIZATION PROBLEMS

A BUNDLE METHOD FOR A CLASS OF BILEVEL NONSMOOTH CONVEX MINIMIZATION PROBLEMS SIAM J. OPTIM. Vo. 18, No. 1, pp. 242 259 c 2007 Society for Industria and Appied Mathematics A BUNDLE METHOD FOR A CLASS OF BILEVEL NONSMOOTH CONVEX MINIMIZATION PROBLEMS MIKHAIL V. SOLODOV Abstract.

More information

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische

More information

A Statistical Framework for Real-time Event Detection in Power Systems

A Statistical Framework for Real-time Event Detection in Power Systems 1 A Statistica Framework for Rea-time Event Detection in Power Systems Noan Uhrich, Tim Christman, Phiip Swisher, and Xichen Jiang Abstract A quickest change detection (QCD) agorithm is appied to the probem

More information

Iterative Decoding Performance Bounds for LDPC Codes on Noisy Channels

Iterative Decoding Performance Bounds for LDPC Codes on Noisy Channels Iterative Decoding Performance Bounds for LDPC Codes on Noisy Channes arxiv:cs/060700v1 [cs.it] 6 Ju 006 Chun-Hao Hsu and Achieas Anastasopouos Eectrica Engineering and Computer Science Department University

More information

Learning Fully Observed Undirected Graphical Models

Learning Fully Observed Undirected Graphical Models Learning Fuy Observed Undirected Graphica Modes Sides Credit: Matt Gormey (2016) Kayhan Batmangheich 1 Machine Learning The data inspires the structures we want to predict Inference finds {best structure,

More information

Nonlinear Gaussian Filtering via Radial Basis Function Approximation

Nonlinear Gaussian Filtering via Radial Basis Function Approximation 51st IEEE Conference on Decision and Contro December 10-13 01 Maui Hawaii USA Noninear Gaussian Fitering via Radia Basis Function Approximation Huazhen Fang Jia Wang and Raymond A de Caafon Abstract This

More information

Incremental Reformulated Automatic Relevance Determination

Incremental Reformulated Automatic Relevance Determination IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 60, NO. 9, SEPTEMBER 22 4977 Incrementa Reformuated Automatic Reevance Determination Dmitriy Shutin, Sanjeev R. Kukarni, and H. Vincent Poor Abstract In this

More information

Investigation on spectrum of the adjacency matrix and Laplacian matrix of graph G l

Investigation on spectrum of the adjacency matrix and Laplacian matrix of graph G l Investigation on spectrum of the adjacency matrix and Lapacian matrix of graph G SHUHUA YIN Computer Science and Information Technoogy Coege Zhejiang Wani University Ningbo 3500 PEOPLE S REPUBLIC OF CHINA

More information

FRIEZE GROUPS IN R 2

FRIEZE GROUPS IN R 2 FRIEZE GROUPS IN R 2 MAXWELL STOLARSKI Abstract. Focusing on the Eucidean pane under the Pythagorean Metric, our goa is to cassify the frieze groups, discrete subgroups of the set of isometries of the

More information

XSAT of linear CNF formulas

XSAT of linear CNF formulas XSAT of inear CN formuas Bernd R. Schuh Dr. Bernd Schuh, D-50968 Kön, Germany; bernd.schuh@netcoogne.de eywords: compexity, XSAT, exact inear formua, -reguarity, -uniformity, NPcompeteness Abstract. Open

More information

Pricing Multiple Products with the Multinomial Logit and Nested Logit Models: Concavity and Implications

Pricing Multiple Products with the Multinomial Logit and Nested Logit Models: Concavity and Implications Pricing Mutipe Products with the Mutinomia Logit and Nested Logit Modes: Concavity and Impications Hongmin Li Woonghee Tim Huh WP Carey Schoo of Business Arizona State University Tempe Arizona 85287 USA

More information

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1654 March23, 1999

More information

arxiv: v1 [math.co] 17 Dec 2018

arxiv: v1 [math.co] 17 Dec 2018 On the Extrema Maximum Agreement Subtree Probem arxiv:1812.06951v1 [math.o] 17 Dec 2018 Aexey Markin Department of omputer Science, Iowa State University, USA amarkin@iastate.edu Abstract Given two phyogenetic

More information

12.2. Maxima and Minima. Introduction. Prerequisites. Learning Outcomes

12.2. Maxima and Minima. Introduction. Prerequisites. Learning Outcomes Maima and Minima 1. Introduction In this Section we anayse curves in the oca neighbourhood of a stationary point and, from this anaysis, deduce necessary conditions satisfied by oca maima and oca minima.

More information

Florida State University Libraries

Florida State University Libraries Forida State University Libraries Eectronic Theses, Treatises and Dissertations The Graduate Schoo Construction of Efficient Fractiona Factoria Mixed-Leve Designs Yong Guo Foow this and additiona works

More information

Maximizing Sum Rate and Minimizing MSE on Multiuser Downlink: Optimality, Fast Algorithms and Equivalence via Max-min SIR

Maximizing Sum Rate and Minimizing MSE on Multiuser Downlink: Optimality, Fast Algorithms and Equivalence via Max-min SIR 1 Maximizing Sum Rate and Minimizing MSE on Mutiuser Downink: Optimaity, Fast Agorithms and Equivaence via Max-min SIR Chee Wei Tan 1,2, Mung Chiang 2 and R. Srikant 3 1 Caifornia Institute of Technoogy,

More information

ORTHOGONAL MULTI-WAVELETS FROM MATRIX FACTORIZATION

ORTHOGONAL MULTI-WAVELETS FROM MATRIX FACTORIZATION J. Korean Math. Soc. 46 2009, No. 2, pp. 281 294 ORHOGONAL MLI-WAVELES FROM MARIX FACORIZAION Hongying Xiao Abstract. Accuracy of the scaing function is very crucia in waveet theory, or correspondingy,

More information

Available online at ScienceDirect. IFAC PapersOnLine 50-1 (2017)

Available online at   ScienceDirect. IFAC PapersOnLine 50-1 (2017) Avaiabe onine at www.sciencedirect.com ScienceDirect IFAC PapersOnLine 50-1 (2017 3412 3417 Stabiization of discrete-time switched inear systems: Lyapunov-Metzer inequaities versus S-procedure characterizations

More information

A Comparison Study of the Test for Right Censored and Grouped Data

A Comparison Study of the Test for Right Censored and Grouped Data Communications for Statistica Appications and Methods 2015, Vo. 22, No. 4, 313 320 DOI: http://dx.doi.org/10.5351/csam.2015.22.4.313 Print ISSN 2287-7843 / Onine ISSN 2383-4757 A Comparison Study of the

More information

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS TONY ALLEN, EMILY GEBHARDT, AND ADAM KLUBALL 3 ADVISOR: DR. TIFFANY KOLBA 4 Abstract. The phenomenon of noise-induced stabiization occurs

More information

Learning Structural Changes of Gaussian Graphical Models in Controlled Experiments

Learning Structural Changes of Gaussian Graphical Models in Controlled Experiments Learning Structura Changes of Gaussian Graphica Modes in Controed Experiments Bai Zhang and Yue Wang Bradey Department of Eectrica and Computer Engineering Virginia Poytechnic Institute and State University

More information

SVM: Terminology 1(6) SVM: Terminology 2(6)

SVM: Terminology 1(6) SVM: Terminology 2(6) Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are

More information

Global Optimality Principles for Polynomial Optimization Problems over Box or Bivalent Constraints by Separable Polynomial Approximations

Global Optimality Principles for Polynomial Optimization Problems over Box or Bivalent Constraints by Separable Polynomial Approximations Goba Optimaity Principes for Poynomia Optimization Probems over Box or Bivaent Constraints by Separabe Poynomia Approximations V. Jeyakumar, G. Li and S. Srisatkunarajah Revised Version II: December 23,

More information

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA ON THE SYMMETRY OF THE POWER INE CHANNE T.C. Banwe, S. Gai {bct, sgai}@research.tecordia.com Tecordia Technoogies, Inc., 445 South Street, Morristown, NJ 07960, USA Abstract The indoor power ine network

More information

MONOCHROMATIC LOOSE PATHS IN MULTICOLORED k-uniform CLIQUES

MONOCHROMATIC LOOSE PATHS IN MULTICOLORED k-uniform CLIQUES MONOCHROMATIC LOOSE PATHS IN MULTICOLORED k-uniform CLIQUES ANDRZEJ DUDEK AND ANDRZEJ RUCIŃSKI Abstract. For positive integers k and, a k-uniform hypergraph is caed a oose path of ength, and denoted by

More information

Semidefinite relaxation and Branch-and-Bound Algorithm for LPECs

Semidefinite relaxation and Branch-and-Bound Algorithm for LPECs Semidefinite reaxation and Branch-and-Bound Agorithm for LPECs Marcia H. C. Fampa Universidade Federa do Rio de Janeiro Instituto de Matemática e COPPE. Caixa Posta 68530 Rio de Janeiro RJ 21941-590 Brasi

More information

Equilibrium of Heterogeneous Congestion Control Protocols

Equilibrium of Heterogeneous Congestion Control Protocols Equiibrium of Heterogeneous Congestion Contro Protocos Ao Tang Jiantao Wang Steven H. Low EAS Division, Caifornia Institute of Technoogy Mung Chiang EE Department, Princeton University Abstract When heterogeneous

More information

Emmanuel Abbe Colin Sandon

Emmanuel Abbe Colin Sandon Detection in the stochastic bock mode with mutipe custers: proof of the achievabiity conjectures, acycic BP, and the information-computation gap Emmanue Abbe Coin Sandon Abstract In a paper that initiated

More information

Paragraph Topic Classification

Paragraph Topic Classification Paragraph Topic Cassification Eugene Nho Graduate Schoo of Business Stanford University Stanford, CA 94305 enho@stanford.edu Edward Ng Department of Eectrica Engineering Stanford University Stanford, CA

More information

8 Digifl'.11 Cth:uits and devices

8 Digifl'.11 Cth:uits and devices 8 Digif'. Cth:uits and devices 8. Introduction In anaog eectronics, votage is a continuous variabe. This is usefu because most physica quantities we encounter are continuous: sound eves, ight intensity,

More information

Melodic contour estimation with B-spline models using a MDL criterion

Melodic contour estimation with B-spline models using a MDL criterion Meodic contour estimation with B-spine modes using a MDL criterion Damien Loive, Ney Barbot, Oivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-305 Lannion Cedex

More information

II. PROBLEM. A. Description. For the space of audio signals

II. PROBLEM. A. Description. For the space of audio signals CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time

More information

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract Stochastic Compement Anaysis of Muti-Server Threshod Queues with Hysteresis John C.S. Lui The Dept. of Computer Science & Engineering The Chinese University of Hong Kong Leana Goubchik Dept. of Computer

More information

HILBERT? What is HILBERT? Matlab Implementation of Adaptive 2D BEM. Dirk Praetorius. Features of HILBERT

HILBERT? What is HILBERT? Matlab Implementation of Adaptive 2D BEM. Dirk Praetorius. Features of HILBERT Söerhaus-Workshop 2009 October 16, 2009 What is HILBERT? HILBERT Matab Impementation of Adaptive 2D BEM joint work with M. Aurada, M. Ebner, S. Ferraz-Leite, P. Godenits, M. Karkuik, M. Mayr Hibert Is

More information

Multilayer Kerceptron

Multilayer Kerceptron Mutiayer Kerceptron Zotán Szabó, András Lőrincz Department of Information Systems, Facuty of Informatics Eötvös Loránd University Pázmány Péter sétány 1/C H-1117, Budapest, Hungary e-mai: szzoi@csetehu,

More information

arxiv: v2 [cs.lg] 4 Sep 2014

arxiv: v2 [cs.lg] 4 Sep 2014 Cassification with Sparse Overapping Groups Nikhi S. Rao Robert D. Nowak Department of Eectrica and Computer Engineering University of Wisconsin-Madison nrao2@wisc.edu nowak@ece.wisc.edu ariv:1402.4512v2

More information

Robust Sensitivity Analysis for Linear Programming with Ellipsoidal Perturbation

Robust Sensitivity Analysis for Linear Programming with Ellipsoidal Perturbation Robust Sensitivity Anaysis for Linear Programming with Eipsoida Perturbation Ruotian Gao and Wenxun Xing Department of Mathematica Sciences Tsinghua University, Beijing, China, 100084 September 27, 2017

More information

Fast Blind Recognition of Channel Codes

Fast Blind Recognition of Channel Codes Fast Bind Recognition of Channe Codes Reza Moosavi and Erik G. Larsson Linköping University Post Print N.B.: When citing this work, cite the origina artice. 213 IEEE. Persona use of this materia is permitted.

More information

Efficiently Generating Random Bits from Finite State Markov Chains

Efficiently Generating Random Bits from Finite State Markov Chains 1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

Haar Decomposition and Reconstruction Algorithms

Haar Decomposition and Reconstruction Algorithms Jim Lambers MAT 773 Fa Semester 018-19 Lecture 15 and 16 Notes These notes correspond to Sections 4.3 and 4.4 in the text. Haar Decomposition and Reconstruction Agorithms Decomposition Suppose we approximate

More information

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University Turbo Codes Coding and Communication Laboratory Dept. of Eectrica Engineering, Nationa Chung Hsing University Turbo codes 1 Chapter 12: Turbo Codes 1. Introduction 2. Turbo code encoder 3. Design of intereaver

More information

u(x) s.t. px w x 0 Denote the solution to this problem by ˆx(p, x). In order to obtain ˆx we may simply solve the standard problem max x 0

u(x) s.t. px w x 0 Denote the solution to this problem by ˆx(p, x). In order to obtain ˆx we may simply solve the standard problem max x 0 Bocconi University PhD in Economics - Microeconomics I Prof M Messner Probem Set 4 - Soution Probem : If an individua has an endowment instead of a monetary income his weath depends on price eves In particuar,

More information

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model Appendix of the Paper The Roe of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Mode Caio Ameida cameida@fgv.br José Vicente jose.vaentim@bcb.gov.br June 008 1 Introduction In this

More information

On the Goal Value of a Boolean Function

On the Goal Value of a Boolean Function On the Goa Vaue of a Booean Function Eric Bach Dept. of CS University of Wisconsin 1210 W. Dayton St. Madison, WI 53706 Lisa Heerstein Dept of CSE NYU Schoo of Engineering 2 Metrotech Center, 10th Foor

More information

arxiv: v1 [cs.db] 1 Aug 2012

arxiv: v1 [cs.db] 1 Aug 2012 Functiona Mechanism: Regression Anaysis under Differentia Privacy arxiv:208.029v [cs.db] Aug 202 Jun Zhang Zhenjie Zhang 2 Xiaokui Xiao Yin Yang 2 Marianne Winsett 2,3 ABSTRACT Schoo of Computer Engineering

More information

Analysis of Emerson s Multiple Model Interpolation Estimation Algorithms: The MIMO Case

Analysis of Emerson s Multiple Model Interpolation Estimation Algorithms: The MIMO Case Technica Report PC-04-00 Anaysis of Emerson s Mutipe Mode Interpoation Estimation Agorithms: The MIMO Case João P. Hespanha Dae E. Seborg University of Caifornia, Santa Barbara February 0, 004 Anaysis

More information

Integrating Factor Methods as Exponential Integrators

Integrating Factor Methods as Exponential Integrators Integrating Factor Methods as Exponentia Integrators Borisav V. Minchev Department of Mathematica Science, NTNU, 7491 Trondheim, Norway Borko.Minchev@ii.uib.no Abstract. Recenty a ot of effort has been

More information

Homogeneity properties of subadditive functions

Homogeneity properties of subadditive functions Annaes Mathematicae et Informaticae 32 2005 pp. 89 20. Homogeneity properties of subadditive functions Pá Burai and Árpád Száz Institute of Mathematics, University of Debrecen e-mai: buraip@math.kte.hu

More information

8 APPENDIX. E[m M] = (n S )(1 exp( exp(s min + c M))) (19) E[m M] n exp(s min + c M) (20) 8.1 EMPIRICAL EVALUATION OF SAMPLING

8 APPENDIX. E[m M] = (n S )(1 exp( exp(s min + c M))) (19) E[m M] n exp(s min + c M) (20) 8.1 EMPIRICAL EVALUATION OF SAMPLING 8 APPENDIX 8.1 EMPIRICAL EVALUATION OF SAMPLING We wish to evauate the empirica accuracy of our samping technique on concrete exampes. We do this in two ways. First, we can sort the eements by probabiity

More information

An approximate method for solving the inverse scattering problem with fixed-energy data

An approximate method for solving the inverse scattering problem with fixed-energy data J. Inv. I-Posed Probems, Vo. 7, No. 6, pp. 561 571 (1999) c VSP 1999 An approximate method for soving the inverse scattering probem with fixed-energy data A. G. Ramm and W. Scheid Received May 12, 1999

More information